Welcome to the CHICKEN Scheme pasting service
Magic Pipes spec added by alaricsp on Fri Feb 8 11:28:25 2013
** TODO Magic Pipes :PROJECT:MEDIUM: :PROPERTIES: :ID: aa0e602d-0513-4b9f-a80c-d9e480e1341b :END: See http://www.snell-pym.org.uk/archives/2009/06/25/magic-pipes/ Rather than taking expressions and evaluating them with INPUT etc. bound to a value, perhaps accept an expression that must evaluate to a procedure. Makes it easier it use existing procedures, lets people choose their own bound names, and is more like Scheme HOFs. User expressions are read using the full Chicken reader, but s-expressions read from standard input are always read with a limited read-table that disables #, #+, #<#, and any other non-standard read syntax that might open up security vulnerabilities. *** Global arguments These are accepted by any of the tools below: + -uUse the supplied unit. + -d Evaluate the supplied expression. + -i Evaluate the contents of the supplied file. (-u), (-d) and (-i) arguments are processed in the order supplied, before evaluating any other user-supplied expressions. The expressions evaluated by (-d) and (-i) have access to the current error port, but not input or output. The intent of these is to set up utility procedures/macros to be used by other expressions later on. *** mpfilter ... Reads s-expressions from stdin, and outputs them to stdout if, when passed to all the procedures listed on the command line, they all return true. current-input-port and current-output-port are banned, but current-error-port is accessible. *** mpmap Reads s-expressions from stdin, applies the procedure on the command line to them, and then writes the results to stdout. current-input-port and current-output-port are banned, but current-error-port is accessible. If the procedure returns multiple values, they are outputs as separate s-expressions; thus "mpmap '(lambda (x) (values x x))'" will duplicate each s-expression in the input. *** mpfold [-o ] [ ] The first expression evaluates to a two-argument procedure, the second (the initial accumulator) to any value; #f is the default if none is specified. (-o) specifies a single-argument output procedure; the identity function is the default. Applies the procedure to each s-expression from the input in turn, with the current accumulator as the second argument. At the end, outputs the result of applying the output procedure to the final accumulator. current-input-port is banned, but current-error-port is accessible. current-output-port is usable, for convenience in writing pipelines that summarise each line of some input then finally write a "totals" line. *** mpsort [-c] [-r] [-p ] [ [ ]] The first expression must produce a two-argument comparison procedure, and defaults to "smart<" if none is present. The second expression must produce a single-argument key extraction procedure, which defaults to the identity. Reads in all the expressions from the input, sorts them by applying the comparison procedure to the results of applying the extraction procedure to the expressions, then returns the result. If (-c) is specified, then the extraction procedure is assumed to be expensive, and its result computed and cached at load time. If (-r) is specified, then the sort order is reversed. Provide smart< and smart> procedures, which compare things in a type-agnostic way: < for numbers, string< for strings, recursive testing for pairs and vectors. As usual, the procedures have no access to current input or output ports, but can write to the error port. If (-p) is specified, then rather than sorting in-memory, we instead start the specified number of threads, each of which reads sexpressions from a bounded FIFO and sends them to a child mpsort process. A master thread then reads sexpressions from standard input and round-robins them to the FIFOs, skipping any FIFOs that are "full" and blocking if they all are. Each child process also has a reader thread that reads its sorted output and loads them into another FIFO, and a final output thread merges the sorted FIFO outputs into a final sorted output to standard output. #!eof is used as a marker in the FIFOs to record the actual end of the file, to distinguish EOF from an empty FIFO due to the source not having produced anything yet. Is it worth having an option to go multi-machine by running mpsort from inetd (perhaps in parallel mode to use multiple cores) on remote machines and parallelising via TCP rather than running a child process? That would be kind of cool and not too hard. Or for huge sorts (where there's not enough memory available), we could have a flag that splits the input into temporary files of up to a certain size, sorts them individually one by one, then merges the results together. *** mplookup [-m] [-f|-F] {lookup