Welcome to the CHICKEN Scheme pasting service

Magic Pipes spec added by alaricsp on Fri Feb 8 11:28:25 2013

** TODO Magic Pipes               :PROJECT:MEDIUM:
   :PROPERTIES:
   :ID:       aa0e602d-0513-4b9f-a80c-d9e480e1341b
   :END:
See http://www.snell-pym.org.uk/archives/2009/06/25/magic-pipes/

Rather than taking expressions and evaluating them with INPUT
etc. bound to a value, perhaps accept an expression that must evaluate
to a procedure. Makes it easier it use existing procedures, lets
people choose their own bound names, and is more like Scheme HOFs.

User expressions are read using the full Chicken reader, but
s-expressions read from standard input are always read with a limited
read-table that disables #, #+, #<#, and any other non-standard read
syntax that might open up security vulnerabilities.

*** Global arguments

These are accepted by any of the tools below:

+ -u   Use the supplied unit.
+ -d   Evaluate the supplied expression.
+ -i   Evaluate the contents of the supplied file.

(-u), (-d) and (-i) arguments are processed in the order supplied,
before evaluating any other user-supplied expressions.

The expressions evaluated by (-d) and (-i) have access to the current
error port, but not input or output.

The intent of these is to set up utility procedures/macros to be used
by other expressions later on.

*** mpfilter ...

Reads s-expressions from stdin, and outputs them to stdout if, when
passed to all the procedures listed on the command line, they all
return true. current-input-port and current-output-port are banned,
but current-error-port is accessible.

*** mpmap 

Reads s-expressions from stdin, applies the procedure on the command
line to them, and then writes the results to
stdout. current-input-port and current-output-port are banned, but
current-error-port is accessible.

If the procedure returns multiple values, they are outputs as separate
s-expressions; thus "mpmap '(lambda (x) (values x x))'" will duplicate
each s-expression in the input.

*** mpfold [-o ]  []

The first expression evaluates to a two-argument procedure, the second
(the initial accumulator) to any value; #f is the default if none is
specified. (-o) specifies a single-argument output procedure; the
identity function is the default.

Applies the procedure to each s-expression from the input
in turn, with the current accumulator as the second argument. At the
end, outputs the result of applying the output procedure to the final
accumulator.

current-input-port is banned, but current-error-port is
accessible. current-output-port is usable, for convenience in writing
pipelines that summarise each line of some input then finally write a
"totals" line.

*** mpsort [-c] [-r] [-p ] [ []]

The first expression must produce a two-argument comparison procedure,
and defaults to "smart<" if none is present. The second expression must
produce a single-argument key extraction procedure, which defaults to
the identity.

Reads in all the expressions from the input, sorts them by applying
the comparison procedure to the results of applying the extraction
procedure to the expressions, then returns the result.

If (-c) is specified, then the extraction procedure is assumed to be
expensive, and its result computed and cached at load time.

If (-r) is specified, then the sort order is reversed.

Provide smart< and smart> procedures, which compare things in a
type-agnostic way: < for numbers, string< for strings, recursive
testing for pairs and vectors.

As usual, the procedures have no access to current input or output
ports, but can write to the error port.

If (-p) is specified, then rather than sorting in-memory, we instead
start the specified number of threads, each of which reads
sexpressions from a bounded FIFO and sends them to a child mpsort
process. A master thread then reads sexpressions from standard input
and round-robins them to the FIFOs, skipping any FIFOs that are "full"
and blocking if they all are. Each child process also has a reader
thread that reads its sorted output and loads them into another FIFO,
and a final output thread merges the sorted FIFO outputs into a final
sorted output to standard output. #!eof is used as a marker in the
FIFOs to record the actual end of the file, to distinguish EOF from an
empty FIFO due to the source not having produced anything yet.

Is it worth having an option to go multi-machine by running mpsort
from inetd (perhaps in parallel mode to use multiple cores) on remote
machines and parallelising via TCP rather than running a child
process? That would be kind of cool and not too hard.

Or for huge sorts (where there's not enough memory available), we
could have a flag that splits the input into temporary files of up to
a certain size, sorts them individually one by one, then merges the
results together.

*** mplookup [-m] [-f|-F] {lookup |revlookup }

It would be convenient to have a simple command-line tool to handle
look-up tables, mapping one s-expression to one or more other
s-expressions. By default, each output s-expression is a list of
results from the corresponding input s-expression, which is empty if
the there is no mapping. If (-f) is specified then the first result is
returned only, not wrapped in a list, and #f used if there is none. If
(-F) is used then the first result is returned, and #f if there is
none or more than one.

File type detection is performed on the map file. There is support for
sqlite databases in a special format (ending .mbm; magic binary map),
or plain text files with a sequence of ( . ) pairs (ending
.msm; magic sexpr map) or /etc/aliases format files (default), which
are treated as string->string mappings.

If (-m) is specified, then the map file is not a file name, but the
name of a meta-map from a list: uid<->name, gid<->name, ip->list of
hostnames, hostname->list of ips, hostname->list of arbitrary DNS
records, port<->service, ... But more heavyweight things like a
PostgreSQL/MySQL lookup tool would be best handled by using mpmap with
a suitable interface egg.

**** mplookup-set  [ ]

In the given map file (which, if nonexistant, is created), set expr1
to map to expr2.

If the exprs are omitted, then sexprs are read from standard input,
and must be pairs, the first element of which is treated as expr1 and
the second as expr2, and are all set into the map in order.

**** mplookup-delete  []

Deletes the given mapping from the given map file. If the expression
is omitted, then expressions are read from stdin and removed from the
map file. If the map file does not exist, an error is raised.

**** mplookup-dump 

Spits out the contents of the map file as a sequence of pairs, with
the car being the key and the cdr the value. This can be piped into
mplookup-set to effect map file format conversions.

*** FIXME: mprandom ???

Take random samples of the input - either pick any s-expression with a
given chance, or read all the s-expressions into RAM and pick N at
random

*** FIXME: mpshuffle ???

Read input s-expressions into a list, shuffle, and output the result.

*** mpflatten

Reads input s-expressions, and if they are lists, writes the elements
of the list as separate s-expressions, otherwise writes them as-is.

*** mpgroup [-a] [-t] [-f|-l] 

The expression must be a single-argument procedure. It is applied to
each input s-expression to obtain a "key" for each input s-expression.

As usual, the procedure has no access to current input or output
ports, but can write to the error port.

If (-a) is specified, then the s-expressions are accumulated in memory
by their keys, into a hashtable. If (-f) is specified, the only the
first s-expression for each key is kept; if (-l) is specified, the
only the last is kept. At the end, the hash table is written out; if
(-t) is specified, it is written as one list per key, the first
element being the key value and the rest being the s-expressions with
that key. If (-t) is not specified, then it is just one list per key,
but without the key as the first element. The order of the keys listed
in undefined, but if neither (-f) nor (-l) are specified, the
s-expressions within a key are in the order they were read.

If (-a) is not specified, then the s-expressions are not accumulated
and spat out in a single batch; instead, they are output in the same
order that they were read in, but grouped into lists of s-expressions
having the same key in a contiguous run. If (-t) is specified, the key
value is prepended to the list. If (-f) is specified, then only the
first s-expression in each run of the same key value is listed (and if
(-t) is not specified, then it is output as-is rather than as a
single-element list). Likewise, if (-l) is specified, the only the
last s-expression in each run with of the same key value is listed,
and unless (-t) is specified, it's written as-is without a
single-element list enclosing it.

*** mpforeach ...

Run the supplied Scheme procedure(s) on each s-expression from the
input. Ignore anything returned, and the Scheme procedure can access
stdout/stderr if required, but has no access to stdin.

*** mpparse [|-p ] [-o ]

The argument, if present, must be a valid SRE; or, if (-p) is used, a
POSIX regexp. If not present, it defaults to "(seq bos (* any) eos)"

Reads in lines of text from stdin and converts them to s-expressions
by applying the regular expression. Lines that do not match the regexp
are ignored.

If (-o) is specified, then the expression must be a single-argument
procedure which is applied to each irregex match object to generate
the output s-expression. If not, then a default is used which has the
following behaviour:

If the regexp has no captures, then the entire matching string is
returned.

If it has only numbered submatches, then a list of the submatches is
returned.

If it has named (and maybe also numbered) submatches, then an alist of
them is returned, with names used where available and numbers where not.

*** mpprintf [-n] ...

Calls "printf" on each input sexpr, with the arguments (concatenated
with spaces) as the format string. Appends a newline unless (-n) is
specified.

*** mpls [-r] [-x|-l|-a|-o ] [-f ]... []...

Write an "ls"-equivalent tool that outputs sepxressions, with a choice
of formats (see list below) or (-o) an arbitrary function to be
applied to each filename (with access to all the posix unit functions,
such as file-stat and friends) to generate the output, and optional
filter expression(s) (-f) which are ANDed together. By default, the
filter accepts all files, and the output is just the filenames as
strings.

Give it the option (-r) to recurse, in which case the filenames passed to
the function are multi-stage relative paths.

Takes an optional list of files on the command line to just list
those, a la "ls".

Standard formats:

+ -x - a pair with the filename as the first element and the result of
  file-stat (a vector) as the second

+ -l - a list with the filename as the first element, a single-letter
  type code as the second (d=directory, r=regular, etc.), mode, uid,
  gid, size, mtime, and for symlinks, the link target as an extra
  element.

+ -a - an alist, with all the data from -x, but as a nicely accessible
  alist.

Add some utility functions to provide advanced "find" functionality,
such as (older (file-creation-time f) (days 5)).

In the expressions, current-input-port and current-output-port are
banned, but current-error-port is accessible.

mpls -r -f 'regular-file?' -f '(lambda (file) (older (file-creation-time file) (days 5)))'
 | mpforeach 'delete-file'

Your annotation:

Enter a new annotation:

Your nick:
The title of your paste:
Your paste (mandatory) :
Which procedure can be used to check whether its argument is a string?
Visually impaired? Let me spell it for you (wav file) download WAV