Wednesday, May 24, 2006

Arc Suggestions


Paul Graham posted the collected suggestions for (his) "new Lisp" here. Some nice points: concurrency continuations, OO, macros... But I liked:

*** Dan Milstein: Concurrency

One problem which, IMHO, no popular language has come even
close to solving well is allowing a programmer to write multithreaded
code. This is particularly important in server-side programming, one of
Arc's major targets. I've written a good deal of multithreaded Java, and
the threading model is deeply, deeply wrong. As a programmer, there's
almost no one way to write the kind of abstractions which let you forget
about the details. You're always sitting there, trying to work through
complicated scenarios in your head, visualizing the run-time structure of
your program.

I didn't see another way until I read John H. Reppy's "Concurrent
Programming in ML". Instead of building his concurrency constructs around
monitored access to shared memory, he builds them around a message passing
model (both synchronous and asynchronous). What's more, he provides
powerful means of capturing a concurrent pattern in an abstraction which
hides the details.

I highly recommend giving that book a read. Here's an example of some of
what you get (not the abstraction, actually, just the basic power of
message-passing over shared memory). The abstraction facilities are
complex enough that, like Lisp macros, a small example doesn't really
capture their power. I'm in no way familiar with concurrent extensions to
Lisp, so I'm not able to provide the code for how much harder it would be
in CL or Scheme. Assuming they were augmented with a shared memory model
(as Java is), which forces the programmer to deal with synchrnoized access
to memory, I can only imagine it would be significantly more complex.

A producer/consumer buffer. You want a buffer with a finite number of
cells. If a producer tries to add an element to a full buffer, it should
block until a consumer removes an element. If a consumer tries to remove
an element from an empty buffer, it should block until a producer adds
something.

In Concurrent ML:

datatype 'a buffer = BUF of {
insCh : 'a chan,
remCh : 'a chan
}

fun buffer () = let
val insCh = channel() and remCh = channel()

fun loop [] = loop [recv inCh]
| loop buf = if (length buf > maxlen)
then (send (remCh, hd buf); loop (tl, buf))
else (select remCh!(hd buf) => loop (tl buf)
or insCh?x => loop (buf @ [x]))

in
spawn loop;
BUF{
insCh = insCh,
remCh = remCh
}
end

fun insert (BUF{insCh, ...}, v) = send (insCh, v)

fun remove (BUF{remCh, ...}) = recv remCh

---------------

Translated into a Lisp-ish syntax (very easy to do from ML), this would
look something like:

(defstruct buffer
ins-ch
rem-ch)

(defun create-buffer ()
(let ((ins-ch (make-channel))
(rem-ch (make-channel)))
(labels ((loop (buf)
(cond ((null buf)
(loop (list (recv ins-ch))))
((> (length buf) maxlen)
(send rem-ch (car buf))
(loop (cdr buf)))
(t
(select
(rem-ch ! (car buf) => (loop (cdr buf)))
(ins-ch ? x => (loop (append buf (list x)))))))))
(spawn #'loop)
(make-buffer :ins-ch ins-ch :rem-ch rem-ch))))

(defun insert (b v)
(send (buffer-ins-ch b) v))

(defun remove (b)
(recv (buffer-rem-ch b)))


Key things to notice:

1) Language features:

Communication between threads *only* occurs over channel objects, which can
be thought of as one-element queues. In CML, channels are typed, but in
Lisp they probably wouldn't be.

send/recv: synchronous (blocking) communication over the channel. A thread
can attempt to send an object over the channel, and will then block until
another thread does a recv on the channel.

Creating a new thread is done via 'spawn', which takes a function as its
argument. (I can't remember what the signature of that function is
supposed to be -- clearly, in this case it can't be a function of no
arguments, but imagine it to be something like that).

Selective communication: the call to 'select' is one of the very powerful
features. It is like a concurrent conditional -- it simultaneously blocks
on a list of send/recv calls, and executes the associated code with
whichever call returns first (and then drops the rest of the calls). The !
syntax means an attempt to send, the ? means an attempt to receieve. In
both cases, the '=>' connects the associated code to execute. (I haven't
really come up with a Lispy translation of that syntax).

I don't think select could be efficiently implemented without language
support. It requires a sort of 'partial blocking', which is tricky to
implement on top of normal blocking.


2) The Idiom

The buffer is implemented as a separate thread which has connections to two
channels and an internal list to keep track of the elements of the buffer.
This thread runs through a loop forever, taking the current state of the
buffer as its argument, and waiting on the channels in the body of its
code. It is tail-recursive.

Note the absolute lack of any code to deal with synchronization or locking.
You might notice the inefficient list mechanism (that append is going to
get costly in terms of new cons cells), and think that this is only safe
code because of the inefficient functional programming style. In fact,
that's not true! The 'loop' function could desctructively modify a list
(or array) to which only it had access. There would still be no potential
for sync'ing problems, since only that one thread has access to the
internal state of the buffer, and it automatically syncs on the sends and
receieves. It's only handling one request at a time, automatically, so it
can do whatever it wants during that time. It could even be safely
rewritten as a do loop.

What I find so enormously powerful and cool about this is that the
programmer doesn't need to worry about the run-time behavior of the system
at all. At all. The lexical structure of the system captures the run-time
behavior -- if there is mutating code inside of the 'loop' function, you
don't have to look at every other function in the file to see if it is
maybe modifying that same structure. This is akin to the power of lexical
scoping over global scoping. I have never seen concurrent code which lets
me ignore so much.

This really just scratches the surface of Concurrent ML (and doesn't touch
on the higher-level means of abstraction). But I hope it gives a sense of
how worthwhile a language it is to learn from.


3) Issues

I think that channels themselves would be fairly easy to implement on top
of the usual operating system threading constructs (without needing a
thread for each one). However, the style which this message-passing model
promotes can easily lead to a *lot* of threads -- if you have a lot of
buffers, and each of them has its own thread, things can get out of hand
quickly. I believe that Mr. Reppy has explored these very issues in his
implementation of CML.

http://cm.bell-labs.com/cm/cs/who/jhr/sml/cml/

Insofar as I have time (which, realistically, I don't) I would love nothing
more than to play around with implementing CML'ish concurrency constructs
in a new version of Lisp like Arc.



No comments: