These are the things we discussed, in no particular order yet.
Threads. We know how to support concurrency with lightweight threads which are started up typically when output actions are executed.
Demands You can imagine a GUI system where every screen component has a thread demanding a separate output stream from the program. It is only separate independent outputs which produce multiple demands and thus real externally-visible concurrency.
Speculation A par construct or equivalent can be used to produce speculative threads which add non-essential concurrency which does not affect logical behaviour.
Heaps. We know how to support distribution, in principle, using several heaps with in-tables and out-tables supporting far pointers. The collection of heaps is completely equivalent to a single giant heap, providing consistency. The way in which the program is spread over the heaps, and the way in which nodes migrate, is controlled by programmer annotations and/or run time resource/efficiency/convenience decisions.
Histories Nodes have birth and death life-histories in the form of cooperation with the garbage collector. This allows, eg, a far pointer which dies to cause the decrementing of the reference count at its target. It allows file descriptors which die to cause the files to be closed. It allows a general cache mechanism in which a node which is referenced by a cache but nothing else is deleted, whereas a node which is still in use elsewhere is retained. Eg a CAF's value is retained while anyone is using it, but the value is GC'ed and recomputed when no-one refers to it any more.
Policy.There are questions to do with whether evaluation of a node happens before or after delivery to a new heap, whether nodes are given away or copied, how many nodes are sent at once in response to a request, and how wrappers are used, either semantics-free or strictness-increasing, to improve efficiency. We have mechanisms for all this, but need policy decisions to go with them.
Modularity. We have a more modular approach to state (see paper with Eleni) than usual. In the usual (Glasgow) approach, there is a single linear action sequence which deals with all I/O and all C procedure calls to external services. Internal (ie self-contained) stateful services such as arrays are always dealt with via references; there is an anonymous empty state representing "the heap", and any number of stateful items in it accessed via references. In our approach, an arbitrary procedural library or service, whether external or internal, can be given a functional interface. The features of our system are:
Encapsulation The new and open facilities have a conventional continuation like form rather than a monadic "GrGrEq" like form in order to give new and open types which encapsulate the new reference using Glasgow's second-order polymorphism trick. It is possible to adapt the do notation so that the actions inside the second argument form the remainder of the main sequence, ie the references remain in scope to the end of the do.
Scope With open, an action in ioacts inside the scope of the handle which attempts to re-open the same file is subject to a suitable locking policy (eg multiple readers or a single writer). An action after the open, when the handle is no longer is scope, acts on the final state causing the (lazy) action sequence on the file substate (defined by the ats in ioacts) to complete. For example, a lazy readfile followed by opening the same file for writing causes the entire file to be read into memory (or a copy of the old file contents made).
Threads Also, with an open representing output, a new thread would normally be established as part of the open call, to drive the output actions independently of the main I/O thread. This is where genuine concurrency comes from.
Prototyping. Apart from true concurrency, there is only one thing preventing us from producing a complete standard Haskell prototype of these facilities. That is the problem that the state type changes as substates are added; Haskell's type system can't cope. Nevertheless, we have a reasonable prototype which shows that the substate and state-splitting ideas are semantically sound.
Modularity One of the nice things about the approach is that we can, any time we want, add a new module which introduces a new state type. An internal state type such as arrays will allow run and new, whereas an external service will have open-like specialised services forcing it to be (eg) a substate of the main I/O state.
Algorithms The ability to create an internal substate as a substate of anything we like means that we can support such things as multi-array algorithms easily. Just create several subarrays as substates of a main state, all accessible via references.
Streams. It is important for state-based facilities to be as flexible as possible. We have mechanisms for flipping between a monadic view of a state, and a stream-processing view. To convert a stream function ys
This has to do with files, and with long-running programs, and with programs providing operating system services such as file and process managers, and with programs which cooperate with procedural shells/programs/services, and with programs which carry out compilations and then incoporate the resulting code into themselves.
Identifiers Files and anything else persistent and/or globally accessible should have unique universal identifiers. More importantly, different versions should have different identities, and old versions should be kept until not referenced any more. Check the Windows registry mechanism for version identities.
Undo. Keeping old versions allows us to have a clean way of killing out-of-control programs in a functional shell (file & process manager). The way to kill a program is to shut down all of its outputs, so that nothing is demanding anything from the program. There are then no threads executing the program, and no references to its resources, so the GC can collect its space. Shutting down the outputs means closing the output windows, and "deleting" the files it is producing output to. Since the goal is to leave the system in exactly the state it was in before running the program, this amounts to undoing the command which started the program up, and restoring the output files to their former contents. Husseyin is looking at undo; one problem is undoing old commands from the history list, without affecting subsequent commands.
Procedural programs. The question is, how to allow procedural programs to run in a clean functional shell environment. The answer is to run them in a protected mode. An old idea we had was to run a program in a current directory which is locked (actually a double layer is needed), and also to lock one's home directory below which the program has write access. In this setting, the program cannot use relative paths to go up out of the locked subdirectory, and can only use absolute paths to access read-only global constant files, eg its own installation or customisation files. Neil's newer idea for achieving something similar is to use chroot to set a programs root directory to a custom-built NFS filing system which we use to shadow the real one, providing whatever access policies we want to impose.
Security. What should a program be allowed to do? Problems with viruses etc stem largely from the fact that procedural programs are currently allowed to do anything they like. Java recognises this and does not allow applets (which are clearly foreign untrustable code) to do anything persistent, ie they cannot access your local files. This is crippling. Java does allow applications to do anything they like; this is too lenient because they too may be downloaded over the net and so be foreign untrustable code.
The solution appears to be to retrict access, probably for any program, to interactively specified resources (eg files given to or taken from the program), or to authenticated sets of resources (as with signed applets). There is a link with the idea of non-discretionary access control. (Trojan horses are harder; they require a Trusted Code Base on your local machine accessed eg via CTRL-ALT-DEL.)
Implementation of this is not too hard. The easiest implementation strategy is to run programs as functions, is specify at the start what resources are going in and coming out. For longer-running programs with dynamic (interactive) resource requirements, loading a new file (say) must consist of a pair of commands; one to the shell to send the file (which establishes the "time"), and one to the program to receive it. This fits neatly with the drag-and-drop paradigm, where a single gesture naturally involves both file manager and program.
GC Given that we have to keep old versions of things in case they are still referenced, how do we garbage collect? Locally, logging out or committing may remove references to old versions. Remotely, we have to keep track of references. One way is to insist that (in effect) each directory is managed by a long running server. Its heap contains nodes representing the persistent files it is managing, and remote heaps have far references to these nodes. Then GC can happen in "the normal way".
Heaps. This is to do with very long running programs rather than very long lasting data. If a group of processes in separate heaps are equivalent to a single concurrent program in a single heap, several questions arise, to which we probably don't have a clear answer at the moment. Do all the linked users and programs in existence form a single program? How does a new user/process/heap join the group? How does a process/heap shut down and leave the group? What if one process/heap goes wrong? The last question needs to be broken down some more.
Exceptions. Problems come in several levels. First, exceptions are semantically explainable, programmed, but rare events such as division by zero or taking the head of an empty list.
Errors. Second, errors are I/O problems which are more or less predictable, and although non-deterministic, can be blamed on the outside world (operating system).
Failures. Third, failures are unpredictable and potentially fatal problems such as running out of memory, network connections going down, or hardware crashes. These tend to "come out of nowhere", but we need to find ways to minimise their impact.
Bottom. Exceptions can be handled using something like Haskell's error function. This produces values which are semantically explainable as bottom, and which have an operational theory in terms of propagating upwars to the topmost level without being "caught". This allows reasoning about correctness of programs; it is possible to produce exception-free programs.
Threads. Exceptions in a concurrent setting have to be handled carefully to retain correctness reasoning. An exception propagates upwards until it reaches the source of demand, which is an output to a file or window, say. There must be some way of flagging such error outputs, without preventing other threads from continuing unaffected. In the case of speculative threads, once the exception has propagated to the top, the result of the evaluation is left as an error-node which may be activated later through some other demand, or may never be activated.
I/O. The Haskell I/O system provides a way of handling errors. An I/O operation which doesn't work raises an error, and that error propagates until explicitly caught (rather like Java's exception handling). However, this has a perfectly clean explanation in terms of "Either t1 t2" types which are hidden by an abstraction, the monadic one. There is no bottom involved.
Timeout. A timeout on, say, a remote fetch of a node needed for execution counts as a failure. It is interesting to look at the possible options:
What issues are raised by trying to apply the previous stuff to the delivery of continuous media? An explicit representation of demand can be used to request services and smaller blocks of data within services. Higher order ideas can be used to instantiate general services into specific delivery services, by-passing the original setup overheads to produce something very slimline and efficient.
Wantons. A wanton is a request for a stream or piece of data. Current experiments just ask for the data; later variants might specify a desired time or rate of delivery. Performance behaviour can then become explicit.
Channels. We want a high level of flexibility, like Henk's channels-along-channels, but via annotations rather than explicit distributed programming. Carter's work on treating functional processes and channels as a kind of process algebra might be relevant. There, processes are dynamic. As processes can only communicate with other processes that they know about (not using names), a new process can only communicate with its parent. If the parent can communicate with a third party, it can pass items straight from the child to the third party and/or vice versa. Thus replugging is needed for reasonable efficiency.
Services. Given modular monadic methods, we should be able to take any procedural thing, service, data structure or library and give it a semantically correct functional interface.
Distribution. We may want a distribution library; a collection of functions with possibly no semantic effect to allow efficient splitting of programs.
Exceptions. To extend what was said before about exceptions, it might be worth investigating David Turner's work on bottomless semantics. It is possible to restrict recursion to certain forms in order to eliminate infinite loops and partial data. I think it is still possible to reason about infinite data, but I'm not sure how reasoning about lazy interactive bahaviour works.
Rate. Can we build, out of the above ideas, a rate-based programming language?
Video. Take shared video on demand as an example. Wantons give rate and timing control. Shared global memory gives a possibility for detecting duplicate demands, perhaps leading to two people sharing a single video data stream if they start watching the same thing at about the same time.
Diary. The distributed diary example is an important one. You carry a most-up-to-date cache of the diary in your organiser. The diary forms a replicated database which may be updated by several people in the separate versions before the versions are merged (as with Access replicas or CVS). You create speculative entries which become confirmed at merge time. We want some higher order capture of the distributed multiple writers problem.
Examples. There may be other examples, such as safety critical software.
|Copyright © 1998 University of Bristol. All rights reserved.
Author: Ian Holyer
|Last modified: 21 Sep 1998 12:43
Authored in CALnet