CS[ Bristol CS | Index ]

Deterministic Communications

What does it mean for concurrent processes to communicate in a deterministic way? It means that timing restrictions of some kind have to be put on the communications, particularly where message streams are merged. These restrictions tend to reduce concurrency, so it is important to find a minimal set of restrictions which keep concurrency at a maximum.

Here, we present a set of timing restrictions, based on the notion of a reactive system, which meets these requirements. The restrictions are not based on real wall-clock time, because maintaining and distributing a universal notion of time is difficult, and because it would be likely to lead to a system which was only approximately determinism. [reference Lamport etc?]

Instead, the restrictions are based on a logical notion of time which puts a total linear ordering on all the messages in the system. This ordering allows concurrency, and indeed allows messages to be processed out of order. The ordering only comes into play where message streams are merged.

Hierarchical Time

Consider an arbitrary network of processes and uni-directional channels along which messages can be sent. (The channels are uni-directional for data messages, not necessarily for acknowledgements etc.) Each message has a timestamp associated with it. These timestamps need not necessarily be implemented directly; they just establish the theoretical requirements for restricting the communications in the system.

The first assumption that has to be made about the system is that there is a single incoming stream of messages from the outside world. This may be achieved even in multi-user systems if incoming streams are merged by some external real-time mechanism before entering the system. These external messages (events) can be given timestamps 1, 2, 3, 4, ...

The next assumption is that each process is sequential, at least to the extent that the interleaving of its receive and send operations can be determined. A process is assumed to be reactive; when it sends messages, these are taken to be triggered by the most recently received message. If a process receives message 3, say, then the messages it sends before the next receive are given timestamps 3.1, 3.2, 3.3, 3.4, ... A timestamp may have many components, and each process which a series of triggered messages passes through adds an extra component. Thus if a process receives a message 3.7.2, then it responds with 3.7.2.1, 3.7.2.2, ... Timestamps are linearly ordered lexicographically.

There is one timestamp per message. If a message is forwarded, this is regarded as a new message with a new timestamp containing a copy of the data in the old message. Also, a message is only ever delivered to one place. If a message is to be distributed, it is sent as a sequence of messages containing identical data, but each with its own timestamp.

What about the initial messages which a process sends before it does its first receive? [picture?] This can be dealt with if we assume that each process is created by a parent process. The situation can be treated as if the new process is woken by a start-up message from its parent at creation time.

Similarly, any other dynamic reconfigurations of the network must be assumed to be carried out in some controlled way. A process cannot suddenly start communicating with another, unconnected process. It can only establish a new channel via the ones it already has, (eg by allowing some mechanism for sending channels along channels). [Only really consider static nets here]

The Deterministic Restrictions

Now we can describe the restrictions that need to placed on the system to make it deterministic. For each process which has more than one input channel, the messages on these channels need to be merged in timestamp order before the process receives them. However, messages must not be delayed indefinitely when progress is possible. [Actually, a process can have multiple inputs, as long as it has no select facility. Must it be sequential?]

For example, suppose that a process has two input channels A and B, and assume that a message has arrived at the merge point along A. It is too restrictive to wait for a message on B to compare the timestamps to see which of the two messages to forward. The message on A must be forwarded as soon as it is possible to determine, from a complete knowledge of the messages currently in the system and the communication state of all the processes, that it is not possible for an earlier message (in the timestamp sense) to arrive on channel B.

The Behaviour of Processes

The normal behaviour of a process is this. It receives a message with a timestamp 3.7.2 say. It sets its local clock by adding one more component to give 3.7.2.1. Then, each time it sends a message, it attaches the current local time as a timestamp, then increments the last component.

There is an exception to this behaviour. It is possible in this setting for a process to receive a message which is earlier than its current clock time. For example, suppose the current clock time is 3.7.2.5. Now, no message with a timestamp earlier than 3.7.2 can arrive, because of the merging rule. However, in a process network with cycles, it is possible for the process to receive a message which was triggered by one of its own earlier messages, eg with a timestamp of 3.7.2.3..... (triggered by 3.7.2.3 which the process sent earlier).

You might think that the process should delay sending 3.7.2.4 until all the messages triggered by 3.7.2.3 have been finished. This would be too restrictive, reducing concurrency to the point where essentially only one message is processed at once. Thus instead, we introduce the rule that if a process receives a message which is older than its current time, it does not reset its clock . Thus, on receipt of 3.7.2.3...., the above example process continues with 3.7.2.6 etc.

Acknowledgements

Suppose you have a cyclic group of processes A -> B -> C -> A, and process A has an input message stream from a process D outside the group, as well as the one from C. Consider the merging problem at A. When a message m from D is received by A, it triggers messages which may go round the loop and come back to A. While any such triggered message remains in the cycle, it is possible for A to receive, from C, a message triggered by m. Thus the next message from D cannot be accepted until all activity associated with m has died away.

In order to detect the cessation of activity, we introduce a system of acknowledgements. We assume that for each message sent from P to Q, an acknowledgement from Q to P will be received at some later time. These acknowledgement are invisible to the processes themselves; they are for gathering information behind the scenes.

The acknowledgement for a message m indicates not just that m has been delivered and received, but that all messages triggered by m, directly or indirectly, have been delivered and received, ie all activity associated with m has ceased.

When process P receives a new message m, ie one which is newer than its own local clock, let messages m1, m2, ... mn be the ones that P sends out in direct response (up until the clock is next reset, ie ignoring the receipt of old messages). P keeps track of acknowledgements for these messages, and when all the acknowledgements have been received, P sends an acknowledgement for message m back to the process that sent it.

When P receives an old message, one triggered by a previous message of its own, it sends back an acknowledgement immediately.

It should be clear that this way of handling acknowledgements has exactly the right effect; an acknowledgement is received once all activity associated with a message has ceased.

How to do the Merging

Suppose a process C has input message streams from A and B. How are the these two streams merged? If there is at least one outstanding message on each stream, their timestamps are compared and the earlier message is forwarded to process C for receipt.

If a message m has arrived from A but not from B, then the merger must determine the status of B to see if an earlier message than m could come along that route. What's more, if B's status is not favourable, the merger must track B's progress until either a message arrives from B or B's status indicates that no message earlier than M is possible. This could either be by B sending periodic messages, or by A sending repeated requests for information.

The status of B is almost completely determined by its clock, which usually indicates the timestamp of the next message which B will send. However, once B has stopped responding to (say) 3.7.2 and has sent back an acknowledgment for it, it should set its clock to 3.7.3 to indicate that it cannot send an earlier message than this.

[Further, if it determines from its input channels that no message earlier than 3.7.4 can arrive, it could set its clock to 3.7.4.1; is this necessary?] [If a process knows that it is not in a cycle, need it wait for an acknowledgement? Should it wait?] [Can we get a better theory of the acyclic problem?] [Note: although the comms system may need to be active when a process isn't, it does not need to be active simultaneously with the process, so everything can be hidden in sequential-looking send/receive calls.] [NB: a merger may or may not add a marker to a message to say which channel it arrived on].

Examples

[Examples to show how concurrency might still be achieved in this setting, and what sort of restrictions might happen, eg cyclic. Radio buttons?].

Implementation

The next section(s) describe how this restriction might be imposed in practice.


Dr. Ian Holyer, ian@cs.bris.ac.uk. Last modified on Tuesday 26 August 1997 at 11:33.