Composition of several transition systems: Constructing a global model by "reachability analysis"

Here we consider that a system consists of several components and each component is described as a transition system. The question that arises is how can the behavior of the composed system be characterized. This can be answered by reachability analysis, finding out what global system states can be reached by the system from its initial state ( A see also CourseNotes3, starting slide 14).

In the following we assume that all interactions between the components are visible (in addition to the interactions of the components with the system environment). What happens if some of these internal interactions are hidded from the system environment is also very important (but this is not discussed in this chapter).

Before we can perform reachability analysis, we have to determine how we can model the system as a whole. The idea is to model it again as a transition system. The transitions of the global system correspond to the transitions of the components, and the state space of the global system is determined by the state spaces of the components. Let us suppose that the global system consists out of n components Ci (i = 1, ..., n) and that the state spaces of these components are Si (i = 1, ..., n). If there are no messages in transit between the components, then the state space of the global system S is given by the Cartesian project of the component state spaces, that is, S = S1 x S2 x ... x Sn. This means, that a particular state of the global system has the form <s1, s2, ..., sn> where si is a state in Si (for i = 1, ..., n). If in a given state, the global system in general also contains some messages in transit, then the state of the message transmission medium (in terms of what messages are in transit to which destination) must also be taken into account. Let us write SM for the state space of this medium, and sM for a particular state of messages in transit, then the global system state space would be S = S1 x S2 x ... x Sn x SM, and a particular state would be of the form <s1, s2, ..., sn, SM> .

What are the transitions of the overall system depends on the communication paradigm used for inter-component interactions within the modeling paradigm. In the following, we first discuss this question for finite state transition models, and consider later the case of extended models and the general case.

Reachability analysis for finite state transition models

As explained at the beginning of the SEG-2106 course note page, the following communication paradigms are the following: procedure calls, shared variables, rendezvous communication and communication with message passing. For finite state transition models, only the rendezvous and message passing interactions are considered. For message passing, there are two variants:

Directly coupled input-output interactions (synchronous, sometimes called message passing without queuing). In this case, the transition of one component that sends the message is performed at the same time as a transition of the destination component that receives (and consumes) the message. These two transitions are considered an non-interruptable action and represent a single transition of the overall system. This was the communication paradigm I used 1978 in the paper Finite State Description of Communication Protocols before the IOA formalism was invented.
Asynchronous message passing. The sending and receiving actions are separate events taking place at different times. When a message is sent and not yet received, it is considered to be in transit, possibly already stored in an input queue of the destination process. The sending transition and the receiving transition represent two separate transitions of the overall system.

These different communication paradigms are used as follows:

The communication between components modelled as LTS are normally rendezvous communication.
The communication between components modelled as IOA (or IOTS, Input-Output Transition Systems) are normally directly coupled input-output interactions.
The communication between components modelled as FSM is normally asynchonous message passing. Please note:
- The resulting composed system can not in general be described as a single FSM since the number of inputs in the queues may grow to infinity.
- The precise properties of the composed system depends on the queuing model used. The most popular queuing models are:
  - Single input queue for each component: inputs from all other system components (or from the system environment) are entered into the same input queue.
  - Multiple queues, one for each source. This gives more freedom to the receiving FSM module; if several queues contain an input, the FSM specification does not define which one should be consumed first (there is a higher degree of non-determinism).
  - An input pool where received messages are stored until they will be consumed whenever the state machine is ready to receive it. Note that this scheme avoid many non-specified receptions when different messages arrive from different components and their order of arrival cannot be predicted.
  - In all these cases, there are the following variants of message transmission:
    - message transmission is reliable (no loss, nor error) or unreliable (losses may occur)
    - for messages coming from the same source, message reception is either in order or not (may arrive in a different order than they were sent)

Reachability analysis consists of starting in the initial state of the global system and determining all global system states that can be reached from the initial state. This is done by exhaustively considering all transtions of the global system that are possible from each state of the global system that is known to be reachable. When all transitions have been explored the set of all reachable global system states is known. The set of reachable global system states is a subset of the state space of the global system, often a relatively small subset.

Reachability analysis for LTS

For simplifying the exposition, we assume in the following that the global system consists of two components C1 and C2.

Before we can perform a reachability analysis, we have to determine for each transition of C1 what are the transitions of C2 with which it is to be executed in rendezvous. There are different ways for specifying which pair of transitions must be performed in rendezvous. A simple approach is to list these pairs explicitely << see for instance in the work by André Arnold; for instance his book Finite Transition Systems: Semantics of Communicating Systems >>, while the others can be executed by each component separately. Another approach requires matching identically named transitions. This was adopted in languages such as CSP, CCS and LOTOS. In the following, we will assume that transitions of different components that have the same name must be executed in rendezvous.

To simplify the following formalism, we introduce the concept of the alphabet of a state transition model. This is the set of all transition names of the model. For LTS, the name of the transition is the label of the transition. We write s1 -- a --> s1' for a transition where s1 and s1' are states and a is a transition name in the alphabet.

The composition of two LTS models, C1 and C2, with alphabet A1 and A2, respectively, can be modeled by an LTS that represents the behavior of these two LTS and their interactions, in the following called C. The model of C is the following LTS:

The states of C are of the form <s1, s2>, where s1 is a state of C1 and s2 is a state of C2.
The initial state of C is the set of states <s1-initial, s2-initial> where si-initial are initial states of Ci (i = 1, 2).
The transitions of C from a state <s1, s2> are the following:
- if s1 -- a --> s1' is a transition in C1 and s2 -- a --> s2' is a transition in C2 : then <s1, s2> -- a --> <s1', s2'> // this is the rendezvous interaction a)
- if s1 -- a --> s1' is a transition in C1 and a is not in the alphabet A2 : then <s1, s2> -- a --> <s1', s2> (transition a is done by C1 alone)
- similarly, for a transition by C2 alone

Reachability analysis for IOA

Reachability analysis for IOA is in some sense similar as for LTS. Here transitions have the name !m (output of the message m) or ?m (input of the message m). Transitions that are named with the same message are performed jointly.

However, there is one very important difference: Whether an output transition is performed is decided by the outputting component alone. If the component that should receive the message is in a state where it does not have an input transition for this message, one encounters a so-called non-specified reception situation (which can be interpreted in different ways - see discussion). Note that similar situations for interacting LTS sometimes represent deadlocks (no transition is possible for the global system).

If Ii is the set of input messages of a component Ci, and Oi is the set of its output messages, we assume that Ii intersection Oi is empty and we define the alphabet Ai = Ii union Oi. In a composition, a given message may be output only from one component; but it may be input to several components (for instance, it may be output generated by the environment and be input to several components). We assume here that all messages between two components are also visible by the environment; this means that an interaction that is output from one component is also output from the composition to the environment. Note: A hiding operation for the inter-component messages may be performed after the reachability analysis.

The composition of two IOA models, C1 and C2, with alphabet A1 = I1 union O1, and A2 = I2 union O2, respectively, can be modeled by a global IOA that represents the behavior of these two component IOA and their interactions, in the following called C. The model of C is the following IOA (rules (1) and (2) are the same as for LTS):

The states of C are of the form <s1, s2>, where s1 is a state of C1 and s2 is a state of C2.
The initial state of C is the set of states <s1-initial, s2-initial> where si-initial are initial states of Ci (i = 1, 2).
The transitions of C from a state <s1, s2> are the following:
- for m in O1 and in I2
  - if there are transitions s1 -- !m --> s1' in C1 and s2 -- ?m --> s2' in C2 : then <s1, s2> -- !m --> <s1', s2'> (input and output transitions executed jointly; this is output to the environment)
  - if there is a transition s1 -- !m --> s1' in C1, but no transition s2 -- ?m --> to any state in C2 : non-specified reception (design error)
  - if there is no transition s1 -- !m --> to any state in C1, but there is the transition s2 -- ?m --> s2' in C2 : no transition in composition (maybe the transition of C2 will never be executed - this depends on the other states of C1 that may occur jointly with s2)
- for m in A1 and m not in A2 (C2 is not involved in this interaction - direct interaction between C1 with the environment)
  - if there is a transition s1 -- m --> s1' : then <s1, s2> -- m --> <s1', s2> (transition by C1 alone)
- similarly as above, for C1 and C2 interchanged
- for m in I1 and in I2
  - if there are transitions s1 -- ?m --> s1' in C1 and s2 -- ?m --> s2' in C2 : then <s1, s2> -- ?m --> <s1', s2'> (joint transition when input m comes from environment)
  - if there is a transition s1 -- ?i --> s1' in C1 , but no transition s2 -- ?i --> to any state in C2 : possibly a design error, but this depends on the behavior of the environment (non-specified reception for C2 if input comes from environment when C2 is in this state); otherwise the transition of C1 will never be executed. Similarly, for C1 and C2 interchanged.
- for m in O1 and in O2 : this is not allowed according to what is said above - miss-match of the interfaces between the two components - conflicting output transitions.

Reachability analysis for FSM

Although the difference between the IOA model and the FSM model is the queuing of messages in the case of communicating FSM, this makes a huge difference for reachability analysis, because the state of the messages in transit (or in the queues) must be taken into account, and the sending and receiving transitions of the components give rise to separate transitions of the composed system. Note: We assume in the following that a transition involving input and output is cut into two transitions with an intermediate state: an input transition followed by an output transition.

The states of C are of the form < q21, s1, q12, s2> , where s1 is a state of C1, s2 is a state of C2, and qij is queue from Ci to Cj
The initial state of C is the set of states is <empty, s1-initial, empty, s2-initial> where si-initial are initial states of Ci (i = 1, 2).
The transitions of C from a state <q21, s1, q12, s2> are the following:
- for any sending transition of C1: s1 -- !i --> s1' with i in O1 : there is the global transition <q21, s1, q12, s2> --!i --> <q21, s1', i.q12, s2>
- if q12 = q12' . i (interaction i is in front of the queue to be received by C2; q12' are the elements behind i in the queue - this may be empty) : then if there is a transition s2 -- ?i --> s2' then there is the global transition <q21, s1, q12, s2> -- ?i --> <q21, s1, q12', s2'>; otherwise there is a non-specified reception (design error)
- similarly, for C1 and C2 interchanged.

Examples of reachability analysis

See for example Exercise-2 from previous years. See also the example of the alternating bit protocol in my 1978 paper Finite State Description of Communication Protocol.

Reachability analysis for extended state transition models

Reachability anaysis may also be performed when the components are specified as extended state machines. For the "finite state aspect" of the specifications, the reachability analysis approach discussed above can be used, while assertion proving may be used to analyse the possible values of state variables and output parameter values that may be reached.

Examples: The alternating bit protocol is discussed in my paper from 1977. The example of a queue with users is an example without a "major state" (only variables).

Results of reachability analysis

Reachability analysis allows us to obtain a description of the global behavior of the system that is defined as the composition of several components, each with its specified behavior. For this global behavior, we are interested in two aspects:

General properties: These are properties that, in general, should be satisfied by any global system behavior. These properties may include the following:
- Absense of Deadlock (accessible state without any transition enabled): For a system of communicating FSM, if all queues are empty, this is a “global deadlock”; it could be the desired final state. If one of the queues contains an input, but the receiving FSM does not have a transition to consume it, this falls under the category of “unspecified receptions” (see below). One calls “local deadlock” a situation where certain FSMs are blocked forever, but other FSMs continue operations. - Similar for composed LTS.
- Absense of non-executable transitions: a transition of a given state machine is called “non-executable” if there is no reachable global state in which it could be taken. The system would have the same behavior if this transition was removed from behavior of that component (making it more abstract - more situations of undefined input).
- Absense of non-specified receptions: If a global state is accessible where one state machine receives a particular input in a state where no transition is specified for that input (more detailed discussion). Note: This cannot happen for communicating LTS because they communicate by rendezvous; however, deadlock may occur if one side is not ready to accept the only interaction that the other side can do.
- Bounded queues: If the size of the queues in all reachable global states is bounded by some integer, then the number of reachable states is finite. The question whether the queues are bounded for a given set of communicating FSMs is in general non-decidable.
- Self-stabilization: A system of communicating state machines is called “self-stabilizing” if for all (arbitrary) initial states, the system will enter, after some finite time, the normal mode of operation. (An example is given in [Boch 78]; recent results in this area have been presented by Gouda).
- Stable states (called “adjoint states” in [Boch 78]): A state vector (representing the states of all component machines) is called “stable” if for all reachable global states corresponding to this vector, all queues are empty.
Specific properties: In our survey paper of 1980 on communcation protocol design, we make the distinction between general properties (see above) and specific properties which depend on the particuar system. Usually, these specific properties are defined by the system requirements. The specific properties represent a more abstract view of the system. When one checks that the analysed model (the "more detailed model") satisfies the specific properties of the more abstract model, one talks about Model Checking. Note, however, that Model Checking also includes the checking of the general properties above. Note: Some people say that the abstract model is checked to see whether it describes the detailed model; other people say that the detailed model is checked to see whether it satisfies the requirements of the abstract model). Anyway, the abstract model must be defined in some suitable language. Various approaches can be used to describe the abstract model:
- state machine models (involving only the externally visible interactions)
- assertions associated with the variables and interaction parameters of the analysed model (the "more detailed model")
- Temporal logic (linear time, or branching time Temporal logic)

Difficulties of reachability analysis

The main difficulty of reachability is the so-called state space explosion, meaning that the number of reachable states grows very large. There are three reasons why often the number of reachable states is too big to be manageable, even with powerful tools:

The number of states of each sub-system is very large because one has introduced supplementary state variables (extended models) for capturing the behavior of the sub-system, e.g. sequence numbers in protocols.
During reachability analysis, one considers the Cartesian product of several sub-systems, which leads to a multiplication of the number of states of all sub-system.
An additional “blow-up” normally due to the varying number of messages in transit (even if this number is often artificially bounded during analysis).

One uses normally automated tools for doing this analysis. Many automatic reachability analysis tools exist (see for instance [Boch 90g]). Some tools combine this with model checking (verifying that the composed system satisfies certain given temporal logic properties). Holzmann proposed a method for minimizing the required memory for remembering which global state has already been analyzed (using only one bit per state), but also reducing the amount of information which is available. Analysis methods that do not construct the whole reachable global state space, but only travel through the whole space (for instance through a depth-first search) and simultaneously check the interesting properties are called “on-the-fly” analysis methods. In opposition, traditional methods construct the whole state space and then check the interesting properties.

The number of reachable states in not only often very large, in many cases it is even infinite. Consider for instance an FSM that has a spontaneous transition that sends a message to its partner and loops back to the same state. In this case, the number of message that may be in transit at any given time is not bounded. Therefore the number of global reachable states is infinite. Clearly such spaces cannot be covered completely. The question is: Is it necessary to explore global states with an arbitrary number of messages in transit ? - In practice, one usually explores only states with a number of messages in transit within a bound. But it is not clear what such a bound should be. Sometimes the reachability analysis shows (if a complete exploration has been done) that the number of messages in transit from a given source component to some given destination component is always less than same bound. However, it has been shown that the problem of deciding whether two given communication state machines may get into a deadlock state is undecidable. Also the question whether the number of messages in transit remains bounded is undecidable.

The following approaches have been proposed to reduce the effect of state space explosion:

Problem decomposition (e.g. analyze different operational phases separately)
Reduced reachability analysis: The idea is to reduce the number of different interleavings of concurrent activities that must be considered, but taking care that one still maintains the guarantee that all design errors will be detected
Ad-hoc analytical methods: Ad-hoc methods may be used for showing that certain types of properties are satisfied for certain types of specifications. They often depend on the particular form of the behavior specification.
Analysis using assertions (in the case of extended state machines): The idea is to treat the properties related to the variables using program proof methods involving assertions and/or invariants. See for example in my paper from 1977.
Incomplete analysis: For instance
- partial exploration of the reachable state space (e.g. random walk [Holz 88]);
- statistical exploration corresponding to the probabilities of the real environment;
- exploration by simulation guided by the user (this is always possible as long as the specification is executable)

Tools for reachability analysis

The most popular tools helping in the analysis of distributed systems are probably the following. For more details see Project Page:

SPIN (for communicating FSM (extended models, including the option of direct coupling (zero queue length), linear temporal logic for defining specific properties)
UPPAAL (for communicating IOA, including possibly hard real-time constraints, branching-time temporal logic for defining specific properties)
LTSA (for communicating LTS, specific properties can be defined by state machine models)

Last update: January 27, 2013