New Page 1

SEG-2106 "Software Construction"

Implementation Design Choices

(based on the book by Braek et al.: "Engineering Real Time Systems", Chapter 9 - this book considers co-design : system design choices concerning hardware and software design)

1. What is "Implementation Design" ?

Purpose: define the correspondance between the functional specification and the system implementation (to be realized in hardware and software)
Work to be done: make the necessary decisions for defining the properties of the implementation and provide detailed enough documentation.
Result: Description of the implementation design which explains how the abstract functions are to be realized.

The scope of implementation design is shown on the following diagram:

Example of the Access Control System - Question: "Which part should be implemented in hardware, which in software ?"

The following two diagrams show (a) the simple case (where the implementation has the same structure as the functional specification), and (b) the case where the design of the implementation leads us to revise the SDL specification (called "SDL description" of the system) in order to include functional aspects that are pertinant for the implementation or to adapt the design to the implementation constraints (in particular, the non-functional requirements).

Steps during the implementation design phase

Choice ("trade-off") between hardware and software
Definition of the hardware architecture
Definition of the software architecture
Revision and refinement of the functional specification

2. Aspects that are ignored by the functional specification

There are two reasons for differences between the functional specification and the description of the implementation elaborated during the implementation design phase:

The inherent concepts of the language used to describe the functional specification do not always correspond to the concepts used for the implementation of the corresponding components in the implemented system. It is therefore necessary to think about the question of how the concepts used for describing the functional specification (for instance, UML State machines and message passing mechanisms possibly between different computers) can be realized in the implementation. This is not further discussed here.
Certain aspects of the system to be constructed are often ignored by the functional specification. These aspects must be considered during the implementation design phase, in particular, the following:

2.1 Execution time

Note: In SDL, the time required for the execution of the transitions and the propagatio delay for messages is ignored; these aspects must be dealt with in design of the implementation.
The non-functional requirements normally mention the following two aspects:
- response time of the system (this is the delay between the production of a request and the reception of the response).
- throughput (number of request handled per second).
The design of the implementation must find hardware/software solutions for these aspects.
The interface between hardware and software components require special attention: The software must be fast enough to handle the signals and interrupts that occur at the hardware interfaces. This is the role of "drivers" (see the processes IO-Channel in Figure 10.1 below; they are concurrent software processes that execute within the multiprogramming environment provided by the Concurrency Support of the operating system).
Estimating the performance of the planned implementation: It is important to estimate the performance of the planned implementation before it is built. It is also important to consider that, for a system supporting many users, the response time depends on the load (throughput) of the system, that is, the more user requests arrive, the slower will be the response time; at maximum throughput the system will be completely overloaded and the response time will be inacceptable. In order to obtain an estimate of the performance, one needs the following:
- Estimation of a realistic workload: (a) frequency of service request; (b) relative frequency of different types of service request which may have different requirements on the system ressources (CPU, disk access, communication needs). Consider average load, peak load. So-called benchmarks are scenarios of service requests that are used to compare the performance of different existing systems; could be applied to a prototype of the planned system.
- Estimation of ressources required for each type of service request: This may be an estimation of number of CPU instructions required to execute a simple service request; knowing the clock cycle of the computer, this gives an estimation of the CPU time required for performing the service request. This may involve estimating the number of times a loop will be executed. The estimation may be performed at a higher level of abstraction: e.g. number of SDL transitions to be executed in the function specification, and estimating the average CPU time required for performing one SDL transition.
- Methods for estimating performance based on given workload and resource requirements: The following methods are most commonly used:
  - Straitforward reasoning: An example is given in the book by Braek et al.:
    - to determine the number of SDL transitions that are involved in the preparation of a response to the request,
    - estimate the CPU time required for the execution of an average SDL transition,
    - estimate the fraction of time that the CPU is allocated to the task in question (among all the tasks that must be performed concurrently).
    - If one wants to estimate the throughput offered by a given implementatio design, one has to do the above estimation for all types of requests and one has to estimate the relative frequency of the different types of requests.
  - Queuing models and simulation models : see earlier course notes
Note: The performance evaluation for the preliminary system architecture may lead to the design of a revised architecture in order to satisfy the performance requirements. An example is discussed in the book by Braek et al.: initial hardware architecture, revised hardware architecture including Cluster units, revised implementation description in SDL (top-level view, detailed block structure including functional blocks for communication protocols and validation (DB) within clusters).
Hard real-time systems: In the case of a system with hard real-time constraints (a so-called "hard real-time systems"), one has to be more precise: instead of making estimations, one has to demonstrate that the execution time is within a given bound. This is much more difficult to do; for example, the Java Virtual Machine is not any more appropriate for the execution of such tasks, since, for instance, the automatic memory recovery procedure may interrupt the normal execution of a Thread and will therefore lengthen the exeuction time by an undetermined amount of time. Note: Special virtual machines suitable for hard real-time systems have been developed

2.2 Imperfections of hardware and software: errors, failures, background noise, etc.

The functional specification normally does not consider reading or writing errors in primary or secondary storage, communication errors or failures of components due to accidents or aging. For instance, what happens if a response does not arrive (because of a message loss or a failure of the server) ? - Here are examples of requirements concerning the impact of errors (from the book by Braek et al.)
If such errors or failures must be considered, it is important to determine how such failures can be detected and what exceptional error handling is to be foreseen. In this context, one distinguishes between the following three actions:
- fault detection (often the word "error" is used; the expression "an error is detected" means that a deviation from the requirements has been detected)
- fault localization (diagnostics), sometimes called fault isolation: this means finding the cause of the error
- fault recovery (of possible), that is, get the system organized as to perform the required task in spite of the fault. In the case of a hardware fault, this involves either the repair of the faulty component or its replacement (which takes time), or redundant hardware which could take over.
  - Note: In the case of highly reliable systems, one often uses triple redundancy, that is, three identical components that perform the task in parallel. At the end of an operation, a comparison unit compares the three results obtained and if they are not identical one can identify the faulty component (under the assumption that only a single unit fails at a time). Then one uses the result of the other two components (the system is fault-tolerant for a single fault) and tries to replace the faulty component as fast as possible (before the next component may fail).
Here are some other important concepts:
- Fault tolerance (see above)
- Fault resilience: A fault may have an impact on the performance of the system, but the important functions will be continued to be offered.
- Fail-safeness: A system is fail-safe if it remains safe in the case of a fault. Safeness means that no catastrophic error will occur. A "catastrophe" is a situation that should never occur, such as the explosion of a nuclear reactor or the opening of the doors of a train in motion.

2.3 Physical distribution

Advantages of an implementation distributed over several computers:
- The failure of one computer does not necessarily affect, stop the system as a whole. The isolation of the fault (on to a given computer) is easily possible.
- Increase of the throughput (by performing operations in parallel).
Disavantages:
- The communication between the computers imply additional delays.
- Communication protocols are necessary for the recuperation of transmission errors and other communication problems.

2.4 Limited ressources

The ressources used by the system are limited. For example: the number of processors on a computer, memory space, maximum length of buffers for the reception of messages.
Note: Limited-size buffers may easily lead to deadlocks (when a producer cannot sent a message because the reception buffer is full); one must be very careful during the design of systems that communicate by messages.

2.5 Security

2.6 Requirements concerning system operations and maintenance

2.7 Development costs and hardware production cost

Note: The development cost occurs only once; the hardware production costs depends on the number of units to be constructed.

2.8 Reutilization of existing components

This also means that the functionality of the existing component should not be duplicated by the new parts to be constructed.
Clearly, the new parts should foresee appropriate interfaces for the communication with the existing components.

The application of these design considerations in the case of the example Access Control System is discussed in the book by Braek at al. in Section 9.4.

Note: The book by Braek et al. uses certain graphical notations for describing hardware and software architectures. These notations are not generally accepted in the community; you do not need to learn them.

Initially written: March 22, 2003; revised; translated into English: March 2008; last revision: 23 March 2015