Previous | Table of Contents | Next |
The requirements of the Fault Tolerant CORBA specification are stated below.
CORBA Object Model
For object groups with the infrastructure-controlled (CONS_INF_CTRL) Consistency Style (
Section 23.3.2.3, “ConsistencyStyle,? on page 23-34), the specification requires
that the CORBA object model is preserved. Even though an object is replicated to provide protection against faults, at all
times its behavior shall appear to be the behavior of a single object. In particular, a replicated object can act as a client
or a server or both, and can invoke another replicated object, regardless of the fault tolerance properties of the two object
groups.
CORBA Object Reference Model
The specification introduces three new special tagged components into the CORBA object reference model. The object group references
that are used for fault tolerance contain multiple profiles that contain these components. Even though an object group reference
contains such components in its profiles, an unreplicated object, hosted by an ORB that does not support fault tolerance,
can still use the reference to invoke the methods of the replicated object. Similarly, a replicated object can use the object
reference of an unreplicated object to invoke the methods of the unreplicated object.
Transparency to Replication and to Faults
Creating or deleting an object using a Generic Factory, and invoking a method of an object, appear the same for replicated
objects as for unreplicated objects. Similarly, the behavior of a replicated server object when invoked by a client object
appears the same whether or not faults occur, except perhaps for a transient delay if the primary member of a passively replicated
object becomes faulty.
No Single Point of Failure
The specification supports applications that need robust fault tolerance, including applications that require higher reliability
than can be provided by a single backup. The specification requires that there shall be no single points of failure.
Client Redirection
For a client and a replicated server, the specification defines an interoperable object group reference that allows the client
to connect to the server replicas, by connecting to an alternative server or through an alternative network, when a fault
in a server replica occurs. It defines an additional service context, in request messages, that allows a server to determine
if the object group reference for the server used by a client is obsolete. Transparency to the client application program
is provided, with minimal modifications to the client ORB and simple mechanisms in the server ORB. Typical applications include
desktop client access to enterprise servers.
Transparent Reinvocation
The specification introduces an additional service context in Request messages that ensures that, in the presence of faults,
a client can reinvoke a request on a replicated server and receive a reply to that request, without risk that the operation
will be performed more than once. Typical applications include desktop client access to e-commerce applications.
Infrastructure-Controlled Membership
The infrastructure-controlled (MEMB_INF_CTRL) Membership Style allows the application to direct the Replication Manager to
create an object group. The Replication Manager then invokes the factories at the different locations to create the object
replicas, and then add them to the group. The Replication Manager is responsible for creating the initial number of replicas
and for maintaining the minimum number of replicas, as specified by the fault tolerance properties for the group. Typical
applications include enterprise server applications, such as supply chain applications, and large-scale critical systems,
such as defense applications.
Application-Controlled Membership
The application-controlled (MEMB_APP_CTRL) Membership Style allows the application to create the members of an object group
and to direct the Replication Manager to add them to the group, or to direct the Replication Manager to create the members
of an object group and add them to the group. The application is responsible for maintaining the initial and minimum number
of replicas and the locations of the replicas, both initially and after faults. Application-controlled membership is particularly
important for applications whose different hosts have different capabilities, such as communication network applications.
Infrastructure-Controlled Consistency
The infrastructure-controlled (CONS_INF_CTRL) Consistency Style provides Strong Replica Consistency between the states of
the members of an object group. Strong Replica Consistency requires that, even in the presence of faults, as members of an
object group execute a sequence of methods invoked on the object group, the behavior is logically equivalent to that of a
single fault-free object processing the same sequence of method invocations. The Fault Tolerance Infrastructure provides logging,
checkpointing, activation, and recovery mechanisms to achieve Strong Replica Consistency. Strong Replica Consistency is particularly
important for financial applications and safety-critical applications, such as industrial process control and aircraft instrumentation.
Application-Controlled Consistency
The application-controlled (CONS_APP_CTRL) Consistency Style depends on application-specific mechanisms to ensure whatever
consistency is required for the members of an object group. Application-controlled consistency does not depend on the Fault
Tolerance Infrastructure to provide logging, checkpointing or recovery, and does not guarantee Strong Replica Consistency.
Typical applications might include telecommunications applications, and some embedded and real-time applications.
Passive Replication
The COLD_PASSIVE or WARM_PASSIVE Replication Styles require that, during fault-free operation, only one member of the object
group, the primary member, executes the methods invoked on the group. Periodically, the state of the primary member is recorded
in a log, together with the sequence of method invocations. In the presence of a fault, a backup member is promoted to be
the new primary member of the group. The state of the new primary is restored to the state of the old primary by reloading
its state from the log, followed by reapplying request messages recorded in the log. Passive replication is useful when the
cost of executing a method invocation is larger than the cost of transferring a state, and the time for recovery after a fault
is not constrained. Typical examples include enterprise inventory, logistics applications, and hospital record keeping.
Active Replication
The ACTIVE Replication Style requires that all of the members of an object group execute each invocation independently but
in the same order, so that they maintain exactly the same state and, in the event of a fault in one member, that the application
can continue with results from another member without waiting for fault detection and recovery. Even though each of the members
of the object group generates each request and each reply, the Message Handling Mechanism detects and suppresses duplicate
requests and replies, and delivers a single request or reply to the destination object(s). Active replication is useful when
the cost of transferring a state is larger than the cost of executing a method invocation, or when the time available for
recovery after a fault is tightly constrained. Typical examples include enterprise electronic trading applications and safety-critical
applications, such as hospital patient monitoring.
Fault Detection and Notification
The Fault Management interfaces allow detection of object crash faults, and provide fault notifications to the entities that
have registered for such notifications. Accuracy of fault detection is impossible in an asynchronous fault-tolerant distributed
system. Occasional false suspicions cause no harm in a robust fault-tolerant system. If a host crashes or an object hangs,
the Fault Detectors are required to detect the fault in a timely manner. However, a Fault Detector must not continuously suspect
all members of an object group, unless all of them are indeed faulty. Most fault-tolerant applications will use the Fault
Management interfaces, but they are particularly important for telecommunications, electric power distribution and other safety-critical
applications.
Logging and Recovery
The Logging and Recovery Mechanisms and Checkpointable and Updateable interfaces allow an application object to record its
state, for use in recovery after a fault or to initialize another replica. Following a fault that damages one or more, but
not all, of the members of an object group, recovery is required to ensure that the continued behavior of the replicated object
after recovery is the same as it would have been in the absence of the fault. A recovering member executes the same requests
in the same order, generates the same replies, invokes the same methods of other objects, and reaches the same internal state,
as if no fault had occurred. If a request is partially executed when a fault occurs, that request is fully executed, at the
same position in the sequence of messages, during recovery. If an object invokes a method of another object and then becomes
faulty, that method invocation must not be duplicated during recovery. Because some objects may be unreplicated, or may be
supported by ORBs that do not provide fault tolerance, or may use different Replication Styles, the recovery of each object
must be self-contained and must not depend on the cooperation of any other object. Applications that employ the infrastructure-controlled
Consistency Style will use these mechanisms and interfaces.