Previous | Table of Contents | Next |
Value types are built from OMG IDL’s value type definitions. Their representation and encoding is defined in this section.
Value types may be used to transmit and encode complex state. The general approach is to support the transmission of the data
(state) and type information encoded as RepositoryIDs.
The loading (and possible transmission) of code is outside of the scope of the GIOP definition, but enough information is
carried to support it, via the CodeBase object.
The format makes a provision for the support of custom marshaling (i.e., the encoding and transmission of state using application-defined
code). Consistency between custom encoders and decoders is not ensured by the protocol.
The encoding supports all of the features of value types as well as supporting the “chunking? of value types. It does so in
a compact way.
At a high level the format can be described as the linearization of a graph. The graph is the depth-first exploration of the
transitive closure that starts at the top-level value object and follows its “reference to value objects? fields (an ordinary
remote reference is just written as an IOR). It is a recursive encoding similar to the one used for TypeCodes. An indirection
is used to point to a value that has already been encoded.
The data members are written beginning with the highest possible base type to the most derived type in the order of their
declaration.
2. Accordingly, in cases where encapsulated data holds data with natural alignment of greater than four octets, some processors
may need to copy the octet data before removing it from the encapsulation. For example, an appropriate way to deal with long
long discriminator type in an encapsulation for a union TypeCode is to encode the body of the encapsulation as if it was aligned
at the 8 byte boundary, and then copy the encoded value into the encapsulation. This may result in long long data values
inside the encapsulation being aligned on only a 4 byte boundary when viewed from outside the encapsulation.
15.3.4.1 Partial Type Information and Versioning
The format provides support for partial type information and versioning issues in the receiving context. However the encoding
has been designed so that this information is only required when “advanced features? such as truncation are used.
The presence (or absence) of type information and codebase URL information is indicated by flags within the <value_tag>, which
is a long in the range between 0x7fffff00 and 0x7fffffff inclusive. The last octet of this tag is interpreted as follows:
• The least significant bit (<value_tag> & 0x00000001) is the value 1 if a <codebase_URL> is present. If this bit is 0, no <codebase_URL> follows in the encoding. The <codebase_URL> is a blank-separated list of one or more URLs.
• The second and third least significant bits (<value_tag> & 0x00000006) are:
• the value 0 if no type information is present in the encoding. This indicates the actual parameter is the same type as the formal argument. • the value 2 if only a single repository id is present in the encoding, which indicates the most derived type of the actual parameter (which may be either the same type as the formal argument or one of its derived types). • the value 6 if the partial type information list of repository ids is present in the encoding as a list of repository ids.
When a list of RepositoryIDs is present, the encoding is a long specifying the number of RepositoryIDs, followed by the RepositoryIDs.
The first RepositoryID is the id for the most derived type of the value. If this type has any base types, the sending context
is responsible for listing the RepositoryIDs for all the base types to which it is safe to truncate the value passed. These
truncatable base types are listed in order, going up the derivation hierarchy. The sending context may choose to (but need
not) terminate the list at any point after it has sent a RepositoryID for a type well-known to the receiving context. A well-known
type is any of the following:
• a type that is a formal parameter, result of the method call, or exception, for which this GIOP message is being marshaled
• a base type of a well-known type
• a member type of a well-known type
• an element type of a well known type
For value types that have an RMI: RepositoryId, ORBs must include at least the most derived RepositoryId, in the value type
encoding.
For value types marshaled as abstract interfaces (see Section 15.3.7, “Abstract
Interfaces,? on page 15-30),
RepositoryId information must be included in the value type encoding.
If the receiving context needs more typing information than is contained in a GIOP message that contains a codebase URL information,
it can go back to the sending context and perform a lookup based on that RepositoryID to retrieve more typing information
(e.g., the type graph).
CORBA RepositoryIDs may contain standard version identification (major and minor version numbers or a hash code information).
The ORB run time may use this information to check whether the version of the value being transmitted is compatible with the
version expected. In the event of a version mismatch, the ORB may apply product-specific truncation/conversion rules (with
the help of a local interface repository or the SendingContext::RunTime service). For example, the Java serialization model
of truncation/conversion across versions can be supported. See the JDK 1.1 documentation for a detailed specification of this
model.
15.3.4.2 Example
The following examples demonstrate legal combinations of truncatability, actual parameter types and GIOP encodings. This is
not intended to be an exhaustive list of legal possibilities.
The following example uses valuetypes animal and horse, where horse is derived from animal. The actual parameters passed to
the specified operations are an_animal of runtime type animal and a_horse of runtime type horse.
The following combinations of truncatability, actual parameter types and GIOP encodings are legal.
1. If there is a single operation:
op1(in animal a);
a) If the type horse cannot be truncated to animal (i.e., horse is declared):
valuetype horse: animal ...
then the encoding is as shown below:
Actual Invocation Legal Encoding
op1(a_horse) 2 horse 6 1 horse
Note that if the type horse is not available to the receiver, then the receiver throws a demarshaling exception. b). If the
type horse can be truncated to animal (i.e., horse is declared): valuetype horse: truncatable animal ...
then the encoding is as shown below
Actual Invocation Legal Encoding
op1(a_horse) 6 2 horse animal
Note that if the type horse is not available to the receiver, then the receiver tries to truncate to animal.
c) Regardless of the truncation relationships, when the exact type of the formal argument is sent, then the encoding is as
shown below:
Actual Invocation Legal Encoding
op1(an_animal) 0 2 animal 6 1 animal
2. Given the additional operation:
op2(in horse h); (i.e., the sender knows that both types horse and animal and their derivation relationship are known to
the receiver) a). If the type horse cannot be truncated to animal (i.e., horse is declared):
valuetype horse: animal ...
then the encoding is as shown below:
Actual Invocation Legal Encoding
op2(a_horse) 2 horse 6 1 horse
Note that the demarshaling exception of case 1 will not occur, since horse is available to the receiver.
b). If the type horse can be truncated to animal (i.e., horse is declared):
valuetype horse: truncatable animal ...
then the encoding is as shown below:
Actual Invocation Legal Encoding
op2 (a_horse) 2 horse
6 1 horse
6 2 horse animal
Note that truncation will not occur, since horse is available to the receiver.
15.3.4.3 Scope of the Indirections
The special value 0xffffffff introduces an indirection (i.e., it directs the decoder to go somewhere else in the marshaling
buffer to find what it is looking for). This can be codebase URL information that has already been encoded, a RepositoryID
that has already been encoded, a list of repository IDs that has already been encoded, or another value object that is shared
in a graph. 0xffffffff is always followed by a long indicating where to go in the buffer. A repositoryID or URL, which is
the target of an indirection used for encoding a valuetype must have been introduced as the type or codebase information for
a valuetype.
It is not permissible for a repositoryID marshalled for some purpose other than as the type information of a valuetype to
use indirection to reference a previously marshaled value. The encoding used to indicate an indirection is the same as that
used for recursive TypeCodes (i.e., a 0xffffffff indirection marker followed by a long offset (in units of octets) from the
beginning of the long offset). As an example, this means that an offset of negative four (-4) is illegal, because it is self-indirecting
to its indirection marker. Indirections may refer to any preceding location in the GIOP message, including previous fragments
if fragmentation is used. This includes any previously marshaled parameters. Non-negative offsets are reserved for future
use. Indirections may not cross encapsulation boundaries.
Fragmentation support in GIOP versions 1.1, 1.2, and 1.3 introduces the possibility of a header for a FragmentMessage being
marshaled between the target of an indirection and the start of the encapsulation containing the indirection. The octets occupied
by any such headers are not included in the calculation of the offset value.
15.3.4.4 Null Values
All value types have a distinguished “null.? All null values are encoded by the <null_tag> (0x0). The CDR encoding of null
values includes no type information.
15.3.4.5 Other Encoding Information
A “new? value is coded as a value header followed by the value’s state. The header contains a tag and codebase URL information
if appropriate, followed by the RepositoryID and an octet flag of bits. Because the same RepositoryID (and codebase URL information)
could be repeated many times in a single request when sending a complex graph, they are encoded as a regular string the first
time they appear, and use an indirection for later occurrences.
15.3.4.6 Fragmentation
It is anticipated that value types may be rather large, particularly when a graph is being transmitted. Hence the encoding
supports the breaking up of the serialization into an arbitrary number of chunks in order to facilitate incremental processing.
Values with truncatable base types need a length indication in case the receiver needs to truncate them to a base type. Value
types that are custom marshaled also need a length indication so that the ORB run time can know exactly where they end in
the stream without relying on user-defined code. This allows the ORB to maintain consistency and ensure the integrity of the
GIOP stream when the user-written custom marshaling and demarshaling does not marshal the entire value state. For simplicity
of encoding, we use a length indication for all values whether or not they have a truncatable base type or use custom marshaling.
If limited space is available for marshaling, it may be necessary for the ORB to send the contents of a marshaling buffer
containing a partially marshaled value as a GIOP fragment. At that point in the marshaling, the length of the entire value
being marshaled may not be known. Calculating this length may require processing as costly as marshaling the entire value.
It is therefore desirable to allow the value to be encoded as multiple chunks, each with its own length. This allows the portion
of a value that occupies a marshaling buffer to be sent as a chunk of known length with no need for additional length calculation
processing.
The data may be split into multiple chunks at arbitrary points except within primitive CDR types, arrays of primitive types,
strings, and wstrings, or between the tag and offset of indirections. It is never necessary to end a chunk within one of these
types as the length of these types is known before starting to marshal them so they can be added to the length of the currently
open chunk. It is the responsibility of the CDR stream to hide the chunking from the marshaling code.
The presence (or absence) of chunking is indicated by flags within the <value_tag>. The fourth least significant bit (<value_tag>
& 0x00000008) is the value 1 if a chunked encoding is used for the value’s state. The chunked encoding is required for custom
marshaling and truncation. If this bit is 0, the state is encoded as <octets>.
Each chunk is preceded by a positive long, which specifies the number of octets in the chunk.
A chunked value is terminated by an end tag that is a non-positive long so the start of the next value can be differentiated
from the start of another chunk. In the case of values that contain other values (e.g., a linked list) the “nested? value
is started without there being an end tag. The absolute value of an end tag (when it finally appears) indicates the nesting
level of the value being terminated. A single end tag can be used to terminate multiple nested values. The detailed rules
are as follows:
• The end tag is a negative long whose value is the negation of the absolute nesting depth of the value type ending at this point in the CDR stream. Any value types that have not already been ended and whose nesting depth is greater than the depth indicated by the end tag are also implicitly ended. The end tag value 0 is reserved
• End tags, chunk size tags, and value tags are encoded using non-overlapping ranges so that the unmarshaling code can tell after reading each chunk whether:
• another chunk follows (positive tag). • one or multiple value types are ending at a given point in the stream (negative tag). • a nested value follows (special large positive tag).
for future use (e.g., supporting a nesting depth of more than 2^31). The outermost value type will always be terminated by
an end tag with a value of -1. Enclosing non-chunked valuetypes are not considered when determining the nesting depth.
The following example describes how end tags may be used. Consider a valuetype declaration that contains two member values:
// IDLvaluetype simpleNode{ ... };valuetype node truncatable simpleNode {public node node1;public node node2;
};
When an instance of type ‘node’ is passed as a parameter of type ‘simpleNode’ a chunked encoding is used. In all cases, the
outermost value is terminated with an end tag with a value of -1. The nested value ‘node1’ is terminated with an end tag with
a value of -2 since only the second-level value ‘node1’ ends at that point. Since the nested value ‘node2’ coterminates with
the outermost value, either of the following end tag layouts is legal:
• A single end tag with a value of -1 marks the termination of the outermost value, implying the termination of the nested value, ‘node2’as well. This is the most compact marshaling.
• An end tag with a value of -2 marks the termination of the nested value, ‘node2.’ This is then followed by an end tag with a value of -1 to mark the termination of the outermost value.
Because data members are encoded in their declaration order, declaring a value type data member of a value type last is likely
to result in more compact encoding on the wire because it maximizes the number of values ending at the same place and so allows
a single end tag to be used for multiple values. The canonical example for that is a linked list.
• For the purposes of chunking, values encoded as indirections or null are treated as non-value data.
• Chunks are never nested. When a value is nested within another value, the outer value’s chunk ends at the place in the stream where the inner value starts. If the outer value has non-value data to be marshaled following the inner value, the end tag for the inner value is followed by a continuation chunk for the remainder of the outer value.
• Regardless of the above rules, any value nested within a chunked value is always chunked. Furthermore, any such nested value that is truncatable must encode its type information as a list of RepositoryIDs (see Section 15.3.4.1, “Partial Type Information and Versioning,? on page 15-16).
Truncating a value type in the receiving context may require keeping track of unused nested values (only during unmarshaling)
in case further indirection tags point back to them. These values can be held in their “raw? GIOP form, as fully unmarshaled
value objects, or in any other product-specific form.
Value types that are custom marshaled are encoded as chunks in order to let the ORB run-time know exactly where they end in
the stream without relying on user-defined code.
15.3.4.7 Notation
The on-the-wire format is described by a BNF grammar with conventions similar to the ones used to define IDL syntax. The terminals
of the grammar are to be interpreted differently. We are describing a protocol format. Although the terminals have the same
names as IDL tokens they represent either:
• constant tags, or
• the GIOP CDR encoding of the corresponding IDL construct.
For example, long is a shorthand for the GIOP encoding of the IDL long data type with all the GIOP alignment rules. Similarly
struct is a shorthand for the GIOP CDR encoding of a struct.
A (type) constant means that an instance of the given type having the given value is encoded according to the rules for that
type. So that (long) 0 means that a CDR encoding for a long having the value 0 appears at that location.
15.3.4.8 The Format
(2) <value_ref> ::= <indirection_tag> <indirection> | <null_tag>
(3) <value_tag> ::= long// 0x7fffff00 <= value_tag <= 0x7fffffff
(4) <type_info> ::= <rep_ids> | <repository_id>
(5) <state> ::= <octets> |<value_data>* [ <end_tag> ]
(6) <value_data> ::= <value_chunk> | <value>
(7) <rep_ids> ::= long <repository_id>+| <indirection_tag> <indirection>
(8) <repository_id> ::= string | <indirection_tag> <indirection>
(9) <value_chunk> ::= <chunk_size_tag> <octets>
(10) <null_tag> ::= (long) 0
(11) <indirection_tag> ::= (long) 0xffffffff
(12) <codebase_URL> ::= string | <indirection_tag> <indirection>
(13) <chunk_size_tag> ::= long // 0 < chunk_size_tag < 2^31-256 (0x7fffff00)
(14) <end_tag> ::= long // -2^31 < end_tag < 0
(15) <indirection> ::= long // -2^31 < indirection < 0
(16) <octets> ::= octet | octet <octets>
(1) <value> ::= <value_tag> [ <codebase_URL> ]
[ <type_info> ] <state> | <value_ref>
The concatenated octets of consecutive value chunks within a value encode state members for the value according to the following
grammar:
(1) | <state members> | ::= <state_member> | |||
| <state_member> <state members> | |||||
(2) | <state_member> | ::= <value_ref> | |||
// All legal IDL types should be here | |||||
| octet | |||||
| boolean | |||||
| char | |||||
| short | |||||
| unsigned short | |||||
| long | |||||
| unsigned long | |||||
| float | |||||
| wchar | |||||
| wstring | |||||
| string | |||||
| struct | |||||
| union | |||||
| sequence | |||||
| array | |||||
| Object | |||||
| any | |||||
| long long | |||||
| unsigned long long | |||||
| double | |||||
| long double | |||||
| fixed |