Previous | Table of Contents | Next |
Primitive data types are specified for both big-endian and little-endian orderings. The
message formats (see Section 15.4, “GIOP Message Formats,? on page 15-30) include
tags in message headers that indicate the byte ordering in the message. Encapsulations include an initial flag that indicates
the byte ordering within the encapsulation,
described in Section 15.3.3, “Encapsulation,? on page 15-14. The byte ordering of any
encapsulation may be different from the message or encapsulation within which it is nested. It is the responsibility of the
message recipient to translate byte ordering if necessary. Primitive data types are encoded in multiples of octets. An octet
is an 8-bit value.
15.3.1.1 Alignment
In order to allow primitive data to be moved into and out of octet streams with instructions specifically designed for those
primitive data types, in CDR all primitive data types must be aligned on their natural boundaries (i.e., the alignment boundary
of a primitive datum is equal to the size of the datum in octets). Any primitive of size n octets must start at an octet stream
index that is a multiple of n. In CDR, n is one of 1, 2, 4, or 8.
Where necessary, an alignment gap precedes the representation of a primitive datum. The value of octets in alignment gaps
is undefined. A gap must be the minimum size
necessary to align the following primitive. Table 15-1 gives alignment boundaries for
CDR/OMG-IDL primitive types.
Table 15-1 Alignment requirements for OMG IDL primitive data types
TYPE | OCTET | ||||
ALIGNMENT | |||||
char | 1 |
Table 15-1 Alignment requirements for OMG IDL primitive data types
TYPE | OCTET | ||||
ALIGNMENT | |||||
wchar | 1, 2 or 4 for GIOP 1.1 | | ||||
1 for GIOP 1.2 and 1.3 | |||||
octet | 1 | ||||
short | 2 | ||||
unsigned short | 2 | ||||
long | 4 | ||||
unsigned long | 4 | ||||
long long | 8 | ||||
unsigned long long | 8 | ||||
float | 4 | ||||
double | 8 | ||||
long double | 8 | ||||
boolean | 1 | ||||
enum | 4 |
Alignment is defined above as being relative to the beginning of an octet stream. The first octet of the stream is octet index
zero (0); any data type may be stored starting at this index. Such octet streams begin at the start of a GIOP message header
(see
Section 15.4.1, “GIOP Message Header,? on page 15-31) and at the beginning of an
encapsulation, even if the encapsulation itself is nested in another encapsulation. (See
Section 15.3.3, “Encapsulation,? on page 15-14).
15.3.1.2 Integer Data Types
Figure 15-1 on page 15-7 illustrates the representations for OMG IDL integer data
types, including the following data types:
• short
• unsigned short
• long
• unsigned long
• long long
• unsigned long long
The figure illustrates bit ordering and size. Signed types (short, long, and long long) are represented as two’s complement
numbers; unsigned versions of these types are represented as unsigned binary numbers.
Big-Endianoctetshort |
|||||
MSB | LSB | 0 1 |
0 long
1 2 3
0 1 2 3long long 4 5 6 7
Little-Endian |
|||||
MSB | LSB | 0 1 octet |
0 1 2 3
0 1 2 3 4 5 6 7
Figure 15-1 Sizes and bit ordering in big-endian and little-endian encodings of OMG IDL integer data types, both signed and
unsigned.
15.3.1.3 Floating Point Data Types
Figure 15-2 on page 15-9 illustrates the representation of floating point numbers.
These exactly follow the IEEE standard formats for floating point numbers1, selected parts of which are abstracted here for
explanatory purposes. The diagram shows three different components for floating points numbers, the sign bit (s), the exponent
(e) and the fractional part (f) of the mantissa. The sign bit has values of 0 or 1, representing positive and negative numbers,
respectively.
1. “IEEE Standard for Binary Floating-Point Arithmetic,? ANSI/IEEE Standard 754-1985, Institute of Electrical and Electronics
Engineers, August 1985.
For single-precision float values the exponent is 8 bits long, comprising e1 and e2 in the figure, where the 7 bits in e1
are most significant. The exponent is represented as excess 127. The fractional mantissa (f1 - f3) is a 23-bit value f where
1.0 <= f < 2.0, f1 being most significant and f3 being least significant. The value of a normalized number is described by:
–1sign ×2(exponent – 127 )×(1+ fraction )
For double-precision values the exponent is 11 bits long, comprising e1 and e2 in the figure, where the 7 bits in e1 are most
significant. The exponent is represented as excess 1023. The fractional mantissa (f1 - f7) is a 52-bit value m where 1.0 <=
m < 2.0, f1 being most significant and f7 being least significant. The value of a normalized number is described by:
–1sign ×2(exponent – 1023 )×(1+ fraction )
For double-extended floating-point values the exponent is 15 bits long, comprising e1 and e2 in the figure, where the 7 bits
in e1 are the most significant. The fractional mantissa (f1 through f14) is 112 bits long, with f1 being the most significant.
The value of a long double is determined by:
–1sign ×2(exponent – 16383 )×(1+ fraction )
float
double
long double
Big-Endian 0 1 2 3
s | e1 | ||||
e2 | f1 | ||||
f2 | |||||
f3 |
s |
e1 |
||||
e2 | f1 | ||||
f2 | |||||
f3 | |||||
f4 | |||||
f5 | |||||
f6 | |||||
f7 |
s | e1 | ||||
e2 | |||||
f1 | |||||
f2 | |||||
f3 | |||||
f4 | |||||
f5 | |||||
f6 | |||||
f7 | |||||
f8 | |||||
f9 | |||||
f10 | |||||
f11 | |||||
f12 | |||||
f13 | |||||
f14 |
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Little-Endian
f3 |
|||||
f2 | |||||
e2 | f1 | ||||
s | e1 |
f7 |
|||||
f6 | |||||
f5 | |||||
f4 | |||||
f3 | |||||
f2 | |||||
e2 | f1 | ||||
s | e1 |
f14 |
|||||
f13 | |||||
f12 | |||||
f11 | |||||
f10 | |||||
f9 | |||||
f8 | |||||
f7 | |||||
f6 | |||||
f5 | |||||
f4 | |||||
f3 | |||||
f2 | |||||
f1 | |||||
e2 | |||||
s | e1 |
0 1 2 3
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Figure 15-2 Sizes and bit ordering in big-endian and little-endian representations of OMG IDL single, double precision, and
double extended floating point numbers.
15.3.1.4 Octet
Octets are uninterpreted 8-bit values whose contents are guaranteed not to undergo any conversion during transmission. For
the purposes of describing possible octet values in this specification, octets may be considered as unsigned 8-bit integer
values.
15.3.1.5 Boolean
Boolean values are encoded as single octets, where TRUE is the value 1, and FALSE as 0.
15.3.1.6 Character Types
An IDL character is represented as a single octet; the code set used for transmission of character data (e.g., TCS-C) between
a particular client and server ORBs is determined
via the process described in Section 13.10, “Code Set Conversion,? on page 13-37. In
the case of multi-byte encodings of characters, a single instance of the char type may only hold one octet of any multi-byte
character encoding.
Note – Full representation of multi-byte characters will require the use of an array of IDL char variables.
For GIOP version 1.1, the transfer syntax for an IDL wide character depends on whether the transmission code set (TCS-W, which
is determined via the process
described in Section 13.10, “Code Set Conversion,? on page 13-37) is byte-oriented or
non-byte-oriented:
• Byte-oriented (e.g., SJIS). Each wide character is represented as one or more octets, as defined by the selected TCS-W.
• Non-byte-oriented (e.g., Unicode UTF-16). Each wide character is represented as one or more codepoints. A codepoint is the same as “Coded-Character data element,? or “CC data element? in ISO terminology. Each codepoint is encoded using a fixed number of bits as determined by the selected TCS-W. The OSF Character and Code Set Registry may be examined using the interfaces in Section 13.10.5, “Relevant OSFM Registry Interfaces,? on page 13-50 to determine the maximum length (max_bytes) of any character codepoint. For example, if the TCS-W is ISO 10646 UCS-2 (Universal Character Set containing 2 bytes), then wide characters are represented as unsigned shorts. For ISO 10646 UCS-4, they are represented as unsigned longs.
For GIOP version 1.2, and 1.3 wchar is encoded as an unsigned binary octet value, followed by the elements of the octet sequence
representing the encoded value of the wchar. The initial octet contains a count of the number of elements in the sequence,
and the elements of the sequence of octets represent the wchar, using the negotiated wide character encoding.
Note – The GIOP 1.2 and 1.3 encoding of wchar is similar to the encoding of an octet sequence, except for its use of a single
octet to encode the value of the length.
For GIOP versions prior to 1.2 and 1.3, interoperability for wchar is limited to the use of two- octet fixed-length encoding.
wchar values in encapsulations are assumed to be encoded using GIOP version 1.2 and 1.3 CDR.
If UTF-16 is selected as the TCS-W the CDR encoding purposes can be big endian or little endian, but defaults to big endian.
By placing a BOM (byte order marker) at the front of the wstring or wchar encoding, it can be sent either big-endian or little-endian.
In particular, the CDR rules for endian-ness of UTF-16 encoded wstring or wchar values are as follows:
• If the first two bytes (after the length indication) are FE FF, it’s big-endian.
• If the first two bytes (after the length indication) are FF FE, it’s little-endian.
• If the first two bytes (after the length indication) are neither, it’s big-endian.
If an ORB decides to use BOM to indicate endianness, it shall add the BOM to the beginning of wchar or wstring values when
encoding the value, since it is not present in wchar or wstring values passed by the user.
If a BOM is present at the beginning of a wchar or wstring received in a GIOP message, the ORB shall remove the BOM before
passing the value to the user.
If a client orb erroneously sends wchar or wstring data in a GIOP 1.0 message, the server shall generate a MARSHAL standard
system exception, with standard minor code 5.
If a server erroneously sends wchar data in a GIOP 1.0 response, the client ORB shall raise a MARSHAL exception to the client
application with standard minor code 6.