15.3.1 Primitive Types

Primitive data types are specified for both big-endian and little-endian orderings. The message formats (see Section 15.4, “GIOP Message Formats,? on page 15-30) include tags in message headers that indicate the byte ordering in the message. Encapsulations include an initial flag that indicates the byte ordering within the encapsulation, described in Section 15.3.3, “Encapsulation,? on page 15-14. The byte ordering of any encapsulation may be different from the message or encapsulation within which it is nested. It is the responsibility of the message recipient to translate byte ordering if necessary. Primitive data types are encoded in multiples of octets. An octet is an 8-bit value.

15.3.1.1 Alignment

In order to allow primitive data to be moved into and out of octet streams with instructions specifically designed for those primitive data types, in CDR all primitive data types must be aligned on their natural boundaries (i.e., the alignment boundary of a primitive datum is equal to the size of the datum in octets). Any primitive of size n octets must start at an octet stream index that is a multiple of n. In CDR, n is one of 1, 2, 4, or 8.

Where necessary, an alignment gap precedes the representation of a primitive datum. The value of octets in alignment gaps is undefined. A gap must be the minimum size necessary to align the following primitive. Table 15-1 gives alignment boundaries for CDR/OMG-IDL primitive types.

Table 15-1 Alignment requirements for OMG IDL primitive data types


TYPE	OCTET
	ALIGNMENT
char	1

Table 15-1 Alignment requirements for OMG IDL primitive data types


TYPE	OCTET
	ALIGNMENT
wchar	1, 2 or 4 for GIOP 1.1 \|
	1 for GIOP 1.2 and 1.3
octet	1
short	2
unsigned short	2
long	4
unsigned long	4
long long	8
unsigned long long	8
float	4
double	8
long double	8
boolean	1
enum	4

Alignment is defined above as being relative to the beginning of an octet stream. The first octet of the stream is octet index zero (0); any data type may be stored starting at this index. Such octet streams begin at the start of a GIOP message header (see Section 15.4.1, “GIOP Message Header,? on page 15-31) and at the beginning of an encapsulation, even if the encapsulation itself is nested in another encapsulation. (See Section 15.3.3, “Encapsulation,? on page 15-14).

15.3.1.2 Integer Data Types

Figure 15-1 on page 15-7 illustrates the representations for OMG IDL integer data types, including the following data types:

• short

• unsigned short

• long

• unsigned long

• long long

• unsigned long long

The figure illustrates bit ordering and size. Signed types (short, long, and long long) are represented as two’s complement numbers; unsigned versions of these types are represented as unsigned binary numbers.

Big-Endian octet short


MSB	LSB	0 1

0 long

1 2 3

0 1 2 3long long 4 5 6 7

Little-Endian

MSB	LSB	0 1 octet

0 1 2 3

0 1 2 3 4 5 6 7

Figure 15-1 Sizes and bit ordering in big-endian and little-endian encodings of OMG IDL integer data types, both signed and unsigned.

15.3.1.3 Floating Point Data Types

Figure 15-2 on page 15-9 illustrates the representation of floating point numbers. These exactly follow the IEEE standard formats for floating point numbers1, selected parts of which are abstracted here for explanatory purposes. The diagram shows three different components for floating points numbers, the sign bit (s), the exponent (e) and the fractional part (f) of the mantissa. The sign bit has values of 0 or 1, representing positive and negative numbers, respectively.

1. “IEEE Standard for Binary Floating-Point Arithmetic,? ANSI/IEEE Standard 754-1985, Institute of Electrical and Electronics Engineers, August 1985.

For single-precision float values the exponent is 8 bits long, comprising e1 and e2 in the figure, where the 7 bits in e1 are most significant. The exponent is represented as excess 127. The fractional mantissa (f1 - f3) is a 23-bit value f where 1.0 <= f < 2.0, f1 being most significant and f3 being least significant. The value of a normalized number is described by:

–1sign ×2(exponent – 127 )×(1+ fraction )

For double-precision values the exponent is 11 bits long, comprising e1 and e2 in the figure, where the 7 bits in e1 are most significant. The exponent is represented as excess 1023. The fractional mantissa (f1 - f7) is a 52-bit value m where 1.0 <= m < 2.0, f1 being most significant and f7 being least significant. The value of a normalized number is described by:

–1sign ×2(exponent – 1023 )×(1+ fraction )

For double-extended floating-point values the exponent is 15 bits long, comprising e1 and e2 in the figure, where the 7 bits in e1 are the most significant. The fractional mantissa (f1 through f14) is 112 bits long, with f1 being the most significant. The value of a long double is determined by:

–1sign ×2(exponent – 16383 )×(1+ fraction )

float

double

long double

Big-Endian 0 1 2 3


s	e1
e2	f1
	f2
	f3

s		e1

	e2		f1
		f2
		f3
		f4
		f5
		f6
		f7


s	e1
	e2
	f1
	f2
	f3
	f4
	f5
	f6
	f7
	f8
	f9
	f10
	f11
	f12
	f13
	f14

0 1 2 3 4 5 6 7

9 10 11 12 13 14 15

Little-Endian

	f3

	f2
e2	f1
s	e1

		f7

		f6
		f5
		f4
		f3
		f2
	e2		f1
s		e1

	f14

	f13
	f12
	f11
	f10
	f9
	f8
	f7
	f6
	f5
	f4
	f3
	f2
	f1
	e2
s	e1

0 1 2 3

0 1 2 3 4 5 6 7

9 10 11 12 13 14 15

Figure 15-2 Sizes and bit ordering in big-endian and little-endian representations of OMG IDL single, double precision, and double extended floating point numbers.

15.3.1.4 Octet

Octets are uninterpreted 8-bit values whose contents are guaranteed not to undergo any conversion during transmission. For the purposes of describing possible octet values in this specification, octets may be considered as unsigned 8-bit integer values.

15.3.1.5 Boolean

Boolean values are encoded as single octets, where TRUE is the value 1, and FALSE as 0.

15.3.1.6 Character Types

An IDL character is represented as a single octet; the code set used for transmission of character data (e.g., TCS-C) between a particular client and server ORBs is determined via the process described in Section 13.10, “Code Set Conversion,? on page 13-37. In the case of multi-byte encodings of characters, a single instance of the char type may only hold one octet of any multi-byte character encoding.

Note – Full representation of multi-byte characters will require the use of an array of IDL char variables.

For GIOP version 1.1, the transfer syntax for an IDL wide character depends on whether the transmission code set (TCS-W, which is determined via the process described in Section 13.10, “Code Set Conversion,? on page 13-37) is byte-oriented or non-byte-oriented:

• Byte-oriented (e.g., SJIS). Each wide character is represented as one or more octets, as defined by the selected TCS-W.

• Non-byte-oriented (e.g., Unicode UTF-16). Each wide character is represented as one or more codepoints. A codepoint is the same as “Coded-Character data element,? or “CC data element? in ISO terminology. Each codepoint is encoded using a fixed number of bits as determined by the selected TCS-W. The OSF Character and Code Set Registry may be examined using the interfaces in Section 13.10.5, “Relevant OSFM Registry Interfaces,? on page 13-50 to determine the maximum length (max_bytes) of any character codepoint. For example, if the TCS-W is ISO 10646 UCS-2 (Universal Character Set containing 2 bytes), then wide characters are represented as unsigned shorts. For ISO 10646 UCS-4, they are represented as unsigned longs.

For GIOP version 1.2, and 1.3 wchar is encoded as an unsigned binary octet value, followed by the elements of the octet sequence representing the encoded value of the wchar. The initial octet contains a count of the number of elements in the sequence, and the elements of the sequence of octets represent the wchar, using the negotiated wide character encoding.

Note – The GIOP 1.2 and 1.3 encoding of wchar is similar to the encoding of an octet sequence, except for its use of a single octet to encode the value of the length.

For GIOP versions prior to 1.2 and 1.3, interoperability for wchar is limited to the use of two- octet fixed-length encoding.

wchar values in encapsulations are assumed to be encoded using GIOP version 1.2 and 1.3 CDR.

If UTF-16 is selected as the TCS-W the CDR encoding purposes can be big endian or little endian, but defaults to big endian. By placing a BOM (byte order marker) at the front of the wstring or wchar encoding, it can be sent either big-endian or little-endian. In particular, the CDR rules for endian-ness of UTF-16 encoded wstring or wchar values are as follows:

• If the first two bytes (after the length indication) are FE FF, it’s big-endian.

• If the first two bytes (after the length indication) are FF FE, it’s little-endian.

• If the first two bytes (after the length indication) are neither, it’s big-endian.

If an ORB decides to use BOM to indicate endianness, it shall add the BOM to the beginning of wchar or wstring values when encoding the value, since it is not present in wchar or wstring values passed by the user.

If a BOM is present at the beginning of a wchar or wstring received in a GIOP message, the ORB shall remove the BOM before passing the value to the user.

If a client orb erroneously sends wchar or wstring data in a GIOP 1.0 message, the server shall generate a MARSHAL standard system exception, with standard minor code 5.

If a server erroneously sends wchar data in a GIOP 1.0 response, the client ORB shall raise a MARSHAL exception to the client application with standard minor code 6.