Previous | Table of Contents | Next |
Certain language environments do not distinguish between byte-oriented and wide characters. In such environments both char
and wchar are mapped to the same “generic? character representation of the language. String and wstring are likewise mapped
to generic strings in such environments. Examples of language environments that provide generic character support are Smalltalk
and Ada.
Even while using languages that do distinguish between wide and byte-oriented characters (e.g., C and C++), it is possible
to mimic some generic behavior by the use of suitable macros and support libraries. For example, developers of Windows NT
and Windows 95 applications can write portable code between NT (which uses Unicode strings) and Windows 95 (which uses byte-oriented
character strings) by using a set of macros for declaring and manipulating characters and character strings. Appendix A in
this chapter shows how to map wide and byte-oriented characters to these generic macros.
Another way to achieve generic manipulation of characters and strings is by treating them as abstract data types (ADTs). For
example, if strings were treated as abstract data types and the programmers are required to create, destroy, and manipulate
strings only through the operations in the ADT interface, then it becomes possible to write code that is representation-independent.
This approach has an advantage over the macro-based approach in that it provides portability between byte-oriented and wide
character environments even without recompilation (at runtime the string function calls are bound to the appropriate byte-oriented/wide
library). Another way of looking at it is that the macro-based genericity gives compile-time flexibility, while ADT-based
genericity gives runtime flexibility.
Yet another way to achieve generic manipulation of character data is through the ANSI C++ Strings library defined as a template
that can be parameterized by char, wchar_t, or other integer types.
Given that there can be several ways of treating characters and character strings in a generic way, this standard cannot,
and therefore does not, specify the mapping of char, wchar, string, and wstring to all of them. It only specifies the following
normative requirements which are applicable to generic character environments:
• wchar must be mapped to the generic character type in a generic character environment.
• wstring must be mapped to a string of such generic characters in a generic character environment.
• The language binding files (i.e., stubs) generated for these generic environments must ensure that the generic type representation is converted to the appropriate code sets (i.e., CNCS on the client side and SNCS on the server side) before character data is given to the ORB runtime for transmission.
13.10.3.1 Describing Generic Interfaces
To describe generic interfaces in IDL we recommend using wchar and wstring. These can be mapped to generic character types
in environments where they do exist and to wide characters where they do not. Either way interoperation between generic and
non-generic character type environments is achieved because of the code set conversion framework.
13.10.3.2 Interoperation
Let us consider an example to see how a generic environment can interoperate with a non-generic environment. Let us say there
is an IDL interface with both char and wchar parameters on the operations, and let us say the client of the interface is in
a generic environment while the server is in a non-generic environment (for example the client is written in Smalltalk and
the server is written in C++).
Assume that the server’s (byte-oriented) native char code set (SNCS) is eucJP and the client’s native char code set (CNCS)
is SJIS. Further assume that the code set negotiation led to the decision to use eucJP as the char TCS-C and Unicode as the
wchar TCS-W.
As per the above normative requirements for mapping to a generic environment, the client’s Smalltalk stubs are responsible
for converting all char data (however they are represented inside Smalltalk) to SJIS and all wchar data to the client’s wchar
code set before passing the data to the client-side ORB. Note that this conversion could be an identity mapping if the internal
representation of narrow and wide characters is the same as that of the native code set(s). The client-side ORB now converts
all char data from SJIS to eucJP and all wchar data from the client’s wchar code set to Unicode, and then transmits to the
server side.
The server side ORB and stubs convert the eucJP data and Unicode data into C++’s internal representation for chars and wchars
as dictated by the IDL operation signatures. Notice that when the data arrives at the server side it does not look any different
from data arriving from a non-generic environment (e.g., that is just like the server itself). In other words, the mappings
to generic character environments do not affect the code set conversion framework.