Previous | Table of Contents | Next |
13.10.5.1 Character and Code Set Registry
The OSF character and code set registry is defined in OSF Character and Code Set Registry (see References in the Preface)
and current registry contents may be obtained directly from the Open Software Foundation (obtain via anonymous ftp to ftp.opengroup.org:/pub/code_set_registry).
This registry contains two parts: character sets and code sets. For each listed code set, the set of character sets encoded
by this code set is shown.
Each 32-bit code set value consists of a high-order 16-bit organization number and a 16-bit identification of the code set
within that organization. As the numbering of organizations starts with 0x0001, a code set null value (0x00000000) may be
used to indicate an unknown code set.
When associating character sets and code sets, OSF uses the concept of “fuzzy equality,? meaning that a code set is shown
as encoding a particular character set if the code set can encode “most? of the characters.
“Compatibility? is determined with respect to two code sets by examining their entries in the registry, paying special attention
to the character sets encoded by each code set. For each of the two code sets, an attempt is made to see if there is at least
one (fuzzydefined) character set in common, and if such a character set is found, then the assumption is made that these code
sets are “compatible.? Obviously, applications which exploit parts of a character set not properly encoded in this scheme
will suffer information loss when communicating with another application in this “fuzzy? scheme.
The ORB is responsible for accessing the OSF registry and determining “compatibility? based on the information returned.
OSF members and other organizations can request additions to both the character set and code set registries by email to cs-registry@opengroup.org;
in particular, one range of the code set registry (0xf5000000 through 0xffffffff) is reserved for organizations to use in
identifying sets which are not registered with the OSF (although such use would not facilitate interoperability without registration).
13.10.5.2 Access Routines
The following routines are for accessing the OSF character and code set registry. These routines map a code set string name
to code set id and vice versa. They also help in determining character set compatibility. These routine interfaces, their
semantics and their actual implementation are not normative (i.e., ORB vendors do not have to bundle the OSF registry implementation
with their products for compliance).
The following routines are adopted from RPC Runtime Support For I18N Characters Functional Specification (see References in
the Preface).
dce_cs_loc_to_rgy
Maps a local system-specific string name for a code set to a numeric code set value specified in the code set registry.
Synopsis
void dce_cs_loc_to_rgy(idl_char *local_code_set_name,unsigned32 *rgy_code_set_value,unsigned16 *rgy_char_sets_number,unsigned16 **rgy_char_sets_value, error_status_t *status);
Parameters
Input
local_code_set_name -A string that specifies the name that the local host’s locale environment uses to refer to the code set.
The string is a maximum of 32 bytes: 31 data bytes plus a terminating NULL character.
Output
rgy_code_set_value 0 - The registered integer value that uniquely identifies the code set specified by local_code_set_name.
rgy_char_sets_number - The number of character sets that the specified code set encodes. Specifying NULL prevents this routine
from returning this parameter.
rgy_char_sets_value - A pointer to an array of registered integer values that uniquely identify the character set(s) that
the specified code set encodes. Specifying NULL prevents this routine from returning this parameter. The routine dynamically
allocates this value.
status - Returns the status code from this routine. This status code indicates whether the routine completed successfully
or, if not, why not.
The possible status codes and their meanings are as follows:
• dce_cs_c_ok – Code set registry access operation succeeded.
• dce_cs_c_cannot_allocate_memory – Cannot allocate memory for code set info.
• dce_cs_c_unknown – No code set value was not found in the registry which corresponds to the code set name specified.
• dce_cs_c_notfound – No local code set name was found in the registry which corresponds to the name specified.
Description
The dce_cs_loc_to_rgy() routine maps operating system-specific names for character/code set encodings to their unique identifiers
in the code set registry.
The dce_cs_loc_to_rgy() routine takes as input a string that holds the host-specific “local name? of a code set and returns
the corresponding integer value that uniquely identifies that code set, as registered in the host's code set registry. If
the integer value does not exist in the registry, the routine returns the status dce_cs_c_unknown.
The routine also returns the number of character sets that the code set encodes and the registered integer values that uniquely
identify those character sets. Specifying NULL in the rgy_char_sets_number and rgy_char_sets_value[] parameters prevents the
routine from performing the additional search for these values. Applications that want only to obtain a code set value from
the code set registry can specify NULL for these parameters in order to improve the routine's performance. If the value is
returned from the routine, application developers should free the array after it is used, since the array is dynamically allocated.
dce_cs_rgy_to_loc
Maps a numeric code set value contained in the code set registry to the local system-specific name for a code set.
Synopsis
void dce_cs_rgy_to_loc( unsigned32 *rgy_code_set_value, idl_char **local_code_set_name, unsigned16 *rgy_char_sets_number,
unsigned16 **rgy_char_sets_value, error_status_t *status);
Parameters
Input
rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set.
Output
local_code_set_name - A string that specifies the name that the local host's locale environment uses to refer to the code
set. The string is a maximum of 32 bytes: 31 data bytes and a terminating NULL character.
rgy_char_sets_number - The number of character sets that the specified code set encodes. Specifying NULL in this parameter
prevents the routine from returning this value.
rgy_char_sets_value - A pointer to an array of registered integer values that uniquely identify the character set(s) that
the specified code set encodes. Specifying NULL in this parameter prevents the routine from returning this value. The routine
dynamically allocates this value.
status - Returns the status code from this routine. This status code indicates whether the routine completed successfully
or, if not, why not.
The possible status codes and their meanings are as follows:
• dce_cs_c_ok – Code set registry access operation succeeded.
• dce_cs_c_cannot_allocate_memory – Cannot allocate memory for code set info.
• dce_cs_c_unknown – The requested code set value was not found in the code set registry.
• dce_cs_c_notfound – No local code set name was found in the registry which corresponds to the specific code set registry ID value. This implies that the code set is not supported in the local system environment.
Description
The dce_cs_rgy_to_loc() routine maps a unique identifier for a code set in the code set registry to the operating system-specific
string name for the code set, if it exists in the code set registry.
The dce_cs_rgy_to_loc() routine takes as input a registered integer value of a code set and returns a string that holds the
operating system-specific, or local name, of the code set.
If the code set identifier does not exist in the registry, the routine returns the status dce_cs_c_unknown and returns an
undefined string.
The routine also returns the number of character sets that the code set encodes and the registered integer values that uniquely
identify those character sets. Specifying NULL in the rgy_char_sets_number and rgy_char_sets_value[] parameters prevents the
routine from performing the additional search for these values. Applications that want only to obtain a local code set name
from the code set registry can specify NULL for these parameters in order to improve the routine's performance. If the value
is returned from the routine, application developers should free the rgy_char_sets_value array after it is used.
rpc_cs_char_set_compat_check
Evaluates character set compatibility between a client and a server.
Synopsis
void rpc_cs_char_set_compat_check( unsigned32 client_rgy_code_set_value, unsigned32 server_rgy_code_set_value, error_status_t *status);
Parameters
Input
client_rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set that the client is using
as its local code set.
server_rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set that the server is using
as its local code set.
Output
status - Returns the status code from this routine. This status code indicates whether the routine completed successfully
or, if not, why not.
The possible status codes and their meanings are as follows:
• rpc_s_ok – Successful status.
• rpc_s_ss_no_compat_charsets – No compatible code set found. The client and server do not have a common encoding that both could recognize and convert.
• The routine can also return status codes from the dce_cs_rgy_to_loc() routine.
Description
The rpc_cs_char_set_compat_check() routine provides a method for determining character set compatibility between a client
and a server; if the server's character set is incompatible with that of the client, then connecting to that server is most
likely not acceptable, since massive data loss would result from such a connection.
The routine takes the registered integer values that represent the code sets that the client and server are currently using
and calls the code set registry to obtain the registered values that represent the character set(s) that the specified code
sets support. If both client and server support just one character set, the routine compares client and server registered
character set values to determine whether or not the sets are compatible. If they are not, the routine returns the status
message rpc_s_ss_no_compat_charsets.
If the client and server support multiple character sets, the routine determines whether at least two of the sets are compatible.
If two or more sets match, the routine considers the character sets compatible, and returns a success status code to the caller.
rpc_rgy_get_max_bytes
Gets the maximum number of bytes that a code set uses to encode one character from the code set registry on a host
Synopsis
void rpc_rgy_get_max_bytes(unsigned32 rgy_code_set_value,unsigned16 *rgy_max_bytes,error_status_t *status);
Parameters
Input
rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set.
Output
rgy_max_bytes - The registered decimal value that indicates the number of bytes this code set uses to encode one character.
status - Returns the status code from this routine. This status code indicates whether the routine completed successfully
or, if not, why not.
The possible status codes and their meanings are as follows:
• rpc_s_ok – Operation succeeded.
• dce_cs_c_cannot_allocate_memory – Cannot allocate memory for code set info.
• dce_cs_c_unknown – No code set value was not found in the registry which corresponds to the code set value specified.
• dce_cs_c_notfound – No local code set name was found in the registry which corresponds to the value specified.
Description
The rpc_rgy_get_max_bytes() routine reads the code set registry on the local host. It takes the specified registered code
set value, uses it as an index into the registry, and returns the decimal value that indicates the number of bytes that the
code set uses to encode one character.
This information can be used for buffer sizing as part of the procedure to determine whether additional storage needs to be
allocated for conversion between local and network code sets.