Oracle7 Server Reference Manual

Background Information

This section provides background information on the issues involved in multi-lingual applications, and shows how they are resolved by the National Language Support (NLS) features of the Oracle7 Server. The remaining sections of this chapter discuss the specific parameters that control NLS operation.

Character Encoding Schemes

To understand how Oracle7 Server deals with character data, it is important to understand the general features of character representation on computers. The appearance of a character on a terminal depends on the convention for character representation used by that terminal. When you press a character key on the keyboard, the terminal generates a numeric code specified by the character encoding scheme in use on that device. When the terminal receives a number representing a character, it displays the character shape specified by that encoding scheme.

Encoding schemes define the representation of alphabetic characters, numerals, and punctuation characters, together with codes that control terminal display and communication. A character encoding scheme (also known as a character set or code page) specifies numbers corresponding to each character that the terminal can display. Examples are 7-bit ASCII, EBCDIC Code Page 500, and Japanese Extended UNIX Code.

Many encoding schemes are used by hardware manufacturers to support different languages. All support the 26 letters of the Latin alphabet, A to Z. In general, single-byte encoding schemes are used for European languages and multi-byte encoding schemes for Asian languages.

There are two general groups of encoding schemes, those based on 7-bit ASCII and those based on IBM EBCDIC. Within each group, all schemes normally use the same encoding for the 26 Latin characters (A to Z), but use different encoding for other characters used in languages other than English. ASCII and EBCDIC use different encodings, even for the Latin characters.

National Language Support Enhancements

Oracle7 Server release 7.3 supports certain national language parameters as environment variables that can be altered by issuing appropriate operating-system commands. Greater flexibility for multi-lingual applications is thereby provided by allowing more granular specification of NLS parameters. The environment variables include NLS_DATE_FORMAT, NLS_DATE_LANGUAGE, and NLS_SORT, among others whose features are discussed in this chapter.

UTF2 Encoding

The UNICODE encoding scheme, UTF2, a variable-width, multi-byte format, is supported with Oracle7 Server release 7.3 to support both multi-byte and single-byte character sets.

Arabic/Hebrew Display Character Set Support

Semitic languages consist of ligatures and typically two sets of digits (that is, Arabic and Hindi numbers), in addition to their alphabetical characters. Using a display character set allows front-end input and output of ligatures and Arabic/Hindi numbers. Some of the display character sets even contain different shapes of a character whose form is context sensitive to its position in a word. However, a display character set should not be used for data storage purposes. A storage character set is defined for the use of data storage. Oracle7 Server release 7.3 supports conversion between display and storage character sets. The environment variable NLS_LANG defines the storage character set while NLS_DISPLAY sets the display character set. It is the client's responsibility to ensure that no display character set is defined as a storage character set and vice versa.

Background Information

Character Encoding Schemes

Single-Byte 7-Bit Encoding Schemes

Single-Byte 8-Bit Encoding Schemes

Multi-Byte Encoding Schemes

National Language Support Enhancements

UTF2 Encoding

Arabic/Hebrew Display Character Set Support