Oracle7 Server Reference Manual
Background Information
This section provides background information on the issues involved in multi-lingual applications, and shows how they are resolved by the National Language Support (NLS) features of the Oracle7 Server. The remaining sections of this chapter discuss the specific parameters that control NLS operation.
Character Encoding Schemes
To understand how Oracle7 Server deals with character data, it is important to understand the general features of character representation on computers. The appearance of a character on a terminal depends on the convention for character representation used by that terminal. When you press a character key on the keyboard, the terminal generates a numeric code specified by the character encoding scheme in use on that device. When the terminal receives a number representing a character, it displays the character shape specified by that encoding scheme.
Encoding schemes define the representation of alphabetic characters, numerals, and punctuation characters, together with codes that control terminal display and communication. A character encoding scheme (also known as a character set or code page) specifies numbers corresponding to each character that the terminal can display. Examples are 7-bit ASCII, EBCDIC Code Page 500, and Japanese Extended UNIX Code.
Many encoding schemes are used by hardware manufacturers to support different languages. All support the 26 letters of the Latin alphabet, A to Z. In general, single-byte encoding schemes are used for European languages and multi-byte encoding schemes for Asian languages.
Single-Byte 7-Bit Encoding Schemes
Single-byte 7-bit encoding schemes can define up to 128 characters, and normally support just one language. The only characters defined in 7-bit ASCII are the 26 Latin alphabetic characters. Various other 7-bit schemes are used where certain characters (normally punctuation) in 7-bit ASCII are replaced with additional alphanumeric characters required for a specific language.
Single-Byte 8-Bit Encoding Schemes
Single-byte 8-bit encoding schemes can define up to 256 characters, and normally support a group of languages. For example, ISO 8859/1 supports many West European languages.
Multi-Byte Encoding Schemes
Multi-byte encoding schemes are needed for Asian languages because these languages use thousands of characters. A double-byte encoding scheme can support up to 65536 characters. Some multi-byte encoding schemes use the value of the most significant bit to indicate if a byte represents a single-byte character or is the first or second byte of a double-byte character. In other schemes, control codes differentiate single-byte from double-byte characters. A shift-out code indicates that the following bytes are double-byte characters until a shift-in code is encountered.
There are two general groups of encoding schemes, those based on 7-bit ASCII and those based on IBM EBCDIC. Within each group, all schemes normally use the same encoding for the 26 Latin characters (A to Z), but use different encoding for other characters used in languages other than English. ASCII and EBCDIC use different encodings, even for the Latin characters.
National Language Support Enhancements
Oracle7 Server release 7.3 supports certain national language parameters as environment variables that can be altered by issuing appropriate operating-system commands. Greater flexibility for multi-lingual applications is thereby provided by allowing more granular specification of NLS parameters. The environment variables include NLS_DATE_FORMAT, NLS_DATE_LANGUAGE, and NLS_SORT, among others whose features are discussed in this chapter.
UTF2 Encoding
The UNICODE encoding scheme, UTF2, a variable-width, multi-byte format, is supported with Oracle7 Server release 7.3 to support both multi-byte and single-byte character sets.
Arabic/Hebrew Display Character Set Support
Semitic languages consist of ligatures and typically two sets of digits (that is, Arabic and Hindi numbers), in addition to their alphabetical characters. Using a display character set allows front-end input and output of ligatures and Arabic/Hindi numbers. Some of the display character sets even contain different shapes of a character whose form is context sensitive to its position in a word. However, a display character set should not be used for data storage purposes. A storage character set is defined for the use of data storage. Oracle7 Server release 7.3 supports conversion between display and storage character sets. The environment variable NLS_LANG defines the storage character set while NLS_DISPLAY sets the display character set. It is the client's responsibility to ensure that no display character set is defined as a storage character set and vice versa.