NLS(7)              NetBSD Miscellaneous Information Manual             NLS(7)

NAME
     NLS -- Native Language Support Overview

DESCRIPTION
     Native Language Support (NLS) provides commands for a single worldwide
     operating system base.  An internationalized system has no built-in
     assumptions or dependencies on language-specific or cultural-specific
     conventions such as:

           +   Character classifications
           +   Character comparison rules
           +   Character collation order
           +   Numeric and monetary formatting
           +   Date and time formatting
           +   Message-text language
           +   Character sets

     All information pertaining to cultural conventions and language is
     obtained at program run time.

     ``Internationalization'' (often abbreviated ``i18n'') refers to the oper-
     ation by which system software is developed to support multiple cultural-
     specific and language-specific conventions.  This is a generalization
     process by which the system is untied from calling only English strings
     or other English-specific conventions.  ``Localization'' (often abbrevi-
     ated ``l10n'') refers to the operations by which the user environment is
     customized to handle its input and output appropriate for specific lan-
     guage and cultural conventions.  This is a specialization process, by
     which generic methods already implemented in an internationalized system
     are used in specific ways.  The formal description of cultural conven-
     tions for some country, together with all associated translations tar-
     geted to the native language, is called the ``locale''.

     NetBSD provides extensive support to programmers and system developers to
     enable internationalized software to be developed.  NetBSD also supplies
     a large variety of locales for system localization.

   Localization of Information
     All locale information is accessible to programs at run time so that data
     is processed and displayed correctly for specific cultural conventions
     and language.

     A locale is divided into categories.  A category is a group of language-
     specific and culture-specific conventions as outlined in the list above.
     ISO C specifies the following six standard categories supported by
     NetBSD:

     LC_COLLATE     string-collation order information
     LC_CTYPE       character classification, case conversion, and other char-
                    acter attributes
     LC_MESSAGES    the format for affirmative and negative responses
     LC_MONETARY    rules and symbols for formatting monetary numeric informa-
                    tion
     LC_NUMERIC     rules and symbols for formatting nonmonetary numeric
                    information
     LC_TIME        rules and symbols for formatting time and date information

     Localization of the system is achieved by setting appropriate values in
     environment variables to identify which locale should be used.  The envi-
     ronment variables have the same names as their respective locale cate-
     gories.  Additionally, the LANG, LC_ALL, and NLSPATH environment vari-
     ables are used.  The NLSPATH environment variable specifies a colon-sepa-
     rated list of directory names where the message catalog files of the NLS
     database are located.  The LC_ALL and LANG environment variables also
     determine the current locale.

     The values of these environment variables contains a string format as:

             language[_territory][.codeset][@modifier]

     Valid values for the language field come from the ISO639 standard which
     defines two-character codes for many languages.  Some common language
     codes are:

     Language Name   Code   Language Family
     ABKHAZIAN       AB     IBERO-CAUCASIAN
     AFAN (OROMO)    OM     HAMITIC
     AFAR            AA     HAMITIC
     AFRIKAANS       AF     GERMANIC
     ALBANIAN        SQ     INDO-EUROPEAN (OTHER)
     AMHARIC         AM     SEMITIC
     ARABIC          AR     SEMITIC
     ARMENIAN        HY     INDO-EUROPEAN (OTHER)
     ASSAMESE        AS     INDIAN
     AYMARA          AY     AMERINDIAN
     AZERBAIJANI     AZ     TURKIC/ALTAIC
     BASHKIR         BA     TURKIC/ALTAIC
     BASQUE          EU     BASQUE
     BENGALI         BN     INDIAN
     BHUTANI         DZ     ASIAN
     BIHARI          BH     INDIAN
     BISLAMA         BI
     BRETON          BR     CELTIC
     BULGARIAN       BG     SLAVIC
     BURMESE         MY     ASIAN
     BYELORUSSIAN    BE     SLAVIC
     CAMBODIAN       KM     ASIAN
     CATALAN         CA     ROMANCE
     CHINESE         ZH     ASIAN
     CORSICAN        CO     ROMANCE
     CROATIAN        HR     SLAVIC
     CZECH           CS     SLAVIC
     DANISH          DA     GERMANIC
     DUTCH           NL     GERMANIC
     ENGLISH         EN     GERMANIC
     ESPERANTO       EO     INTERNATIONAL AUX.
     ESTONIAN        ET     FINNO-UGRIC
     FAROESE         FO     GERMANIC
     FIJI            FJ     OCEANIC/INDONESIAN
     FINNISH         FI     FINNO-UGRIC
     FRENCH          FR     ROMANCE
     FRISIAN         FY     GERMANIC
     GALICIAN        GL     ROMANCE
     GEORGIAN        KA     IBERO-CAUCASIAN
     GERMAN          DE     GERMANIC
     GREEK           EL     LATIN/GREEK
     GREENLANDIC     KL     ESKIMO
     GUARANI         GN     AMERINDIAN
     GUJARATI        GU     INDIAN
     HAUSA           HA     NEGRO-AFRICAN
     HEBREW          HE     SEMITIC
     HINDI           HI     INDIAN
     HUNGARIAN       HU     FINNO-UGRIC
     ICELANDIC       IS     GERMANIC
     INDONESIAN      ID     OCEANIC/INDONESIAN
     INTERLINGUA     IA     INTERNATIONAL AUX.
     INTERLINGUE     IE     INTERNATIONAL AUX.
     INUKTITUT       IU
     INUPIAK         IK     ESKIMO
     IRISH           GA     CELTIC
     ITALIAN         IT     ROMANCE
     JAPANESE        JA     ASIAN
     JAVANESE        JV     OCEANIC/INDONESIAN
     KANNADA         KN     DRAVIDIAN
     KASHMIRI        KS     INDIAN
     KAZAKH          KK     TURKIC/ALTAIC
     KINYARWANDA     RW     NEGRO-AFRICAN
     KIRGHIZ         KY     TURKIC/ALTAIC
     KURUNDI         RN     NEGRO-AFRICAN
     KOREAN          KO     ASIAN
     KURDISH         KU     IRANIAN
     LAOTHIAN        LO     ASIAN
     LATIN           LA     LATIN/GREEK
     LATVIAN         LV     BALTIC
     LINGALA         LN     NEGRO-AFRICAN
     LITHUANIAN      LT     BALTIC
     MACEDONIAN      MK     SLAVIC
     MALAGASY        MG     OCEANIC/INDONESIAN
     MALAY           MS     OCEANIC/INDONESIAN
     MALAYALAM       ML     DRAVIDIAN
     MALTESE         MT     SEMITIC
     MAORI           MI     OCEANIC/INDONESIAN
     MARATHI         MR     INDIAN
     MOLDAVIAN       MO     ROMANCE
     MONGOLIAN       MN
     NAURU           NA
     NEPALI          NE     INDIAN
     NORWEGIAN       NO     GERMANIC
     OCCITAN         OC     ROMANCE
     ORIYA           OR     INDIAN
     PASHTO          PS     IRANIAN
     PERSIAN (farsi) FA     IRANIAN
     POLISH          PL     SLAVIC
     PORTUGUESE      PT     ROMANCE
     PUNJABI         PA     INDIAN
     QUECHUA         QU     AMERINDIAN
     RHAETO-ROMANCE  RM     ROMANCE
     ROMANIAN        RO     ROMANCE
     RUSSIAN         RU     SLAVIC
     SAMOAN          SM     OCEANIC/INDONESIAN
     SANGHO          SG     NEGRO-AFRICAN
     SANSKRIT        SA     INDIAN
     SCOTS GAELIC    GD     CELTIC
     SERBIAN         SR     SLAVIC
     SERBO-CROATIAN  SH     SLAVIC
     SESOTHO         ST     NEGRO-AFRICAN
     SETSWANA        TN     NEGRO-AFRICAN
     SHONA           SN     NEGRO-AFRICAN
     SINDHI          SD     INDIAN
     SINGHALESE      SI     INDIAN
     SISWATI         SS     NEGRO-AFRICAN
     SLOVAK          SK     SLAVIC
     SLOVENIAN       SL     SLAVIC
     SOMALI          SO     HAMITIC
     SPANISH         ES     ROMANCE
     SUNDANESE       SU     OCEANIC/INDONESIAN
     SWAHILI         SW     NEGRO-AFRICAN
     SWEDISH         SV     GERMANIC
     TAGALOG         TL     OCEANIC/INDONESIAN
     TAJIK           TG     IRANIAN
     TAMIL           TA     DRAVIDIAN
     TATAR           TT     TURKIC/ALTAIC
     TELUGU          TE     DRAVIDIAN
     THAI            TH     ASIAN
     TIBETAN         BO     ASIAN
     TIGRINYA        TI     SEMITIC
     TONGA           TO     OCEANIC/INDONESIAN
     TSONGA          TS     NEGRO-AFRICAN
     TURKISH         TR     TURKIC/ALTAIC
     TURKMEN         TK     TURKIC/ALTAIC
     TWI             TW     NEGRO-AFRICAN
     UIGUR           UG
     UKRAINIAN       UK     SLAVIC
     URDU            UR     INDIAN
     UZBEK           UZ     TURKIC/ALTAIC
     VIETNAMESE      VI     ASIAN
     VOLAPUK         VO     INTERNATIONAL AUX.
     WELSH           CY     CELTIC
     WOLOF           WO     NEGRO-AFRICAN
     XHOSA           XH     NEGRO-AFRICAN
     YIDDISH         YI     GERMANIC
     YORUBA          YO     NEGRO-AFRICAN
     ZHUANG          ZA
     ZULU            ZU     NEGRO-AFRICAN

     For example, the locale for the Danish language spoken in Denmark using
     the ISO 8859-1 character set is da_DK.ISO8859-1.  The da stands for the
     Danish language and the DK stands for Denmark.  The short form of da_DK
     is sufficient to indicate this locale.

     The environment variable settings are queried by their priority level in
     the following manner:

     +   If the LC_ALL environment variable is set, all six categories use the
         locale it specifies.

     +   If the LC_ALL environment variable is not set, each individual cate-
         gory uses the locale specified by its corresponding environment vari-
         able.

     +   If the LC_ALL environment variable is not set, and a value for a par-
         ticular LC_* environment variable is not set, the value of the LANG
         environment variable specifies the default locale for all categories.
         Only the LANG environment variable should be set in /etc/profile,
         since it makes it most easy for the user to override the system
         default using the individual LC_* variables.

     +   If the LC_ALL environment variable is not set, a value for a particu-
         lar LC_* environment variable is not set, and the value of the LANG
         environment variable is not set, the locale for that specific cate-
         gory defaults to the C locale.  The C or POSIX locale assumes the
         ASCII character set and defines information for the six categories.

   Character Sets
     A character is any symbol used for the organization, control, or repre-
     sentation of data.  A group of such symbols used to describe a particular
     language make up a character set.  It is the encoding values in a charac-
     ter set that provide the interface between the system and its input and
     output devices.

     The following character sets are supported in NetBSD:

     ASCII            The American Standard Code for Information Exchange
                      (ASCII) standard specifies 128 Roman characters and con-
                      trol codes, encoded in a 7-bit character encoding
                      scheme.

     ISO 8859 family  Industry-standard character sets specified by the
                      ISO/IEC 8859 standard.  The standard is divided into 15
                      numbered parts, with each part specifying broad script
                      similarities.  Examples include Western European, Cen-
                      tral European, Arabic, Cyrillic, Hebrew, Greek, and
                      Turkish.  The character sets use an 8-bit character
                      encoding scheme which is compatible with the ASCII char-
                      acter set.

     Unicode          The Unicode character set is the full set of known
                      abstract characters of all real-world scripts.  It can
                      be used in environments where multiple scripts must be
                      processed simultaneously.  Unicode is compatible with
                      ISO 8859-1 (Western European) and ASCII.  Many character
                      encoding schemes are available for Unicode, including
                      UTF-8, UTF-16 and UTF-32.  These encoding schemes are
                      multi-byte encodings.  The UTF-8 encoding scheme uses
                      8-bit, variable-width encodings which is compatible with
                      ASCII.  The UTF-16 encoding scheme uses 16-bit, vari-
                      able-width encodings.  The UTF-32 encoding scheme using
                      32-bit, fixed-width encodings.

   Font Sets
     A font set contains the glyphs to be displayed on the screen for a corre-
     sponding character in a character set.  A display must support a suitable
     font to display a character set.  If suitable fonts are available to the
     X server, then X clients can include support for different character
     sets.  xterm(1) includes support for Unicode with UTF-8 encoding.  xfd(1)
     is useful for displaying all the characters in an X font.

     The NetBSD wscons(4) console provides support for loading fonts using the
     wsfontload(8) utility.  Currently, only fonts for the ISO8859-1 family of
     character sets are supported.

   Internationalization for Programmers
     To facilitate translations of messages into various languages and to make
     the translated messages available to the program based on a user's
     locale, it is necessary to keep messages separate from the programs and
     provide them in the form of message catalogs that a program can access at
     run time.

     Access to locale information is provided through the setlocale(3) and
     nl_langinfo(3) interfaces.  See their respective man pages for further
     information.

     Message source files containing application messages are created by the
     programmer and converted to message catalogs.  These catalogs are used by
     the application to retrieve and display messages, as needed.

     NetBSD supports two message catalog interfaces: the X/Open catgets(3)
     interface and the Uniforum gettext(3) interface.  The catgets(3) inter-
     face has the advantage that it belongs to a standard which is well sup-
     ported.  Unfortunately the interface is complicated to use and mainte-
     nance of the catalogs is difficult.  The implementation also doesn't sup-
     port different character sets.  The gettext(3) interface has not been
     standardized yet, however it is being supported by an increasing number
     of systems.  It also provides many additional tools which make program-
     ming and catalog maintenance much easier.

   Support for Multi-byte Encodings
     Some character sets with multi-byte encodings may be difficult to decode,
     or may contain state (i.e., adjacent characters are dependent).  ISO C
     specifies a set of functions using 'wide characters' which can handle
     multi-byte encodings properly.  The behaviour of these functions is
     affected by the LC_CTYPE category of the current locale.

     A wide character is specified in ISO C as being a fixed number of bits
     wide and is stateless.  There are two types for wide characters: wchar_t
     and wint_t.  wchar_t is a type which can contain one wide character and
     operates like 'char' type does for one character.  wint_t can contain one
     wide character or WEOF (wide EOF).

     There are functions that operate on wchar_t, and substitute for functions
     operating on 'char'.  See wmemchr(3) and towlower(3) for details.  There
     are some additional functions that operate on wchar_t.  See wctype(3) and
     wctrans(3) for details.

     Wide characters should be used for all I/O processing which may rely on
     locale-specific strings.  The two primary issues requiring special use of
     wide characters are:

           +   All I/O is performed using multibyte characters.  Input data is
               converted into wide characters immediately after reading and
               data for output is converted from wide characters to multi-byte
               encoding immediately before writing.  Conversion is controlled
               by the mbstowcs(3), mbsrtowcs(3), wcstombs(3), wcsrtombs(3),
               mblen(3), mbrlen(3), and mbsinit(3).

           +   Wide characters are used directly for I/O, using getwchar(3),
               fgetwc(3), getwc(3), ungetwc(3), fgetws(3), putwchar(3),
               fputwc(3), putwc(3), and fputws(3).  They are also used for
               formatted I/O functions for wide characters such as fwscanf(3),
               wscanf(3), swscanf(3), fwprintf(3), wprintf(3), swprintf(3),
               vfwprintf(3), vwprintf(3), and vswprintf(3), and wide character
               identifier of %lc, %C, %ls, %S for conventional formatted I/O
               functions.

SEE ALSO
     gencat(1), xfd(1), xterm(1), catgets(3), gettext(3), nl_langinfo(3),
     setlocale(3), wsfontload(8)

BUGS
     This man page is incomplete.

NetBSD 5.1                     February 21, 2007                    NetBSD 5.1

You can also request any man page by name and (optionally) by section:

Command: 
Section: 
Architecture: 
Collection: 
 

Use the DEFAULT collection to view manual pages for third-party software.


©1994 Man-cgi 1.15, Panagiotis Christias <christia@softlab.ntua.gr>
©1996-2014 Modified for NetBSD by Kimmo Suominen