From 0b8e38dd8bc9b4760c0667509d1eb5ef46c09ff3 Mon Sep 17 00:00:00 2001 From: Corinna Vinschen Date: Sun, 17 Jan 2010 14:55:57 +0000 Subject: [PATCH] * setup2.sgml (setup-locale): Mention three character codes per ISO 639-3. * setup2.sgml (setup-locale): Adapt description to the C using ASCII change in 1.7.2. --- winsup/doc/ChangeLog | 11 +++++++++++ winsup/doc/setup2.sgml | 42 +++++++++++++++++++++++++++--------------- 2 files changed, 38 insertions(+), 15 deletions(-) diff --git a/winsup/doc/ChangeLog b/winsup/doc/ChangeLog index 4c5854579..7e831e68c 100644 --- a/winsup/doc/ChangeLog +++ b/winsup/doc/ChangeLog @@ -1,3 +1,14 @@ +2010-01-17 Corinna Vinschen + + * setup2.sgml (setup-locale): Mention three character codes per + ISO 639-3. + +2010-01-17 Corinna Vinschen + Andy Koppe + + * setup2.sgml (setup-locale): Adapt description to the C using ASCII + change in 1.7.2. + 2010-01-16 Christopher Faylor * setup-net.sgml: Remove obsolete assertion. diff --git a/winsup/doc/setup2.sgml b/winsup/doc/setup2.sgml index 51fc0d3c2..8c03babae 100644 --- a/winsup/doc/setup2.sgml +++ b/winsup/doc/setup2.sgml @@ -183,8 +183,11 @@ specifier is language[[_TERRITORY][.charset][@modifier]] -"language" is a lowercase two character string per ISO 639-1, -"TERRITORY" is an uppercase two character string per ISO 3166, charset is +"language" is a lowercase two character string per ISO 639-1, or, +if there is no ISO 639-1 code for the language (for instance, "Lower Sorbian"), +a three character string per ISO 639-3. + +"TERRITORY" is an uppercase two character string per ISO 3166, charset is one of a list of supported character sets, and the modifier doesn't matter here (though it might for some applications). If you're interested in the exact description, you can find it in the online publication of the POSIX @@ -197,21 +200,23 @@ manual pages on the homepage of the "de_CH" language = German, territory = Switzerland, default charset "fr_FR.UTF-8" language = french, territory = France, charset = UTF-8 "ko_KR.eucKR" language = korean, territory = South Korea, charset = eucKR + "syr_SY" language = Syriac, territory = Syria, default charset At application startup, the application's locale is set to the default -"C" or "POSIX" locale. Under Cygwin, this locale defaults to the UTF-8 -character set. If you want to stick to the "C" locale and only change to -another charset, you can define this by setting one of the locale environment -variables to "C.charset". For instance +"C" or "POSIX" locale. Under Cygwin 1.7.2 and later, this locale defaults +to the ASCII character set on the application level. If you want to stick +to the "C" locale and only change to another charset, you can define this +by setting one of the locale environment variables to "C.charset". For +instance "C.ISO-8859-1" -The default locale in the absence of the aforementioned locale -environment variables is "C.UTF-8". +The default locale in the absence of the aforementioned locale +environment variables is "C.UTF-8". Windows uses the UTF-16 charset exclusively to store the names of any object used by the Operating System. This is especially important @@ -232,8 +237,8 @@ process. However, even if one of the locale environment variables is set to some other value than "C", this does only affect how Cygwin itself converts filenames. As the POSIX standard requires, -it's the applications responsibility to activate that locale for its -own purpose, typically by using the call +it's the application's responsibility to activate that locale for its +own purposes, typically by using the call setlocale (LC_ALL, ""); @@ -244,6 +249,18 @@ lost: If the application calls setlocale as above, and there is none of the important locale variables set in the environment, the locale is set to the default locale, which is "C.UTF-8". +But what about applications which are not locale-aware? Per POSIX, +they are running in the "C" or "POSIX" locale, which implies the ASCII +charset. The Cygwin DLL itself, however, will nevertheless use the locale +set in the environment (or the "C.UTF-8" default locale) for converting +filenames etc. + +When the locale set in the environment specifies an ASCII charset, +for example "C" or "en_US.ASCII", Cygwin will still use UTF-8 +under the hood to translate filenames. This allows for easier +interoperability with applications running in the default "C.UTF-8" locale. + + Right now the language and territory, as well as the modifier, are not important to Cygwin, except to fix a single problem. There's a class of @@ -274,11 +291,6 @@ How does that work? - -The default locale is the "C" or "POSIX" locale. Under Cygwin this locale -defaults to the UTF-8 character set. - - Assume that you've set one of the aforementioned environment variables to some valid POSIX locale value, other than "C" and "POSIX". Assume further that