* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.
* setup2.sgml (setup-locale-ov): Describe how valid locales are determined by Windows locale support. Change description for modifiers in locale environment variables. (setup-locale-how): Describe new charset behaviour. Mention new getlocale tool to fetch valid locale information from Windows. (setup-locale-missing): Drop now implemented LC_foo options. Explain missing LC_MESSAGES in more detail.
This commit is contained in:
parent
be822de2a1
commit
ff0056d45e
|
@ -1,3 +1,14 @@
|
|||
2010-01-22 Corinna Vinschen <corinna@vinschen.de>
|
||||
|
||||
* new-features.sgml (ov-new1.7.2): Add chapter for news in 1.7.2.
|
||||
* setup2.sgml (setup-locale-ov): Describe how valid locales are
|
||||
determined by Windows locale support. Change description for modifiers
|
||||
in locale environment variables.
|
||||
(setup-locale-how): Describe new charset behaviour. Mention new
|
||||
getlocale tool to fetch valid locale information from Windows.
|
||||
(setup-locale-missing): Drop now implemented LC_foo options.
|
||||
Explain missing LC_MESSAGES in more detail.
|
||||
|
||||
2010-01-17 Corinna Vinschen <corinna@vinschen.de>
|
||||
|
||||
* setup2.sgml (setup-locale): Mention three character codes per
|
||||
|
|
|
@ -1,5 +1,43 @@
|
|||
<sect1 id="ov-new1.7"><title>What's new and what changed in Cygwin 1.7</title>
|
||||
|
||||
<sect2 id="ov-new1.7.2"><title>What's new and what changed from 1.7.1 to 1.7.2</title>
|
||||
|
||||
<screen>
|
||||
- Localization support has been much improved.
|
||||
|
||||
- Cygwin now handles locales using the underlying Windows locale support.
|
||||
The locale must exists in Windows to be recognized.
|
||||
|
||||
- New tool "getlocale" to fetch valid locale values from Windows.
|
||||
|
||||
- Default charset for locales without explicit charset is now choosen
|
||||
from a list of Linx-compatible charsets. For instance en_US -> ISO-8859-1,
|
||||
ja_JP -> EUC-JP.
|
||||
|
||||
- Support for the @euro locale modifier to switch to the ISO-8859-15
|
||||
charset.
|
||||
|
||||
- Default charset in the "C" or "POSIX" locale has been changed back from
|
||||
UTF-8 to ASCII, to circumvent problems with applications expecting a
|
||||
singlebyte charset in the "C"/"POSIX" locale. Still use UTF-8 internally
|
||||
for filename conversion in this case.
|
||||
|
||||
- LC_COLLATE, LC_MONETARY, LC_NUMERIC, and LC_TIME localization is enabled
|
||||
via Windows locale support.
|
||||
|
||||
- New strfmon(3) call.
|
||||
|
||||
- Support open(2) flags O_CLOEXEC and O_TTY_INIT flags. Support
|
||||
fcntl flag F_DUPFD_CLOEXEC. Support socket flags SOCK_CLOEXEC and
|
||||
SOCK_NONBLOCK).
|
||||
|
||||
- Add new Linux-compatible API calls accept4(2), dup3(2), and pipe2(2).
|
||||
|
||||
- fnmatch(3) call is now multibyte-aware.
|
||||
</screen>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="ov-new1.7-os"><title>OS related changes</title>
|
||||
|
||||
<screen>
|
||||
|
|
|
@ -255,35 +255,41 @@ charset. The Cygwin DLL itself, however, will nevertheless use the locale
|
|||
set in the environment (or the "C.UTF-8" default locale) for converting
|
||||
filenames etc.</para>
|
||||
|
||||
<para>When the locale set in the environment specifies an ASCII charset,
|
||||
<para>When the locale in the environment specifies an ASCII charset,
|
||||
for example "C" or "en_US.ASCII", Cygwin will still use UTF-8
|
||||
under the hood to translate filenames. This allows for easier
|
||||
interoperability with applications running in the default "C.UTF-8" locale.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Right now the language and territory, as well as the modifier, are not
|
||||
important to Cygwin, except to fix a single problem. There's a class of
|
||||
characters in the Unicode character set, called the "CJK Ambiguous Width
|
||||
Character set". For these characters the width returned by the
|
||||
wcwidth/wcswidth function is usually 1. This is often a problem in
|
||||
East-Asian languages, which historically use character sets in which
|
||||
these characters have a width of 2. Kind of explains why they are
|
||||
called "ambiguous"...</para>
|
||||
Starting with Cygwin 1.7.2, the language and territory are used to
|
||||
fetch locale-dependent information from Windows. If the language and
|
||||
territory are not known to Windows, the <function>setlocale</function>
|
||||
function fails.</para>
|
||||
|
||||
<para>
|
||||
The problem has been fixed like this. wcwidth/wcswidth usually
|
||||
return 1 as the width of these characters. However, if the language is
|
||||
specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese), wcwidth
|
||||
returns 2 for these characters. Unfortunately this isn't correct in
|
||||
all circumstances, so the user can specify the modifier "@cjknarrow",
|
||||
which modifies the behaviour of wcwidth/wcswidth to return 1 for the
|
||||
ambiguous width characters to return 1 even in those languages.</para>
|
||||
<para>The modifier is used for two cases.</para>
|
||||
|
||||
<para>
|
||||
Other than that, the only important part so far is the character set.
|
||||
<itemizedlist mark="bullet">
|
||||
|
||||
How does that work?</para>
|
||||
<listitem><para>For languages which default to one of the ISO-8859 character
|
||||
sets, the modifier "@euro" can be added to enforce usage of the ISO-8859-15
|
||||
character set, which includes a character for the "Euro" currency sign .</para>
|
||||
</listitem>
|
||||
|
||||
<listitem><para>There's a class of characters in the Unicode character set,
|
||||
called the "CJK Ambiguous Width Character set". For these characters the width
|
||||
returned by the wcwidth/wcswidth function is usually 1. This is often a
|
||||
problem in East-Asian languages, which historically use character sets in
|
||||
which these characters have a width of 2. By default, the wcwidth/wcswidth
|
||||
functions return 1 as the width of these characters, except if the language is
|
||||
specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese). In these
|
||||
languages wcwidth and wcswidth return 2 for these characters. This is not
|
||||
correct in all circumstances, so the user of one of these languages can specify
|
||||
the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to
|
||||
return 1 for the ambiguous width characters.</para>
|
||||
</listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
@ -296,32 +302,47 @@ Assume that you've set one of the aforementioned environment variables to some
|
|||
valid POSIX locale value, other than "C" and "POSIX". Assume further that
|
||||
you're living in Japan. You might want to use the language code "ja" and the
|
||||
territory "JP", thus setting, say, <envar>LANG</envar> to "ja_JP". You didn't
|
||||
set a character set, so what will Cygwin use now? Easy! It will use the
|
||||
default Windows ANSI codepage of your system, if it's supported by Cygwin.
|
||||
Hopefully Cygwin supports all relevant default ANSI codepages...</para>
|
||||
set a character set, so what will Cygwin use now? Starting with Cygwin 1.7.2,
|
||||
the default character set is determined by the default Windows ANSI codepage
|
||||
for this language and territory. Cygwin uses a character set which is the
|
||||
typical Unix-equivalent to the Windows ANSI codepage. For instance:</para>
|
||||
|
||||
<note><para>For a list of supported character sets, see
|
||||
<xref linkend="setup-locale-charsetlist"></xref>
|
||||
</para></note>
|
||||
<screen>
|
||||
"en_US" ISO-8859-1
|
||||
"el_GR" ISO-8859-7
|
||||
"pl_PL" ISO-8859-2
|
||||
"pl_PL@euro" ISO-8859-15
|
||||
"ja_JP" EUCJP
|
||||
"ko_KR" EUCKR
|
||||
"te_IN" UTF-8
|
||||
</screen>
|
||||
</listitem>
|
||||
|
||||
<listitem><para>
|
||||
You don't want to use the default Windows codepage as character set?
|
||||
In that case you have to specify the charset explicitly. For instance,
|
||||
assume you're from Italy and don't want to use the Italian default Windows
|
||||
ANSI codepage 1252, but the more portable ISO-8859-15 character set.
|
||||
What you can do, for instance, is to set the <envar>LANG</envar> variable
|
||||
in the <filename>C:\cygwin\Cygwin.bat</filename> file which is the batch file
|
||||
to start a Cygwin session from the "Cygwin" desktop shortcut.</para>
|
||||
You don't want to use the default character set? In that case you have to
|
||||
specify the charset explicitly. For instance, assume you're from Japan and
|
||||
don't want to use the japanese default charset EUC-JP, but the Windows
|
||||
default charset SJIS. What you can do, for instance, is to set the
|
||||
<envar>LANG</envar> variable in the <filename>C:\cygwin\Cygwin.bat</filename>
|
||||
file which is the batch file to start a Cygwin session from the "Cygwin"
|
||||
desktop shortcut.</para>
|
||||
|
||||
<screen>
|
||||
@echo off
|
||||
|
||||
C:
|
||||
chdir C:\cygwin\bin
|
||||
set LANG=it_IT.ISO-8859-15
|
||||
set LANG=ja_JP.SJIS
|
||||
bash --login -i
|
||||
</screen>
|
||||
|
||||
<note><para>For a list of locales supported by your Windows machine, use the new
|
||||
><command>getlocale -a</command> command, which is part of the Cygwin package.
|
||||
For a description see <xref linkend="getlocale"></xref></para></note>
|
||||
|
||||
<note><para>For a list of supported character sets, see
|
||||
<xref linkend="setup-locale-charsetlist"></xref>
|
||||
</para></note>
|
||||
</listitem>
|
||||
|
||||
<listitem><para>
|
||||
|
@ -435,19 +456,18 @@ entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5,
|
|||
<sect2 id="setup-locale-missing"><title>What does not work?</title>
|
||||
|
||||
<para>
|
||||
Except for <envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>,
|
||||
and <envar>LANG</envar>, all other LC_xxx environment variables,
|
||||
<envar>LC_COLLATE</envar>, <envar>LC_MESSAGES</envar>,
|
||||
<envar>LC_MONETARY</envar>, <envar>LC_NUMERIC</envar>,
|
||||
and <envar>LC_TIME</envar>, are ignored right now. This means, while Cygwin
|
||||
supports different character sets, it does <emphasis>not</emphasis> support
|
||||
real localization so far. There's no support for locale-specific monetary
|
||||
symbols, for a decimalpoint other than '.', no support for native time
|
||||
formats, and no support for native language sorting orders.
|
||||
</para>
|
||||
The environment variable and locale setting <envar>LC_MESSAGES</envar>
|
||||
is ignored right now. There's no known WIndows function to fetch the
|
||||
regular expressions to recognize user input with the meaning of "yes"
|
||||
or "no" from some Windows function. Therefore,
|
||||
<function>nl_langinfo(YESEXPR)</function> and
|
||||
<function>nl_langinfo(NOEXPR)</function> always return a string
|
||||
suitable only for the English language.</para>
|
||||
|
||||
<para>Cygwin's internationalization support is work in progress and we would
|
||||
be glad for coding help in this area.</para>
|
||||
<para>If somebody knows a simple solution to this problem, feel free
|
||||
to notify us on the
|
||||
<ulink url="mailto:cygwin@cygin.com">Cygwin mailing list</ulink>.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
|
Loading…
Reference in New Issue