https://cygwin.com/pipermail/cygwin/2022-December/252571.html
Cygwin's default locale is "C.UTF-8" as far as LC_CTYPE settings
are concerned. However, while __global_locale contains fixed
mbtowc and wctomb pointers, the lc_ctype_T pointer is still pointing
to _C_ctype_locale, representing the standard "C" locale.
The problem with this is that the codeset name as well as MB_CUR_MAX
is wrong.
Fix this by introducing a new lc_ctype_T structure called
_C_utf8_ctype_locale, setting the default codeset to "UTF-8" and
MB_CUR_MAX to 6. Use this as lc_ctype_T pointer in __global_locale
by default on Cygwin.
Fixes: a6a477fa8190 ("POSIX-1.2008 per-thread locales, groundwork part 1")
Co-Authored-By: Takashi Yano <takashi.yano@nifty.ne.jp>
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
By default, Newlib uses a huge object of type struct _reent to store
thread-specific data. This object is returned by __getreent() if the
__DYNAMIC_REENT__ Newlib configuration option is defined.
The reentrancy structure contains for example errno and the standard input,
output, and error file streams. This means that if an application only uses
errno it has a dependency on the file stream support even if it does not use
it. This is an issue for lower end targets and applications which need to
qualify the software according to safety standards (for example ECSS-E-ST-40C,
ECSS-Q-ST-80C, IEC 61508, ISO 26262, DO-178, DO-330, DO-333).
If the new _REENT_THREAD_LOCAL configuration option is enabled, then struct
_reent is replaced by dedicated thread-local objects for each struct _reent
member. The thread-local objects are defined in translation units which use
the corresponding object.
When __HAVE_LOCALE_INFO__ is not selected, directly access the
existing _ctype_ variable from __locale_ctype_ptr() and
__locale_ctype_ptr_l(), eliminating the need for any locale or reent
structure
Signed-off-by: Keith Packard <keithp@keithp.com>
v2:
locale: fix conflict with __locale_ctype_ptr macro
If we are building without __HAVE_LOCALE_INFO__, there is a
macro providing __locale_ctype_ptr which in turn fouls up this
declaration.
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Locale modifier @cjkwide makes Unicode "ambiguous width" characters
wide. So ambiguous width characters can be enforced to have width 2
even in non-CJK locales. This gives e.g. users of "Powerline symbols"
the opportunity to adjust their width to the desired behaviour (and the
behaviour apparently expected by some tools) without having to set a CJK
locale and without losing consistence of terminal character width with
wcwidth/wcswidth locale width.
Problem:
After passing locales created by 'duplocale' to 'uselocale',
referencing 'MB_CUR_MAX', which is actually expanded to
'__locale_mb_cur_max()' by preprocessors, causes segmentation faults.
Direct use of locales from 'newlocale' does not cause the problem.
This is the problem of 'duplocale'.
$ echo $LANG
ja_JP.UTF-8
$ cat test.c
#include <stdlib.h>
#include <locale.h>
volatile int var;
int main(void) {
locale_t const loc = newlocale(LC_ALL_MASK, "", NULL);
locale_t const dup = duplocale(loc);
locale_t const old = uselocale(dup);
var = MB_CUR_MAX; /* <-- crashes here */
uselocale(old);
freelocale(dup);
freelocale(loc);
return 0;
}
$ gcc test.c
$ ./a
Segmentation fault (core dumped)
# Note: "core dumped" in the above message was actually written in
# Japanese, but I translated the part to post a mail in English.
Bug:
In the beginning of '__loadlocale' (newlib/libc/locale/locale.c:501),
there is a code which checks if the operations can be skipped:
> /* Avoid doing everything twice if nothing has changed. */
> if (!strcmp (new_locale, loc->categories[category]))
> return loc->categories[category];
While, in the function '_duplocale_r' (newlib/libc/locale/
duplocale.c), '__loadlocale' is called as in the quoted codes:
> /* If the object is not a "C" locale category, copy it. Just call
> __loadlocale. It knows what to do to replicate the category. */
> tmp_locale.lc_cat[i].ptr = NULL;
> tmp_locale.lc_cat[i].buf = NULL;
> if (!__loadlocale (&tmp_locale, i, tmp_locale.categories[i]))
> goto error;
This call of '__loadlocale' results in the skip check being
!strcmp(tmp_locale.categories[i], tmp_locale.categories[i]),
which is always true. This means that the actual operations of
'__loadLocale' will never be performed for 'duplocale'.
Fix:
The call of '__loadlocale' in '_duplocale_r' is modified.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Keep __ctype_ptr__ available on Cygwin only, for backward compatibility
with existing apps referencing it via the ctype macros.
Otherwise initialize __global_locale.ctype_ptr and __C_locale.ctype_ptr
and use them throughout.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Only access "C" locale using the new __get_C_locale inline function.
Enable __global_locale for !_MB_CAPABLE targets. Accommodate !_MB_CAPABLE
targets in new locale code.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Add global const __C_locale for reference purposes.
Bump Cygwin API minor number and DLL major version number to 2.6.0.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
This allows looping through the structs and buffers. Also
rearrange definitions to follow order of LC_xxx values.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
Don't use global variables. This allows to call loadlocale from
the yet to be created newlocale().
Rename _thr_locale_t to __locale_t (these locales are not restricted
to threads so the name is misleading).
Along these lines, fix _set_ctype to take a __locale_t as parameter.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
- Remove charset parameter from low level __foo_wctomb/__foo_mbtowc calls.
- Instead, create array of function for ISO and Windows codepages to point
to function which does not require to evaluate the charset string on
each call. Create matching helper functions. I.e., __iso_wctomb,
__iso_mbtowc, __cp_wctomb and __cp_mbtowc are functions returning the
right function pointer now.
- Create __WCTOMB/__MBTOWC macros utilizing per-reent locale and replace
calls to __wctomb/__mbtowc with calls to __WCTOMB/__MBTOWC.
- Drop global __wctomb/__mbtowc vars.
- Utilize aforementioned changes in Cygwin to get rid of charset in other,
calling functions and simplify the code.
- In Cygwin restrict global cygheap locale info to the job performed
by internal_setlocale. Use UTF-8 instead of ASCII on the fly in
internal conversion functions.
- In Cygwin dll_entry, make sure to initialize a TLS area with a NULL
_REENT->_locale pointer. Add comment to explain why.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
Move all locale category structure definitions into setlocale.h and remove
other headers in locale subdir. Create inline accessor functions for
current category struct pointers and use throughout. Use pointers to
"C" locale category structs by default in __global_locale.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
Introduce first cut of struct _thr_locale_t used for the locale_t definition.
Introduce global instance called __global_locale used by default.
Introduce internal inline functions __get_global_locale, __get_locale_r,
__get_current_locale.
Remove usage of global variables in favor of accessor functions pointing to
__global_locale for now. Include all local headers in locale subdir from
setlocale.h to get single include for internal locale access.
Introduce __CTYPE_PTR macro to replace direct access to __ctype_ptr__
and use throughout in isxxx functions.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
accidental declaration of __numeric_load_locale.
* libc/locale/locale.c: Include timelocal.h to get declaration of
__time_load_locale.
(__set_locale_from_locale_alias): Fix return type.
(__locale_msgcharset): Avoid compiler warnings.
(_localeconv_r): Ditto.
* libc/locale/locale.c (current_categories): On Cygwin, set LC_CTYPE
to C.UTF-8 to match initial __wctomb and __mbtowc settings.
(lc_ctype_charset): On Cygwin, initialize to "UTF-8".
(loadlocale): Remove unused Cygwin-specifc code.
if __HAVE_LOCALE_INFO_EXTENDED__ is defined.
* libc/include/langinfo.h (enum __nl_item): New type. Define all
native values accessible through nl_langinfo. Define previously
existing POSIX-compatible values as macros as well.
* libc/include/stdlib.h (__mb_cur_max): Drop declaration.
(__locale_mb_cur_max): Declare.
(MB_CUR_MAX): Re-define calling __locale_mb_cur_max.
* libc/locale/Makefile.am (ELIX_SOURCES): Add lctype.c.
* libc/locale/Makefile.in: Regenerate.
* libc/locale/lctype.c: New file to define and load LC_CTYPE category.
* libc/locale/lctype.h: New file, matching header.
* libc/locale/lmessages.c (_C_messages_locale): Add default values for
wide char members.
(__messages_load_locale): Add _C_messages_locale in call to
__set_lc_messages_from_win.
* libc/locale/lmessages.h (struct lc_messages_T): Add wide char members.
* libc/locale/lmonetary.c (_C_monetary_locale): Add default values for
wide char members.
(__monetary_load_locale): Add _C_monetary_locale in call to
__set_lc_monetary_from_win.
* libc/locale/lmonetary.h (struct lc_monetary_T): Add wide char members.
Add numerical values for international currency formatting per
POSIX-1.2008, if __HAVE_LOCALE_INFO_EXTENDED__ is defined.
* libc/locale/lnumeric.c (_C_numeric_locale): Add default values for
wide char members.
(__numeric_load_locale): Add _C_numeric_locale in call to
__set_lc_numeric_from_win.
* libc/locale/lnumeric.h (struct lc_numeric_T): Add wide char members.
* libc/locale/locale.c (loadlocale): Return doing nothing if category
locale didn't change. Convert category if chain to switch statement.
Call __ctype_load_locale in LC_CTYPE case.
(__locale_charset): Add (but disable for now) returning codeset from
__get_current_ctype_locale.
(__locale_mb_cur_max): Add (but disable for now) returning mb_cur_max
from __get_current_ctype_locale.
(__locale_msgcharset): Add returning codeset from
__get_current_messages_locale.
(_localeconv_r): Accommodate int_XXX values.
* libc/locale/nl_langinfo.c (nl_ext): New array to define what is to
be returned for non-POSIX values.
(nl_Langinfo): Return correct codeset for each locale category. Return
extended values if __HAVE_LOCALE_INFO_EXTENDED__ is defined.
* libc/locale/timelocal.c (_C_time_locale): Add default values for
wide char members.
(__time_load_locale): Add _C_time_locale in call to
__set_lc_time_from_win.
* libc/locale/timelocal.h (struct lc_time_T): Add wide char members.
* libc/stdio/vfwprintf.c (_VFWPRINTF_R): Use wide char decimal point
and thousands_sep if __HAVE_LOCALE_INFO_EXTENDED__ is defined.
* libc/time/strftime.c: Rework to accommodate availability of wide char
strings in LC_TIME category if __HAVE_LOCALE_INFO_EXTENDED__ is defined.
Cygwin only: Allow GB2312 and EUC-CN as alternative codeset names
for GBK. Add to documentation.
* libc/locale/nl_langinfo.c (nl_langinfo): On Cygwin, translate EUCCN
to GB2312.
reason for using __CYGWIN__.
(lconv): Remove _CONST entirely.
(loadlocale): Guard calls to function loading locale-specific
category data with __HAVE_LOCALE_INFO__ rather than __CYGWIN__.
* libc/sys/config.h (__HAVE_LOCALE_INFO__): Define for Cygwin.
parameters for wide char to multibyte conversion. Call
__set_lc_messages_from_win on Cygwin.
* libc/locale/lmessages.h: Make C++-safe.
(__messages_load_locale): Change declaration.
* libc/locale/lmonetary.c (__monetary_load_locale): Use
_monetary_locale_buf as buffer pointer.
* libc/locale/lnumeric.c (__numeric_load_locale): Use
_numeric_locale_buf as buffer pointer.
* libc/locale/timelocal.c (__time_load_locale): Use time_locale_buf
as buffer pointer.
* libc/locale/locale.c (loadlocale): Enable loading LC_MESSAGES data
on Cygwin.
support to documentation.
(__set_locale_from_locale_alias): Declare when build for Cygwin.
(loadlocale): On Cygwin, if locale can't be recognized, call
__set_locale_from_locale_alias to check for locale alias.
Define FAIL macro to replace `return NULL' statements. Replace
throughout.
(_CTYPE_GEORGIAN_PS_255): Define.
(_CTYPE_PT154_128_254): Define.
(_CTYPE_PT154_255): Define.
(__ctype_cp): Add array members for above ctype definitions.
* libc/locale/locale.c (loadlocale): Make TIS-620 charset name
available for all targets. Add guards for setting the conversion
function pointers. Add support for GEORGIAN-PS and PT154 charsets.
Change documentation to reflect current behaviour more closely.
* libc/locale/nl_langinfo.c (nl_langinfo): On Cygwin, translate
"CP101" to "GEORGIAN-PS" and "CP102" to "PT154".
* libc/stdlib/sb_charsets.c (__cp_conv): Add conversion arrays
for GEORGIAN-PS and PT154.
(__cp_index): Map invalid Windows codepage number 101 to
GEORGIAN-PS conversion array, 102 to PT154 conversion array.