newlib-cygwin

Commit Graph

Author	SHA1	Message	Date
Corinna Vinschen	2d5492453a	Cygwin: locales: fix behaviour for @euro locales Latest Windows supports more EU locales than GLibc, so some of the @euro locales are not covered by checking the GLibc locale defaults. Those locales have no long history, they are all UTF-8. So just check for @euro in the UTF-8 case and set them to ISO-8859-15. Fixes: `2483e54be8` ("Cygwin: locale: Set default charset from Linux locale -> codeset mapping") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-26 13:06:38 +02:00
Corinna Vinschen	c3e7f7609e	Cygwin: locales: fix behaviour for @cjk* and @euro locales @cjknarrow and @cjkwide modifiers are newlib only, so they need some tweaking in __set_charset_from_locale. Fixes: `2483e54be8` ("Cygwin: locale: Set default charset from Linux locale -> codeset mapping") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-26 13:01:52 +02:00
Corinna Vinschen	a97fbb58e2	Cygwin: locales: fix return value check of ResolveLocaleName ResolveLocaleName does not simply return an error value if it can't resolve a locale. Rather, it returns an empty string and the length of this string: 1. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-25 22:48:35 +01:00
Corinna Vinschen	7002f7f7c7	Revert "Cygwin: locales: drop supporting iso639 strings as valid locales" This reverts commit `15898b9588`. The idea behind this patch was wrong. Systems are supposed to support iso639-only strings as settings for the locale environment variables, and they are not necessarily available in the /usr/share/locale/locale.alias file. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-25 22:30:15 +01:00
Corinna Vinschen	b5b67a65f8	Cygwin: locales: implement own method to check locale validity The Windows function ResolveLocaleName is next to useless to convert a partial locale identifier into a full, supported locale identifier. It converts anything which vaguely resembles a locale into some other locale it supports. Bad examples are: "en-XY" gets converted to "en-US", and worse, "ff-BF" gets converted to "ff-Latn-SN", even though "ff-Adlm-BF" exists! To check if a locale is supported, we have to enumerate all valid Windows locales, and return the match, even if the locale in Windows requires a script. Implement resolve_locale_name() as replacement function for ResolveLocaleName. Fixes: `e95a7a7955` ("Cygwin: convert Windows locale handling from LCID to ISO5646 strings") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-24 12:50:59 +01:00
Corinna Vinschen	15898b9588	Cygwin: locales: drop supporting iso639 strings as valid locales This was incorrect behaviour. The only valid way to support those is by adding them to /usr/share/locale/locale.alias. Fixes: `e95a7a7955` ("Cygwin: convert Windows locale handling from LCID to ISO5646 strings") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-24 12:50:59 +01:00
Corinna Vinschen	c53d0910e6	Cygwin: locales: set errno to ENOENT if locale is invalid This allows newlocale to return with a valid errno if the locale is invalid. Fixes: `e95a7a7955` ("Cygwin: convert Windows locale handling from LCID to ISO5646 strings") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-24 12:50:59 +01:00
Corinna Vinschen	5da71b6059	Cygwin: add support for GB18030 codeset Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-16 18:25:09 +01:00
Corinna Vinschen	5ca1c57a82	Cygwin: is_unicode_equiv: fix comment Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-03-03 18:19:18 +01:00
Corinna Vinschen	6e75277b12	Cygwin: __wscollate_range_cmp: fix incorrect comment The comment that the first arg must be the pattern was added during development, before it turned out that __wscollate_range_cmp can be implemented in an order independent way. Better explain why this function uses pointers to strings. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-28 12:24:29 +01:00
Corinna Vinschen	abd81bc01f	Cygwin: locale: fix devanagari modifier Effectively revert commit `57bac33359`. The fact that the devanagari modifier was called devanagar (missing the trailing 'i') is a result of `locale -av' shortening the locale name to a maximum of 15 characters. D'oh. I guess we need a better way to do this... Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-26 17:17:33 +01:00
Corinna Vinschen	2483e54be8	Cygwin: locale: Set default charset from Linux locale -> codeset mapping Generate lc_def_codeset.h header containing the default mapping from locale to codeset on Linux. Use this mapping in __set_charset_from_locale in the first place. For every locale not covered by this table, just map Windows codepages to equivalent codesets used on Linux/Unix, getting rid of LCIDs entirely. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-25 16:12:51 +01:00
Corinna Vinschen	57bac33359	Cygwin: locale: fix devanagar modifier Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-25 16:12:51 +01:00
Corinna Vinschen	e95a7a7955	Cygwin: convert Windows locale handling from LCID to ISO5646 strings Since Windows Vista, locale handling is converted from using numeric locale identifiers (LCID) to using ISO5646 locale strings. In the meantime Windows introduced new locales which don't even have a LCID attached. Those were unusable in Cygwin because locale information for these locales required to call the new locale functions taking a locale string. Convert Cygwin to drop LCIDs and use Windows ISO5646 locales instead. The last place using LCIDs is the __set_charset_from_locale function. Checking numerically is easier and uslay faster than checking strings. However, this function is clearly a TODO	2023-02-24 16:40:58 +01:00
Corinna Vinschen	89eb4bce15	Cygwin: support KOI8-T codeset Used on Linux as default codeset for Tajik. There's no matching Windows codepage, so fake it as CP103. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-24 16:40:58 +01:00
Corinna Vinschen	2229f42400	Cygwin: __wscollate_range_cmp: workaround wcscoll's case-insensitivity Most locales using latin characters ignore case while sorting. This is what wcscoll does (correctly so). However, there's an internal order of collating sequences compared to the base character, which is case-sensitive, at least in GLibc. There's no way to express this in Windows, because CompareString and LCMapString always use case-insensitivity in those locales, even if none of the *IGNORECASE sorting flags are used. We want to follow glibc's behaviour more closely, so we add an extra check for the case and make sure upper and lower cased letters don't comapre as identical. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-22 12:20:32 +01:00
Corinna Vinschen	ce5aa09807	Cygwin: glob: implement collating symbol support Allow the [.<sym>.] expression This requires a string comparision rather than a character comparison. Introduce and use __wscollate_range_cmp. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-20 22:50:17 +01:00
Corinna Vinschen	1eadb23887	Cygwin: nlsfuncs.cc: introduce collating elements and helper functions lc_collelem.h: autogenerated table of collating element, taken from glibc is_unicode_coll_elem: Check if a UTF-32 string is a collating element next_unicode_char: return length of prefix from a string constituting a complete character in the current locale, taking collating elements into acocunt.	2023-02-20 22:38:52 +01:00
Corinna Vinschen	064e4bb8bb	Cygwin: convert __collate_range_cmp to __wcollate_range_cmp https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=179721 After FreeBSD eventually picked up the bugreport from within only 5 years, rename __collate_range_cmp to __wcollate_range_cmp as suggested all along, and make it type safe (wint_t instead of wchar_t for hopefully obvious reasons...) While at it, drop __collate_load_error and fix the checks for it in glob and fnmatch. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-19 14:40:29 +01:00
Corinna Vinschen	f0417a6201	Cygwin: is_unicode_equiv: fix normalization Change normalization to form KD and make room for longer decomposed sequences.	2023-02-18 23:14:11 +01:00
Corinna Vinschen	e4cc9e4846	Cygwin: is_unicode_equiv: fix comment Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-16 20:52:20 +01:00
Corinna Vinschen	b5f9b0241a	Cygwin: is_unicode_equiv: implement Unicode equivalence class check is_unicode_equiv compares two UTF-32 values and returns 1 if both are member of the same Unicode equivalence class, 0 otherwise. Note that this function only works with precomposed characters per Unicode normalization form C. It doesn't handle decomposed characters, just like its counterpart in glibc. I.e., equivalence class comparison using decomposed chars won't work. Example: fnmatch("[=n=]", "ñ") == 0 fnmatch("[=ñ=]", "n") == 0 but fnmatch("[=n=]", "n\x0303") == 1 fnmatch("[=n\x0303=]", "n") == 1 fnmatch("[=n\x0303=]", "n\x0303") == 1 Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-15 22:00:39 +01:00
Corinna Vinschen	eac830e0fe	Cygwin: __collate_range_cmp: handle Unicode values >= 0x10000 So far the input to __collate_range_cmp was handled as a wchar_t. Change that to handle it as wint_t holding a UTF-32 value and add creating surrogate pairs for the call to wcscoll. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2023-02-14 12:48:26 +01:00
Corinna Vinschen	0819679a7a	Cygwin: cwd: use SRWLOCK instead of muto To reduce thread contention, use reader/writer locks as required. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-08-23 12:09:44 +02:00
Corinna Vinschen	b794f2c603	Cygwin: drop support for systems not supporting RFC 4646 locales i. e. Vista/2008. This drops support for the sr_CS locale. Regenerate LC_MESSAGES and LC_TIME ERA data from more recent Linux Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2021-10-29 18:19:45 +02:00
Corinna Vinschen	eaed594d73	Cygwin: pty: move codepage evaluation to nlsfuncs.cc The new function __eval_codepage_from_internal_charset is a simplified version of the former code in fhandler_tty.cc. It probably needs some extension, but the gist is to use knowledge of internals to be as quick as possible. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2020-09-08 10:36:04 +02:00
Corinna Vinschen	462fcdb67f	Cygwin: convert sys_wcstombs/sys_mbstowcs wrapper to inline functions This should slightly speed up especially path conversions, given there's one less function call rearranging all function arguments in registers/stack (and less stack pressure). For clarity, rename overloaded sys_wcstombs to _sys_wcstombs and sys_cp_mbstowcs to _sys_mbstowcs. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2020-07-10 10:29:33 +02:00
Corinna Vinschen	d2ef2331f9	Cygwin: fix formatting: drop spaces leading tabs Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2020-03-11 13:45:58 +01:00
Corinna Vinschen	10900b98d1	Cygwin: wcsxfrm_l: Only byte swap if dest size is > 0 commit `c0d7d3e1a2` removed the usage of the LCMAP_BYTEREV flag in the call to LCMapStringW to workaround a strange bug in LCMapStringW. This patch didn't take a userspace call of wcsxfrm{_l} with NULL buffer and 0 size to evaluate the required buffer size into account. This introduced a crash trying to byte swap the NULL buffer. This patch fixes that problem. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2019-03-19 21:03:06 +01:00
Corinna Vinschen	c0d7d3e1a2	cygwin wcsxfrm: byte swap result ourselves Workaround a bug (or undocumented behaviour) in LCMapStringW: It's documented() that the cchDest parameter is a byte count with LCMAP_SORTKEY, but a character count otherwise. But the docs don't state what happens if you combine LCMAP_SORTKEY with LCMAP_BYTEREV. Tests indicate that LCMAP_SORTKEY treats cchDest as byte count, but then LCMAP_BYTEREV treats it as char count in the same call. So the latter swaps twice as much bytes in the destination buffer than the byte count it returns, which potentially results in writing past the end of the given output buffer. Solution: Don't specify LCMAP_BYTEREV in the LCMapStringW(LCMAP_SORTKEY) call, rather byte swap afterwards. () https://msdn.microsoft.com/en-us/library/windows/desktop/dd318702(v=vs.85).aspx Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2017-06-06 18:27:47 +02:00
Corinna Vinschen	2fb5e3dfb2	Reference __global_locale only via __get_global_locale. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2016-08-23 12:38:28 +02:00
Corinna Vinschen	7630e38462	Introduce __current_locale_charset/__locale_charset The former __locale_charset always fetched the current locale's charset. We need the per-locale charset, too, in future. Rename __locale_charset to __current_locale_charset and change __locale_charset to take a locale_t as parameter. Accommodate througout. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2016-08-20 17:14:56 +02:00
Corinna Vinschen	542b970d4e	Rename __get_locale_XXX to __get_XXX_locale to use unified naming scheme Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:58 +02:00
Corinna Vinschen	c1b7d9d93d	Implement per-locale string functions strcasecmp_l, strcoll_l, strncasecmp_l, strxfrm_l, wcscasecmp_l, wcscoll_l, wcstrncasecmp_l, wcstrxfrm_l, strftime_l. Add missing CHEWOUT_FILES from previous patch. TODO: strfmon_l. Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:58 +02:00
Corinna Vinschen	1afa0fe4b3	Fix memory handling in functions called from loadlocale Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:57 +02:00
Corinna Vinschen	53f84bb5ac	Rearrange struct __locale_t pointers into an array This allows looping through the structs and buffers. Also rearrange definitions to follow order of LC_xxx values. Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:57 +02:00
Corinna Vinschen	1498c79db8	Change loadlocale to fill a __locale_t given as parameter Don't use global variables. This allows to call loadlocale from the yet to be created newlocale(). Rename _thr_locale_t to __locale_t (these locales are not restricted to threads so the name is misleading). Along these lines, fix _set_ctype to take a __locale_t as parameter. Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:57 +02:00
Corinna Vinschen	d16a56306d	Consolidate wctomb/mbtowc calls for POSIX-1.2008 - Remove charset parameter from low level __foo_wctomb/__foo_mbtowc calls. - Instead, create array of function for ISO and Windows codepages to point to function which does not require to evaluate the charset string on each call. Create matching helper functions. I.e., __iso_wctomb, __iso_mbtowc, __cp_wctomb and __cp_mbtowc are functions returning the right function pointer now. - Create __WCTOMB/__MBTOWC macros utilizing per-reent locale and replace calls to __wctomb/__mbtowc with calls to __WCTOMB/__MBTOWC. - Drop global __wctomb/__mbtowc vars. - Utilize aforementioned changes in Cygwin to get rid of charset in other, calling functions and simplify the code. - In Cygwin restrict global cygheap locale info to the job performed by internal_setlocale. Use UTF-8 instead of ASCII on the fly in internal conversion functions. - In Cygwin dll_entry, make sure to initialize a TLS area with a NULL _REENT->_locale pointer. Add comment to explain why. Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:57 +02:00
Corinna Vinschen	88208d3735	POSIX-1.2008 per-thread locales, groundwork part 2 Move all locale category structure definitions into setlocale.h and remove other headers in locale subdir. Create inline accessor functions for current category struct pointers and use throughout. Use pointers to "C" locale category structs by default in __global_locale. Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:56 +02:00
Corinna Vinschen	a6a477fa81	POSIX-1.2008 per-thread locales, groundwork part 1 Introduce first cut of struct _thr_locale_t used for the locale_t definition. Introduce global instance called __global_locale used by default. Introduce internal inline functions __get_global_locale, __get_locale_r, __get_current_locale. Remove usage of global variables in favor of accessor functions pointing to __global_locale for now. Include all local headers in locale subdir from setlocale.h to get single include for internal locale access. Introduce __CTYPE_PTR macro to replace direct access to __ctype_ptr__ and use throughout in isxxx functions. Signed-off by: Corinna Vinschen <corinna@vinschen.de>	2016-08-15 10:56:56 +02:00
Corinna Vinschen	288df6f818	Add support for certain newer locales only available with Script Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2016-06-24 12:11:42 +02:00
Corinna Vinschen	94f98f18db	Drop has_localenames flag	2016-06-23 22:21:23 +02:00
Corinna Vinschen	ed0ff4b940	Drop has_always_all_codepages flag	2016-06-23 22:21:23 +02:00
Corinna Vinschen	6e623e9320	Switching the Cygwin DLL to LGPLv3+, dropping commercial buyout option Bump GPLv2+ to GPLv3+ for some files, clarify BSD 2-clause. Everything else stays under GPLv3+. New Linking Exception exempts resulting executables from LGPLv3 section 4. Add CONTRIBUTORS file to keep track of licensing. Remove 'Copyright Red Hat Inc' comments. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2016-06-23 10:09:17 +02:00
Corinna Vinschen	e185421106	strxfrm/wcsxfrm: Always return length of the transformed string Cygwin's strxfrm/wcsfrm treated a too short output buffer as an error condition and always returned the size value provided as third parameter. This is not as it's documented in POSIX.1-2008. Rather, the only error condition is an invalid input string(). Other than that, the functions are supposed to return the length of the resulting sort key, even if the output buffer is too small. In the latter case the content of the output array is unspecified, but it's the job of the application to check that the return value is greater or equal to the provided buffer size. () We have to make an exception in Cygwin: strxfrm has to call the UNICODE function LCMapStringW for reasons outlined in a source comment. If the incoming multibyte string is so large that we fail to malloc the space required to convert it to a wchar_t string, we have to ser errno as well since we have nothing to call LCMapStringW with. * nlsfuncs.cc (wcsxfrm): Fix expression computing offset of trailing wchar_t NUL. Compute correct return value even if output buffer is too small. (strxfrm): Handle failing malloc. Compute correct return value even if output buffer is too small. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2016-04-12 15:06:05 +02:00
Corinna Vinschen	26a8b62e9a	Fix numeric and monetary decimal point and thousands separator in fa_IR and ps_AF locales * nlsfuncs.cc (setlocaleinfo): New macro calling __setlocaleinfo. (__setlocaleinfo): New function to set a locale-specific character to an explicit wchar_t value. (__set_lc_numeric_from_win): Handle fa_IR and ps_AF locales to return same decimal point and thousands separator characters as on Linux. (__set_lc_monetary_from_win): Ditto for monetary characters. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2015-11-21 16:51:12 +01:00
Corinna Vinschen	677eea00a6	Workaround bug in LocaleNameToLCID on Windows 10 * nlsfuncs.cc (__get_lcid_from_locale): Handle LocaleNameToLCID returning LOCALE_CUSTOM_UNSPECIFIED instead of failing in case of an unsupported locale on Windows 10. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2015-10-30 20:13:26 +01:00
Corinna Vinschen	a50f8f5973	* nlsfuncs.cc (wcscoll): Add "__restrict" to definition. (wcsxfrm): Ditto.	2013-11-26 17:27:25 +00:00
Corinna Vinschen	d12e1c0670	* nlsfuncs.cc (strcoll): Add "__restrict" to definition. (strxfrm): Ditto.	2013-11-26 17:08:56 +00:00
Corinna Vinschen	d2a88d9792	Throughout, drop unnecessary explicit includes of windows header files included by default. * winlean.h: Add long comment to explain why we have to define certain symbols. (_NORMALIZE_): Define. (_WINNLS_): Drop definition and subsequent undef. (_WINNETWK_): Ditto. (_WINSVC_): Ditto. 2013-11-23 Eric Blake <eblake@redhat.com>	2013-11-24 12:13:36 +00:00

1 2

90 Commits