This reverts commit 2b77087a48ea56e77fca5aeab478c922f6473d7c.
For some reason lost in time, commit 2b77087a48ea5 introduced
Cygwin-specific code treating single byte characters outside the
portable character set as illegal chars. However, Cygwin was
always alone with this over-correct behaviour and it leads to
stuff like gnulib replacing functions defined in Cygwin with
their own implementation just due to that.
Revert this change, sans the changes to ChangeLog.
Fixes: 2b77087a48ea ("* libc/stdlib/mbtowc_r.c (__ascii_mbtowc): Disallow conversion of")
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Commit 89eb4bce152f was pretty half-hearted, missing
the codepage character type tables and wctomb/mbtowc
mappings.
Fixes: 89eb4bce152f ("Cygwin: support KOI8-T codeset")
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Add a _REENT_ERRNO() macro to encapsulate the access to the
_errno member of struct reent. This will help to replace the
structure member with a thread-local storage object in a follow
up patch.
Replace uses of __errno_r() with _REENT_ERRNO(). Keep __errno_r() macro for
potential users outside of Newlib.
Some architectures like ARM encode the short enum option state in the
object file and the linker checks that this option is consistent for all
objects of an executable. In case applications use -fno-short-enums,
then this leads to linker warnings. Use the enum __packed attribute for
the relevent enums to avoid the -fshort-enums compiler option. This
attribute is at least available on GCC, LLVM/clang and the Intel
compiler.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
- Remove charset parameter from low level __foo_wctomb/__foo_mbtowc calls.
- Instead, create array of function for ISO and Windows codepages to point
to function which does not require to evaluate the charset string on
each call. Create matching helper functions. I.e., __iso_wctomb,
__iso_mbtowc, __cp_wctomb and __cp_mbtowc are functions returning the
right function pointer now.
- Create __WCTOMB/__MBTOWC macros utilizing per-reent locale and replace
calls to __wctomb/__mbtowc with calls to __WCTOMB/__MBTOWC.
- Drop global __wctomb/__mbtowc vars.
- Utilize aforementioned changes in Cygwin to get rid of charset in other,
calling functions and simplify the code.
- In Cygwin restrict global cygheap locale info to the job performed
by internal_setlocale. Use UTF-8 instead of ASCII on the fly in
internal conversion functions.
- In Cygwin dll_entry, make sure to initialize a TLS area with a NULL
_REENT->_locale pointer. Add comment to explain why.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
compiler warnings found this way.
* libc/stdio/freopen.c (_freopen_r): Fix bug setting _flags.
* libc/include/stdio.h (_rename): Define when building newlib.
* libc/include/sys/signal.h (_kill): Ditto.
* libc/include/sys/stat.h (_mkdir): Ditto.
* libc/include/sys/time.h (_gettimeofday): Ditto.
* libc/include/sys/times.h (_times): Ditto.
* libc/include/sys/wait.h (_wait): Ditto.
* libc/locale/lmessages.c (empty): Don't define for Cygwin.
* libc/locale/lmonetary.c (cnv): Ditto.
* libc/locale/nl_langinfo.c (nl_langinfo): Ditto for variable s.
* libc/posix/collate.c: Throughout cast to avoid compiler warning.
* libc/posix/engine.c (matcher): Initialize dp to avoid compiler
warning.
* libc/posix/glob.c: Disable on Cygwin. Explain why.
* libc/posix/regcomp.c: Fix "uninitialized" compiler warnings.
(dissect): Deliberately silence gcc compiler warning. Add comment to
explain why.
* libc/posix/wordexp.c (wordexp): Remove num_bytes variable since result
is never used.
* libc/posix/popen.c (popen): Ditto for variable last.
* libc/reent/mkdirr.c: Include sys/stat.h.
* libc/reent/renamer.c: Include stdio.h.
* libc/search/hash.c: Throughout use underscored variants of the stat
function family.
(init_hash): Add missing definition for the __USE_INTERNAL_STAT64 case.
* libc/search/hash_bigkey.c (__big_insert): Add parenthesis to avoid
compiler warning.
* libc/search/hash_page.c (overflow_page): Initalize freep to NULL to
avoid compiler warning.
* libc/stdio/asiprintf.c (_asiprintf_r): Cast unsigned char * to char *
to avoid compiler warning.
(asiprintf): Ditto.
* libc/stdio/asprintf.c (_asprintf_r): Ditto.
(asprintf): Ditto.
* libc/stdio/vasiprintf.c (_vasiprintf_r): Ditto.
* libc/stdio/vasprintf.c (_vasprintf_r): Ditto.
* libc/stdio/mktemp.c (_gettemp): Cast to unsigned char in call to
isdigit to avoid compiler warning.
* libc/stdio/vfprintf.c (_VFPRINTF_R): Initialize variables used for
grouping to avoid compiler warning. Only define and set nseps and
nrepeats if they are really used.
* libc/stdio/vfwprintf.c (_VFWPRINTF_R): Ditto. Only define state if
it is really used.
* libc/stdio/vfscanf.c (u_char): Revert to be defined as unsigned char.
(__SVFSCANF_R): Cast fmt in call to __mbtowc.
* libc/stdlib/mbtowc_r.c (JIS_state_table): Disable when building
Cygwin.
(JIS_action_table): Ditto.
* libc/stdlib/wctomb_r.c (__utf8_wctomb): Add parenthesis to avoid
compiler warning.
* libc/string/strcasestr.c: Deliberately silence gcc compiler warning.
Add comment to explain why.
* libc/time/strptime.c (strptime): Cast to unsigned char in calls to
isspace to avoid compiler warning.
* libm/math/e_atan2.c (__ieee754_atan2): Add parenthesis to avoid
compiler warning.
* libm/math/e_exp.c (__ieee754_exp): Initialize k to 0 to avoid
compiler warning. Drop setting it to 0 later.
* libm/math/ef_exp.c (__ieee754_expf): Ditto.
* libm/math/e_pow.c (__ieee754_pow): Add braces to avoid compiler
warning.
* libm/math/ef_pow.c (__ieee754_powf): Ditto.
* libm/math/er_lgamma.c (__ieee754_lgamma_r): Initialize nadj to 0 to
avoid compiler warning.
* libm/math/erf_lgamma.c (__ieee754_lgammaf_r): Ditto.
* libm/math/e_rem_pio2.c (__ieee754_rem_pio2): Ditto for variable z.
* libm/common/sf_round.c (roundf): Remove signbit variable since result
is never used.
_mbtowc_r with direct call to __mbtowc.
* libc/stdio/vfscanf.c: Ditto.
* libc/stdlib/btowc.c: Include local.h. Replace call to _mbtowc_r
with direct call to __mbtowc.
* libc/stdlib/mblen.c: Ditto.
* libc/stdlib/mblen_r.c: Ditto.
* libc/stdlib/mbrtowc.c: Ditto.
* libc/stdlib/mbstowcs_r.c: Ditto.
* libc/stdlib/mbtowc.c: Ditto.
* libc/stdlib/wcrtomb.c: Include local.h. Replace call to _wctomb_r
with direct call to __wctomb.
* libc/stdlib/wcsnrtombs.c: Ditto.
(_wcsnrtombs_r): Ditto.
* libc/stdlib/wcstombs_r.c: Ditto.
* libc/stdlib/wctob.c: Ditto.
* libc/stdlib/wctomb.c: Ditto.
* libc/stdlib/mbrtowc.c (mbrtowc): Implement independently from
_mbrtowc_r, unless PREFER_SIZE_OVER_SPEED or __OPTIMIZE_SIZE__ are
defined.
* libc/stdlib/wcrtomb.c (wcrtomb): Implement independently from
_wcrtomb_r, unless PREFER_SIZE_OVER_SPEED or __OPTIMIZE_SIZE__ are
defined.
* libc/stdlib/mbtowc_r.c (__utf8_mbtowc): Drop unnecessary test for
ch >= 0.
(lc_message_charset): Ditto.
(loadlocale): Set charset of the "C" locale to "UTF-8" on Cygwin.
* libc/stdlib/mbtowc_r.c (__mbtowc): Default to __utf8_mbtowc on
Cygwin.
* libc/stdlib/wctomb_r.c (__wctomb): Default to __utf8_wctomb on
Cygwin.
pair handling to be more bullet-proof even with incomplete UTF-8
sequences. Add check for 4 byte sequences resulting in values
outside the valid Unicode range. Add a comment to clarify checking
for invalid CESU-8 sequences.
recognizes 0x8e and 0x8f lead bytes.
(_iseucjp2): Rename from _iseucjp.
* libc/stdlib/mbtowc_r.c (__eucjp_mbtowc): Convert JIS-X-0212
triplebyte sequences as well.
* libc/stdlib/wctomb_r.c (__eucjp_wctomb): Convert to JIS-X-0212
triplebyte sequences as well.
_MB_CAPABLE systems.
* libc/ctype/iswblank.c: Ditto.
* libc/ctype/iswcntrl.c: Ditto.
* libc/ctype/iswprint.c: Ditto.
* libc/ctype/iswpunct.c: Ditto.
* libc/ctype/iswspace.c: Ditto.
* libc/ctype/jp2uc.c (__jp2uc): On Cygwin, just return c.
Explain why.
* libc/ctype/towlower.c: Ditto.
* libc/ctype/towupper.c: Ditto.
* libc/include/sys/config.h: Define _MB_EXTENDED_CHARSETS_ISO
and _MB_EXTENDED_CHARSETS_WINDOWS if _MB_EXTENDED_CHARSETS_ALL is
defined. Define _MB_EXTENDED_CHARSETS_ALL on Cygwin only for now.
* libc/include/sys/reent.h (struct _reent): Mark _current_category
and _current_locale as unused.
* libc/locale/locale.c: Add new charset support to documentation.
Include ../stdio/local.h from here.
(lc_ctype_charset): Set to "ASCII" by default.
(lc_message_charset): Ditto.
(_setlocale_r): Don't set _current_category and _current_locale.
(loadlocale): Add Cygwin codepage support. On _MB_CAPABLE
systems, set __mbtowc and __wctomb function pointers to function
corresponding with current charset. Don't allow non-existant
ISO-8859-12 charset. Add support for Windows singlebyte codepages.
On Cygwin, add support for GBK, CP949, and BIG5. On Cygwin,
call __set_ctype() in case the catorgy is LC_CTYPE. Don't set
_current_category and _current_locale.
* libc/stdlib/Makefile.am (GENERAL_SOURCES): Add sb_charsets.c.
* libc/stdlib/Makefile.in: Regenerate.
* libc/stdlib/local.h: Add prototype for __locale_charset.
Add prototypes for __mbtowc and __wctomb pointers.
Add prototypes for charset-specific _wctomb_r and _mbtowc_r
functions.
Declare tables and functions from sb_charsets.c.
* libc/stdlib/mbtowc_r.c (__mbtowc): Define. Set to __ascii_mbtowc
by default.
(_mbtowc_r): Just call __mbtowc from here.
(__ascii_mbtowc): New function.
(__iso_mbtowc): New function.
(__cp_mbtowc): New function.
(__utf8_mbtowc): New function.
(__sjis_mbtowc): New function. Disable on Cygwin.
(__eucjp_mbtowc): New function. Disable on Cygwin.
(__jis_mbtowc): New function. Disable on Cygwin.
* libc/stdlib/sb_charsets.c: New file, adding singlebyte to UTF
conversion tables for all ISO and CP charsets.
(__iso_8859_index): New function.
(__cp_index): New function.
* libc/stdlib/wctomb_r.c (__wctomb): Define. Set to __ascii_wctomb
by default.
(_wctomb_r): Just call __wctomb from here.
(__ascii_wctomb): New function.
(__utf8_wctomb): New function.
(__sjis_wctomb): New function. Disable on Cygwin.
(__eucjp_wctomb): New function. Disable on Cygwin.
(__jis_wctomb): New function. Disable on Cygwin.
(__iso_wctomb): New function.
(__cp_wctomb): New function.
invalid character sequence.
* libc/stdlib/mbtowc_r.c (_mbtowc_r): Fix compiler warning due to
missing declaration of __locale_charset.
* libc/stdlib/wctomb_r.c (_wctomb_r): Ditto.
sequences since they are invalid in the Unicode standard.
Handle surrogate pairs in case of wchar_t == UTF-16.
* wctomb_r.c (_wctomb_r): Don't convert invalid Unicode wchar_t
values beyond 0x10ffff into UTF-8 chars. Handle surrogate pairs in
case of wchar_t == UTF-16.
* libc/stdio/open_memstream.c (internal_open_memstream_r): Fix max
buffer size to be in wchar_t units if wide == 1 is passed in. In
this case, also initialize the first character of the buffer to be
wide char null.
(_open_wmemstream_r): Cast buf to be (char **) to avoid warning.
* libc/stdlib/mbtowc_r.c (_mbtowc_r): Change all occurences of
incrementing the size_t value n to first check that n is not already
size_t -1. Fix some compiler warnings.
* libc/stdlib/wcstod.c: Add includes for <wctype.h> and <math.h>.
* libc/include/sys/_types.h (_mbstate_t): Changed to use
unsigned char internally.
* libc/sys/linux/sys/_types.h: Ditto.
* libc/include/sys/reent.h
* libc/stdlib/mblen.c (mblen): Use function-specific state
value from default reentrancy structure.
* libc/stdlib/mblen_r.c (_mblen_r): If return code from
_mbtowc_r is less than 0, reset state __count value and
return -1.
* libc/stdlib/mbrlen.c (mbrlen): If the input state pointer
is NULL, use the function-specific pointer provided in the
default reentrancy structure.
* libc/stdlib/mbrtowc.c: Add reentrant form of function.
If input state pointer is NULL, use function-specific area
provided in reentrancy structure.
* libc/stdlib/mbsrtowcs.c: Ditto.
* libc/stdlib/wcrtomb.c: Ditto.
* libc/stdlib/wcsrtombs.c: Ditto.
* libc/stdlib/mbstowcs.c: Reformat.
* libc/stdlib/wcstombs.c: Ditto.
* libc/stdlib/mbstowcs_r.c (_mbstowcs_r): If an error occurs,
reset the state's __count value and return -1.
* libc/stdlib/mbtowc.c: Ditto.
* libc/stdlib/mbtowc_r.c (_mbtowc_r): Add restartable functionality.
If number of bytes is used up before completing a valid multibyte
character, return -2 and save the state.
* libc/stdlib/wctomb_r.c (_wctomb_r): Define __state as __count
and change some __count references to __state for clarity.