g_Ctoc, converting the UTF-32 filenames to multibyte, still
used UTF-16 to multibyte conversion. Introduce a wirtomb
helper and fix that.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
wcintowcs: convert UTF-16 to UTF-32 string
wcilen: return number of characters in a UTF-32 string
wcincmp: compare two fixed-size UTF-32 strings
Used in followup patches introducing collating symbols
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Given how UTF-16 isn't capable to hold all Unicode chars in a single
wchar_t, we need a function returning a wint_t value representing
a UTF-32 value for comparison functions. Fortunately the important
wide character functions like towupper/towlower, isw<class>, iswctype,
etc, already take wint_t values and newlib handles them as UTF-32.
If only we had switched wchar_t to 32 bit way back when... sigh.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Add a _REENT_ERRNO() macro to encapsulate the access to the
_errno member of struct reent. This will help to replace the
structure member with a thread-local storage object in a follow
up patch.
Replace uses of __errno_r() with _REENT_ERRNO(). Keep __errno_r() macro for
potential users outside of Newlib.
These have no effect on x86_64. Retain a few occurrences of __cdecl
in files imported from other sources.
Also retain all occurrences of WINAPI, even though the latter is
simply a macro that expands to __stdcall. Most of these occurrences
are associated with Windows API functions, and removing them might
make the code confusing instead of simpler.
The register keyword was already deprecated with C++11, but
with C++17 it has been entirely removed.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This should slightly speed up especially path conversions,
given there's one less function call rearranging all function
arguments in registers/stack (and less stack pressure).
For clarity, rename overloaded sys_wcstombs to _sys_wcstombs
and sys_cp_mbstowcs to _sys_mbstowcs.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
sys_mbstowcs is called with the destination buffer length
set to MaximumLength from the receiving UNICODE_STRING buffer.
This is twice as much as the actual size of the buffer in
wchar_t units, which is the unit expected by sys_mbstowcs.
sys_mbstowcs always attaches a NUL, within the destination
buffersize given. But if the string is exactly one wchar_t
less than the actual buffer, and the buffersize is given too
large, sys_mbstowcs writes a NUL one wchar_t beyond the buffer.
This has only been exposed with Cygwin 3.1.5 because alloca
on newer gcc 9 apparently allocates more tightly. The alloca
buffer here is requested with 16 bytes, which is exactly the
number of bytes required for the string L"cmd.exe". Older gcc
apparently allocated a few more bytes on the stack, while gcc 9
allocates in 16 byte granularity...
Fix this by giving the correct destination buffer size to
sys_mbstowcs.
Fixes: https://cygwin.com/pipermail/cygwin/2020-June/245226.html
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This function is going to be used for transposing sun_path of
abstract sockets. This also adds a transposition of the NUL
character to tfx_chars since NUL-bytes in abstract socket names
are perfectly valid.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
- Remove charset parameter from low level __foo_wctomb/__foo_mbtowc calls.
- Instead, create array of function for ISO and Windows codepages to point
to function which does not require to evaluate the charset string on
each call. Create matching helper functions. I.e., __iso_wctomb,
__iso_mbtowc, __cp_wctomb and __cp_mbtowc are functions returning the
right function pointer now.
- Create __WCTOMB/__MBTOWC macros utilizing per-reent locale and replace
calls to __wctomb/__mbtowc with calls to __WCTOMB/__MBTOWC.
- Drop global __wctomb/__mbtowc vars.
- Utilize aforementioned changes in Cygwin to get rid of charset in other,
calling functions and simplify the code.
- In Cygwin restrict global cygheap locale info to the job performed
by internal_setlocale. Use UTF-8 instead of ASCII on the fly in
internal conversion functions.
- In Cygwin dll_entry, make sure to initialize a TLS area with a NULL
_REENT->_locale pointer. Add comment to explain why.
Signed-off by: Corinna Vinschen <corinna@vinschen.de>
Bump GPLv2+ to GPLv3+ for some files, clarify BSD 2-clause.
Everything else stays under GPLv3+.
New Linking Exception exempts resulting executables from LGPLv3 section 4.
Add CONTRIBUTORS file to keep track of licensing.
Remove 'Copyright Red Hat Inc' comments.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* dcrt0.cc (dll_crt0_1), environ.cc (environ_init, getwinenveq,
build_env), strfuncs.cc (sys_wcstombs, sys_wcstombs_alloc),
wchar.c (sys_wcstombs, sys_wcstombs_alloc): avoid mis-conversions
of text that does not, actually, refer to a path or file name
Detailed explanation:
Our WCS -> UTF conversion handles the private Unicode page specially
to allow for otherwise invalid file names. However, this handling makes
no sense for command-lines, nor environment variables, which we would
rather convert verbatim.
As a stop-gap solution, let's just introduce a version of the
sys_wcstombs() function that specifically excludes that file name
conversion magic.
The proper solution is to change sys_wcstombs() to assume that it is not
a path that wants to be converted, and introduce sys_wcstombs_path()
that does, but that is a bigger task which we leave for another patch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* strfuncs.cc (sys_cp_wcstombs): Always return number of multibytes
without trailing NUL as the documentation implies. Throughout Cygwin,
fix usage to align to this pattern.
* fhandler_process.cc (format_process_winexename): Drop trailing NUL
and LF from output.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
included by default.
* winlean.h: Add long comment to explain why we have to define certain
symbols.
(_NORMALIZE_): Define.
(_WINNLS_): Drop definition and subsequent undef.
(_WINNETWK_): Ditto.
(_WINSVC_): Ditto.
2013-11-23 Eric Blake <eblake@redhat.com>
* kernel32.cc: Add includes needed for GetCommandLine functions.
(ucmd): New function.
(cygwin_GetCommandLineW): Ditto.
(cygwin_GetCommandLineA): Ditto.
* spawn.cc (child_info_spawn::worker): Rename one_line -> cmd. Use lb_wcs
macro to generate a wide character version of the line buffer. Remove
duplicate printing of command line. Don't access members of linebuf directly.
* winf.h: Use pragma once.
(linebuf): Make storage private.
(linebuf::operator size_t): New operator. Return size of buf.
(linebuf::operator wchar_t): New operator.
(linebuf::wcs): New function.
(lb_wcs): New macro.
* include/cygwin/version.h: Bump API minor number to 268.
* strfuncs.cc: Clarify descriptive file comment.
warnings between regparm definitions and declarations.
* smallprint.cc (__small_vswprintf): Conditionalize declaration and
setting of l_opt for only x86_64.
* spawn.cc (child_info_spawn::worker): Remove unused 'pid' variable.
* thread.cc (verifyable_object_isvalid): Temporarily define as
non-inline with gcc 4.7+, regardless of target.
* fhandler_clipboard.cc: Throughout remove handling of eof member.
(fhandler_dev_clipboard::write): Handle EOF condition immediately,
rather than pushing it erroneously to the next read call. Rearrange
code. Fix bug in CF_UNICODETEXT case which potentially dropped single
bytes at the end of the buffer. Add comment.
* strfuncs.cc (sys_cp_wcstombs): Allow returning non-NUL-terminated
buffer if dst != NULL and len == (size_t) -1. Extend leading comment
to explain what's returned in more detail.
declare with regparms.
* path.cc (get_nt_native_path): Add third parameter to allow conversion
of leading and trailing dots and spaces on filesystems only supporting
filenames following DOS rules.
(path_conv::get_nt_native_path): Call get_nt_native_path according to
fs.has_dos_filenames_only flag.
(getfileattr): Accommodate new parameter to get_nt_native_path.
(symlink_info::check): Revamp fs_update_called handling to call
fs.update only once per call. Call get_nt_native_path according to
fs.has_dos_filenames_only flag. Streamline filesystem dependent code
not to be called more than once unnecessarily. Drop code tweaking
incoming path for broken filesystems only allowing DOS pathnames.
Rely on changed get_nt_native_path instead.
* mount.cc (fillout_mntent): Accommodate new parameter to
get_nt_native_path.
* strfuncs.cc (tfx_rev_chars): New conversion table with comment.
(sys_cp_wcstombs): Use tfx_rev_chars rather than tfx_chars.
* autoload.cc (LocaleNameToLCID): Define.
* cygwin.din (strfmon): Export.
* nlsfuncs.cc: New file. Define a lot of internal functions called
from setlocale.
(wcscoll): Implement locale-aware here, using CompareStringW function.
(strcoll): Ditto.
(wcsxfrm): Implement locale-aware here, usingLCMapStringW function.
(strxfrm): Ditto.
(__set_charset_from_locale): Replace __set_charset_from_codepage.
Return Linux-compatible charset.
* strfuncs.cc (__set_charset_from_codepage): Remove.
* wchar.h (__set_charset_from_codepage): Drop definition.
* wincap.h (wincaps::has_localenames): New element.
* wincap.cc: Implement above element throughout.
* libc/strfmon.c: New file.
* libc/strptime.cc: Remove locale constant strings in favor of
access to locale-specifc data.
(strptime): Point _CurrentTimeLocale to locale-specific data.
Throughout use correct locale-specific format fields for all
locale-specific formats.
* include/monetary.h: New file.
* include/cygwin/version.h (CYGWIN_VERSION_API_MINOR): Bump.
* mount.cc (mount_info::from_fstab): Remove extern declaration of
transform_chars.
* path.cc (tfx_chars): Move to strfuncs.cc.
(transform_chars): Ditto.
* strfunc.cc (tfx_chars): Moved here from path.cc.
(transform_chars): Ditto.
(sys_cp_wcstombs): Make UNICODE private use area conversion roundtrip
save for all characters.
(sys_cp_mbstowcs): Ditto, by removing special case for UTF-8 sequences
representing U+f0XX UNICODE chars. Fix typo in comment.
str_to_con.
* fhandler_console.cc (dev_console::con_to_str): Simplify. Always
default to the current internal locale.
(dev_console::get_console_cp): Always use codepage 437 for alternate
charset.
(dev_console::str_to_con): Constify charset parameter.
(fhandler_console::write_normal): Always use codepage 437 for alternate
charset. Otherwise always default to the current internal locale.
Replace ASCII SO with ASCII CAN.
* strfuncs.cc: Tweka comments according to below changes.
(sys_cp_wcstombs): Constify charset parameter. Convert all wchar_t
values in the Unicode private use area U+F0xx to the singlebyte
counterpart. Drop special handling creating ASCII SO sequence from
U+DCxx value. Rearrange for performance. Replace ASCII SO with
ASCII CAN.
(sys_cp_mbstowcs): Constify charset parameter. Replace ASCII SO with
ASCII CAN. Drop special case for U+DCxx ASCII SO sequences. Always
create a replacement from the Unicode private use area U+F0xx for
invalid byte values in a multibyte sequence. Do the same for wchar_t
values from the U+F0xx range to make them roundtrip safe.
* wchar.h (sys_cp_wcstombs): Constify charset parameter.
(sys_cp_mbstowcs): Ditto.
second halves to unambiguous ASCII SO sequence. When converting
chars invalid in current codepage to ASCII SO sequence, make
sure to check for surrogate pair second half only if ct least
one wide characters is left. Decrement nwc if valid second half has
been converted.
(sys_cp_mbstowcs): Improve ASCII SO handling. Never break from loop
if invalid character has been found. Recognize ASCII SO sequence
representing originally invalid mulitbyte char converted into a
lone surrogate pair second half. Convert accordingly.
(select_record::select_record): Define do-nothing constructor for "new" to
avoid gratuitous zeroing.
(select_info): New base class.
(select_pipe_info): New class with methods for dealing with pipes.
(select_socket_info): New class with methods for dealing with sockets.
(select_serial_info): Dummy class for serial.
(select_mailslot_info): Dummy class for mailslots.
(select_stuff): Define device_specific_* as actual classes rather than void *.
* dtable.h (dtable::select_read): Accommodate return value change to 'bool' and
argument change to "select_stuff".
(dtable::select_write): Ditto.
(dtable::select_except): Ditto.
* dtable.cc (dtable::select_read): Accommodate return value change to 'bool'
and argument change to "select_stuff".
(dtable::select_write): Ditto.
(dtable::select_except): Ditto.
* fhandler.h: Excise select-related classes.
(fhandler_*::select_read): Change argument to select_stuff.
(fhandler_*::select_write): Ditto.
(fhandler_*::select_except): Ditto.
* select.cc (UNIX_FD_ZERO): Use memset rather than bzero.
(select_stuff::test_and_set): Change return type to bool. Allocate
select_record on entry and let fhandler_*::select_* operate on the start.next
field of select_stuff.
(pipeinf): Delete.
(select_pipe_info::select_pipe_info): New constructor. Allocates event for
controlling pipe waits.
(select_pipe_info::~select_pipe_info): New destructor. Destroy event. Stop
thread.
(select_pipe_info::add_watch_handle): New function.
(thread_pipe): Wait for the hEvent part of any overlapped pipes before peeking.
(start_thread_pipe): Don't allocate device_specific_pipe stuff here. Assume
that it has been allocated earlier.
(pipe_cleanup): Rely on select_pipe_info destructor to clean up pipe
paraphenalia.
(fhandler_*::select_*): Derive select_record from new select_stuff argument.
(fhandler_pipe::select_*): Ditto. Allocate pipe-specific field if not already
allocated.
(serialinf): Delete.
(thread_serial): serialinf -> select_serial_info.
(fhandler_base::ready_for_read): Rewrite to accommodate change in argument to
fhandler_*::select_*.
(socketinf): Delete.
(thread_socket): socketinf -> select_socket_info.
(mailslotinf): Delete.
(thread_mailslot): mailslotinf -> select_mailslot_info.
of the change to sys_cp_mbstowcs from 2009-05-30.
(sys_cp_mbstowcs): Slightly reformat. Fix comment to accommodate
change to sys_cp_wcstombs. Don't write to *ptr if dst is NULL.
* cygheap.h (struct cygheap_locale): New structure.
(struct user_heap_info): Add cygheap_locale member locale.
* dcrt0.cc (dll_crt0_1): Revert to calling _setlocale_r so that only
the applications locale is reverted to "C".
* environ.cc (environ_init): Remove unused got_lc variable.
* fhandler.h (class dev_console): Remove now unsed locale variables.
* fhandler_console.cc (fhandler_console::get_tty_stuff): Remove
setting dev_console's locale members.
(dev_console::con_to_str): Use internal locale settings. Default to
__ascii_wctomb if charset is "ASCII".
(fhandler_console::write_normal): Ditto.
* strfuncs.cc (__ascii_wctomb): Drop declaration.
(__db_wctomb): Use fixed value 2 instead of not
necessarily matching MB_CUR_MAX.
(__eucjp_wctomb): Use 3 instead of MB_CUR_MAX.
(sys_cp_wcstombs): Remove special case for "C" locale.
(sys_wcstombs): Implement here. Use internal locale data stored on
cygheap.
(sys_cp_mbstowcs): Remove special case for "C" locale.
(sys_mbstowcs): Implement here. Use internal locale data stored on
cygheap.
* syscalls.cc (internal_setlocale): New function to set cygheap locale
data and to reset CWD posix path.
(setlocale): Just call internal_setlocale from here if necessary.
* wchar.h (__ascii_wctomb): Declare.
(sys_wcstombs): Don't define inline, just declare.
(sys_mbstowcs): Ditto.
and con_charset.
(dev_console::str_to_con): Take mbtowc function pointer and charset
as additional parameters.
* fhandler_console.cc (fhandler_console::get_tty_stuff): Initialize
aforementioned new members. Explain why.
(dev_console::con_to_str): Remove useless comment. Call new
sys_cp_wcstombs function rather than sys_wcstombs.
(dev_console::str_to_con): Take mbtowc function pointer and charset
as additional parameters. Call sys_cp_mbstowcs accordingly.
(fhandler_console::write_normal): Only initialize f_mbtowc and charset
once. Accommodate changed str_to_con.
* strfuncs.cc (sys_cp_wcstombs): Renamed from sys_wcstombs. Take
wctomb function pointer and charset as parameters. Use throughout.
(sys_cp_mbstowcs): Take wctomb function pointer and charset as
parameters instead of codepage. Remove matching local variables and
their initialization.
* wchar.h (ENCODING_LEN): Define as in newlib.
(__mbtowc): Use mbtowc_p typedef for declaration.
(wctomb_f): New type.
(wctomb_p): New type.
(__wctomb): Declare.
(__utf8_wctomb): Use wctomb_f typedef for declaration.
(sys_cp_wcstombs): Move declaration from winsup.h here.
(sys_wcstombs): Ditto.
(sys_wcstombs_alloc): Ditto.
(sys_cp_mbstowcs): Ditto.
(sys_mbstowcs): Ditto.
(sys_mbstowcs_alloc): Ditto.
* winsup.h: Move declaration of sys_FOO functions to wchar.h. Include
wchar.h instead.
for now.
(__db_wctomb): Alwaus use WC_NO_BEST_FIT_CHARS.
(__jis_wctomb): Just call __ascii_wctomb from here.
(__eucjp_wctomb): Convert to standalone implementation to fix up the
difference between eucJP and CP 20932 affecting JIS-X-0212 characters.
Explain.
(__kr_wctomb): Use codepage 949.
(__db_mbtowc): Reorder code slightly. Always use MB_ERR_INVALID_CHARS
in call to MultiByteToWideChar. Fix a problem with singlebyte
sequences. Fix a bug in '\0' handling. Reset state->__count on
successful return from non-zero state.
(__jis_mbtowc): Just call __ascii_mbtowc from here.
(__eucjp_mbtowc): Convert to standalone implementation to fix up the
difference between eucJP and CP 20932 affecting JIS-X-0212 characters.
(__kr_mbtowc): Use codepage 949.
(__set_charset_from_codepage): Handle codepage 20932 as eucJP.