Commit Graph

15 Commits

Author SHA1 Message Date
Johannes Schindelin e0d4e3fec7 Do not treat the command line or environment like paths
* dcrt0.cc (dll_crt0_1), environ.cc (environ_init, getwinenveq,
	build_env), strfuncs.cc (sys_wcstombs, sys_wcstombs_alloc),
	wchar.c (sys_wcstombs, sys_wcstombs_alloc): avoid mis-conversions
	of text that does not, actually, refer to a path or file name

Detailed explanation:

Our WCS -> UTF conversion handles the private Unicode page specially
to allow for otherwise invalid file names. However, this handling makes
no sense for command-lines, nor environment variables, which we would
rather convert verbatim.

As a stop-gap solution, let's just introduce a version of the
sys_wcstombs() function that specifically excludes that file name
conversion magic.

The proper solution is to change sys_wcstombs() to assume that it is not
a path that wants to be converted, and introduce sys_wcstombs_path()
that does, but that is a bigger task which we leave for another patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2016-01-08 15:17:52 +01:00
Corinna Vinschen ac39f7b4e8 Drop sys_cp_wcstombs and save two arguments per call
* strfuncs.cc (sys_cp_wcstombs): Delete and move functionality into
        sys_wcstombs.
        * wchar.h (sys_cp_wcstombs): Drop declaration.
        * fhandler_console.cc (dev_console::con_to_str): Call sys_wcstombs.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2015-12-18 12:42:40 +01:00
Christopher Faylor 6e75c72b89 Throughout, change __attribute__ ((regparm (N))) to just __regN. Throughout,
(mainly in fhandler*) start fixing gcc 4.7.2 mismatch between regparm
definitions and declarations.
* gendef: Define some functions to take @ declaration to accommodate _regN
defines which use __stdcall.
* gentls_offsets: Define __regN macros as empty.
* autoload.cc (wsock_init): Remove unneeded regparm attribute.
* winsup.h (__reg1): Define.
(__reg2): Define.
(__reg3): Define.
* advapi32.cc (DuplicateTokenEx): Coerce some initializers to avoid warnings
from gcc 4.7.2.
* exceptions.cc (status_info): Declare struct to use NTSTATUS.
(cygwin_exception::dump_exception): Coerce e->ExceptionCode to NTSTATUS.
* fhandler_clipboard.cc (cygnativeformat): Redefine as UINT to avoid gcc 4.7.2
warnings.
(fhandler_dev_clipboard::read): Ditto.
2013-01-21 04:34:52 +00:00
Corinna Vinschen e1e595a649 Replace regex files with multibyte-aware version from FreeBSD.
* Makefile.in (install-headers): Remove extra command to install
	regex.h.
	(uninstall-headers): Remove extra command to uninstall regex.h.
	* nlsfuncs.cc (collate_lcid): Make externally available to allow
	access to collation internals from regex functions.
	(collate_charset): Ditto.
	* wchar.h: Add __cplusplus guards to make C-clean.
	* include/regex.h: New file, replacing regex/regex.h.  Remove UCB
	advertising clause.
	* regex/COPYRIGHT: Accommodate BSD license.  Remove UCB advertising
	clause.
	* regex/cclass.h: Remove.
	* regex/cname.h: New file from FreeBSD.
	* regex/engine.c: Ditto.
	(NONCHAR): Tweak for Cygwin.
	* regex/engine.ih: Remove.
	* regex/mkh: Remove.
	* regex/regcomp.c: New file from FreeBSD.  Tweak slightly for Cygwin.
	Import required collate internals from nlsfunc.cc.
	(p_ere_exp): Add GNU-specific \< and \> handling for word boundaries.
	(p_simp_re): Ditto.
	(__collate_range_cmp): Define.
	(p_b_term): Use Cygwin-specific collate internals.
	(findmust): Ditto.
	* regex/regcomp.ih: Remove.
	* regex/regerror.c: New file from FreeBSD.  Fix a few compiler warnings.
	* regex/regerror.ih: Remove.
	* regex/regex.7: New file from FreeBSD.  Remove UCB advertising clause.
	* regex/regex.h: Remove.  Replaced by include/regex.h.
	* regex/regexec.c: New file from FreeBSD.  Fix a few compiler warnings.
	* regex/regfree.c: New file from FreeBSD.
	* regex/tests: Remove.
	* regex/utils.h: New file from FreeBSD.
2010-02-04 12:35:49 +00:00
Corinna Vinschen 326fb376dd * Makefile.in (DLL_OFILES): Add nlsfunc.o and strfmon.o.
* autoload.cc (LocaleNameToLCID): Define.
	* cygwin.din (strfmon): Export.
	* nlsfuncs.cc: New file.  Define a lot of internal functions called
	from setlocale.
	(wcscoll): Implement locale-aware here, using CompareStringW function.
	(strcoll): Ditto.
	(wcsxfrm): Implement locale-aware here, usingLCMapStringW function.
	(strxfrm): Ditto.
	(__set_charset_from_locale): Replace __set_charset_from_codepage.
	Return Linux-compatible charset.
	* strfuncs.cc (__set_charset_from_codepage): Remove.
	* wchar.h (__set_charset_from_codepage): Drop definition.
	* wincap.h (wincaps::has_localenames): New element.
	* wincap.cc: Implement above element throughout.
	* libc/strfmon.c: New file.
	* libc/strptime.cc: Remove locale constant strings in favor of
	access to locale-specifc data.
	(strptime): Point _CurrentTimeLocale to locale-specific data.
	Throughout use correct locale-specific format fields for all
	locale-specific formats.
	* include/monetary.h: New file.
	* include/cygwin/version.h (CYGWIN_VERSION_API_MINOR): Bump.
2010-01-22 22:31:31 +00:00
Corinna Vinschen 587b75f7bd * fhandler.h (class dev_console): Constify charset parameter of
str_to_con.
	* fhandler_console.cc (dev_console::con_to_str): Simplify.  Always
	default to the current internal locale.
	(dev_console::get_console_cp): Always use codepage 437 for alternate
	charset.
	(dev_console::str_to_con): Constify charset parameter.
	(fhandler_console::write_normal): Always use codepage 437 for alternate
	charset.  Otherwise always default to the current internal locale.
	Replace ASCII SO with ASCII CAN.
	* strfuncs.cc: Tweka comments according to below changes.
	(sys_cp_wcstombs): Constify charset parameter.  Convert all wchar_t
	values in the Unicode private use area U+F0xx to the singlebyte
	counterpart.  Drop special handling creating ASCII SO sequence from
	U+DCxx value.  Rearrange for performance.  Replace ASCII SO with
	ASCII CAN.
	(sys_cp_mbstowcs): Constify charset parameter.  Replace ASCII SO with
	ASCII CAN.  Drop special case for U+DCxx ASCII SO sequences.  Always
	create a replacement from the Unicode private use area U+F0xx for
	invalid byte values in a multibyte sequence.  Do the same for wchar_t
	values from the U+F0xx range to make them roundtrip safe.
	* wchar.h (sys_cp_wcstombs): Constify charset parameter.
	(sys_cp_mbstowcs): Ditto.
2009-09-28 12:10:32 +00:00
Corinna Vinschen 20fc2f4936 * wincap.h (wincaps::has_always_all_codepages): New element.
* wincap.cc: Implement above element throughout.
	* wchar.h (__sjis_mbtowc): Declare.
	(__eucjp_mbtowc): Ditto.
	(__gbk_mbtowc): Ditto.
	(__kr_mbtowc): Ditto.
	(__big5_mbtowc): Ditto.
	* syscalls.cc (internal_setlocale): Convert to char * function.
	Return parameter by default.  Return NULL if request to use a
	charset can't be satisfied due to missing codepage support in the
	underlying OS.  Fix comment.
	(setlocale): Store original locale.  Restore to original locale if
	internal_setlocale returns NULL.
2009-07-20 15:44:55 +00:00
Corinna Vinschen a972ea99d5 * wchar.h (sys_mbstowcs): Add missing __stdcall attribute. 2009-05-15 11:27:41 +00:00
Corinna Vinschen 6f401eccfb * cygheap.cc (cygheap_init): Set Cygwin default locale values.
* cygheap.h (struct cygheap_locale): New structure.
	(struct user_heap_info): Add cygheap_locale member locale.
	* dcrt0.cc (dll_crt0_1): Revert to calling _setlocale_r so that only
	the applications locale is reverted to "C".
	* environ.cc (environ_init): Remove unused got_lc variable.
	* fhandler.h (class dev_console): Remove now unsed locale variables.
	* fhandler_console.cc (fhandler_console::get_tty_stuff): Remove
	setting dev_console's locale members.
	(dev_console::con_to_str): Use internal locale settings.  Default to
	__ascii_wctomb if charset is "ASCII".
	(fhandler_console::write_normal): Ditto.
	* strfuncs.cc (__ascii_wctomb): Drop declaration.
	(__db_wctomb): Use fixed value 2 instead of not
	necessarily matching MB_CUR_MAX.
	(__eucjp_wctomb): Use 3 instead of MB_CUR_MAX.
	(sys_cp_wcstombs): Remove special case for "C" locale.
	(sys_wcstombs): Implement here.  Use internal locale data stored on
	cygheap.
	(sys_cp_mbstowcs): Remove special case for "C" locale.
	(sys_mbstowcs): Implement here.  Use internal locale data stored on
	cygheap.
	* syscalls.cc (internal_setlocale): New function to set cygheap locale
	data and to reset CWD posix path.
	(setlocale): Just call internal_setlocale from here if necessary.
	* wchar.h (__ascii_wctomb): Declare.
	(sys_wcstombs): Don't define inline, just declare.
	(sys_mbstowcs): Ditto.
2009-05-14 19:49:37 +00:00
Corinna Vinschen 21c7d001dc * strfuncs.cc: Change WCHAR to wchar_t in multibyte<->widechar
conversion functions throughout.
	* wchar.h: Ditto in declarations.  Guard them __INSIDE_CYGWIN__.
2009-04-07 16:22:55 +00:00
Corinna Vinschen 62755474e5 * fhandler.h (class dev_console): Add members con_mbtowc, con_wctomb,
and con_charset.
	(dev_console::str_to_con): Take mbtowc function pointer and charset
	as additional parameters.
	* fhandler_console.cc (fhandler_console::get_tty_stuff): Initialize
	aforementioned new members.  Explain why.
	(dev_console::con_to_str): Remove useless comment.  Call new
	sys_cp_wcstombs function rather than sys_wcstombs.
	(dev_console::str_to_con): Take mbtowc function pointer and charset
	as additional parameters.  Call sys_cp_mbstowcs accordingly.
	(fhandler_console::write_normal): Only initialize f_mbtowc and charset
	once.  Accommodate changed str_to_con.
	* strfuncs.cc (sys_cp_wcstombs): Renamed from sys_wcstombs.  Take
	wctomb function pointer and charset as parameters.  Use throughout.
	(sys_cp_mbstowcs): Take wctomb function pointer and charset as
	parameters instead of codepage.  Remove matching local variables and
	their initialization.
	* wchar.h (ENCODING_LEN): Define as in newlib.
	(__mbtowc): Use mbtowc_p typedef for declaration.
	(wctomb_f): New type.
	(wctomb_p): New type.
	(__wctomb): Declare.
	(__utf8_wctomb): Use wctomb_f typedef for declaration.
	(sys_cp_wcstombs): Move declaration from winsup.h here.
	(sys_wcstombs): Ditto.
	(sys_wcstombs_alloc): Ditto.
	(sys_cp_mbstowcs): Ditto.
	(sys_mbstowcs): Ditto.
	(sys_mbstowcs_alloc): Ditto.
	* winsup.h: Move declaration of sys_FOO functions to wchar.h.  Include
	wchar.h instead.
2009-04-07 12:13:37 +00:00
Corinna Vinschen db917b216e * wchar.h: Replace UINT with unsigned int. 2009-03-24 13:33:57 +00:00
Corinna Vinschen 10558efdef * wchar.h: Remove erroneous "C" specifier from extern declaration. 2009-03-24 13:21:23 +00:00
Corinna Vinschen 161211d186 * ctype.cc (_CTYPE_DATA_0_127): Add _B class to TAB character.
(__ctype_default): New character class array for default ASCII
	character set.
	(__ctype_iso): New array of character class array for ISO charsets.
	(__ctype_cp): Ditto for singlebyte Windows codepages.
	(tolower): Implement as distinct function to support any singlebyte
	charset.
	(toupper): Ditto.
	(__set_ctype): New function to copy singlebyte character classes
	corresponding to current charset to ctype_b array.
	Align copyright text to upstream.
	* dcrt0.cc (dll_crt0_1): Reset current locale to "C" per POSIX.
	* environ.cc (set_file_api_mode): Remove.
	(codepage_init): Remove.
	(parse_thing): Remove "codepage" setting.
	(environ_init): Set locale according to environment settings, or
	to current codepage, before converting environment to multibyte.
	* fhandler.h (fhandler_console::write_replacement_char): Drop argument.
	* fhandler_console.cc (dev_console::str_to_con): Call sys_cp_mbstowcs
	rather than MultiByteToWideChar.
	(fhandler_console::write_replacement_char): Always print a funny
	half filled square if a character isn't in the current charset.
	(fhandler_console::write_normal): Convert to using __mbtowc
	rather than next_char.
	* fork.cc (frok::child): Drop call to set_file_api_mode.
	* globals.cc (enum codepage_type) Remove.
	(current_codepage): Remove.
	* miscfuncs.cc (cygwin_wcslwr): Unused, dangerous.  Remove.
	(cygwin_wcsupr): Ditto.
	(is_cp_multibyte): Remove.
	(next_char): Remove.
	* miscfuncs.h (is_cp_multibyte): Drop declaration.
	(next_char): Ditto.
	* strfuncs.cc (get_cp): Remove.
	(__db_wctomb): New function to implement _wctomb_r functionality for
	doublebyte charsets using WideCharToMultiByte.
	(__sjis_wctomb): New function to replace unusable newlib function.
	(__jis_wctomb): Ditto.
	(__eucjp_wctomb): Ditto.
	(__gbk_wctomb): New function.
	(__kr_wctomb): Ditto.
	(__big5_wctomb): Ditto.
	(__db_mbtowc): New function to implement _mbtowc_r functionality for
	doublebyte charsets using MultiByteToWideChar.
	(__sjis_mbtowc): New function to replace unusable newlib function.
	(__jis_mbtowc): Ditto.
	(__eucjp_mbtowc): Ditto.
	(__gbk_mbtowc): New function.
	(__kr_mbtowc): New function
	(__big5_mbtowc): New function
	(__set_charset_from_codepage): New function.
	(sys_wcstombs): Reimplement, basically using same wide char to multibyte
	conversion as newlib's application level functions.  Plus extras.
	Add lengthy comment to explain.  Change return type to size_t.
	(sys_wcstombs_alloc): Just use sys_wcstombs.  Change return type to
	size_t.
	(sys_cp_mbstowcs): Replace sys_mbstowcs, take additional codepage
	argument.  Explain why.  Change return type to size_t.
	(sys_mbstowcs_alloc): Just use sys_mbstowcs.  Change return type to
	size_t.
	* wchar.h: Declare internal functions implemented in strfuncs.cc.
	(wcscasecmp): Remove.
	(wcsncasecmp): Remove.
	(wcslwr): Remove.
	(wcsupr): Remove.
	* winsup.h (codepage_init): Remove declaration.
	(get_cp): Ditto.
	(sys_wcstombs): Align declaration to new implementation.
	(sys_wcstombs_alloc): Ditto.
	(sys_cp_mbstowcs): Add declaration.
	(sys_mbstowcs): Define as inline function.
	(sys_mbstowcs_alloc): Align declaration to new implementation.
	(set_file_api_mode): Remove declaration.
	* include/ctype.h (isblank): Redefine to use _B character class.
	(toupper): Remove ASCII-only definition.
	(tolower): Ditto.
2009-03-24 12:18:34 +00:00
Corinna Vinschen 1feea0bfd7 * dcrt0.cc: Include string.h.
(initial_env): Use small_printf's %P specifier.
	* dll_init.cc (dll_list::alloc): Use PATH_MAX instead of CYG_MAX_PATH
	for path name buffer size.
	* dll_init.h (struct dll): Ditto.
	* environ.cc: Include string.h.
	(win_env::add_cache): Use temporary local buffer for path conversion.
	(posify): Ditto.
	* exceptions.cc (try_to_debug): Use CreateProcessW to allow long path
	names.
	* miscfuncs.cc: Drop unused implementations of strcasematch and
	strncasematch.
	(ch_case_eq): Drop.
	(strcasestr): Drop.
	(cygwin_wcscasecmp): New function.
	(cygwin_wcsncasecmp): New function.
	(cygwin_strcasecmp): New function.
	(cygwin_strncasecmp): New function.
	(cygwin_wcslwr): New function.
	(cygwin_wcsupr): New function.
	(cygwin_strlwr): New function.
	(cygwin_strupr): New function.
	* ntdll.h (RtlDowncaseUnicodeString): Declare.
	(RtlUpcaseUnicodeString): Declare.
	(RtlInt64ToHexUnicodeString): Fix typo in comment.
	* string.h: Disable not NLS aware implementations of strcasematch
	and strncasematch.
	(cygwin_strcasecmp): Declare.
	(strcasecmp): Define as cygwin_strcasecmp.
	(cygwin_strncasecmp): Declare.
	(strncasecmp): Define as cygwin_strncasecmp.
	(strcasematch):Define using cygwin_strcasecmp.
	(strncasematch):Define using cygwin_strncasecmp.
	(cygwin_strlwr): Declare.
	(strlwr): Define as cygwin_strlwr.
	(cygwin_strupr): Declare.
	(strupr): Define as cygwin_strupr.
	* wchar.h: New file.
	* wincap.cc (wincapc::init): Use "NT" as fix OS string.
	* winsup.h (strcasematch): Drop declaration.
	(strncasematch): Ditto.
	(strcasestr): Ditto.
2007-12-12 12:12:24 +00:00