https://cygwin.com/pipermail/cygwin/2022-December/252571.html
Cygwin's default locale is "C.UTF-8" as far as LC_CTYPE settings
are concerned. However, while __global_locale contains fixed
mbtowc and wctomb pointers, the lc_ctype_T pointer is still pointing
to _C_ctype_locale, representing the standard "C" locale.
The problem with this is that the codeset name as well as MB_CUR_MAX
is wrong.
Fix this by introducing a new lc_ctype_T structure called
_C_utf8_ctype_locale, setting the default codeset to "UTF-8" and
MB_CUR_MAX to 6. Use this as lc_ctype_T pointer in __global_locale
by default on Cygwin.
Fixes: a6a477fa81 ("POSIX-1.2008 per-thread locales, groundwork part 1")
Co-Authored-By: Takashi Yano <takashi.yano@nifty.ne.jp>
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This simple testcase:
locale_t st = newlocale(LC_ALL_MASK, "C", (locale_t)0);
locale_t st2 = newlocale(LC_CTYPE_MASK, "en_US.UTF-8", st);
is sufficient to reproduce a crash in _newlocale_r. After the first call
to newlocale, `st' points to __C_locale, which is const. When using `st'
as locale base in the second call, _newlocale_r tries to set pointers
inside base to NULL. This is bad if base is __C_locale, obviously.
Add a test to avoid trying to overwrite pointer values inside base if
base is __C_locale.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
By default, Newlib uses a huge object of type struct _reent to store
thread-specific data. This object is returned by __getreent() if the
__DYNAMIC_REENT__ Newlib configuration option is defined.
The reentrancy structure contains for example errno and the standard input,
output, and error file streams. This means that if an application only uses
errno it has a dependency on the file stream support even if it does not use
it. This is an issue for lower end targets and applications which need to
qualify the software according to safety standards (for example ECSS-E-ST-40C,
ECSS-Q-ST-80C, IEC 61508, ISO 26262, DO-178, DO-330, DO-333).
If the new _REENT_THREAD_LOCAL configuration option is enabled, then struct
_reent is replaced by dedicated thread-local objects for each struct _reent
member. The thread-local objects are defined in translation units which use
the corresponding object.
Add a _REENT_LOCALE() macro to encapsulate access to the _locale
member of struct reent. This will help to replace the struct
member with a thread-local storage object in a follow up patch.
Convert all the libc/ subdir makes into the top-level Makefile. This
allows us to build all of libc from the top Makefile without using any
recursive make calls. This is faster and avoids the funky lib.a logic
where we unpack subdir archives to repack into a single libc.a. The
machine override logic is maintained though by way of Makefile include
ordering, and source file accumulation in libc_a_SOURCES.
There's a few dummy.c files that are no longer necessary since we aren't
doing the lib.a accumulating, so punt them.
The winsup code has been pulling the internal newlib ssp library out,
but that doesn't exist anymore, so change that to pull the objects.
This kills off the last configure script under libc/ and folds it
into the top newlib configure script. The a lot of the logic was
already in the top configure script, so move what's left into a
libc/acinclude.m4 file.
The crt0.o was handled in a subdir-by-subdir basis: it would be compiled
in one (e.g. libc/sys/$arch/), then copied up one level (libc/sys/), then
copied up another (libc/) before finally being copied & installed in the
top newlib dir. The libc/sys/ copy was cleaned up, and then the top dir
was changed to copy it directly out of the libc/sys/$arch/ dir. But the
libc/sys/ copy to libc/ was left behind. Clean that up now too.
This is used in a bunch of places, but nowhere is it ever set, and
nowhere can I find any documentation, nor can I find any other project
using it. So delete the flags to simplify.
This was only ever used for i?86-pc-linux-gnu targets, but that's been
broken for years, and has since been dropped. So clean this up too.
This also deletes the funky objectlist logic since it only existed for
the libtool libraries. Since it was the only thing left in the small
Makefile.shared file, we can punt that too.
Now that we use AC_NO_EXECUTABLES, and we require a recent version of
autoconf, we don't need to define our own copies of these macros. So
switch to the standard AC_PROG_CC.
This allows building the libc & libm pages in parallel, and drops
the duplication in the subdirs with the chew/chapter settings.
The unused rules in Makefile.shared are left in place to minimize
noise in the change.
This doesn't migrate all the docs, just the libc's manual (pdf/info).
This is to show the basic form of migrating the chew files.
For subdirs that didn't have any docs, I've stripped their settings
for clarity. If someone wanted to suddenly add docs, they can add
the corresponding Makefile.inc files easily.
THe stdio subdir is actually required by the documentation. The
stdio/def is handled dynamically, but libc.texi always expects it
to be included, and fails if it isn't. So making it required when
building docs is safe.
The xdr subdir is handled dynamically, but it doesn't include any
docs, so the dynamic logic isn't (currently) adding any value. So
making it required when building docs is safe.
That leaves: iconv, stdio64, posix, and signal subdirs. The chapters
have a little disclaimer saying they are system-dependent, but even
then, imo having stable manuals regardless of the target is preferable,
and we can add more disclaimer language to these chapters if we want.
This doesn't touch the man page codepaths, just the info/pdf.
When using the top-level configure script but subdir Makefiles, the
newlib_basedir value gets a bit out of sync: it's relative to where
configure lives, not where the Makefile lives. Move the abs setting
from the top-level configure script into acinclude.m4 so we can rely
on it being available everywhere. Although this commit doesn't use
it anywhere, just lays the groundwork.
The machine configure scripts are all effectively stub scripts that
pass the higher level options to its own makefile. There were only
three doing custom tests. The rest were all effectively the same as
the libc/ configure script.
So instead of recursively running configure in all of these subdirs,
generate their makefiles from the top-level configure. For the few
unique ones, deploy a pattern of including subdir logic via m4:
m4_include([machine/nds32/acinclude.m4])
Some of the generated machine makefiles have a bunch of extra stuff
added to them, but that's because they were inconsistent in their
configure libtool calls. The top-level has it, so it exports some
new vars to the ones that weren't already.
The sys/{configure,Makefile} files exist to fan out to the specific
sys/$arch/ subdir, and to possibly generate a crt0. We already have
all that same info in the libc/ dir itself, so by moving the recursive
configure and make calls into it, we can cut off some of this logic
entirely and save the overhead.
For arches that don't have a sys subdir, it means they can skip the
logic entirely.
The sys subdir itself is kept for the crt0 logic, for now. We'll try
and clean that up next.
The machine/{configure,Makefile} files exist only to fan out to the
specific machine/$arch/ subdir. We already have all that same info
in the libc/ dir itself, so by moving the recursive configure and
make calls into it, we can cut off this logic entirely and save the
overhead.
For arches that don't have a machine subdir, it means they can skip
the logic entirely. Although there's prob not too many of those.
This was added decades ago, but the commit message lacks any
explanation, and it was unused when it was merged. It's still
unused today. So punt it all.
This matches what the other GNU toolchain projects have done already.
The generated diff in practice isn't terribly large. This will allow
more use of subdir local.mk includes due to fixes & improvements that
came after the 1.11 release series.
The newlib & libgloss dirs are already generated using autoconf-2.69.
To avoid merging new code and/or accidental regeneration using diff
versions, leverage config/override.m4 to pin to 2.69 exactly. This
matches what gcc/binutils/gdb are already doing.
The README file already says to use autoconf-2.69.
To accomplish this, it's just as simple as adding -I flags to the
top-level config/ dir when running aclocal. This is because the
override.m4 file overrides AC_INIT to first require the specific
autoconf version before calling the real AC_INIT.
Since automake deprecated the INCLUDES name in favor of AM_CPPFLAGS,
change all existing users over. The generated code is the same since
the two variables have been used in the same exact places by design.
There are other cleanups to be done, but lets focus on just renaming
here so we can upgrade to a newer automake version w/out triggering
new warnings.
The 'cygnus' option was removed from automake 1.13 in 2012, so the
presence of this option prevents that or a later version of automake
being used.
A check-list of the effects of '--cygnus' from the automake 1.12
documentation, and steps taken (where possible) to preserve those
effects (See also this thread [1] for discussion on that):
[1] https://lists.gnu.org/archive/html/bug-automake/2012-03/msg00048.html
1. The foreign strictness is implied.
Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4
2. The options no-installinfo, no-dependencies and no-dist are implied.
Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4
Future work: Remove no-dependencies and any explicit header dependencies,
and use automatic dependency tracking instead. Are there explicit rules
which are now redundant to removing no-installinfo and no-dist?
3. The macro AM_MAINTAINER_MODE is required.
Already present in newlib/acinclude.m4
Note that maintainer-mode is still disabled by default.
4. Info files are always created in the build directory, and not in the
source directory.
This appears to be an error in the automake documentation describing
'--cygnus' [2]. newlib's info files are generated in the source
directory, and no special steps are needed to keep doing that.
[2] https://lists.gnu.org/archive/html/bug-automake/2012-04/msg00028.html
5. texinfo.tex is not required if a Texinfo source file is specified.
(The assumption is that the file will be supplied, but in a place that
automake cannot find.)
This effect is overriden by an explicit setting of the TEXINFO_TEX
variable (the directory part of which is fed into texi2X via the
TEXINPUTS environment variable).
6. Certain tools will be searched for in the build tree as well as in the
user's PATH. These tools are runtest, expect, makeinfo and texi2dvi.
For obscure automake reasons, this effect of '--cygnus' is not active
for makeinfo in newlib's configury.
However, there appears to be top-level configury which selects in-tree
runtest, expect and makeinfo, if present. So, if that works as it
appears, this effect is preserved. If not, this may cause problem if
anyone is building those tools in-tree.
This effect is not preserved for texi2dvi. This may cause problems if
anyone is building texinfo in-tree.
If needed, explicit checks for those tools looking in places relative to
$(top_srcdir)/../ as well as in PATH could be added.
7. The check target doesn't depend on all.
This effect is not preseved. The check target now depends on the all
target.
This concern seems somewhat academic given the current state of the
testsuite.
Also note that this doesn't touch libgloss.
So far the build mechanism in newlib only allowed to either define
machine-specific headers, or headers shared between all machines.
In some cases, architectures are sufficiently alike to share header
files between them, but not with other architectures. A good example
is ix86 vs. x86_64, which share certain traits with each other, but
not with other architectures.
Introduce a new configure variable called "shared_machine_dir". This
dir can then be used for headers shared between architectures.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
When __HAVE_LOCALE_INFO__ is not selected, directly access the
existing _ctype_ variable from __locale_ctype_ptr() and
__locale_ctype_ptr_l(), eliminating the need for any locale or reent
structure
Signed-off-by: Keith Packard <keithp@keithp.com>
v2:
locale: fix conflict with __locale_ctype_ptr macro
If we are building without __HAVE_LOCALE_INFO__, there is a
macro providing __locale_ctype_ptr which in turn fouls up this
declaration.
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Locale modifier @cjkwide makes Unicode "ambiguous width" characters
wide. So ambiguous width characters can be enforced to have width 2
even in non-CJK locales. This gives e.g. users of "Powerline symbols"
the opportunity to adjust their width to the desired behaviour (and the
behaviour apparently expected by some tools) without having to set a CJK
locale and without losing consistence of terminal character width with
wcwidth/wcswidth locale width.
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Problem:
After passing locales created by 'duplocale' to 'uselocale',
referencing 'MB_CUR_MAX', which is actually expanded to
'__locale_mb_cur_max()' by preprocessors, causes segmentation faults.
Direct use of locales from 'newlocale' does not cause the problem.
This is the problem of 'duplocale'.
$ echo $LANG
ja_JP.UTF-8
$ cat test.c
#include <stdlib.h>
#include <locale.h>
volatile int var;
int main(void) {
locale_t const loc = newlocale(LC_ALL_MASK, "", NULL);
locale_t const dup = duplocale(loc);
locale_t const old = uselocale(dup);
var = MB_CUR_MAX; /* <-- crashes here */
uselocale(old);
freelocale(dup);
freelocale(loc);
return 0;
}
$ gcc test.c
$ ./a
Segmentation fault (core dumped)
# Note: "core dumped" in the above message was actually written in
# Japanese, but I translated the part to post a mail in English.
Bug:
In the beginning of '__loadlocale' (newlib/libc/locale/locale.c:501),
there is a code which checks if the operations can be skipped:
> /* Avoid doing everything twice if nothing has changed. */
> if (!strcmp (new_locale, loc->categories[category]))
> return loc->categories[category];
While, in the function '_duplocale_r' (newlib/libc/locale/
duplocale.c), '__loadlocale' is called as in the quoted codes:
> /* If the object is not a "C" locale category, copy it. Just call
> __loadlocale. It knows what to do to replicate the category. */
> tmp_locale.lc_cat[i].ptr = NULL;
> tmp_locale.lc_cat[i].buf = NULL;
> if (!__loadlocale (&tmp_locale, i, tmp_locale.categories[i]))
> goto error;
This call of '__loadlocale' results in the skip check being
!strcmp(tmp_locale.categories[i], tmp_locale.categories[i]),
which is always true. This means that the actual operations of
'__loadLocale' will never be performed for 'duplocale'.
Fix:
The call of '__loadlocale' in '_duplocale_r' is modified.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Provide an extension NL_LOCALE_NAME() macro, with semantics
matching glibc, which can be used as:
nl_langinfo_l(NL_LOCALE_NAME(LC_MESSAGES), locale);
to get back the locale string that locale was originally
created with during newlocale(). This in turn allows a library
(such as gettext) to determine what thread-local locale settings
it has inherited from the main program without having to be told
what parameters were passed to newlocale(), for less overall
coupling between parts of the program.
gnulib is set up to use the extension:
https://lists.gnu.org/archive/html/bug-gnulib/2017-01/msg00129.html
* libc/include/langinfo.h (NL_LOCALE_NAME): New macro
* libc/locale/nl_langinfo.c (nl_langinfo_l): Expose locale names
of a locale_t's category components.
Signed-off-by: Eric Blake <eblake@redhat.com>