Add __nl_item to <sys/_types.h> for FreeBSD compatibility. Use it in
<langinfo.h> and the Cygwin <nl_types.h>. Make the enum __nl_item in
<langinfo.h> anonymous.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Add __tls_get_addr() for all targets to crt0. This is not only used on
ARM. In particular, it is used on RISC-V. This helps to adequately
support the GCC libgomp.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de
strtof ("-nan") returned positive NaN instead of negative NaN.
strtod ("-nan") and strtold ("-nan") return negative NaN.
Linux glibc has been fixed
that strto{f|d|ld} ("-nan") returns negative NaN.
https://sourceware.org/bugzilla/show_bug.cgi?id=23007
This commit makes strtof preserves the negative sign bit
when parsing "-nan" like glibc.
By previous commit, strto{d|ld} ("nan")
does not use the definition of NaN.
There is no other function that uses the definitions.
This commit remove the definitions.
The definition of qNaN for x86_64 and i386 was wrong.
strto{d|ld} ("nan") returned wrong negative NaN
instead of correct positive NaN
since it used the wrong definition.
On the other hand, strtof ("nan") returns correct positive NaN
since it uses nanf ("") instead of the wrong definition.
This commit makes strto{d|ld} ("nan") uses {nan|nanl} ("")
like strtof ("nan") using.
So strto{d|ld} ("nan") returns positive NaN.
Improve comments in sincosf implementation to make the code easier
to understand. Rename the constant pi64 to pi63 since it's actually
PI * 2^-63. Add comments for fields of sincos_t structure. Add comments
describing implementation details to reduce_fast.
wordexp uses fprintf in a dangerous way. It uses an unchecked
input string as format string, rather than as parameter to a %s.
Replace fprintf with fputs.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Introduce new host configuration variable "have_init_fini" which is set
to "yes" by default. Override it for RISC-V to "no".
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Restore FreeBSD compatibility for __alloc_size() and __alloc_align().
This is a follow-up to commit e494b560350cabef94126a4478096aae89ae35a0.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
At least on GCC7 calling __alloc_size(x) twice is not equivalent to
calling using the attribute once with two arguments. The later is the
documented use in GCC documentation so add a new alloc_size(n, x)
alternative to cover for the few places where it is used: basically:
calloc(3), reallocarray(3) and mallocarray(9).
Submitted by: Mark Millard
MFC after: 3 days
Reference:
http://docs.freebsd.org/cgi/mid.cgi?F227842D-6BE2-4680-82E7-07906AF61CD7
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
GCC only activates C11 keywords in C mode, not C++ mode. This means
that when targeting an older C++ standard, we cannot fall back to using
_Static_assert(). In this case, do define _Static_assert() as a macro
that uses a typedef'ed array.
Discussed in: r322875 commit thread
Reported by: Mark MIllard
MFC after: 1 month
The previous version genenerated the following GCC note:
towctrans_l.c:44:1: note: offset of packed bit-field 'diff' has changed in GCC 4.4
caseconv_table [] = {
^~~~~~~~~~~~~~
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
The commit 46ba1675c457324b0eeef4670a09101ef3f34c50 accidently changed a
bit-field from signed to unsigned. The caseconv_entry::delta must be a
signed integer, see also "newlib/libc/ctype/caseconv.t".
Unfortunately, a standard GCC/Newlib build is done without
-Wsign-conversion. Using this warning option would have helped to avoid
this bug:
caseconv.t:2:22: warning: unsigned conversion from 'int' to 'unsigned int:17' changes value from '-32' to '131040' [-Wsign-conversion]
{0x0061, 25, TOUP, -32},
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This prevents errors like this:
newlib/libc/ctype/categories.c:6:3: error: width of 'first' exceeds its type
unsigned int first: 24;
^
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Exotic RTEMS targets can define this back to int32_t as an exception if
there are good reasons.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.
This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark in glibc is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
PREFER_FLOAT_COMPARISON setting was not correct as it could raise
spurious exceptions. Fixing it is easy: just use ISLESS(x, y) instead
of abstop12(x) < abstop12(y) with appropriate non-signaling definition
for ISLESS. However it seems this setting is not very useful (there is
only minor performance difference on various architectures), so remove
this option for now.
The !HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such
that when z = 1-tiny and c = 1 then rlo and rhi could have much larger
magnitude than r which later caused large rounding errors.
So do a nearest rounding instead of truncation at the split.
In newlib with default settings this was observable on some arm targets
that enable the new math code but has no fma.
The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the natural return type is int32_t.
The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.
But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.
In newlib with default settings only aarch64 is affected and there is no
significant code generation change with gcc after the patch.
Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch. This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.
This fix is for some platforms which do not have writev().
*perror.c: Use _write_r() instead of writev().
*psignal.c: Use write() insetad of writev().
Revise commit: d4f4e7ae1be1bcf8c021f2b0865aafc16b338aa3
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.547 ULP with inlined fma and fma contraction. It uses
a 1 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1584 bytes.
Note that the math.h header defines log2(x) to be log(x)/Ln2, this is
not changed, so the new code is only used if that macro is suppressed.
Improvements on Cortex-A72:
latency: 2.0x
thruput: 2.2x
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code. There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.
The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.
The file name has no special prefix to avoid any name collision with
existing files.
Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x