This prevents errors like this:
newlib/libc/ctype/categories.c:6:3: error: width of 'first' exceeds its type
unsigned int first: 24;
^
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Exotic RTEMS targets can define this back to int32_t as an exception if
there are good reasons.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.
This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark in glibc is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
This fix is for some platforms which do not have writev().
*perror.c: Use _write_r() instead of writev().
*psignal.c: Use write() insetad of writev().
Revise commit: d4f4e7ae1b
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
* (mkcategories): Fix a bug that outputs incorrect Unicode category
table for code point ranges.
* (categories.t): Rebuild it using the bug-fixed mkcategories.
This fixes the problem reported in the following post.
https://cygwin.com/ml/cygwin/2018-06/msg00248.html
Previously, "test 1 2 3 -a -b -c" was permuted to "test -a -b -c 1 2 3",
but "test 1 2 3 -abc" was left as "test 1 2 3 -abc".
Signed-off-by: Thomas Kindler <mail+newlib@t-kindler.de>
- when calculating a correction to align next brk to page boundary,
ensure that the correction is less than a page size
- if allocating the correction fails, ensure that the top size is
set to brk + sbrk_size (minus any front alignment made)
Signed-off-by: Jeff Johnston <jjohnstn@redhat.com>
When converting number of days since epoch (32-bits) to seconds,
calculations using 32-bit `long` overflow for years above 2038. Solve
this by casting number of days to `time_t` just before final
multiplication.
Signed-off-by: Freddie Chopin <freddie.chopin@gmail.com>
- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
The gdtoa implementation uses the type long, defined as Long, in lots
of code. For historical reason newlib defines Long as int32_t instead.
This works fine, as long as floating point exceptions are not enabled.
The conversion to 32 bit int can lead to a FE_INVALID situation.
Example:
const char *str = "121645100408832000.0";
char *ptr;
feenableexcept (FE_INVALID);
strtod (str, &ptr);
This leads to the following situation in strtod
double aadj;
Long L;
[...]
L = (Long)aadj;
For instance, on x86_64 the code here is
cvttsd2si %xmm0,%eax
At this point, aadj is 2529648000.0 in our example. The conversion to
32 bit %eax results in a negative int value, thus the conversion is
invalid. With feenableexcept (FE_INVALID), a SIGFPE is raised.
Fix this by always using 64 bit ints here if double is not a 32 bit type
to avoid this type of FP exceptions.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Classical function call recursion wastes a lot of stack space.
Each recursion level requires a full stack frame comprising all
local variables and additional space as dictated by the
processor calling convention.
This implementation instead stores the variables that are unique
for each recursion level in a parameter stack array, and uses
iteration to emulate recursion. Function call recursion is not
used until the array is full.
To ensure the stack consumption isn't worsened by this design, the
size of the parameter stack array is chosen to be similar to the
stack frame excluding the array. Each function call recursion level
can handle 8 iterative recursion levels.
Stack consumption will worsen when sorting tiny arrays that do not
need recursion (of 6 elements or less). It will be about equal for
up to 15 elements, and be an improvement for larger arrays. The best
case improvement is a stack size reduction down to about one quarter
of the stack consumption before the change.
A design where the parameter stack array is large enough for the
worst case recursion level was rejected because it would worsen
the stack consumption when sorting arrays smaller than about 1500
elements. The worst case is 31 levels on a 32-bit system.
A design with a dynamic parameter array size was rejected because
of limitations in some compilers.
The qsort algorithm splits the input array in three parts. The
left and right parts may need further sorting. One of them is
sorted by recursion, the other by iteration. This update ensures
that it is the smaller part that is chosen for recursion.
By choosing the smaller part, each recursion level will handle
less than half the array of the previous recursion level. Hence
the recursion depth is bounded to be less than log2(n) i.e. 1
level per significant bit in the array size n.
The update also includes code comments explaining the algorithm.
Newlib has a build configuration where syscalls can be directly
embedded in the newlib library rather than relying on libgloss.
This configuration was broken recently by an update to the libgloss
support for Arm that was not propagated to the syscalls interface in
newlib itself. This patch restores the build. It's essentially a
copy of https://sourceware.org/ml/newlib/2018/msg00128.html but there
are some other minor cleanups and changes that I've made at the same
time. None of those cleanups affect functionality.
The prototypes of the following functions have been updated: _link,
_sbrk, _getpid, _write, _swiwrite, _lseek, _swilseek, _read and
_swiread.
Signed-off-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
E.g. arm ABI requires -fshort-enums for bare-metal toolchains.
Given there are only 29 category enums, the compiler chooses an
8 bit enum type, so a size of 11 bits for the bitfield leads to
a compile time error:
error: width of 'cat' exceeds its type
enum category cat: 11;
^~~
Fix this by aligning the size of the category members to byte
borders.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Scripts do not try to acquire Unicode data by best-effort magic anymore.
Options supported:
-h for help
-i to copy Unicode data from /usr/share/unicode/ucd first
-u to download Unicode data from unicode.org first
If (despite of -i or -u if given) the necessary Unicode files are not
available locally, table generation is skipped, but no error code is
returned, so not to obstruct the build process if called from a Makefile.
E.g. arm ABI requires -fshort-enums for bare-metal toolchains.
Given there are only 29 category enums, the compiler chooses an
8 bit enum type, so a size of 11 bits for the bitfield leads to
a compile time error:
error: width of 'cat' exceeds its type
enum category cat: 11;
^~~
Fix this by aligning the size of the category members to byte
borders.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
touupper and toulower didn't return a value in all cases. Worse,
this only broke Cygwin when building without optimization for debug
purposes.
Why GCC neglects to notice this is a mystery.
While at it, fix formatting.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The tow* functions use an included case conversion table which can be
generated from Unicode data.
The isw* functions use a character categories table (provided by
categories.c) which can be generated from Unicode data.
Delegation between current-locale and specific-locale-dependent functions
was reverted towards the generic locale-dependent functions (*_l.c);
this is however only relevant on systems with non-Unicode wide character
locales, thus not on Cygwin.
Table categories.t and tag enumeration categories.cat provide
character class data for most of the isw* functions.
These data are generated from Unicode data.
Linux and FreeBSD use int as well. In addition, this fixes an Ada
incompatiblity problem on 64-bit targets. See also GCC:
gcc/ada/libgnarl/s-osinte__rtems.ads
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Locale modifier @cjkwide makes Unicode "ambiguous width" characters
wide. So ambiguous width characters can be enforced to have width 2
even in non-CJK locales. This gives e.g. users of "Powerline symbols"
the opportunity to adjust their width to the desired behaviour (and the
behaviour apparently expected by some tools) without having to set a CJK
locale and without losing consistence of terminal character width with
wcwidth/wcswidth locale width.
At least with Binutils 2.30 and GCC 7.3 we need symbol definitions
without the leading underscore.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This is a NetBSD-specific detail which does not apply to Newlib, causing
linking issues in certain scenarios:
https://cygwin.com/ml/cygwin/2018-01/msg00189.html
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
New optimized powf, logf, log2f, expf and exp2f yield worse performance
on Arm targets with only single precision instructions because the
double precision arithmetic is then implemented via softfloat routines.
This patch uses the old implementation when double precision
instructions are not available on Arm targets.
Testing: Built newlib with GCC's rmprofile Arm multilibs and compared
before/after -> only the above functions are changed and calls to them
(name change from logf to __ieee754_logf and similar). Testing the
changed function on a panel of values yields the same result before the
original patches to improve them and after this one. Double checking the
performance by looping the same panel of values being tested on Arm
Cortex-M4 does show the performance regression is fixed.
This patch fixes a syntax error in exit.c that was introduced during the
ANSI-fication of newlib. The patch fixes a compile-time issue that arises when
newlib is configured with the --enable-lite-exit feature.
Code path for _MB_CAPABLE scans for the '%' character and advances
'fmt' pointer past '%'. Code path for !_MB_CAPABLE leaved fmt pointing
to '%', which caused the state machine to go from START to DONE state
immediately.
Neither upstream FreeBSD nor glibc ever call fflush from ftell
and friends. In border cases it has the tendency to return
wrong or unexpected values, for instance on block devices.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Make prototype of _kill() always visible when _COMPILING_NEWLIB is
defined. This makes <sys/signal.h> consistent with the use of
_COMPILING_NEWLIB in <sys/unistd.h>, <sys/times.h>, etc.
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Old BSD bug: While ^ is recognized and the set of matching characters
is negated, the code neglects to increment the pointer pointing to the
matching characters. Thus, on a negation expression like %[^xyz], the
matching doesn't only stop at x, y, or z, but incorrectly also on ^.
Fix this by setting the start pointer after recognizing the ^.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The following functions are also guarded in glibc:
fwprintf, swprintf, wprintf, vfwprintf, vswprintf, vwprintf.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
* vfscanf: per POSIX, if the target type is wchar_t, the width is
counted in (multibyte) characters, not in bytes.
* vfscanf: Handle UTF-8 multibyte sequences converted to surrogate
pairs on UTF-16 systems.
* vfwscanf: Don't count high surrogates in input against field width
counting. Per POSIX, input is
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The width value keeps the maximum field width. This is the maximum
field width of the *input*. It's *never* to be used in conjunction
with the number of bytes or characters written to the output argument.
However, especially in vfwscanf, the code is partially taken from
NetBSD which erroneously subtracts the number of multibyte chars
written to the argument from the width variable, thus potentially
subtracting up to MB_CUR_MAX from width for a single character in
the input stream.
To make matters worse, the previous patch adding %m added basically
the same mistake for 'c' type input.
Fix it.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* The new code is guarded with _WANT_IO_POSIX_EXTENSIONS, but
this is automatically enabled with _WANT_IO_C99_FORMATS for now.
* vfscanf neglects to implement %l[, so %ml[ is not implemented yet
either.
* Sidenote: vfwscanf doesn't allow ranges in %[ yet. Strictly this
is allowed per POSIX, but it differes from vfscanf as well as from
glibc.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The implementation is from NetBSD, with the addition of feature test macros
for readlink. glibc also wraps the following functions:
confstr, getdomainname, getgroups, gethostname, getlogin_r, getwd, pread,
readlinkat, ttyname_r.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
The implementation is mostly from NetBSD, except for switching fgets to
pure inline, and the addition of fgets_unlocked, fread, and fread_unlocked
for parity with glibc. The following functions are also guarded in glibc:
asprintf, dprintf, fprintf, printf, vasprintf, vdprintf, vfprintf, vprintf.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
The implementation is from NetBSD, with the addition of mempcpy (a GNU
extension) for parity with glibc and libssp.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
The Object Size Checking (-D_FORTIFY_SOURCE=*) functionality provides
wrappers around functions suspectible to buffer overflows. While
independent from Stack Smashing Protection (-fstack-protector*), they
are often used and implemented together.
While GCC also provides an implementation in libssp, it is completely
broken (CVE-2016-4973, RHBZ#1324759) and seemingly unfixable, as there
is no reliable way for a preprocessor macro to trigger a link flag.
Therefore, adding this here is necessary to make it work.
Note that this does require building gcc with --disable-libssp and
gcc_cv_libc_provides_ssp=yes.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Compiling with any of the -fstack-protector* flags requires the
__stack_chk_guard data import (which needs to be initialized) and the
__stack_chk_fail{,_local} functions. While GCC's own libssp can provide
these, it is better that we provide these ourselves. The implementation
is custom due to being OS-specific.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
The special handling of %\0 in [w]scanf is flawed. It's just a
matching failure and should be handled as such. scanf also
fakes an int input value on %X with X being an invalid conversion
char. This is also just a matching failure and should be handled
the same way as %\0.
There's no indication of the reason for this "disgusting
backwards compatibility hacks" in the logs, given this
code made it into newlib before setting up the CVS repo.
Just handle these cases identically as matching failures.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Since commit 8128f5482f, we have all the
non-tracing functions listed in posixoptions(7). The tracing functions
are gated by their own option, and are obsolecent anyway.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Difference to Linux: We can't create files which don't show up
in the filesystem due to OS restrictions. As a kludge, make a
(half-hearted) attempt to hide the file in the filesystem.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
The variable doesn't follow the convention of having the same name as
the function it's bundled with. Furthermore, it clashes with the
variable of the same name in newlib/libc/stdlib/calloc.c.
Signed-off-by: Florian Schmidt <florian.schmidt@neclab.eu>
RTEMS provides the option to have a global or per-thread reentrancy
as part of application configuration. As part of this, RTEMS provides
the implementation of __getreent() as appropriate. Allow the target
to determine if this method is present in libc.a.
Based on code from https://github.com/ARM-software/optimized-routines/
This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf. The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials. Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.
The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems. Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.
libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
In order to avoid the year 2038 problem, define time_t to a signed
integer with at least 64-bits. The type for time_t can be forced to
long with the --enable-newlib-long-time_t configure option or with the
_USE_LONG_TIME_T system configuration define.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
In case time_t is long, then the cast to long is a nop. In case time_t
is __int_least64_t, then the cast to long may truncate the value before
the division.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
In C++, the usage of static inline functions for getchar_unlocked and
putchar_unlocked may result in error messages like
error: ‘_putchar_unlocked’ was not declared in this scope
Fix this by not using the _getchar_unlocked and _putchar_unlocked
macros in C++.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Remove local strnstr() implementation to fix compile error:
newlib/libc/iconv/lib/aliasesi.c:53:8: error: conflicting types for 'strnstr'
_DEFUN(strnstr, (haystack, needle, length),
^
In file included from newlib/libc/iconv/lib/aliasesi.c:29:0:
newlib/libc/include/string.h:125:10:
note: previous declaration of 'strnstr' was here
char *strnstr(const char *, const char *, size_t) __pure;
^~~~~~~
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This reverts most of commit 979d467ff6.
We cannot avoid some bareword attributes until clang is fixed to
properly support __-decorated attributes; see this bug:
https://bugs.llvm.org/show_bug.cgi?id=34319
The macros in question expand to the empty string under gcc, so
only compilation under clang is affected, and since clang has the
bug, the obvious solution is to roll back the changes, and document
the issue.
Signed-off-by: Eric Blake <eblake@redhat.com>
- For prevent confuse about what BSD license variant we used, 2- or
3-clause license, we change the license to FreeBSD license to make
it unambiguously refers to the 2-clause license.
Always use the __-decorated form of an attribute name in public
headers, as the bareword form is in the user's namespace, and we
don't want compilation to break just because the user defines the
bareword to mean something else.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add internal inline functions _getchar_unlocked() and
_putchar_unlocked() if __CUSTOM_FILE_IO__ is not defined. These
functions get _REENT only once. Use them for getchar_unlocked() and
putchar_unlocked(). Define getchar() and putchar() to these unlocked
internal functions if __SINGLE_THREAD__ is defined, otherwise use the
external functions to use proper locking of the FILE object.
Assumes that __SINGLE_THREAD__ is not defined if __CYGWIN__ is defined.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This is copied from musl (MIT license). This is newer and more thorough
than that of FreeBSD currently shipped only on Cygwin.
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
With this change the arm platform can now be fully compiled with Clang.
Tested by comparing the output with GCC 4.8.2, and Clang 4.0, using a
variety of arches, big/little endianness, and arm/thumb mode to verify
the generated assembly output matches between GCC vs Clang with UAL, and
also GCC with UAL vs GCC with non-UAL, for all preprocessor code blocks.
The only difference found is an extra nop at the end of the function
when compiled with GCC using armv7-a/thumb/little-endian/-O2 compared to
Clang. The nop is not emitted when compiled in big-endian mode.
Include <strings.h> in <string.h> if __BSD_VISIBLE like on FreeBSD.
Remove redundant declarations from <string.h>. Make ffsl(), ffsll(),
strncasecmp(), strcasecmp_l(), and strncasecmp_l() visible via
__BSD_VISIBLE instead of __GNU_VISIBLE. Add fls(), flsl(), and flsll()
to <strings.h> if __BSD_VISIBLE.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Use memset() to implement bzero() to profit from machine-specific
memset() optimizations.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
In Newlib, the stdio streams are defined to thread-specific pointers
_reent::_stdin, _reent::_stdout and _reent::_stderr. In case
_REENT_SMALL is not defined, then these pointers are initialized via
_REENT_INIT_PTR() or _REENT_INIT_PTR_ZEROED() to thread-specific FILE
objects provided via _reent::__sf[3]. There are two problems with this
(at least in case of RTEMS).
(1) The thread-specific FILE objects are closed by _reclaim_reent().
This leads to problems with language run-time libraries that provide
wrappers to the C/POSIX stdio streams (e.g. C++ and Ada), since they
use the thread-specific FILE objects of the initialization thread. In
case the initialization thread is deleted, then they use freed memory.
(2) Since thread-specific FILE objects are used with a common output
device via file descriptors 0, 1 and 2, the locking at FILE object level
cannot ensure atomicity of the output, e.g. a call to printf().
Introduce a new Newlib configuration option _REENT_GLOBAL_STDIO_STREAMS
to enable the use of global stdio FILE objects.
As a side-effect this reduces the size of struct _reent by more than
50%.
The _REENT_GLOBAL_STDIO_STREAMS should not be used without
_STDIO_CLOSE_PER_REENT_STD_STREAMS.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>