Update the offsets used to save registers into the stejmp jmp_buf
structure in order to:
* Avoid writing the supervision register outside the buffer and thus
clobbering something on the stack. Previously the supervision register
was written at offset 124 while the buffer was of length 124.
* Shrink the jmp_buf down to the size actually needed, by avoiding holes
at the locations of omitted registers.
Add support for the AMD GCN GPU architecture. This is primarily intended for
use with OpenMP and OpenACC offloading. It can also be used for stand-alone
programs, but this is intended mostly for testing the compiler and is not
expected to be useful in general.
The GPU architecture is highly parallel, and therefore Newlib must be
configured to use dynamic re-entrancy, and thread-safe malloc.
The only I/O available is a via a shared-memory interface provided by libgomp
and the gcn-run tool included with GCC. At this time this is limited to
stdout, argc/argv, and the return code.
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
New optimized powf, logf, log2f, expf and exp2f yield worse performance
on Arm targets with only single precision instructions because the
double precision arithmetic is then implemented via softfloat routines.
This patch uses the old implementation when double precision
instructions are not available on Arm targets.
Testing: Built newlib with GCC's rmprofile Arm multilibs and compared
before/after -> only the above functions are changed and calls to them
(name change from logf to __ieee754_logf and similar). Testing the
changed function on a panel of values yields the same result before the
original patches to improve them and after this one. Double checking the
performance by looping the same panel of values being tested on Arm
Cortex-M4 does show the performance regression is fixed.
Based on code from https://github.com/ARM-software/optimized-routines/
This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf. The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials. Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.
The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems. Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.
libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
Provide __intmax_t and __uintmax_t via <machine/_default_types.h> and
define intmax_t and uintmax_t in <sys/_stdint.h> for FreeBSD
compatibility.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Resurrect <machine/_user_types.h> for use in <sys/types.h>. Newlib
targets may provide an own version of <machine/types.h> in their machine
directory to add custom user types for <sys/types.h>. Check the
_SYS_TYPES_H header guard to prevent a direct include of
<machine/types.h>, since the <machine/types.h> file is a Newlib
speciality.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This change solves a glibc/BSD compatibility problem.
glibc and BSD use double underscore types for internal types. The Linux
port of Newlib uses some glibc provided internal type definitions which
are not protected by guard defines, e.g. __off_t. To avoid a conflict
Newlib uses single underscore types for some internal types, e.g.
_off_t. However, for BSD compatibility we have to define the internal
types with double underscore names in <sys/_types.h>.
The header file <machine/types.h> is Newlib-specific. It was used
instead of <sys/_types.h> to provide the internal type definitions
_CLOCK_T, _TIME_T_, _CLOCKID_T_, _TIMER_T_, and __suseconds_t. Move
these definitions to <sys/_types.h> (there exist two instances of this
file, one for Linux and one for all other targets). This makes the
_HAVE_SYSTYPES configuration define obsolete (could possibly break the
__RDOS__ target). Use the standard <sys/_types.h> include throughout.
Move __loff_t defintion to default (non-Linux) <sys/_types.h>. Define
it via _off64_t to avoid a dependency on the compiler.
Provide the __off_t definition via default (non-Linux) <sys/_types.h>
based on _off_t for all systems except Cygwin. For Cygwin use _off64_t.
Define off_t via __off_t.
Provide the __pid_t definition via default (non-Linux) <sys/_types.h>.
This prevents a potential __pid_t and pid_t incompatibility. Add BSD
guard defines for pid_t.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Move the kernel dependent parts of <sys/time.h> to new system-specific
header file <machine/_time.h>. Provide an empty default implementation.
Add a specialized implementation for RTEMS.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Intel MCU System V ABI are incompartible with i386 System V ABI:
o Minimum instruction set is Intel Pentium ISA minus x87 instructions
o No x87 or vector registers
o First three args are passed in %eax, %edx and %ecx
o Full specification available here:
https://github.com/hjl-tools/x86-psABI/wiki/iamcu-psABI-0.7.pdf
newlib/
* configure.host: Add new ix86-*-elfiamcu target
newlib/libc/include/
* setjmp.h: Change _JBLEN for Intel MCU target
newlib/libc/machine/i386/
* memchr.S: (memchr) Target-specific size-optimized version
* memcmp.S: (memcmp) Likewise
* memcpy.S: (memcpy) Likewise
* memmove.S: (memmove) Likewise
* memset.S: (memset) Likewise
* setjmp.S: (setjmp) Likewise
* strchr.S: (strchr) Likewise
* strlen.S: (strlen) Likewise
newlib/libc/stdlib/
* srtold.c: (__flt_rounds) Disable for Intel MCU
According to the OpenBSD man page, "A Replacement Call for Random". It
offers high quality random numbers derived from input data obtained by
the OpenBSD specific getentropy() system call which is declared in
<unistd.h> and must be implemented for each Newlib port externally. The
arc4random() functions are used for example in LibreSSL and OpenSSH.
Cygwin provides currently its own implementation of the arc4random
family. Maybe it makes sense to use this getentropy() implementation:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libcrypto/crypto/getentropy_win.c?rev=1.4&content-type=text/x-cvsweb-markup
* libc/include/stdlib.h (arc4random): Declare if __BSD_VISIBLE.
(arc4random_buf): Likewise.
(arc4random_uniform): Likewise.
* libc/include/sys/unistd.h (getentropy): Likewise.
* libc/include/machine/_arc4random.h: New file.
* libc/stdlib/arc4random.c: Likewise.
* libc/stdlib/arc4random.h: Likewise.
* libc/stdlib/arc4random_uniform.c: Likewise.
* libc/stdlib/chacha_private.h: Likewise.
* libc/sys/rtems/include/machine/_arc4random.h: Likewise.
* libc/stdlib/Makefile.am (EXTENDED_SOURCES): Add arc4random.c
and arc4random_uniform.c.
* libc/stdlib/Makefile.in: Regenerate.
* libc/include/machine/setjmp.h (siglongjmp): Declare as function on
Cygwin.
(sigsetjmp): Ditto.
(_longjmp): Mark as noreturn function on Cygwin.
* common.din (siglongjmp): Export.
(sigsetjmp): Export.
* gendef: Change formatting of some comments.
(sigsetjmp): Implement.
(siglongjmp): Implement.
(__setjmpex): x86_64 only: Drop entry point.
(setjmp): x86_64 only: Store tls stackptr in Frame now, store MXCSR
and FPUCW registers in Spare, as MSVCRT does.
(longjmp): x86_64 only: Restore tls stackptr from Frame now, restore
MXCSR and FPUCW registers from Spare.
* include/cygwin/version.h (CYGWIN_VERSION_API_MINOR): Bump.
* new-features.xml (ov-new2.2): Document sigsetjmp, siglongjmp.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
from the 64-bit _JBTYPE definition.
* libc/machine/mips/setjmp.S: Re-work the o32 FP64 support to match
the now one-and-only supported o32 FP64 ABI extension. Also
support o32 FPXX.
header includes. Include <sys/features.h> for
__GNUC_PREREQ__().
(__u?int.*_t): Define via GCC provided __U?INT.*_TYPE__ if
available.
(__intptr_t): Define.
(__uintptr_t): Likewise.
* libc/include/stdint.h: Include <machine/_default_types.h>
instead of <_ansi.h>.
(u?int.*_t): Define via __u?int.*_t provided by
<machine/_default_types.h>.
(u?int_fast.*_t): Define via GCC provided
__U?INT_FAST.*_TYPE__ if available.
(U?INT.*(MIN|MAX)): Define via GCC provided __U?INT.*(MIN|MAX)__
if available.
(U?INT.*_C): Define via GCC provided __U?INT.*_C if available.
* libc/include/sys/cdefs.h: Use <machine/_default_types.h>
instead of <stdint.h>.
* libc/sys/rtems/sys/cpuset.h: Likewise.
* libc/sys/rtems/machine/_types.h: Include <stdint.h> for
FreeBSD compatibility.
* libc/include/machine/setjmp.h: Add support for __mips_fpr being
64 and treat it the same as if __mips64 is set.
* libc/machine/mips/setjmp.S: Ditto, plus add checks for _MIPS_SIM
being _ABIN32 and _ABI64.
* libc/include/sys/features.h: Redefine compilation environment
definitions for Cygwin to cover 64 bit Cygwin.
* libc/ctype/ctype_.c (_ctype_): Fix definition for 64 bit Cygwin.
* libc/include/machine/setjmp.h: Change definition of _JBLEN to allow
different values for 32 bit and 64 bit Cygwin.
* libc/include/reent.h (stat64): Define as stat under Cygwin, instead
of as __stat64. Undef stat64 if not building Newlib.
* libc/include/sys/stat.h (stat64): Define as stat under Cygwin.
throughout in place of explicit GNUC version checks.
* libc/include/_ansi.h (_NOINLINE): Define.
(_NOINLINE_STATIC): Define.
* libc/stdio/vfprintf.c (__sbprintf): Define _NOINLINE_STATIC.