- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
New optimized powf, logf, log2f, expf and exp2f yield worse performance
on Arm targets with only single precision instructions because the
double precision arithmetic is then implemented via softfloat routines.
This patch uses the old implementation when double precision
instructions are not available on Arm targets.
Testing: Built newlib with GCC's rmprofile Arm multilibs and compared
before/after -> only the above functions are changed and calls to them
(name change from logf to __ieee754_logf and similar). Testing the
changed function on a panel of values yields the same result before the
original patches to improve them and after this one. Double checking the
performance by looping the same panel of values being tested on Arm
Cortex-M4 does show the performance regression is fixed.
Based on code from https://github.com/ARM-software/optimized-routines/
This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf. The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials. Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.
The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems. Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.
libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
Provide __intmax_t and __uintmax_t via <machine/_default_types.h> and
define intmax_t and uintmax_t in <sys/_stdint.h> for FreeBSD
compatibility.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Resurrect <machine/_user_types.h> for use in <sys/types.h>. Newlib
targets may provide an own version of <machine/types.h> in their machine
directory to add custom user types for <sys/types.h>. Check the
_SYS_TYPES_H header guard to prevent a direct include of
<machine/types.h>, since the <machine/types.h> file is a Newlib
speciality.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This change solves a glibc/BSD compatibility problem.
glibc and BSD use double underscore types for internal types. The Linux
port of Newlib uses some glibc provided internal type definitions which
are not protected by guard defines, e.g. __off_t. To avoid a conflict
Newlib uses single underscore types for some internal types, e.g.
_off_t. However, for BSD compatibility we have to define the internal
types with double underscore names in <sys/_types.h>.
The header file <machine/types.h> is Newlib-specific. It was used
instead of <sys/_types.h> to provide the internal type definitions
_CLOCK_T, _TIME_T_, _CLOCKID_T_, _TIMER_T_, and __suseconds_t. Move
these definitions to <sys/_types.h> (there exist two instances of this
file, one for Linux and one for all other targets). This makes the
_HAVE_SYSTYPES configuration define obsolete (could possibly break the
__RDOS__ target). Use the standard <sys/_types.h> include throughout.
Move __loff_t defintion to default (non-Linux) <sys/_types.h>. Define
it via _off64_t to avoid a dependency on the compiler.
Provide the __off_t definition via default (non-Linux) <sys/_types.h>
based on _off_t for all systems except Cygwin. For Cygwin use _off64_t.
Define off_t via __off_t.
Provide the __pid_t definition via default (non-Linux) <sys/_types.h>.
This prevents a potential __pid_t and pid_t incompatibility. Add BSD
guard defines for pid_t.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Move the kernel dependent parts of <sys/time.h> to new system-specific
header file <machine/_time.h>. Provide an empty default implementation.
Add a specialized implementation for RTEMS.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Intel MCU System V ABI are incompartible with i386 System V ABI:
o Minimum instruction set is Intel Pentium ISA minus x87 instructions
o No x87 or vector registers
o First three args are passed in %eax, %edx and %ecx
o Full specification available here:
https://github.com/hjl-tools/x86-psABI/wiki/iamcu-psABI-0.7.pdf
newlib/
* configure.host: Add new ix86-*-elfiamcu target
newlib/libc/include/
* setjmp.h: Change _JBLEN for Intel MCU target
newlib/libc/machine/i386/
* memchr.S: (memchr) Target-specific size-optimized version
* memcmp.S: (memcmp) Likewise
* memcpy.S: (memcpy) Likewise
* memmove.S: (memmove) Likewise
* memset.S: (memset) Likewise
* setjmp.S: (setjmp) Likewise
* strchr.S: (strchr) Likewise
* strlen.S: (strlen) Likewise
newlib/libc/stdlib/
* srtold.c: (__flt_rounds) Disable for Intel MCU
According to the OpenBSD man page, "A Replacement Call for Random". It
offers high quality random numbers derived from input data obtained by
the OpenBSD specific getentropy() system call which is declared in
<unistd.h> and must be implemented for each Newlib port externally. The
arc4random() functions are used for example in LibreSSL and OpenSSH.
Cygwin provides currently its own implementation of the arc4random
family. Maybe it makes sense to use this getentropy() implementation:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libcrypto/crypto/getentropy_win.c?rev=1.4&content-type=text/x-cvsweb-markup
* libc/include/stdlib.h (arc4random): Declare if __BSD_VISIBLE.
(arc4random_buf): Likewise.
(arc4random_uniform): Likewise.
* libc/include/sys/unistd.h (getentropy): Likewise.
* libc/include/machine/_arc4random.h: New file.
* libc/stdlib/arc4random.c: Likewise.
* libc/stdlib/arc4random.h: Likewise.
* libc/stdlib/arc4random_uniform.c: Likewise.
* libc/stdlib/chacha_private.h: Likewise.
* libc/sys/rtems/include/machine/_arc4random.h: Likewise.
* libc/stdlib/Makefile.am (EXTENDED_SOURCES): Add arc4random.c
and arc4random_uniform.c.
* libc/stdlib/Makefile.in: Regenerate.
* libc/include/machine/setjmp.h (siglongjmp): Declare as function on
Cygwin.
(sigsetjmp): Ditto.
(_longjmp): Mark as noreturn function on Cygwin.
* common.din (siglongjmp): Export.
(sigsetjmp): Export.
* gendef: Change formatting of some comments.
(sigsetjmp): Implement.
(siglongjmp): Implement.
(__setjmpex): x86_64 only: Drop entry point.
(setjmp): x86_64 only: Store tls stackptr in Frame now, store MXCSR
and FPUCW registers in Spare, as MSVCRT does.
(longjmp): x86_64 only: Restore tls stackptr from Frame now, restore
MXCSR and FPUCW registers from Spare.
* include/cygwin/version.h (CYGWIN_VERSION_API_MINOR): Bump.
* new-features.xml (ov-new2.2): Document sigsetjmp, siglongjmp.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
from the 64-bit _JBTYPE definition.
* libc/machine/mips/setjmp.S: Re-work the o32 FP64 support to match
the now one-and-only supported o32 FP64 ABI extension. Also
support o32 FPXX.
header includes. Include <sys/features.h> for
__GNUC_PREREQ__().
(__u?int.*_t): Define via GCC provided __U?INT.*_TYPE__ if
available.
(__intptr_t): Define.
(__uintptr_t): Likewise.
* libc/include/stdint.h: Include <machine/_default_types.h>
instead of <_ansi.h>.
(u?int.*_t): Define via __u?int.*_t provided by
<machine/_default_types.h>.
(u?int_fast.*_t): Define via GCC provided
__U?INT_FAST.*_TYPE__ if available.
(U?INT.*(MIN|MAX)): Define via GCC provided __U?INT.*(MIN|MAX)__
if available.
(U?INT.*_C): Define via GCC provided __U?INT.*_C if available.
* libc/include/sys/cdefs.h: Use <machine/_default_types.h>
instead of <stdint.h>.
* libc/sys/rtems/sys/cpuset.h: Likewise.
* libc/sys/rtems/machine/_types.h: Include <stdint.h> for
FreeBSD compatibility.
* libc/include/machine/setjmp.h: Add support for __mips_fpr being
64 and treat it the same as if __mips64 is set.
* libc/machine/mips/setjmp.S: Ditto, plus add checks for _MIPS_SIM
being _ABIN32 and _ABI64.
* libc/include/sys/features.h: Redefine compilation environment
definitions for Cygwin to cover 64 bit Cygwin.
* libc/ctype/ctype_.c (_ctype_): Fix definition for 64 bit Cygwin.
* libc/include/machine/setjmp.h: Change definition of _JBLEN to allow
different values for 32 bit and 64 bit Cygwin.
* libc/include/reent.h (stat64): Define as stat under Cygwin, instead
of as __stat64. Undef stat64 if not building Newlib.
* libc/include/sys/stat.h (stat64): Define as stat under Cygwin.
throughout in place of explicit GNUC version checks.
* libc/include/_ansi.h (_NOINLINE): Define.
(_NOINLINE_STATIC): Define.
* libc/stdio/vfprintf.c (__sbprintf): Define _NOINLINE_STATIC.
* libc/include/machine/ieeefp.h[__arm__][!__VFP_FP__]: Set to
__IEEE_BIG_ENDIAN and set __IEEE_BYTES_LITTLE_ENDIAN appropriately
based on __ARMEL flag.
* libc/include/machine/endian.h: To set byte order to LITTLE_ENDIAN,
check for __IEEE_LITTLE_ENDIAN or __IEEE_BYTES_LITTLE_ENDIAN.