Zfinx/Zdinx are new extensions ratified in 2022, it similar to F/D extensions,
support hard float operation for single/double precision, but the difference
between Zfinx/Zdinx and F/D is Zfinx/Zdinx is operating under general purpose
registers rather than dedicated floating-point registers.
This patch improve the hard float support detection for RISC-V port, so
that Zfinx/Zdinx can have better/right performance.
Co-authored-by: Jesse Huang <jesse.huang@sifive.com>
Set __OBSOLETE_MATH_DEFAULT to 0 if 'd' extension is supported (i.e.
__riscv_flen == 64).
Base on the comment for __OBSOLETE_MATH_DEFAULT:
> ... it assumes that the toolchain has ISO C99 support (hexfloat
> literals, standard fenv semantics), the target has IEEE-754 conforming
> binary32 float and binary64 double (not mixed endian) representation,
> standard SNaN representation, double and single precision arithmetics
> has similar latency and it has no legacy SVID matherr support, only
> POSIX errno and fenv exception based error handling.
Signed-off-by: Hau Hsu <hau.hsu@sifive.com>
As per the arm Procedure Call Standard for the Arm Architecture
section 6.1.2 [1], VFP registers s16-s31 (d8-d15, q4-q7) must be
preserved across subroutine calls.
The current setjmp/longjmp implementations preserve only the core
registers, with the jump buffer size too small to store the required
co-processor registers.
In accordance with the C Library ABI for the Arm Architecture
section 6.11 [2], this patch sets _JBTYPE to long long adjusting
_JBLEN to 20.
It also emits vfp load/store instructions depending on architectural
support, predicated at compile time on ACLE feature-test macros.
[1] https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst
[2] https://github.com/ARM-software/abi-aa/blob/main/clibabi32/clibabi32.rst
By including sys/_stdint.h, all types from stdint.h are
exposed even if stdint.h isn't pulled in explicitely. Include
<machine/_default_types.h instead. Fix up newlib and Cygwin
files which rely on stdint.h types, too.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
- This patch uses gdtoa imported from OpenBSD if newlib configure
option "--enable-newlib-use-gdtoa=no" is NOT specified. gdtoa
provides more accurate output and faster conversion than legacy
ldtoa, while it requires more heap memory.
math_errhandling is specified to contain two bits of information:
1. MATH_ERRNO -- Set when the library sets errno
2. MATH_ERREXCEPT -- Set when math operations report exceptions
MATH_ERRNO should match whether the original math code is compiled in
_IEEE_LIBM mode and the new math code has WANT_ERRNO == 1.
MATH_ERREXCEPT should match whether the underlying hardware has
exception support. This patch adds configurations of this value for
RISC-V, ARM, Aarch64, x86 and x86_64 when using HW float.
Signed-off-by: Keith Packard <keithp@keithp.com>
Update the offsets used to save registers into the stejmp jmp_buf
structure in order to:
* Avoid writing the supervision register outside the buffer and thus
clobbering something on the stack. Previously the supervision register
was written at offset 124 while the buffer was of length 124.
* Shrink the jmp_buf down to the size actually needed, by avoiding holes
at the locations of omitted registers.
Add support for the AMD GCN GPU architecture. This is primarily intended for
use with OpenMP and OpenACC offloading. It can also be used for stand-alone
programs, but this is intended mostly for testing the compiler and is not
expected to be useful in general.
The GPU architecture is highly parallel, and therefore Newlib must be
configured to use dynamic re-entrancy, and thread-safe malloc.
The only I/O available is a via a shared-memory interface provided by libgomp
and the gcn-run tool included with GCC. At this time this is limited to
stdout, argc/argv, and the return code.
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
New optimized powf, logf, log2f, expf and exp2f yield worse performance
on Arm targets with only single precision instructions because the
double precision arithmetic is then implemented via softfloat routines.
This patch uses the old implementation when double precision
instructions are not available on Arm targets.
Testing: Built newlib with GCC's rmprofile Arm multilibs and compared
before/after -> only the above functions are changed and calls to them
(name change from logf to __ieee754_logf and similar). Testing the
changed function on a panel of values yields the same result before the
original patches to improve them and after this one. Double checking the
performance by looping the same panel of values being tested on Arm
Cortex-M4 does show the performance regression is fixed.
Based on code from https://github.com/ARM-software/optimized-routines/
This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf. The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials. Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.
The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems. Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.
libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
Provide __intmax_t and __uintmax_t via <machine/_default_types.h> and
define intmax_t and uintmax_t in <sys/_stdint.h> for FreeBSD
compatibility.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Resurrect <machine/_user_types.h> for use in <sys/types.h>. Newlib
targets may provide an own version of <machine/types.h> in their machine
directory to add custom user types for <sys/types.h>. Check the
_SYS_TYPES_H header guard to prevent a direct include of
<machine/types.h>, since the <machine/types.h> file is a Newlib
speciality.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This change solves a glibc/BSD compatibility problem.
glibc and BSD use double underscore types for internal types. The Linux
port of Newlib uses some glibc provided internal type definitions which
are not protected by guard defines, e.g. __off_t. To avoid a conflict
Newlib uses single underscore types for some internal types, e.g.
_off_t. However, for BSD compatibility we have to define the internal
types with double underscore names in <sys/_types.h>.
The header file <machine/types.h> is Newlib-specific. It was used
instead of <sys/_types.h> to provide the internal type definitions
_CLOCK_T, _TIME_T_, _CLOCKID_T_, _TIMER_T_, and __suseconds_t. Move
these definitions to <sys/_types.h> (there exist two instances of this
file, one for Linux and one for all other targets). This makes the
_HAVE_SYSTYPES configuration define obsolete (could possibly break the
__RDOS__ target). Use the standard <sys/_types.h> include throughout.
Move __loff_t defintion to default (non-Linux) <sys/_types.h>. Define
it via _off64_t to avoid a dependency on the compiler.
Provide the __off_t definition via default (non-Linux) <sys/_types.h>
based on _off_t for all systems except Cygwin. For Cygwin use _off64_t.
Define off_t via __off_t.
Provide the __pid_t definition via default (non-Linux) <sys/_types.h>.
This prevents a potential __pid_t and pid_t incompatibility. Add BSD
guard defines for pid_t.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Move the kernel dependent parts of <sys/time.h> to new system-specific
header file <machine/_time.h>. Provide an empty default implementation.
Add a specialized implementation for RTEMS.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Intel MCU System V ABI are incompartible with i386 System V ABI:
o Minimum instruction set is Intel Pentium ISA minus x87 instructions
o No x87 or vector registers
o First three args are passed in %eax, %edx and %ecx
o Full specification available here:
https://github.com/hjl-tools/x86-psABI/wiki/iamcu-psABI-0.7.pdf
newlib/
* configure.host: Add new ix86-*-elfiamcu target
newlib/libc/include/
* setjmp.h: Change _JBLEN for Intel MCU target
newlib/libc/machine/i386/
* memchr.S: (memchr) Target-specific size-optimized version
* memcmp.S: (memcmp) Likewise
* memcpy.S: (memcpy) Likewise
* memmove.S: (memmove) Likewise
* memset.S: (memset) Likewise
* setjmp.S: (setjmp) Likewise
* strchr.S: (strchr) Likewise
* strlen.S: (strlen) Likewise
newlib/libc/stdlib/
* srtold.c: (__flt_rounds) Disable for Intel MCU
According to the OpenBSD man page, "A Replacement Call for Random". It
offers high quality random numbers derived from input data obtained by
the OpenBSD specific getentropy() system call which is declared in
<unistd.h> and must be implemented for each Newlib port externally. The
arc4random() functions are used for example in LibreSSL and OpenSSH.
Cygwin provides currently its own implementation of the arc4random
family. Maybe it makes sense to use this getentropy() implementation:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libcrypto/crypto/getentropy_win.c?rev=1.4&content-type=text/x-cvsweb-markup
* libc/include/stdlib.h (arc4random): Declare if __BSD_VISIBLE.
(arc4random_buf): Likewise.
(arc4random_uniform): Likewise.
* libc/include/sys/unistd.h (getentropy): Likewise.
* libc/include/machine/_arc4random.h: New file.
* libc/stdlib/arc4random.c: Likewise.
* libc/stdlib/arc4random.h: Likewise.
* libc/stdlib/arc4random_uniform.c: Likewise.
* libc/stdlib/chacha_private.h: Likewise.
* libc/sys/rtems/include/machine/_arc4random.h: Likewise.
* libc/stdlib/Makefile.am (EXTENDED_SOURCES): Add arc4random.c
and arc4random_uniform.c.
* libc/stdlib/Makefile.in: Regenerate.
* libc/include/machine/setjmp.h (siglongjmp): Declare as function on
Cygwin.
(sigsetjmp): Ditto.
(_longjmp): Mark as noreturn function on Cygwin.
* common.din (siglongjmp): Export.
(sigsetjmp): Export.
* gendef: Change formatting of some comments.
(sigsetjmp): Implement.
(siglongjmp): Implement.
(__setjmpex): x86_64 only: Drop entry point.
(setjmp): x86_64 only: Store tls stackptr in Frame now, store MXCSR
and FPUCW registers in Spare, as MSVCRT does.
(longjmp): x86_64 only: Restore tls stackptr from Frame now, restore
MXCSR and FPUCW registers from Spare.
* include/cygwin/version.h (CYGWIN_VERSION_API_MINOR): Bump.
* new-features.xml (ov-new2.2): Document sigsetjmp, siglongjmp.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
from the 64-bit _JBTYPE definition.
* libc/machine/mips/setjmp.S: Re-work the o32 FP64 support to match
the now one-and-only supported o32 FP64 ABI extension. Also
support o32 FPXX.