This removes the run-time configuration of errno support present in
portions of the math library and unifies all of the compile-time errno
configuration under a single parameter so that the whole library
is consistent.
The run-time support provided by _LIB_VERSION is no longer present in
the public API, although it is still used internally to disable errno
setting in some functions. Now that it is a constant, the compiler should
remove that code when errno is not supported.
This removes s_lib_ver.c as _LIB_VERSION is no longer variable.
Signed-off-by: Keith Packard <keithp@keithp.com>
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code. There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.
The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.
The file name has no special prefix to avoid any name collision with
existing files.
Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x
The recently added new math code inlines error handling instead of using
error handling wrappers around __ieee754* internal symbols, and thus the
__ieee754* symbols are no longer provided.
However __ieee754_expf and __ieee754_logf are used in the implementation
of a number of other math functions. These symbols are safe to redirect
to the external expf and logf symbols, because those names are always
reserved when single precision math functions are reserved and the
additional error handling code is either not reached or there will be
an error in the final result that will override an internal spurious
errno setting.
For consistency all of __ieee754_expf, __ieee754_logf and __ieee754_powf
are redirected using a macro.
(SAFE_RIGHT_SHIFT): Likewise.
* libm/common/s_llround.c (llround): Annotate shift operations with
possible shift amount ranges, and use SAFE_RIGHT_SHIFT to avoid
undefined behaviour.
* libm/common/s_lround.c (lround): Likewise.
* libc/include/sys/_types.h: Define _ssize_t as int if int is
32-bits, otherwise define it as long.
* libc/include/sys/types.h: Include <_ansi.h> and <sys/_types.h>
and define ssize_t as _ssize_t.
* libc/reent/readr.c: Change return type to _ssize_t.
* libc/reent/writer.c: Ditto.
* libc/sys/linux/Makefile.am: Add aio.c.
* libc/sys/linux/Makefile.in: Regenerated.
* libc/sys/linux/aio.c: New file.
* libc/sys/linux/sys/cdefs.h: Add __restrict_arr definition.
* libm/common/fdlibm.h: Undef __P before defining it.
* libc/include/math.h: Remove <sys/types.h>.
(__dmath): Use __ULong instead of _uint32_t.
* libc/include/sys/reent.h: If long or int is not 32-bits,
include <sys/types.h> to get definitions for _int32_t and _uint32_t.
* libc/stdlib/mprec.h: Include <sys/types.h> to get integer defs.
* libm/common/fdlibm.h: Ditto.