newlib-cygwin

Commit Graph

Author	SHA1	Message	Date
Szabolcs Nagy	e5791079c6	New log implementation The new implementations are provided under !__OBSOLETE_MATH, it uses ISO C99 code. With default settings the worst case error in nearest rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased by 1703 bytes. The w_log.c wrapper is disabled since error handling is inline in the new code. New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were added to enable selecting between the code path that uses fma and the one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT if they have single instruction fma and the compiler can actually inline it (gcc has __FP_FAST_FMA macro but that does not guarantee inlining with -fno-builtin-fma). Improvements on Cortex-A72: latency: 1.9x thruput: 2.3x	2018-06-27 15:40:49 +02:00
Szabolcs Nagy	fb929067db	New exp and exp2 implementations The new implementations are provided under !__OBSOLETE_MATH, they use ISO C99 code. There are several settings, with the default one the worst case error in nearest rounding mode is 0.509 ULP for exp and 0.507 ULP for exp2 when a multiply and add is contracted into an fma. They use a shared 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled for the new code as it takes care of error handling inline. The old exp2(x) code used to be just pow(2,x) so the speedup there is more significant. The file name has no special prefix to avoid any name collision with existing files. Improvements on Cortex-A72: exp latency: 3.2x exp thruput: 4.1x exp2 latency: 7.8x exp2 thruput: 18.8x	2018-06-27 15:40:49 +02:00
Szabolcs Nagy	cfbcbd1c95	Use uint32_t sign argument to math error functions This change is equivalent to the commit `c65db17340` and only affects code that is from the Arm optimized-routines project. It does not affect the observable behaviour, but the code generation can be different on 64bit targets. The intention is to make the portable semantics of the code obvious by using a fixed size type.	2018-06-27 15:40:49 +02:00
Wilco Dijkstra	3baadb9912	Improve performance of sinf/cosf/sincosf Here is the correct patch with both filenames and int cast fixed: This patch is a complete rewrite of sinf, cosf and sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases - on some targets this may be slow, so this can be configured to use floating point comparisons. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. Simple benchmark showing speedup factor on AArch64 for various ranges: range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8 range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7 range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5 range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2 range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1 range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2 ChangeLog: 2018-05-18 Wilco Dijkstra <wdijkstr@arm.com> * newlib/libm/common/Makefile.in: Regenerated. * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS. * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND, roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float, eval_as_double, likely, unlikely. * newlib/libm/common/cosf.c: New file. * newlib/libm/common/sinf.c: Likewise. * newlib/libm/common/sincosf.h: Likewise. * newlib/libm/common/sincosf.c: Likewise. * newlib/libm/common/sincosf_data.c: Likewise. * newlib/libm/math/sf_cos.c: Add #if to build conditionally. * newlib/libm/math/sf_sin.c: Likewise. * newlib/libm/math/wf_sincos.c: Likewise. --	2018-06-21 09:37:04 +02:00
Corinna Vinschen	cfe8c6c504	Revert "Improve performance of sinf/cosf/sincosf" This reverts commit `fca80a9d1b`. Accidentally pushed a preliminary version	2018-06-21 09:36:39 +02:00
Wilco Dijkstra	fca80a9d1b	Improve performance of sinf/cosf/sincosf This patch is a complete rewrite of sinf, cosf and sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases - on some targets this may be slow, so this can be configured to use floating point comparisons. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. Simple benchmark showing speedup factor on AArch64 for various ranges: range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8 range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7 range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5 range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2 range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1 range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2 ChangeLog: 2018-06-18 Wilco Dijkstra <wdijkstr@arm.com> * newlib/libm/common/Makefile.in: Regenerated. * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS. * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND, roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float, eval_as_double, likely, unlikely. * newlib/libm/common/cosf.c: New file. * newlib/libm/common/sinf.c: Likewise. * newlib/libm/common/sincosf.h: Likewise. * newlib/libm/common/sincosf.c: Likewise. * newlib/libm/common/sincosf_data.c: Likewise. * newlib/libm/math/sf_cos.c: Add #if to build conditionally. * newlib/libm/math/sf_sin.c: Likewise. * newlib/libm/math/wf_sincos.c: Likewise. --	2018-06-19 09:44:28 +02:00
Szabolcs Nagy	c156098271	New expf, exp2f, logf, log2f and powf implementations Based on code from https://github.com/ARM-software/optimized-routines/ This patch adds a highly optimized generic implementation of expf, exp2f, logf, log2f and powf. The new functions are not only faster (6x for powf!), but are also smaller and more accurate. In order to achieve this, the algorithm uses double precision arithmetic for accuracy, avoids divisions and uses small table lookups to minimize the polynomials. Special cases are handled inline to avoid the unnecessary overhead of wrapper functions and set errno to POSIX requirements. The new functions are added under newlib/libm/common, but the old implementations are kept (in newlib/libm/math) for non-IEEE or pre-C99 systems. Targets can enable the new math code by defining __OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h, users can override the default by defining __OBSOLETE_MATH. Currently the new code is enabled for AArch64 and AArch32 with VFP. Targets with a single precision FPU may still prefer the old implementation. libm.a size changes: arm: -1692 arm/thumb/v7-a/nofp: -878 arm/thumb/v7-a+fp/hard: -864 arm/thumb/v7-a+fp/softfp: -908 aarch64: -1476	2017-10-13 10:58:00 +02:00

7 Commits