The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.547 ULP with inlined fma and fma contraction. It uses
a 1 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1584 bytes.
Note that the math.h header defines log2(x) to be log(x)/Ln2, this is
not changed, so the new code is only used if that macro is suppressed.
Improvements on Cortex-A72:
latency: 2.0x
thruput: 2.2x
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code. There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.
The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.
The file name has no special prefix to avoid any name collision with
existing files.
Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x
Here is the correct patch with both filenames and int cast fixed:
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
ChangeLog:
2018-05-18 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libm/common/Makefile.in: Regenerated.
* newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
* newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
eval_as_double, likely, unlikely.
* newlib/libm/common/cosf.c: New file.
* newlib/libm/common/sinf.c: Likewise.
* newlib/libm/common/sincosf.h: Likewise.
* newlib/libm/common/sincosf.c: Likewise.
* newlib/libm/common/sincosf_data.c: Likewise.
* newlib/libm/math/sf_cos.c: Add #if to build conditionally.
* newlib/libm/math/sf_sin.c: Likewise.
* newlib/libm/math/wf_sincos.c: Likewise.
--
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
ChangeLog:
2018-06-18 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libm/common/Makefile.in: Regenerated.
* newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
* newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
eval_as_double, likely, unlikely.
* newlib/libm/common/cosf.c: New file.
* newlib/libm/common/sinf.c: Likewise.
* newlib/libm/common/sincosf.h: Likewise.
* newlib/libm/common/sincosf.c: Likewise.
* newlib/libm/common/sincosf_data.c: Likewise.
* newlib/libm/math/sf_cos.c: Add #if to build conditionally.
* newlib/libm/math/sf_sin.c: Likewise.
* newlib/libm/math/wf_sincos.c: Likewise.
--
Based on code from https://github.com/ARM-software/optimized-routines/
This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf. The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials. Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.
The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems. Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.
libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
previous commit 4c90db7bc89e7fa1077025fefdd58269dc71a6ac introduced
a compile time error because libm/common/s_infconst.c used the remove
__fmath, __dmath, and __ldmath union types.
Since this is very old, and unused for a very long time, just drop the
file and thus the __infinity constants entirely.
Exception: Cygwin exports __infinity from the beginning. There's a very,
VERY low probability that any existing executable or lib still uses this
constant, but we just keep it in for backward compat, nevertheless.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* libc/include/math.h: (llround, llroundf): Declare.
* libm/common/s_llround.c: New file, implementing llround().
* libm/common/sf_llround.c: New file, implementing llroundf().
* libm/common/sf_lround.c: Remove spurious cast in _DOUBLE_IS_32BITS
version of function.
* libm/common/sf_lrint.c: Ditto.
* libm/common/sf_logb.c: Corrected return for subnormal argument
by replacing existing function with a version created from sf_ilogb.c.
* libm/common/s_logb.c: Ditto, except starting point s_ilogb.c. Also
added documentation for logb() and logbf().
* libm/common/s_signbit.c: Add signbit() documentation.
* libm/common/s_log2.c: Update return values to match what w_log2.c has,
since log2 uses log(); add note about being derived instead of direct.
* libm/common/sf_fma.c: Add casts to attempt to get correct results,
as well as comments pointing out problems with the implementation.
* libm/common/s_fma.c: Add fma() and fmaf() documentation.
* libm/common/sf_remquo.c: Incorrect quotient returns for large values
corrected by discarding existing function and replacing with Sun
verion, with some enhancements.
* libm/common/s_remquo.c: Ditto. Add remquo() and remquof()
documentation.
* libm/common/s_fmax.c: Add fmax() and fmaxf() documentation.
* libm/common/s_fmin.c: Add fmin() and fminf() documentation.
* libm/common/s_fdim.c: Return NAN for NAN arg, add fdim() and fdimf()
documentation.
* libm/common/sf_fdim.c: Return NAN for NAN arg, HUGE_VALF for inf arg.
* libm/common/s_trunc.c: Add trunc() and truncf() documentation.
* libm/common/s_rint.c: Add rint() and rintf() documentation.
* libm/common/s_round.c: Add round() and roundf() documentation.
* libm/common/s_scalbn.c: Add scalbln() and scalblnf() documentation.
* libm/common/s_infinity.c: Add infinity() and infinityf()
documentation.
* libm/common/s_lround.c: Add lround(), lroundf(), llround(), and
llroundf() documentation.
* libm/common/s_lrint.c: Add lrint(), lrintf(), llrint(), and llrintf()
documentation.
* libm/common/isgreater.c: New file for documenting math.h function-like
macros isgreater(), isgreaterequal(), isless(), islessequal(),
islessgreater(), and isunordered().
* libm/common/s_isnan.c: Add documentation for function-like macros
fpclassify(), isfinite(), isinf(), isnan(), and isnormal().
* libm/common/s_nearbyint.c: Add nearbyint() and nearbyintf()
documentation.
* libm/common/Makefile.am: Add s_llround.c (src); sf_llround.c (fsrc);
s_fdim.def, s_fma.def, s_fmax.def, s_fmin.def,
s_logb.def, s_lrint.def, s_lround.def, s_nearbyint.def, s_remquo.def,
s_rint.def, s_round.def, s_signbit.def, s_trunc.def, and
isgreater.def (chobj);
re-name all existing chew files (chobj) to match source file base
names (put in underscores), delete all special targets for chew files
(leaving all to be generated by rule).
* libm/common/Makefile.in: regenerate.
* libm/math/w_exp2.c: Add "base 2" to documentation description (and
delete TRAD_SYNOPSIS).
* libm/math/w_gamma.c: Add tgamma() and tgammaf() documentation, along
with some history behind the function names.
* libm/math/math.tex: Add includes for newly-added documentation (see
.def additions to common/Makefile.am and math/Makefile.am in this
ChangeLog list), adjusted existing .def file names to match source file
base names (added underscores); add mention of HUGE_VALF; rename
"Version of library" section to "Error Handling" and add some text
about floating-point exception; added section "Standards Compliance And
Portability".
* libm/math/Makefile.am: Add w_exp2.def (chobj);
re-name all existing chew files (chobj) to match source file base
names, delete all special targets for chew files (leaving all to be
generated by rule).
* libm/math/Makefile.in: regenerated
* doc/makedoc.c: Change silent ignoring of commands < 5 characters
to a failure when reading macro file for commands < 4 characters;
add -v (verbose) option for printing some debugging information;
get rid of spurious translation of "@*" to "*" (no source files used @*,
so no existing doc pages were affected); clean up some compiler
warnings.
* doc/doc.str: add BUGS and SEEALSO sections (to match texi2pod.pl
which has them); Remove ITEM command (redundant with makedoc built-in
"o", not used in any present source file so nothing is lost, anyway).
* HOWTO: New file to hold information for maintainers regarding how
to do things. Initial sections on documentation and ELIX levels.
* Makefile.am (MATHOBJS_IN_LIBC): Add s_isinfd, sf_isinff,
s_isnand, and sf_isnanf object files.
* Makefile.in: Regenerated.
* libc/include/ieeefp.h: Undef isnan and isinf to avoid
conflict if <math.h> has previously been included.
* libc/include/math.h
* libm/common/Makefile.am: Add new s_isinfd, s_isnand, sf_isinff,
and sf_isnanf files. Also support s_isnan, sf_isnan, s_isinf, and
sf_isinf files which have been moved from math/mathfp directories.
* libm/common/Makefile.in: Regenerated.
* libm/common/s_isinfd.c: New file.
* libm/common/s_isnand.c: Ditto.
* libm/common/sf_isinff.c: Ditto.
* libm/common/sf_isnanf.c: Ditto.
* libm/common/s_isinf.c: Moved from libm/math directory.
* libm/common/s_isnan.c: Ditto.
* libm/common/sf_isinf.c: Ditto.
* libm/common/sf_isnan.c: Ditto.
* libm/math/Makefile.am: Remove isinf and isnan family functions
which have been moved into common directory.
* libm/mathfp/Makefile.am: Ditto.
* libm/math/Makefile.in: Regenerated.
* libm/mathfp/Makefile.in: Ditto.
* libm/math/s_isinf.c: Removed.
* libm/math/s_isnan.c: Ditto.
* libm/math/sf_isinf.c: Ditto.
* libm/math/sf_isnan.c: Ditto.
* libm/mathfp/s_isinf.c: Ditto.
* libm/mathfp/s_isnan.c: Ditto.
* libm/mathfp/sf_isinf.c: Ditto.
* libm/mathfp/sf_isnan.c: Ditto.
* libc/include/math.h (HUGE_VALF, HUGE_VALL): New.
* libm/common/Makefile.am: Add s_infconst.c support.
* libm/common/Makefile.in: Regenerated.
* libm/common/s_infconst.c: New file with float and
long double infinity support added.
* libm/math/Makefile.am: Remove s_infconst.c support.
* libm/math/Makefile.in: Regenerated.
* libm/math/s_infconst.c: Moved to common directory.
* libm/mathfp/Makefile.am: Remove s_infconst.c support.
* libm/mathfp/Makefile.in: Regenerated.
* libm/mathfp/s_infconst.c: Moved to common directory.
* libm/common/Makefile.am (doc): Do not append to $(TARGETDOC).
* libm/common/Makefile.in: Regenerate.
* libm/common/common.tex: Delete file.
* libm/math/math.tex: Include .def files from common/.
* libm/mathfp/mathfp.tex: Likewise.