While testing the exp function we noticed some errors at the specified
magnitude. Within this range the exp function returns the input value +1
as an output. We chose to run a test of 1m exponentially spaced values
in the ranges [-2^-27,-2^-32] and [2^-32,2^-27] which showed 7603 and
3912 results with an error of >=0.5 ULP (compared with MPFR in 128 bit)
with the highest being 0.56 ULP and 0.53 ULP.
It's easy to fix by changing the magnitude at which the input value +1
is returned from <2^-28 to <2^-32 and using the polynomial instead. This
reduces the number of results with an error of >=0.5 ULP to 485 and 479
in above tests, all of which are exactly 0.5 ULP.
As we were already checking on exp we also took a look at expf. For expf
the magnitude where the input value +1 is returned can be increased from
<2^-28 to <2^-23 without accuracy loss for a slight performance
improvement. To ensure this was the correct value we tested all values
in the ranges [-2^-17,-2^-28] and [2^-28,2^-17] (~92.3m values each).
The single-precision trigonometric functions show rather high errors in
specific ranges starting at about 30000 radians. For example the sinf
procedure produces an error of 7626.55 ULP with the input
5.195880078125e+04 (0x474AF6CD) (compared with MPFR in 128bit
precision). For the test we used 100k values evenly spaced in the range
of [30k, 70k]. The issues are periodic at higher ranges.
This error was introduced when the double precision range reduction was
first converted to float. The shift by 8 bits always returns 0 as iq is
never higher than 255.
The fix reduces the error of the example above to 0.45 ULP, highest
error within the test set fell to 1.31 ULP, which is not perfect, but
still a significant improvement. Testing other previously erroneous
ranges no longer show particularly large accuracy errors.
I think I may have encountered a bug in the implementation of pow:
pow(-1.0, NaN) returns 1.0 when it should return NaN.
Because ix is used to check input vs 1.0 rather than hx, -1.0 is
mistaken for 1.0
This patch removes the definitions of HUGE_VAL from some of the float math
functions. HUGE_VAL is defined in newlib/libc/include/math.h, so it is not
necessary to have a further definition in the math functions.
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction. It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.
Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code. With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction. It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes. The w_log.c wrapper is disabled since error handling is
inline in the new code.
New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not. Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).
Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code. There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes. The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.
The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.
The file name has no special prefix to avoid any name collision with
existing files.
Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x
Here is the correct patch with both filenames and int cast fixed:
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
ChangeLog:
2018-05-18 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libm/common/Makefile.in: Regenerated.
* newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
* newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
eval_as_double, likely, unlikely.
* newlib/libm/common/cosf.c: New file.
* newlib/libm/common/sinf.c: Likewise.
* newlib/libm/common/sincosf.h: Likewise.
* newlib/libm/common/sincosf.c: Likewise.
* newlib/libm/common/sincosf_data.c: Likewise.
* newlib/libm/math/sf_cos.c: Add #if to build conditionally.
* newlib/libm/math/sf_sin.c: Likewise.
* newlib/libm/math/wf_sincos.c: Likewise.
--
This patch is a complete rewrite of sinf, cosf and sincosf. The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
Simple benchmark showing speedup factor on AArch64 for various ranges:
range 0.7853982 sinf 1.7 cosf 2.2 sincosf 2.8
range 1.570796 sinf 1.9 cosf 1.9 sincosf 2.7
range 3.141593 sinf 2.0 cosf 2.0 sincosf 3.5
range 6.283185 sinf 2.3 cosf 2.3 sincosf 4.2
range 125.6637 sinf 2.9 cosf 3.0 sincosf 5.1
range 1.1259e15 sinf 26.8 cosf 26.8 sincosf 45.2
ChangeLog:
2018-06-18 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libm/common/Makefile.in: Regenerated.
* newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
* newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
eval_as_double, likely, unlikely.
* newlib/libm/common/cosf.c: New file.
* newlib/libm/common/sinf.c: Likewise.
* newlib/libm/common/sincosf.h: Likewise.
* newlib/libm/common/sincosf.c: Likewise.
* newlib/libm/common/sincosf_data.c: Likewise.
* newlib/libm/math/sf_cos.c: Add #if to build conditionally.
* newlib/libm/math/sf_sin.c: Likewise.
* newlib/libm/math/wf_sincos.c: Likewise.
--
Updated patch to use 0.0f in addition to calling rintf.
Tested same way as before, with a testcase that triggers the code and
make check.
OK?
newlib/
* libm/math/wf_pow.c (powf): Call rintf instead of rint. Use 0.0f
for compare.
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Based on code from https://github.com/ARM-software/optimized-routines/
This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf. The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials. Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.
The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems. Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.
libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
makedoc defines a command as 'all upper case, and alone on a line'.
A few QUICKREF lines currently violate this by having some additional text after
the QUICKREF.
So, currently, these lines are treated as an unknown command.
This is benign as QUICKREF currently does nothing but produce some ignored
output on stderr. I'm not sure what the intent of QUICKREF is.
2015-11-06 Jon Turney <jon.turney@dronecode.org.uk>
* libm/mathfp/s_acos.c: Fix QUICKREF.
* libm/mathfp/e_acosh.c: Ditto.
* libm/math/w_asin.c: Ditto.
* libm/mathfp/e_acosh.c: Ditto.
* libm/mathfp/s_acos.c: Ditto.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
I think these are accidental omissions, as these source files are listed to be
chewed by makedoc, but the result is not included by any texinfo source file.
Future work: Nothing in libc/reent/ which is processed by makedoc is included by
reent.tex
2015-06-23 Jon Turney <jon.turney@dronecode.org.uk>
* libc/stdlib/stdlib.tex: Include itoa and utoa, and add to menu.
* libc/string/strings.tex: Include memrchr and rawmemchr, and add
to menu.
* libm/math/math.tex: Include exp10 and pow10, and add to menu.
* libm/common/s_exp10.c: Improve one-line description.
* libm/common/s_exp10.c: Ditto.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
compiler warnings found this way.
* libc/stdio/freopen.c (_freopen_r): Fix bug setting _flags.
* libc/include/stdio.h (_rename): Define when building newlib.
* libc/include/sys/signal.h (_kill): Ditto.
* libc/include/sys/stat.h (_mkdir): Ditto.
* libc/include/sys/time.h (_gettimeofday): Ditto.
* libc/include/sys/times.h (_times): Ditto.
* libc/include/sys/wait.h (_wait): Ditto.
* libc/locale/lmessages.c (empty): Don't define for Cygwin.
* libc/locale/lmonetary.c (cnv): Ditto.
* libc/locale/nl_langinfo.c (nl_langinfo): Ditto for variable s.
* libc/posix/collate.c: Throughout cast to avoid compiler warning.
* libc/posix/engine.c (matcher): Initialize dp to avoid compiler
warning.
* libc/posix/glob.c: Disable on Cygwin. Explain why.
* libc/posix/regcomp.c: Fix "uninitialized" compiler warnings.
(dissect): Deliberately silence gcc compiler warning. Add comment to
explain why.
* libc/posix/wordexp.c (wordexp): Remove num_bytes variable since result
is never used.
* libc/posix/popen.c (popen): Ditto for variable last.
* libc/reent/mkdirr.c: Include sys/stat.h.
* libc/reent/renamer.c: Include stdio.h.
* libc/search/hash.c: Throughout use underscored variants of the stat
function family.
(init_hash): Add missing definition for the __USE_INTERNAL_STAT64 case.
* libc/search/hash_bigkey.c (__big_insert): Add parenthesis to avoid
compiler warning.
* libc/search/hash_page.c (overflow_page): Initalize freep to NULL to
avoid compiler warning.
* libc/stdio/asiprintf.c (_asiprintf_r): Cast unsigned char * to char *
to avoid compiler warning.
(asiprintf): Ditto.
* libc/stdio/asprintf.c (_asprintf_r): Ditto.
(asprintf): Ditto.
* libc/stdio/vasiprintf.c (_vasiprintf_r): Ditto.
* libc/stdio/vasprintf.c (_vasprintf_r): Ditto.
* libc/stdio/mktemp.c (_gettemp): Cast to unsigned char in call to
isdigit to avoid compiler warning.
* libc/stdio/vfprintf.c (_VFPRINTF_R): Initialize variables used for
grouping to avoid compiler warning. Only define and set nseps and
nrepeats if they are really used.
* libc/stdio/vfwprintf.c (_VFWPRINTF_R): Ditto. Only define state if
it is really used.
* libc/stdio/vfscanf.c (u_char): Revert to be defined as unsigned char.
(__SVFSCANF_R): Cast fmt in call to __mbtowc.
* libc/stdlib/mbtowc_r.c (JIS_state_table): Disable when building
Cygwin.
(JIS_action_table): Ditto.
* libc/stdlib/wctomb_r.c (__utf8_wctomb): Add parenthesis to avoid
compiler warning.
* libc/string/strcasestr.c: Deliberately silence gcc compiler warning.
Add comment to explain why.
* libc/time/strptime.c (strptime): Cast to unsigned char in calls to
isspace to avoid compiler warning.
* libm/math/e_atan2.c (__ieee754_atan2): Add parenthesis to avoid
compiler warning.
* libm/math/e_exp.c (__ieee754_exp): Initialize k to 0 to avoid
compiler warning. Drop setting it to 0 later.
* libm/math/ef_exp.c (__ieee754_expf): Ditto.
* libm/math/e_pow.c (__ieee754_pow): Add braces to avoid compiler
warning.
* libm/math/ef_pow.c (__ieee754_powf): Ditto.
* libm/math/er_lgamma.c (__ieee754_lgamma_r): Initialize nadj to 0 to
avoid compiler warning.
* libm/math/erf_lgamma.c (__ieee754_lgammaf_r): Ditto.
* libm/math/e_rem_pio2.c (__ieee754_rem_pio2): Ditto for variable z.
* libm/common/sf_round.c (roundf): Remove signbit variable since result
is never used.
* libc/stdlib/getenv.c: Delete "char *_findenv_r ();", as is not a
proper prototype, and is properly prototyped in stdlib.h, anyway.
* libc/stdlib/getenv_r.c: Ditto.
* libc/search/hash.c: Add _DEFUN to __hash_open() declaration; add
#define __DBINTERFACE_PRIVATE to activate prototypes from db_local.h.
* libc/search/db_local.h: Correct __hash_open() prototype.
* libc/sys/linux/cmath/math_private.h: Eliminate compiler warnings:
Remove #define INFINITY (redefines from math.h); remove #define __isnanf
and #define __isinff isinff.
* libc/include/math.h: (llround, llroundf): Declare.
* libm/common/s_llround.c: New file, implementing llround().
* libm/common/sf_llround.c: New file, implementing llroundf().
* libm/common/sf_lround.c: Remove spurious cast in _DOUBLE_IS_32BITS
version of function.
* libm/common/sf_lrint.c: Ditto.
* libm/common/sf_logb.c: Corrected return for subnormal argument
by replacing existing function with a version created from sf_ilogb.c.
* libm/common/s_logb.c: Ditto, except starting point s_ilogb.c. Also
added documentation for logb() and logbf().
* libm/common/s_signbit.c: Add signbit() documentation.
* libm/common/s_log2.c: Update return values to match what w_log2.c has,
since log2 uses log(); add note about being derived instead of direct.
* libm/common/sf_fma.c: Add casts to attempt to get correct results,
as well as comments pointing out problems with the implementation.
* libm/common/s_fma.c: Add fma() and fmaf() documentation.
* libm/common/sf_remquo.c: Incorrect quotient returns for large values
corrected by discarding existing function and replacing with Sun
verion, with some enhancements.
* libm/common/s_remquo.c: Ditto. Add remquo() and remquof()
documentation.
* libm/common/s_fmax.c: Add fmax() and fmaxf() documentation.
* libm/common/s_fmin.c: Add fmin() and fminf() documentation.
* libm/common/s_fdim.c: Return NAN for NAN arg, add fdim() and fdimf()
documentation.
* libm/common/sf_fdim.c: Return NAN for NAN arg, HUGE_VALF for inf arg.
* libm/common/s_trunc.c: Add trunc() and truncf() documentation.
* libm/common/s_rint.c: Add rint() and rintf() documentation.
* libm/common/s_round.c: Add round() and roundf() documentation.
* libm/common/s_scalbn.c: Add scalbln() and scalblnf() documentation.
* libm/common/s_infinity.c: Add infinity() and infinityf()
documentation.
* libm/common/s_lround.c: Add lround(), lroundf(), llround(), and
llroundf() documentation.
* libm/common/s_lrint.c: Add lrint(), lrintf(), llrint(), and llrintf()
documentation.
* libm/common/isgreater.c: New file for documenting math.h function-like
macros isgreater(), isgreaterequal(), isless(), islessequal(),
islessgreater(), and isunordered().
* libm/common/s_isnan.c: Add documentation for function-like macros
fpclassify(), isfinite(), isinf(), isnan(), and isnormal().
* libm/common/s_nearbyint.c: Add nearbyint() and nearbyintf()
documentation.
* libm/common/Makefile.am: Add s_llround.c (src); sf_llround.c (fsrc);
s_fdim.def, s_fma.def, s_fmax.def, s_fmin.def,
s_logb.def, s_lrint.def, s_lround.def, s_nearbyint.def, s_remquo.def,
s_rint.def, s_round.def, s_signbit.def, s_trunc.def, and
isgreater.def (chobj);
re-name all existing chew files (chobj) to match source file base
names (put in underscores), delete all special targets for chew files
(leaving all to be generated by rule).
* libm/common/Makefile.in: regenerate.
* libm/math/w_exp2.c: Add "base 2" to documentation description (and
delete TRAD_SYNOPSIS).
* libm/math/w_gamma.c: Add tgamma() and tgammaf() documentation, along
with some history behind the function names.
* libm/math/math.tex: Add includes for newly-added documentation (see
.def additions to common/Makefile.am and math/Makefile.am in this
ChangeLog list), adjusted existing .def file names to match source file
base names (added underscores); add mention of HUGE_VALF; rename
"Version of library" section to "Error Handling" and add some text
about floating-point exception; added section "Standards Compliance And
Portability".
* libm/math/Makefile.am: Add w_exp2.def (chobj);
re-name all existing chew files (chobj) to match source file base
names, delete all special targets for chew files (leaving all to be
generated by rule).
* libm/math/Makefile.in: regenerated
* doc/makedoc.c: Change silent ignoring of commands < 5 characters
to a failure when reading macro file for commands < 4 characters;
add -v (verbose) option for printing some debugging information;
get rid of spurious translation of "@*" to "*" (no source files used @*,
so no existing doc pages were affected); clean up some compiler
warnings.
* doc/doc.str: add BUGS and SEEALSO sections (to match texi2pod.pl
which has them); Remove ITEM command (redundant with makedoc built-in
"o", not used in any present source file so nothing is lost, anyway).
* HOWTO: New file to hold information for maintainers regarding how
to do things. Initial sections on documentation and ELIX levels.
* libm/mathfp/s_logarithm.c: Fix error introduced by previous
fix when handling negative input values. Make function
consistent with math directory and glibc version such that
inf and nan values return inf and nan respectively with no
errno setting.
* libm/mathfp/sf_logarithm.c: Ditto.
* libm/math/w_log.c: Set errno to ERANGE when input is 0.0.
* libm/math/wf_log.c: Ditto.
* libm/math/w_log10.c: Ditto.
* libm/math/wf_log10.c: Ditto.
* libm/math/e_pow.c: Fix to be consistent with glibc with regards
to treatment of NaN and +-inf arguments.
* libm/math/ef_pow.c: Ditto.
* libm/math/w_pow.c: Ditto.
* libm/math/wf_pow.c: Ditto.
* libm/math/w_acos.c: Fix domain errors to return NaN.
* libm/math/w_asin.c: Ditto.
* libm/math/wf_acos.c: Ditto.
* libm/math/wf_asin.c: Ditto.
* libm/math/w_log.c: Fix to return NaN for negative number inputs.
* libm/math/wf_log.c: Ditto.
* libm/math/wf_log10.c: Ditto.
* libm/math/w_log10.c: Ditto.