Commit Graph

273 Commits

Author SHA1 Message Date
Szabolcs Nagy 81dc535bb9 Remove float compare option from sincosf
PREFER_FLOAT_COMPARISON setting was not correct as it could raise
spurious exceptions.  Fixing it is easy: just use ISLESS(x, y) instead
of abstop12(x) < abstop12(y) with appropriate non-signaling definition
for ISLESS.  However it seems this setting is not very useful (there is
only minor performance difference on various architectures), so remove
this option for now.
2018-07-11 17:16:04 +02:00
Szabolcs Nagy 358f3c61d6 Fix the documentation comments for log_inline in pow
There was a typo and the arguments were not explained clearly.
2018-07-11 17:16:04 +02:00
Szabolcs Nagy 138575c9b9 Fix namespace issues in sinf, cosf and sincosf
Use const sincos_t for clarity instead of making the typedef const.
Use __inv_pi4 and __sincosf_table to avoid namespace issues with
static linking.
2018-07-06 10:29:01 +02:00
Szabolcs Nagy 2805b07fa1 Fix large ulp error in pow without fma very near 1.0
The !HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such
that when z = 1-tiny and c = 1 then rlo and rhi could have much larger
magnitude than r which later caused large rounding errors.

So do a nearest rounding instead of truncation at the split.

In newlib with default settings this was observable on some arm targets
that enable the new math code but has no fma.
2018-07-06 10:29:01 +02:00
Szabolcs Nagy 6a85e1a4e5 Change the return type of converttoint and document the semantics
The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the natural return type is int32_t.

The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.

But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.

In newlib with default settings only aarch64 is affected and there is no
significant code generation change with gcc after the patch.
2018-07-06 10:29:01 +02:00
Szabolcs Nagy 73a3e95ff2 Remove unused TOINT_RINT and TOINT_SHIFT macros
Only have separate code paths for TOINT_INTRINSICS and !TOINT_INTRINSICS.
2018-07-06 10:29:01 +02:00
Szabolcs Nagy 393a1cb4ea Move __HAVE_FAST_FMA to math_config.h
Define it consistently with other HAVE_* macros that only affect code
using math_config.h.  This is also closer to the Arm Optimized Routines
code.
2018-07-06 10:29:01 +02:00
Szabolcs Nagy cbe50607fb Fix code style and comments of new math code
Synchronize code style and comments with Arm Optimized Routines, there
are no code changes in this patch.  This ensures different projects using
the same code have consistent code style so bug fix patches can be applied
more easily.
2018-07-06 10:29:01 +02:00
Szabolcs Nagy b99d49e506 New pow implementation
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code.  With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction.  It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.

Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
2018-06-27 15:40:49 +02:00
Szabolcs Nagy 07e2c32828 New log2 implementation
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code.  With default settings the worst case error in nearest
rounding mode is 0.547 ULP with inlined fma and fma contraction.  It uses
a 1 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1584 bytes.

Note that the math.h header defines log2(x) to be log(x)/Ln2, this is
not changed, so the new code is only used if that macro is suppressed.

Improvements on Cortex-A72:
latency: 2.0x
thruput: 2.2x
2018-06-27 15:40:49 +02:00
Szabolcs Nagy e5791079c6 New log implementation
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code.  With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction.  It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes.  The w_log.c wrapper is disabled since error handling is
inline in the new code.

New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not.  Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).

Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
2018-06-27 15:40:49 +02:00
Szabolcs Nagy fb929067db New exp and exp2 implementations
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code.  There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes.  The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.

The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.

The file name has no special prefix to avoid any name collision with
existing files.

Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x
2018-06-27 15:40:49 +02:00
Szabolcs Nagy cfbcbd1c95 Use uint32_t sign argument to math error functions
This change is equivalent to the commit
c65db17340
and only affects code that is from the Arm optimized-routines project.

It does not affect the observable behaviour, but the code generation
can be different on 64bit targets.  The intention is to make the
portable semantics of the code obvious by using a fixed size type.
2018-06-27 15:40:49 +02:00
Corinna Vinschen b14daac482 Revert "Remove -fno-builtin to allow gcc to inline functions such as fabs, floor, creal, imag."
This reverts commit c077b9de99.

Yet another accidental commit...
2018-06-26 10:17:04 +02:00
Jon Beniston c077b9de99 Remove -fno-builtin to allow gcc to inline functions such as fabs, floor, creal, imag. 2018-06-25 13:31:51 +02:00
Wilco Dijkstra 3baadb9912 Improve performance of sinf/cosf/sincosf
Here is the correct patch with both filenames and int cast fixed:

This patch is a complete rewrite of sinf, cosf and sincosf.  The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

Simple benchmark showing speedup factor on AArch64 for various ranges:

range	0.7853982	sinf	1.7	cosf	2.2	sincosf	2.8
range	1.570796	sinf	1.9	cosf	1.9	sincosf	2.7
range	3.141593	sinf	2.0	cosf	2.0	sincosf	3.5
range	6.283185	sinf	2.3	cosf	2.3	sincosf	4.2
range	125.6637	sinf	2.9	cosf	3.0	sincosf	5.1
range	1.1259e15	sinf	26.8	cosf	26.8	sincosf	45.2

ChangeLog:
2018-05-18  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libm/common/Makefile.in: Regenerated.
        * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
        sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
        * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
        roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
        eval_as_double, likely, unlikely.
        * newlib/libm/common/cosf.c: New file.
        * newlib/libm/common/sinf.c: Likewise.
        * newlib/libm/common/sincosf.h: Likewise.
        * newlib/libm/common/sincosf.c: Likewise.
        * newlib/libm/common/sincosf_data.c: Likewise.
        * newlib/libm/math/sf_cos.c: Add #if to build conditionally.
        * newlib/libm/math/sf_sin.c: Likewise.
        * newlib/libm/math/wf_sincos.c: Likewise.

--
2018-06-21 09:37:04 +02:00
Corinna Vinschen cfe8c6c504 Revert "Improve performance of sinf/cosf/sincosf"
This reverts commit fca80a9d1b.

Accidentally pushed a preliminary version
2018-06-21 09:36:39 +02:00
Jon Beniston b7d9d27b0e libm/common/s_round.c (round): Add cast for 16-bit CPUs 2018-06-21 09:31:13 +02:00
Wilco Dijkstra fca80a9d1b Improve performance of sinf/cosf/sincosf
This patch is a complete rewrite of sinf, cosf and sincosf.  The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

Simple benchmark showing speedup factor on AArch64 for various ranges:

range	0.7853982	sinf	1.7	cosf	2.2	sincosf	2.8
range	1.570796	sinf	1.9	cosf	1.9	sincosf	2.7
range	3.141593	sinf	2.0	cosf	2.0	sincosf	3.5
range	6.283185	sinf	2.3	cosf	2.3	sincosf	4.2
range	125.6637	sinf	2.9	cosf	3.0	sincosf	5.1
range	1.1259e15	sinf	26.8	cosf	26.8	sincosf	45.2

ChangeLog:
2018-06-18  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libm/common/Makefile.in: Regenerated.
        * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
        sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
        * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
        roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
        eval_as_double, likely, unlikely.
        * newlib/libm/common/cosf.c: New file.
        * newlib/libm/common/sinf.c: Likewise.
        * newlib/libm/common/sincosf.h: Likewise.
        * newlib/libm/common/sincosf.c: Likewise.
        * newlib/libm/common/sincosf_data.c: Likewise.
        * newlib/libm/math/sf_cos.c: Add #if to build conditionally.
        * newlib/libm/math/sf_sin.c: Likewise.
        * newlib/libm/math/wf_sincos.c: Likewise.

--
2018-06-19 09:44:28 +02:00
Matthias Kannwischer fcfea0ae2d fix llrint and lrint for 52 <= exponent <= 62 2018-05-29 15:59:48 +02:00
Jeff Johnston e928275566 Use _LDBL_EQ_DBL in nexttowardf.c
2018-05-07  Tom de Vries  <tom@codesourcery.com>

	* libm/common/nexttowardf.c: Use _LDBL_EQ_DBL instead of
	_LDBL_EQ_DOUBLE.
2018-05-07 12:22:12 -04:00
Jeff Johnston cd31fbb2ae Add nvptx port.
- From: Cesar Philippidis <cesar@codesourcery.com>
  Date: Tue, 10 Apr 2018 14:43:42 -0700
  Subject: [PATCH] nvptx port

  This port adds support for Nvidia GPU's, which are primarily used as
  offload accelerators in OpenACC and OpenMP.
2018-04-13 15:42:37 -04:00
Jeff Johnston fffd2770db Bump release to 3.0.0 for yearly snapshot
- major release required due to removal of K&R support
2018-01-18 13:07:45 -05:00
Yaakov Selkowitz 7192f84096 ansification: remove _HAVE_STDC
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:30 -06:00
Yaakov Selkowitz 70ee6b17df ansification: remove _EXFUN, _EXFUN_NOTHROW
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:29 -06:00
Yaakov Selkowitz 9087163804 ansification: remove _DEFUN
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:26 -06:00
Yaakov Selkowitz fff27f8429 ansification: remove _DEFUN_VOID
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:19 -06:00
Yaakov Selkowitz 0bda30e1ff ansification: remove _CONST
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:08 -06:00
Yaakov Selkowitz 6783860a2e ansification: remove _AND
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:05 -06:00
Jim Wilson c874f1145f newlib: Don't do double divide in powf.
* Use 0.0f instead of 0.0 in divide.
2017-12-13 11:33:19 +01:00
Jim Wilson c338bc2255 Don't call double rint from float powf.
Updated patch to use 0.0f in addition to calling rintf.

Tested same way as before, with a testcase that triggers the code and
make check.

OK?

	newlib/
	* libm/math/wf_pow.c (powf): Call rintf instead of rint.  Use 0.0f
	for compare.
2017-12-13 11:03:10 +01:00
Jon Turney c006fd459f makedoc: make errors visible
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
2017-12-07 11:54:11 +00:00
Yaakov Selkowitz 9f369d3c8d mathfp: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:54 -06:00
Yaakov Selkowitz ec4c079f4b math: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:53 -06:00
Yaakov Selkowitz 59822e777f libm/machine: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:53 -06:00
Yaakov Selkowitz ac8b60bdd1 complex: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:53 -06:00
Yaakov Selkowitz 3312f960a7 libm/common: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:53 -06:00
Szabolcs Nagy 56e494c074 fix internal __ieee754_expf and __ieee754_logf calls
The recently added new math code inlines error handling instead of using
error handling wrappers around __ieee754* internal symbols, and thus the
__ieee754* symbols are no longer provided.

However __ieee754_expf and __ieee754_logf are used in the implementation
of a number of other math functions.  These symbols are safe to redirect
to the external expf and logf symbols, because those names are always
reserved when single precision math functions are reserved and the
additional error handling code is either not reached or there will be
an error in the final result that will override an internal spurious
errno setting.

For consistency all of __ieee754_expf, __ieee754_logf and __ieee754_powf
are redirected using a macro.
2017-10-20 11:19:02 +02:00
Szabolcs Nagy c156098271 New expf, exp2f, logf, log2f and powf implementations
Based on code from https://github.com/ARM-software/optimized-routines/

This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf.  The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials.  Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.

The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems.  Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.

libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
2017-10-13 10:58:00 +02:00
Brian Inglis f9b24fad7c newlib/libm/complex/cargl.c change imag() real() to cimagl() creall() 2017-09-19 15:36:12 -05:00
Kito Cheng 7040b2de08 Add RISC-V port for libm
Contributor list:
    - Michael Neilly  <mneilly@yahoo.com>
    - Kito Cheng  <kito.cheng@gmail.com>
2017-08-17 12:54:56 -04:00
Aditya Upadhyay 0e0900cb40 Importing catanl long double complex method from NetBSD. 2017-07-28 20:36:09 +02:00
Aditya Upadhyay 124ccc500e Fixing HUGE_VALF to HUGE_VALL. 2017-07-28 20:30:30 +02:00
Corinna Vinschen 181d8393ae newlib: fix file mode of newly added complex sources
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2017-06-29 15:30:35 +02:00
Corinna Vinschen 074ca98595 newlib: libm/complex/Makefile.in: regenerate 2017-06-29 13:55:10 +02:00
Aditya Upadhyay 45ae81fc91 Adding csinl.c in Makefile.am
Signed-off-by: Aditya Upadhyay <aadit0402@gmail.com>
2017-06-29 13:54:34 +02:00
Aditya Upadhyay 5bc320d3b5 Importing csinl.c from NetBSD. 2017-06-29 13:54:31 +02:00
Aditya Upadhyay 72b051888e Importing csinhl.c from NetBSD. 2017-06-29 13:44:32 +02:00
Aditya Upadhyay 0d924f0e02 Importing casinhl.c from NetBSD. 2017-06-29 13:44:32 +02:00
Aditya Upadhyay f834c77e7d Importing ctanl.c from NetBSD. 2017-06-29 13:44:32 +02:00