4
0
mirror of git://sourceware.org/git/newlib-cygwin.git synced 2025-01-25 08:37:33 +08:00

109 Commits

Author SHA1 Message Date
Mike Frysinger
2339979934 newlib: libm: merge machine/ configure scripts up a level
The machine configure scripts are all effectively stub scripts that
pass the higher level options to its own makefile.  The only one doing
any custom tests was nds32.  The rest were all effectively the same as
the libm/ configure script.

So instead of recursively running configure in all of these subdirs,
generate their makefiles from the top-level configure.  For nds32,
deploy a pattern of including subdir logic via m4:
	m4_include([machine/nds32/acinclude.m4])

Even its set of checks are very small -- it does 2 preprocessor tests
and sets up 2 makefile conditionals.

Some of the generated machine makefiles have a bunch of extra stuff
added to them, but that's because they were inconsistent in their
configure libtool calls.  The top-level has it, so it exports some
new vars to the ones that weren't already.
2022-01-26 03:11:20 -05:00
Mike Frysinger
6ac043b192 newlib: libm: merge machine/ trampoline up a level
The machine/{configure,Makefile} files exist only to fan out to the
specific machine/$arch/ subdir.  We already have all that same info
in the libm/ dir itself, so by moving the recursive configure and
make calls into it, we can cut off this logic entirely and save the
overhead.

For arches that don't have a machine subdir, it means they can skip
the logic entirely.
2022-01-26 03:11:20 -05:00
Mike Frysinger
20e3103471 newlib: update to automake-1.15
This matches what the other GNU toolchain projects have done already.
The generated diff in practice isn't terribly large.  This will allow
more use of subdir local.mk includes due to fixes & improvements that
came after the 1.11 release series.
2022-01-14 19:10:38 -05:00
Mike Frysinger
a100e80fc9 require autoconf-2.69 exactly
The newlib & libgloss dirs are already generated using autoconf-2.69.
To avoid merging new code and/or accidental regeneration using diff
versions, leverage config/override.m4 to pin to 2.69 exactly.  This
matches what gcc/binutils/gdb are already doing.

The README file already says to use autoconf-2.69.

To accomplish this, it's just as simple as adding -I flags to the
top-level config/ dir when running aclocal.  This is because the
override.m4 file overrides AC_INIT to first require the specific
autoconf version before calling the real AC_INIT.
2022-01-14 15:24:33 -05:00
Mike Frysinger
ed20821a40 newlib: migrate from INCLUDES to AM_CPPFLAGS
Since automake deprecated the INCLUDES name in favor of AM_CPPFLAGS,
change all existing users over.  The generated code is the same since
the two variables have been used in the same exact places by design.

There are other cleanups to be done, but lets focus on just renaming
here so we can upgrade to a newer automake version w/out triggering
new warnings.
2022-01-05 20:29:53 -05:00
Jon Turney
bfcabeb876
newlib: Regenerate autotools files 2021-12-29 22:45:06 +00:00
Jon Turney
a4e734fcdb
newlib: Remove automake option 'cygnus'
The 'cygnus' option was removed from automake 1.13 in 2012, so the
presence of this option prevents that or a later version of automake
being used.

A check-list of the effects of '--cygnus' from the automake 1.12
documentation, and steps taken (where possible) to preserve those
effects (See also this thread [1] for discussion on that):

[1] https://lists.gnu.org/archive/html/bug-automake/2012-03/msg00048.html

1. The foreign strictness is implied.

Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4

2. The options no-installinfo, no-dependencies and no-dist are implied.

Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4

Future work: Remove no-dependencies and any explicit header dependencies,
and use automatic dependency tracking instead.  Are there explicit rules
which are now redundant to removing no-installinfo and no-dist?

3. The macro AM_MAINTAINER_MODE is required.

Already present in newlib/acinclude.m4

Note that maintainer-mode is still disabled by default.

4. Info files are always created in the build directory, and not in the
source directory.

This appears to be an error in the automake documentation describing
'--cygnus' [2]. newlib's info files are generated in the source
directory, and no special steps are needed to keep doing that.

[2] https://lists.gnu.org/archive/html/bug-automake/2012-04/msg00028.html

5. texinfo.tex is not required if a Texinfo source file is specified.
(The assumption is that the file will be supplied, but in a place that
automake cannot find.)

This effect is overriden by an explicit setting of the TEXINFO_TEX
variable (the directory part of which is fed into texi2X via the
TEXINPUTS environment variable).

6. Certain tools will be searched for in the build tree as well as in the
user's PATH. These tools are runtest, expect, makeinfo and texi2dvi.

For obscure automake reasons, this effect of '--cygnus' is not active
for makeinfo in newlib's configury.

However, there appears to be top-level configury which selects in-tree
runtest, expect and makeinfo, if present. So, if that works as it
appears, this effect is preserved. If not, this may cause problem if
anyone is building those tools in-tree.

This effect is not preserved for texi2dvi. This may cause problems if
anyone is building texinfo in-tree.

If needed, explicit checks for those tools looking in places relative to
$(top_srcdir)/../ as well as in PATH could be added.

7. The check target doesn't depend on all.

This effect is not preseved. The check target now depends on the all
target.

This concern seems somewhat academic given the current state of the
testsuite.

Also note that this doesn't touch libgloss.
2021-12-29 22:45:04 +00:00
Jon Turney
8e166351b3
newlib: Regenerate autotools files 2021-12-29 22:45:03 +00:00
Jon Turney
639cb7ec1a
newlib: Regenerate all autotools files
Regenerate all aclocal.m4, configure and Makefile.in files.
2021-12-09 21:41:35 +00:00
Mike Frysinger
59e83de0b1 libgloss/newlib: update configure.ac in Makefile.in files
The maintainer rules refer to configure.in directly, so update that
after renaming all the configure.ac files.
2021-11-06 14:14:49 -04:00
Jeff Johnston
a9165ea07c Fix rounding issues with sqrt/sqrtf
- compiler is sometimes optimizing out the rounding check in
  e_sqrt.c and ef_sqrt.c which uses two constants to create
  an inexact operation
- there is a similar constant operation in s_tanh.c/sf_tanh.c
- make the one and tiny constants volatile to stop this
2021-06-04 14:42:58 -04:00
Corinna Vinschen
80bd01ef83 Add build mechanism to share common header files between machines
So far the build mechanism in newlib only allowed to either define
machine-specific headers, or headers shared between all machines.
In some cases, architectures are sufficiently alike to share header
files between them, but not with other architectures.  A good example
is ix86 vs. x86_64, which share certain traits with each other, but
not with other architectures.

Introduce a new configure variable called "shared_machine_dir".  This
dir can then be used for headers shared between architectures.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2021-04-13 12:55:33 +02:00
Paul Zimmermann
4bb6581aa8 fixes to make compilation succeeds 2020-12-18 10:06:31 +01:00
Jeff Johnston
b2f3d593ff Update gamma functions from code in picolibc
- fixes issue with inf sign when x is -0
2020-12-17 16:23:43 -05:00
Fabian Schriever
cf1ef2dc5b Fix error in powf for x close to 1 and large y
This patch fixes the error found by Paul Zimmermann (see
https://homepages.loria.fr/PZimmermann/papers/#accuracy) regarding x
close to 1 and rather large y (specifically he found the case
powf(0x1.ffffeep-1,-0x1.000002p+27) which returns +Inf instead of the
correct value). We found 2 more values for x which show the same faulty
behaviour, and all 3 are fixed with this patch. We have tested all
combinations for x in [+1.fffdfp-1, +1.00020p+0] and y in
[-1.000007p+27, -1.000002p+27] and [1.000002p+27,1.000007p+27].
2020-12-11 14:38:19 -05:00
Keith Packard
4641693796 libm: Make tgamma(-small) = -INFINITY
Need to copy the argument sign to the output for tgamma(finite)
overflow case.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-09-18 17:20:27 -04:00
Keith Packard via Newlib
1f8e5847df libm: Fix 'gamma' and 'gammaf' functions. Clean up other gamma code. [v2]
The current gamma, gamma_r, gammaf and gammaf_r functions return
|gamma(x)| instead of ln(|gamma(x)|) due to a change made back in 2002
to the __ieee754_gamma_r implementation. This patch fixes that, making
all of these functions map too their lgamma equivalents.

To fix the underlying bug, the __ieee754_gamma functions have been
changed to return gamma(x), removing the _r variants as those are no
longer necessary. Their names have been changed to __ieee754_tgamma to
avoid potential confusion from users.

Now that the __ieee754_tgamma functions return the correctly signed
value, the tgamma functions have been modified to use them.

libm.a now exposes the following gamma functions:

    ln(|gamma(x)|):

	__ieee754_lgamma_r
	__ieee754_lgammaf_r

	lgamma
	lgamma_r
	gamma
	gamma_r

	lgammaf
	lgammaf_r
	gammaf
	gammaf_r

	lgammal	(on machines where long double is double)

    gamma(x):

	__ieee754_tgamma
	__ieee754_tgammaf
	tgamma
	tgammaf
	tgammal (on machines where long double is double)

Additional aliases for any of the above functions can be added if
necessary; in particular, I'm not sure if we need to include
__ieee754_gamma*_r functions (which would return ln(|(gamma(x)|).

Signed-off-by: Keith Packard <keithp@keithp.com>

----

v2:
	Switch commit message to ASCII
2020-09-04 21:27:11 +02:00
Keith Packard via Newlib
73b02710ec libm/math: ensure that expf(-huge) sets FE_UNDERFLOW exception
It was calling __math_uflow(0) instead of __math_uflowf(0), which
resulted in no exception being set on machines with exception support
for float but not double.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-08-10 10:31:36 +02:00
Keith Packard via Newlib
e108d04432 libm/math: Don't modify __ieee754_pow return values in pow
The __ieee754 functions already return the right value in exception
cases, so don't modify those. Setting the library to _POSIX_/_IEEE_
mode now only affects whether errno is modified.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-08-05 22:16:31 +02:00
Keith Packard via Newlib
98a4f8de47 libm/math: Set errno to ERANGE for pow(0, -y)
POSIX says that the errno for pow(0, -y) should be ERANGE instead of
EDOM.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/pow.html

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-08-05 22:16:31 +02:00
Keith Packard via Newlib
2eafcc78df libm/math: Make yx functions set errno=ERANGE for x=0
The y0, y1 and yn functions need separate conditions when x is zero as
that returns ERANGE instead of EDOM.

Also stop adjusting the return value from the __ieee754_y* functions
as that is already correct and we were just breaking it.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-08-05 22:16:31 +02:00
Keith Packard via Newlib
905aa4c013 libm/math: set errno to ERANGE at gamma poles
For POSIX, gamma(i) (i non-positive integer) should set errno to
ERANGE instead of EDOM.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-08-05 22:16:31 +02:00
Keith Packard
12ad9a46df libm/math: Use __math_xflow in obsolete math code [v2]
C compilers may fold const values at compile time, so expressions
which try to elicit underflow/overflow by performing simple
arithemetic on suitable values will not generate the required
exceptions.

Work around this by replacing code which does these arithmetic
operations with calls to the existing __math_xflow functions that are
designed to do this correctly.

Signed-off-by: Keith Packard <keithp@keithp.com>

----

v2:
	libm/math: Pass sign to __math_xflow instead of muliplying result
2020-08-03 13:29:27 +02:00
Keith Packard via Newlib
6295d75913 newlib/libm/math: Make pow/powf return qnan for snan arg
The IEEE spec for pow only has special case for x**0 and 1**y when x/y
are quiet NaN. For signaling NaN, the general case applies and these functions
should signal the invalid exception and return a quiet NaN.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-03-26 12:21:33 +01:00
Joseph S. Myers
5e24839658 Fix spurious underflow exceptions for Bessel functions for double(from glibc bug 14155)
This fix comes from glibc, from files which originated from
	the same place as the newlib files. Those files in glibc carry
	the same license as the newlib files.

Bug 14155 is spurious underflow exceptions from Bessel functions for
large arguments.  (The correct results for large x are roughly
constant * sin or cos (x + constant) / sqrt (x), so no underflow
exceptions should occur based on the final result.)

There are various places underflows may occur in the intermediate
calculations that cause the failures listed in that bug.  This patch
fixes problems for the double version where underflows occur in
calculating the intermediate functions P and Q (in particular, x**-12
gets computed while calculating Q).  Appropriate approximations are
used for P and Q for arguments at least 0x1p28 and above to avoid the
underflows.

For sufficiently large x - 0x1p129 and above - the code already has a
cut-off to avoid calculating P and Q at all, which means the
approximations -0.125 / x and 0.375 / x can't themselves cause
underflows calculating Q.  This cut-off is heuristically reasonable
for the point beyond which Q can be neglected (based on expecting
around 0x1p-64 to be the least absolute value of sin or cos for large
arguments representable in double).

The float versions use a cut-off 0x1p17, which is less heuristically
justifiable but should still only affect values near zeroes of the
Bessel functions where these implementations are intrinsically
inaccurate anyway (bugs 14469-14472), and should serve to avoid
underflows (the float underflow for jn in bug 14155 probably comes
from the recurrence to compute jn).  ldbl-96 uses 0x1p129, which may
not really be enough heuristically (0x1p143 or so might be safer - 143
= 64 + 79, number of mantissa bits plus total number of significant
bits in representation) but again should avoid underflows and only
affect values where the code is substantially inaccurate anyway.
ldbl-128 and ldbl-128ibm share a completely different implementation
with no such cut-off, which I propose to fix separately.

Signed-off-by: Keith Packard <keithp@keithp.com>
2020-03-26 12:21:33 +01:00
Fabian Schriever
6b0c1e7cc8 Fix hypotf missing mask in hi+lo decomposition
Add the missing mask for the decomposition of hi+lo which caused some
errors of 1-2 ULP.

This change is taken over from FreeBSD:
95436ce20d

Additionally I've removed some variable assignments which were never
read before being overwritten again in the next 2 lines.
2020-03-19 16:46:17 +01:00
Fabian Schriever
9e8da7bd21 Fix for k_tan.c specific inputs
This fix for k_tan.c is a copy from fdlibm version 5.3 (see also
http://www.netlib.org/fdlibm/readme), adjusted to use the macros
available in newlib (SET_LOW_WORD).

This fix reduces the ULP error of the value shown in the fdlibm readme
(tan(1.7765241907548024E+269)) to 0.45 (thereby reducing the error by
1).

This issue only happens for large numbers that get reduced by the range
reduction to a value smaller in magnitude than 2^-28, that is also
reduced an uneven number of times. This seems rather unlikely given that
one ULP is (much) larger than 2^-28 for the values that may cause an
issue.  Although given the sheer number of values a double can
represent, it is still possible that there are more affected values,
finding them however will be quite hard, if not impossible.

We also took a look at how another library (libm in FreeBSD) handles the
issue: In FreeBSD the complete if branch which checks for values smaller
than 2^-28 (or rather 2^-27, another change done by FreeBSD) is moved
out of the kernel function and into the external function. This means
that the value that gets checked for this condition is the unreduced
value. Therefore the input value which caused a problem in the
fdlibm/newlib kernel tan will run through the full polynomial, including
the careful calculation of -1/(x+r). So the difference is really whether
r or y is used. r = y + p with p being the result of the polynomial with
1/3*x^3 being the largest (and magnitude defining) value. With x being
<2^-27 we therefore know that p is smaller than y (y has to be at least
the size of the value of x last mantissa bit divided by 2, which is at
least x*2^-51 for doubles) by enough to warrant saying that r ~ y.  So
we can conclude that the general implementation of this special case is
the same, FreeBSD simply has a different philosophy on when to handle
especially small numbers.
2020-03-18 10:05:11 +01:00
Fabian Schriever
a8a40ee575 Fix error in exp in magnitude [2e-32,2e-28]
While testing the exp function we noticed some errors at the specified
magnitude. Within this range the exp function returns the input value +1
as an output. We chose to run a test of 1m exponentially spaced values
in the ranges [-2^-27,-2^-32] and [2^-32,2^-27] which showed 7603 and
3912 results with an error of >=0.5 ULP (compared with MPFR in 128 bit)
with the highest being 0.56 ULP and 0.53 ULP.

It's easy to fix by changing the magnitude at which the input value +1
is returned from <2^-28 to <2^-32 and using the polynomial instead. This
reduces the number of results with an error of >=0.5 ULP to 485 and 479
in above tests, all of which are exactly 0.5 ULP.

As we were already checking on exp we also took a look at expf. For expf
the magnitude where the input value +1 is returned can be increased from
<2^-28 to <2^-23 without accuracy loss for a slight performance
improvement. To ensure this was the correct value we tested all values
in the ranges [-2^-17,-2^-28] and [2^-28,2^-17] (~92.3m values each).
2020-03-09 10:12:25 +01:00
Fabian Schriever
d4bcecb3e9 Fix error in float trig. function range reduction
The single-precision trigonometric functions show rather high errors in
specific ranges starting at about 30000 radians. For example the sinf
procedure produces an error of 7626.55 ULP with the input
5.195880078125e+04 (0x474AF6CD) (compared with MPFR in 128bit
precision). For the test we used 100k values evenly spaced in the range
of [30k, 70k]. The issues are periodic at higher ranges.

This error was introduced when the double precision range reduction was
first converted to float. The shift by 8 bits always returns 0 as iq is
never higher than 255.

The fix reduces the error of the example above to 0.45 ULP, highest
error within the test set fell to 1.31 ULP, which is not perfect, but
still a significant improvement. Testing other previously erroneous
ranges no longer show particularly large accuracy errors.
2020-03-03 16:45:22 +01:00
Fabian Schriever
cef36220f2 Fix error in powf for (-1.0, NaN) input
Prevent confusion between -1.0 and 1.0 in powf. The corresponding
similar error was previously fixed for pow (see commit bb25dd1b)
2020-03-02 16:46:03 +01:00
Nicolas Brunie
bb25dd1b0f pow: fix pow(-1.0, NaN)
I think I may have encountered a bug in the implementation of pow:
pow(-1.0, NaN) returns 1.0 when it should return NaN.
Because ix is used to check input vs 1.0 rather than hx, -1.0 is
mistaken for 1.0
2020-02-14 10:12:25 +01:00
Jozef Lawrynowicz
b644774b8f Use nanf() instead of nan() in single-precision float libm math functions
This patch reduces code size for a few single-precision float math
functions, by using nanf() instead of nan() where required.
2019-01-23 10:46:30 +01:00
Jozef Lawrynowicz
d451d9ec78 Use HUGE_VALF instead of HUGE_VAL in single-precision float libm math functions
This patch replaces instances of "(float).*HUGE_VAL" with a direct usage of
HUGE_VALF, which is also defined in math.h.
2019-01-23 10:46:30 +01:00
Jozef Lawrynowicz
7db203304e Remove HUGE_VAL definition from libm math functions
This patch removes the definitions of HUGE_VAL from some of the float math
functions. HUGE_VAL is defined in newlib/libc/include/math.h, so it is not
necessary to have a further definition in the math functions.
2019-01-23 10:46:30 +01:00
Jozef Lawrynowicz
b14a879d85 Remove matherr, and SVID and X/Open math library configurations
Default math library configuration is now IEEE
2019-01-23 10:46:24 +01:00
Jon Beniston
fcc1e7039f e_scalb.c: Call scalbln instead of scalbn on 16-bit targets to ensure constant fits in an int. 2018-09-03 09:41:23 +02:00
Keith Packard
088a45cdf6 Remove unused variable 'one' from sf_cos.c
Defined, never mentioned.

Signed-off-by: Keith Packard <keithp@keithp.com>
2018-08-29 15:57:27 +02:00
Corinna Vinschen
2d87d95f12 newlib: fix various gcc warnings
* unused variables
* potentially used uninitialized
* suggested bracketing
* misleading indentation

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2018-08-08 10:50:19 +02:00
Szabolcs Nagy
b99d49e506 New pow implementation
The new implementation is provided under !__OBSOLETE_MATH, it uses
ISO C99 code.  With default settings the worst case error in nearest
rounding mode is 0.54 ULP with inlined fma and fma contraction.  It uses
a 4 KB lookup table in addition to the table in exp_data.c, on aarch64
.text+.rodata size of libm.a is increased by 2295 bytes.

Improvements on Cortex-A72:
latency: 3.3x
thruput: 4.9x
2018-06-27 15:40:49 +02:00
Szabolcs Nagy
e5791079c6 New log implementation
The new implementations are provided under !__OBSOLETE_MATH, it uses
ISO C99 code.  With default settings the worst case error in nearest
rounding mode is 0.519 ULP with inlined fma and fma contraction.  It uses
a 2 KB lookup table, on aarch64 .text+.rodata size of libm.a is increased
by 1703 bytes.  The w_log.c wrapper is disabled since error handling is
inline in the new code.

New __HAVE_FAST_FMA and __HAVE_FAST_FMA_DEFAULT feature macros were
added to enable selecting between the code path that uses fma and the
one that does not.  Targets supposed to set __HAVE_FAST_FMA_DEFAULT
if they have single instruction fma and the compiler can actually
inline it (gcc has __FP_FAST_FMA macro but that does not guarantee
inlining with -fno-builtin-fma).

Improvements on Cortex-A72:
latency: 1.9x
thruput: 2.3x
2018-06-27 15:40:49 +02:00
Szabolcs Nagy
fb929067db New exp and exp2 implementations
The new implementations are provided under !__OBSOLETE_MATH, they use
ISO C99 code.  There are several settings, with the default one the
worst case error in nearest rounding mode is 0.509 ULP for exp and
0.507 ULP for exp2 when a multiply and add is contracted into an fma.
They use a shared 2 KB lookup table, on aarch64 .text+.rodata size
of libm.a is increased by 1868 bytes.  The w_*.c wrappers are disabled
for the new code as it takes care of error handling inline.

The old exp2(x) code used to be just pow(2,x) so the speedup there
is more significant.

The file name has no special prefix to avoid any name collision with
existing files.

Improvements on Cortex-A72:
exp latency: 3.2x
exp thruput: 4.1x
exp2 latency: 7.8x
exp2 thruput: 18.8x
2018-06-27 15:40:49 +02:00
Wilco Dijkstra
3baadb9912 Improve performance of sinf/cosf/sincosf
Here is the correct patch with both filenames and int cast fixed:

This patch is a complete rewrite of sinf, cosf and sincosf.  The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

Simple benchmark showing speedup factor on AArch64 for various ranges:

range	0.7853982	sinf	1.7	cosf	2.2	sincosf	2.8
range	1.570796	sinf	1.9	cosf	1.9	sincosf	2.7
range	3.141593	sinf	2.0	cosf	2.0	sincosf	3.5
range	6.283185	sinf	2.3	cosf	2.3	sincosf	4.2
range	125.6637	sinf	2.9	cosf	3.0	sincosf	5.1
range	1.1259e15	sinf	26.8	cosf	26.8	sincosf	45.2

ChangeLog:
2018-05-18  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libm/common/Makefile.in: Regenerated.
        * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
        sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
        * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
        roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
        eval_as_double, likely, unlikely.
        * newlib/libm/common/cosf.c: New file.
        * newlib/libm/common/sinf.c: Likewise.
        * newlib/libm/common/sincosf.h: Likewise.
        * newlib/libm/common/sincosf.c: Likewise.
        * newlib/libm/common/sincosf_data.c: Likewise.
        * newlib/libm/math/sf_cos.c: Add #if to build conditionally.
        * newlib/libm/math/sf_sin.c: Likewise.
        * newlib/libm/math/wf_sincos.c: Likewise.

--
2018-06-21 09:37:04 +02:00
Corinna Vinschen
cfe8c6c504 Revert "Improve performance of sinf/cosf/sincosf"
This reverts commit fca80a9d1b3fa6620cdaccec6b726eef1a6530a1.

Accidentally pushed a preliminary version
2018-06-21 09:36:39 +02:00
Wilco Dijkstra
fca80a9d1b Improve performance of sinf/cosf/sincosf
This patch is a complete rewrite of sinf, cosf and sincosf.  The new version
is significantly faster, as well as simple and accurate.
The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all
4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases - on some targets this may
be slow, so this can be configured to use floating point comparisons.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

Simple benchmark showing speedup factor on AArch64 for various ranges:

range	0.7853982	sinf	1.7	cosf	2.2	sincosf	2.8
range	1.570796	sinf	1.9	cosf	1.9	sincosf	2.7
range	3.141593	sinf	2.0	cosf	2.0	sincosf	3.5
range	6.283185	sinf	2.3	cosf	2.3	sincosf	4.2
range	125.6637	sinf	2.9	cosf	3.0	sincosf	5.1
range	1.1259e15	sinf	26.8	cosf	26.8	sincosf	45.2

ChangeLog:
2018-06-18  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libm/common/Makefile.in: Regenerated.
        * newlib/libm/common/Makefile.am: Add sinf.c, cosf.c, sincosf.c
        sincosf.h, sincosf_data.c. Add -fbuiltin -fno-math-errno to CFLAGS.
        * newlib/libm/common/math_config.h: Add HAVE_FAST_ROUND, HAVE_FAST_LROUND,
        roundtoint, converttoint, force_eval_float, force_eval_double, eval_as_float,
        eval_as_double, likely, unlikely.
        * newlib/libm/common/cosf.c: New file.
        * newlib/libm/common/sinf.c: Likewise.
        * newlib/libm/common/sincosf.h: Likewise.
        * newlib/libm/common/sincosf.c: Likewise.
        * newlib/libm/common/sincosf_data.c: Likewise.
        * newlib/libm/math/sf_cos.c: Add #if to build conditionally.
        * newlib/libm/math/sf_sin.c: Likewise.
        * newlib/libm/math/wf_sincos.c: Likewise.

--
2018-06-19 09:44:28 +02:00
Yaakov Selkowitz
7192f84096 ansification: remove _HAVE_STDC
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:30 -06:00
Jim Wilson
c874f1145f newlib: Don't do double divide in powf.
* Use 0.0f instead of 0.0 in divide.
2017-12-13 11:33:19 +01:00
Jim Wilson
c338bc2255 Don't call double rint from float powf.
Updated patch to use 0.0f in addition to calling rintf.

Tested same way as before, with a testcase that triggers the code and
make check.

OK?

	newlib/
	* libm/math/wf_pow.c (powf): Call rintf instead of rint.  Use 0.0f
	for compare.
2017-12-13 11:03:10 +01:00
Jon Turney
c006fd459f makedoc: make errors visible
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
2017-12-07 11:54:11 +00:00
Yaakov Selkowitz
ec4c079f4b math: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:53 -06:00
Szabolcs Nagy
c156098271 New expf, exp2f, logf, log2f and powf implementations
Based on code from https://github.com/ARM-software/optimized-routines/

This patch adds a highly optimized generic implementation of expf,
exp2f, logf, log2f and powf.  The new functions are not only
faster (6x for powf!), but are also smaller and more accurate.
In order to achieve this, the algorithm uses double precision
arithmetic for accuracy, avoids divisions and uses small table
lookups to minimize the polynomials.  Special cases are handled
inline to avoid the unnecessary overhead of wrapper functions and
set errno to POSIX requirements.

The new functions are added under newlib/libm/common, but the old
implementations are kept (in newlib/libm/math) for non-IEEE or
pre-C99 systems.  Targets can enable the new math code by defining
__OBSOLETE_MATH_DEFAULT to 0 in newlib/libc/include/machine/ieeefp.h,
users can override the default by defining __OBSOLETE_MATH.
Currently the new code is enabled for AArch64 and AArch32 with VFP.
Targets with a single precision FPU may still prefer the old
implementation.

libm.a size changes:
arm: -1692
arm/thumb/v7-a/nofp: -878
arm/thumb/v7-a+fp/hard: -864
arm/thumb/v7-a+fp/softfp: -908
aarch64: -1476
2017-10-13 10:58:00 +02:00