The implementation of expf() explains how approximation in the range [0 - 0.34] is done. The comment describes the "Reme" algorithm for constructing the polynomial. This is a typo and should be the "Remez" algorithm. The remez algorithm (or minimax) is used to calculate the coefficients of polynomials in other implementations of exp(0 and log().
See more:
https://en.wikipedia.org/wiki/Remez_algorithm
By default, Newlib uses a huge object of type struct _reent to store
thread-specific data. This object is returned by __getreent() if the
__DYNAMIC_REENT__ Newlib configuration option is defined.
The reentrancy structure contains for example errno and the standard input,
output, and error file streams. This means that if an application only uses
errno it has a dependency on the file stream support even if it does not use
it. This is an issue for lower end targets and applications which need to
qualify the software according to safety standards (for example ECSS-E-ST-40C,
ECSS-Q-ST-80C, IEC 61508, ISO 26262, DO-178, DO-330, DO-333).
If the new _REENT_THREAD_LOCAL configuration option is enabled, then struct
_reent is replaced by dedicated thread-local objects for each struct _reent
member. The thread-local objects are defined in translation units which use
the corresponding object.
Convert all the libm/ subdir makes into the top-level Makefile. This
allows us to build all of libm from the top Makefile without using any
recursive make calls. This is faster and avoids the funky lib.a logic
where we unpack subdir archives to repack into a single libm.a. The
machine override logic is maintained though by way of Makefile include
ordering, and source file accumulation in libm_a_SOURCES.
One thing to note is that this will require GNU Make because of:
libm_a_CFLAGS = ... $(libm_a_CFLAGS_$(subst /,_,$(@D)))
This was the only way I could find to supporting per-dir compiler
settings, and I couldn't find a POSIX compatible way of transforming
the variable content. I don't think this is a big deal as other
Makefiles in the tree are using GNU Make-specific syntax, but I call
this out as it's the only one so far in the new automake code that
I've been writing.
Automake doesn't provide precise control over the output object names
(by design). This is fine by default as we get consistent names in all
the subdirs: libm_a-<source>.o. But this relies on using the same set
of compiler flags for all objects. We currently compile libm/common/
with different optimizations than the rest.
If we want to compile objects differently, we can create an intermediate
archive with the subset of objects with unique flags, and then add those
objects to the main archive. But Automake will use a different prefix
for the objects, and thus we can't rely on ordering to override.
But if we leverage $@, we can turn Automake's CFLAGS into a multiplex
on a per-dir (and even per-file if we wanted) basis. Unfortunately,
since $@ contains /, Automake complains it's an invalid name. While
GNU Make supports this, it's a POSIX extension, so Automake flags it.
Using $(subst) avoids the Automake warning to get a POSIX compliant
name, albeit with a GNU Make extension.
This is used in a bunch of places, but nowhere is it ever set, and
nowhere can I find any documentation, nor can I find any other project
using it. So delete the flags to simplify.
The original cut for small arguments at |x|<2**-70 (copied from the
double version) produces that when computing nadj we get a subnormal
number for t*x and thus, the division of pi/subnormal will be INF and
the logarithm of it too, which is wrong as a result for lgammaf in this
range.
The proposed new limit seems to be safe and has been tested to
produce accurate results.
(Courtesy of Andreas Jung, ESA)
Correct the overflow limit in the variable o_threshold to be consistent
with the FLT_UWORD_LOG_MAX variable used by the internal implementation
of the expf algorithm itself.
The u_threshold variable has also been modified to be written in the
same format.
Note that this fix improves the situation but does not completely
correct the inconsistencies regarding the overflow and underflow limits
between the expf wrapper (wf_exp.c) and the expf algorithm itself
(ef_exp.c).
Currently these limits are different for the
_FLT_LARGEST_EXPONENT_IS_NORMAL and _FLT_NO_DENORMALS cases as well as
for the case where __OBSOLETE_MATH is not defined (only for the
underflow limit in this case).
This kills off the last configure script under libm/ and folds it
into the top newlib configure script. The vast majority of logic
was already in the top configure script, so move the little that
is left into a libm/acinclude.m4 file.
This was only ever used for i?86-pc-linux-gnu targets, but that's been
broken for years, and has since been dropped. So clean this up too.
This also deletes the funky objectlist logic since it only existed for
the libtool libraries. Since it was the only thing left in the small
Makefile.shared file, we can punt that too.
Now that we use AC_NO_EXECUTABLES, and we require a recent version of
autoconf, we don't need to define our own copies of these macros. So
switch to the standard AC_PROG_CC.
This allows building the libc & libm pages in parallel, and drops
the duplication in the subdirs with the chew/chapter settings.
The unused rules in Makefile.shared are left in place to minimize
noise in the change.
When using the top-level configure script but subdir Makefiles, the
newlib_basedir value gets a bit out of sync: it's relative to where
configure lives, not where the Makefile lives. Move the abs setting
from the top-level configure script into acinclude.m4 so we can rely
on it being available everywhere. Although this commit doesn't use
it anywhere, just lays the groundwork.
The machine configure scripts are all effectively stub scripts that
pass the higher level options to its own makefile. The only one doing
any custom tests was nds32. The rest were all effectively the same as
the libm/ configure script.
So instead of recursively running configure in all of these subdirs,
generate their makefiles from the top-level configure. For nds32,
deploy a pattern of including subdir logic via m4:
m4_include([machine/nds32/acinclude.m4])
Even its set of checks are very small -- it does 2 preprocessor tests
and sets up 2 makefile conditionals.
Some of the generated machine makefiles have a bunch of extra stuff
added to them, but that's because they were inconsistent in their
configure libtool calls. The top-level has it, so it exports some
new vars to the ones that weren't already.
The machine/{configure,Makefile} files exist only to fan out to the
specific machine/$arch/ subdir. We already have all that same info
in the libm/ dir itself, so by moving the recursive configure and
make calls into it, we can cut off this logic entirely and save the
overhead.
For arches that don't have a machine subdir, it means they can skip
the logic entirely.
This matches what the other GNU toolchain projects have done already.
The generated diff in practice isn't terribly large. This will allow
more use of subdir local.mk includes due to fixes & improvements that
came after the 1.11 release series.
The newlib & libgloss dirs are already generated using autoconf-2.69.
To avoid merging new code and/or accidental regeneration using diff
versions, leverage config/override.m4 to pin to 2.69 exactly. This
matches what gcc/binutils/gdb are already doing.
The README file already says to use autoconf-2.69.
To accomplish this, it's just as simple as adding -I flags to the
top-level config/ dir when running aclocal. This is because the
override.m4 file overrides AC_INIT to first require the specific
autoconf version before calling the real AC_INIT.
Since automake deprecated the INCLUDES name in favor of AM_CPPFLAGS,
change all existing users over. The generated code is the same since
the two variables have been used in the same exact places by design.
There are other cleanups to be done, but lets focus on just renaming
here so we can upgrade to a newer automake version w/out triggering
new warnings.
The 'cygnus' option was removed from automake 1.13 in 2012, so the
presence of this option prevents that or a later version of automake
being used.
A check-list of the effects of '--cygnus' from the automake 1.12
documentation, and steps taken (where possible) to preserve those
effects (See also this thread [1] for discussion on that):
[1] https://lists.gnu.org/archive/html/bug-automake/2012-03/msg00048.html
1. The foreign strictness is implied.
Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4
2. The options no-installinfo, no-dependencies and no-dist are implied.
Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4
Future work: Remove no-dependencies and any explicit header dependencies,
and use automatic dependency tracking instead. Are there explicit rules
which are now redundant to removing no-installinfo and no-dist?
3. The macro AM_MAINTAINER_MODE is required.
Already present in newlib/acinclude.m4
Note that maintainer-mode is still disabled by default.
4. Info files are always created in the build directory, and not in the
source directory.
This appears to be an error in the automake documentation describing
'--cygnus' [2]. newlib's info files are generated in the source
directory, and no special steps are needed to keep doing that.
[2] https://lists.gnu.org/archive/html/bug-automake/2012-04/msg00028.html
5. texinfo.tex is not required if a Texinfo source file is specified.
(The assumption is that the file will be supplied, but in a place that
automake cannot find.)
This effect is overriden by an explicit setting of the TEXINFO_TEX
variable (the directory part of which is fed into texi2X via the
TEXINPUTS environment variable).
6. Certain tools will be searched for in the build tree as well as in the
user's PATH. These tools are runtest, expect, makeinfo and texi2dvi.
For obscure automake reasons, this effect of '--cygnus' is not active
for makeinfo in newlib's configury.
However, there appears to be top-level configury which selects in-tree
runtest, expect and makeinfo, if present. So, if that works as it
appears, this effect is preserved. If not, this may cause problem if
anyone is building those tools in-tree.
This effect is not preserved for texi2dvi. This may cause problems if
anyone is building texinfo in-tree.
If needed, explicit checks for those tools looking in places relative to
$(top_srcdir)/../ as well as in PATH could be added.
7. The check target doesn't depend on all.
This effect is not preseved. The check target now depends on the all
target.
This concern seems somewhat academic given the current state of the
testsuite.
Also note that this doesn't touch libgloss.
- compiler is sometimes optimizing out the rounding check in
e_sqrt.c and ef_sqrt.c which uses two constants to create
an inexact operation
- there is a similar constant operation in s_tanh.c/sf_tanh.c
- make the one and tiny constants volatile to stop this
So far the build mechanism in newlib only allowed to either define
machine-specific headers, or headers shared between all machines.
In some cases, architectures are sufficiently alike to share header
files between them, but not with other architectures. A good example
is ix86 vs. x86_64, which share certain traits with each other, but
not with other architectures.
Introduce a new configure variable called "shared_machine_dir". This
dir can then be used for headers shared between architectures.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This patch fixes the error found by Paul Zimmermann (see
https://homepages.loria.fr/PZimmermann/papers/#accuracy) regarding x
close to 1 and rather large y (specifically he found the case
powf(0x1.ffffeep-1,-0x1.000002p+27) which returns +Inf instead of the
correct value). We found 2 more values for x which show the same faulty
behaviour, and all 3 are fixed with this patch. We have tested all
combinations for x in [+1.fffdfp-1, +1.00020p+0] and y in
[-1.000007p+27, -1.000002p+27] and [1.000002p+27,1.000007p+27].
The current gamma, gamma_r, gammaf and gammaf_r functions return
|gamma(x)| instead of ln(|gamma(x)|) due to a change made back in 2002
to the __ieee754_gamma_r implementation. This patch fixes that, making
all of these functions map too their lgamma equivalents.
To fix the underlying bug, the __ieee754_gamma functions have been
changed to return gamma(x), removing the _r variants as those are no
longer necessary. Their names have been changed to __ieee754_tgamma to
avoid potential confusion from users.
Now that the __ieee754_tgamma functions return the correctly signed
value, the tgamma functions have been modified to use them.
libm.a now exposes the following gamma functions:
ln(|gamma(x)|):
__ieee754_lgamma_r
__ieee754_lgammaf_r
lgamma
lgamma_r
gamma
gamma_r
lgammaf
lgammaf_r
gammaf
gammaf_r
lgammal (on machines where long double is double)
gamma(x):
__ieee754_tgamma
__ieee754_tgammaf
tgamma
tgammaf
tgammal (on machines where long double is double)
Additional aliases for any of the above functions can be added if
necessary; in particular, I'm not sure if we need to include
__ieee754_gamma*_r functions (which would return ln(|(gamma(x)|).
Signed-off-by: Keith Packard <keithp@keithp.com>
----
v2:
Switch commit message to ASCII
It was calling __math_uflow(0) instead of __math_uflowf(0), which
resulted in no exception being set on machines with exception support
for float but not double.
Signed-off-by: Keith Packard <keithp@keithp.com>
The __ieee754 functions already return the right value in exception
cases, so don't modify those. Setting the library to _POSIX_/_IEEE_
mode now only affects whether errno is modified.
Signed-off-by: Keith Packard <keithp@keithp.com>
The y0, y1 and yn functions need separate conditions when x is zero as
that returns ERANGE instead of EDOM.
Also stop adjusting the return value from the __ieee754_y* functions
as that is already correct and we were just breaking it.
Signed-off-by: Keith Packard <keithp@keithp.com>
C compilers may fold const values at compile time, so expressions
which try to elicit underflow/overflow by performing simple
arithemetic on suitable values will not generate the required
exceptions.
Work around this by replacing code which does these arithmetic
operations with calls to the existing __math_xflow functions that are
designed to do this correctly.
Signed-off-by: Keith Packard <keithp@keithp.com>
----
v2:
libm/math: Pass sign to __math_xflow instead of muliplying result
The IEEE spec for pow only has special case for x**0 and 1**y when x/y
are quiet NaN. For signaling NaN, the general case applies and these functions
should signal the invalid exception and return a quiet NaN.
Signed-off-by: Keith Packard <keithp@keithp.com>
This fix comes from glibc, from files which originated from
the same place as the newlib files. Those files in glibc carry
the same license as the newlib files.
Bug 14155 is spurious underflow exceptions from Bessel functions for
large arguments. (The correct results for large x are roughly
constant * sin or cos (x + constant) / sqrt (x), so no underflow
exceptions should occur based on the final result.)
There are various places underflows may occur in the intermediate
calculations that cause the failures listed in that bug. This patch
fixes problems for the double version where underflows occur in
calculating the intermediate functions P and Q (in particular, x**-12
gets computed while calculating Q). Appropriate approximations are
used for P and Q for arguments at least 0x1p28 and above to avoid the
underflows.
For sufficiently large x - 0x1p129 and above - the code already has a
cut-off to avoid calculating P and Q at all, which means the
approximations -0.125 / x and 0.375 / x can't themselves cause
underflows calculating Q. This cut-off is heuristically reasonable
for the point beyond which Q can be neglected (based on expecting
around 0x1p-64 to be the least absolute value of sin or cos for large
arguments representable in double).
The float versions use a cut-off 0x1p17, which is less heuristically
justifiable but should still only affect values near zeroes of the
Bessel functions where these implementations are intrinsically
inaccurate anyway (bugs 14469-14472), and should serve to avoid
underflows (the float underflow for jn in bug 14155 probably comes
from the recurrence to compute jn). ldbl-96 uses 0x1p129, which may
not really be enough heuristically (0x1p143 or so might be safer - 143
= 64 + 79, number of mantissa bits plus total number of significant
bits in representation) but again should avoid underflows and only
affect values where the code is substantially inaccurate anyway.
ldbl-128 and ldbl-128ibm share a completely different implementation
with no such cut-off, which I propose to fix separately.
Signed-off-by: Keith Packard <keithp@keithp.com>
Add the missing mask for the decomposition of hi+lo which caused some
errors of 1-2 ULP.
This change is taken over from FreeBSD:
95436ce20d
Additionally I've removed some variable assignments which were never
read before being overwritten again in the next 2 lines.
This fix for k_tan.c is a copy from fdlibm version 5.3 (see also
http://www.netlib.org/fdlibm/readme), adjusted to use the macros
available in newlib (SET_LOW_WORD).
This fix reduces the ULP error of the value shown in the fdlibm readme
(tan(1.7765241907548024E+269)) to 0.45 (thereby reducing the error by
1).
This issue only happens for large numbers that get reduced by the range
reduction to a value smaller in magnitude than 2^-28, that is also
reduced an uneven number of times. This seems rather unlikely given that
one ULP is (much) larger than 2^-28 for the values that may cause an
issue. Although given the sheer number of values a double can
represent, it is still possible that there are more affected values,
finding them however will be quite hard, if not impossible.
We also took a look at how another library (libm in FreeBSD) handles the
issue: In FreeBSD the complete if branch which checks for values smaller
than 2^-28 (or rather 2^-27, another change done by FreeBSD) is moved
out of the kernel function and into the external function. This means
that the value that gets checked for this condition is the unreduced
value. Therefore the input value which caused a problem in the
fdlibm/newlib kernel tan will run through the full polynomial, including
the careful calculation of -1/(x+r). So the difference is really whether
r or y is used. r = y + p with p being the result of the polynomial with
1/3*x^3 being the largest (and magnitude defining) value. With x being
<2^-27 we therefore know that p is smaller than y (y has to be at least
the size of the value of x last mantissa bit divided by 2, which is at
least x*2^-51 for doubles) by enough to warrant saying that r ~ y. So
we can conclude that the general implementation of this special case is
the same, FreeBSD simply has a different philosophy on when to handle
especially small numbers.
While testing the exp function we noticed some errors at the specified
magnitude. Within this range the exp function returns the input value +1
as an output. We chose to run a test of 1m exponentially spaced values
in the ranges [-2^-27,-2^-32] and [2^-32,2^-27] which showed 7603 and
3912 results with an error of >=0.5 ULP (compared with MPFR in 128 bit)
with the highest being 0.56 ULP and 0.53 ULP.
It's easy to fix by changing the magnitude at which the input value +1
is returned from <2^-28 to <2^-32 and using the polynomial instead. This
reduces the number of results with an error of >=0.5 ULP to 485 and 479
in above tests, all of which are exactly 0.5 ULP.
As we were already checking on exp we also took a look at expf. For expf
the magnitude where the input value +1 is returned can be increased from
<2^-28 to <2^-23 without accuracy loss for a slight performance
improvement. To ensure this was the correct value we tested all values
in the ranges [-2^-17,-2^-28] and [2^-28,2^-17] (~92.3m values each).
The single-precision trigonometric functions show rather high errors in
specific ranges starting at about 30000 radians. For example the sinf
procedure produces an error of 7626.55 ULP with the input
5.195880078125e+04 (0x474AF6CD) (compared with MPFR in 128bit
precision). For the test we used 100k values evenly spaced in the range
of [30k, 70k]. The issues are periodic at higher ranges.
This error was introduced when the double precision range reduction was
first converted to float. The shift by 8 bits always returns 0 as iq is
never higher than 255.
The fix reduces the error of the example above to 0.45 ULP, highest
error within the test set fell to 1.31 ULP, which is not perfect, but
still a significant improvement. Testing other previously erroneous
ranges no longer show particularly large accuracy errors.
I think I may have encountered a bug in the implementation of pow:
pow(-1.0, NaN) returns 1.0 when it should return NaN.
Because ix is used to check input vs 1.0 rather than hx, -1.0 is
mistaken for 1.0