Commit Graph

84 Commits

Author SHA1 Message Date
Sebastian Huber 3cacedbbac aarch64: Remove duplicated optimized memmove()
The optimized aarch64/memcpy.S already provides a memmove() implementation.
2023-11-21 10:48:43 +01:00
Sebastian Huber fe5886a500 aarch64: Import memrchr.S
Import memrchr.S for AArch64 from:

https://github.com/ARM-software/optimized-routines

commit 0cf84f26b6b8dcad8287fe30a4dcc1fdabd06560
Author: Sebastian Huber <sebastian.huber@embedded-brains.de>
Date:   Thu Jul 27 17:14:57 2023 +0200

    string: Fix corrupt GNU_PROPERTY_TYPE (5) size

    For ELF32 the notes alignment is 4 and not 8.
2023-10-05 14:16:59 +02:00
Sebastian Huber 96ec8f868e aarch64: Sync with ARM-software/optimized-routines
Update AArch64 assembly string routines from:

https://github.com/ARM-software/optimized-routines

commit 0cf84f26b6b8dcad8287fe30a4dcc1fdabd06560
Author: Sebastian Huber <sebastian.huber@embedded-brains.de>
Date:   Thu Jul 27 17:14:57 2023 +0200

    string: Fix corrupt GNU_PROPERTY_TYPE (5) size

    For ELF32 the notes alignment is 4 and not 8.

Add license and copyright information to COPYING.NEWLIB as entry (56).
2023-10-05 14:16:57 +02:00
Jennifer Averett 048ebea981 newlib: Add non LDBL_EQ_DBL math support for aarch64, i386, and x86_64
Rename s_nearbyint.c, s_fdim.c and s_scalbln.c to remove conflicts
    Remove functions that are not needed from above files
    Modify include paths
    Add includes missing in cygwin build
    Add missing types
    Create Makefiles
    Create header files to resolve dependencies between directories
    Modify some instances of unsigned long to uint64_t for 32 bit platforms
    Add HAVE_FPMATH_H
2023-05-16 09:05:36 -05:00
Jennifer Averett c630a6a837 newlib: Add FreeBSD files for non LDBL_EQ_DBL support
FreeBSD files to add long double support for i386,
aarch64 and x86_64.
2023-05-16 09:05:36 -05:00
Mike Frysinger 96bc16f6b2 newlib: libc: merge build up a directory
Convert all the libc/ subdir makes into the top-level Makefile.  This
allows us to build all of libc from the top Makefile without using any
recursive make calls.  This is faster and avoids the funky lib.a logic
where we unpack subdir archives to repack into a single libc.a.  The
machine override logic is maintained though by way of Makefile include
ordering, and source file accumulation in libc_a_SOURCES.

There's a few dummy.c files that are no longer necessary since we aren't
doing the lib.a accumulating, so punt them.

The winsup code has been pulling the internal newlib ssp library out,
but that doesn't exist anymore, so change that to pull the objects.
2022-03-16 21:18:25 -04:00
Mike Frysinger 8343db918f newlib: libc: move configure into top-level
This kills off the last configure script under libc/ and folds it
into the top newlib configure script.  The a lot of the logic was
already in the top configure script, so move what's left into a
libc/acinclude.m4 file.
2022-02-25 13:52:48 -05:00
Mike Frysinger 416792d59a newlib: libc: delete crt0.o duplication
The crt0.o was handled in a subdir-by-subdir basis: it would be compiled
in one (e.g. libc/sys/$arch/), then copied up one level (libc/sys/), then
copied up another (libc/) before finally being copied & installed in the
top newlib dir.  The libc/sys/ copy was cleaned up, and then the top dir
was changed to copy it directly out of the libc/sys/$arch/ dir.  But the
libc/sys/ copy to libc/ was left behind.  Clean that up now too.
2022-02-18 21:25:32 -05:00
Mike Frysinger 907764ebec newlib/libgloss: drop unused $(CROSS_CFLAGS)
This is used in a bunch of places, but nowhere is it ever set, and
nowhere can I find any documentation, nor can I find any other project
using it.  So delete the flags to simplify.
2022-02-15 20:02:51 -05:00
Mike Frysinger ac90a6590b newlib: phoenix: merge configure up to top-level
Merge sys/phoenix/ configure logic into libc/ itself.  This kills
off the last lingering script in this tree (other than libc itself).
2022-02-15 19:59:08 -05:00
Mike Frysinger 5b9c4cf23e newlib: drop support for $oext
This was needed only to support libtool in case objects ended in .lo
instead of .o, but we dropped libtool, so drop this too.
2022-02-09 23:35:23 -05:00
Mike Frysinger f034d8ad19 newlib: drop support for $aext
This was needed only to support libtool in case the library ended in
.la instead of .a, but we dropped libtool, so drop this too.
2022-02-09 23:34:17 -05:00
Mike Frysinger 006da84337 newlib: drop libtool support
This was only ever used for i?86-pc-linux-gnu targets, but that's been
broken for years, and has since been dropped.  So clean this up too.

This also deletes the funky objectlist logic since it only existed for
the libtool libraries.  Since it was the only thing left in the small
Makefile.shared file, we can punt that too.
2022-02-09 20:27:37 -05:00
Mike Frysinger e7ad3f5aa8 newlib: switch to AM_PROG_AR
Now that we require automake-1.15, we can use this macro rather than
do the tool search ourselves.
2022-02-08 21:24:59 -05:00
Mike Frysinger b9346cee1a newlib: switch to standard AC_PROG_CC
Now that we use AC_NO_EXECUTABLES, and we require a recent version of
autoconf, we don't need to define our own copies of these macros.  So
switch to the standard AC_PROG_CC.
2022-02-08 19:09:26 -05:00
Mike Frysinger 44f6310bf9 newlib: libc: include all chapters all the time in the manual
THe stdio subdir is actually required by the documentation.  The
stdio/def is handled dynamically, but libc.texi always expects it
to be included, and fails if it isn't.  So making it required when
building docs is safe.

The xdr subdir is handled dynamically, but it doesn't include any
docs, so the dynamic logic isn't (currently) adding any value.  So
making it required when building docs is safe.

That leaves: iconv, stdio64, posix, and signal subdirs.  The chapters
have a little disclaimer saying they are system-dependent, but even
then, imo having stable manuals regardless of the target is preferable,
and we can add more disclaimer language to these chapters if we want.

This doesn't touch the man page codepaths, just the info/pdf.
2022-02-04 19:39:09 -05:00
Mike Frysinger 6444f108d9 newlib: export abs_newlib_basedir for all subdirs
When using the top-level configure script but subdir Makefiles, the
newlib_basedir value gets a bit out of sync: it's relative to where
configure lives, not where the Makefile lives.  Move the abs setting
from the top-level configure script into acinclude.m4 so we can rely
on it being available everywhere.  Although this commit doesn't use
it anywhere, just lays the groundwork.
2022-01-29 01:35:30 -05:00
Mike Frysinger 08a55a233d newlib: libc: merge machine/ configure scripts up a level
The machine configure scripts are all effectively stub scripts that
pass the higher level options to its own makefile.  There were only
three doing custom tests.  The rest were all effectively the same as
the libc/ configure script.

So instead of recursively running configure in all of these subdirs,
generate their makefiles from the top-level configure.  For the few
unique ones, deploy a pattern of including subdir logic via m4:
	m4_include([machine/nds32/acinclude.m4])

Some of the generated machine makefiles have a bunch of extra stuff
added to them, but that's because they were inconsistent in their
configure libtool calls.  The top-level has it, so it exports some
new vars to the ones that weren't already.
2022-01-26 03:11:21 -05:00
Mike Frysinger f159663b08 newlib: stop clobbering LDFLAGS with non-standard $ldflags
It's unclear why this was added originally, but assuming it was needed
20 years ago, it shouldn't be explicitly required nowadays.  Current
versions of autotools already take care of exporting LDFLAGS to the
Makefile as needed (things are actually getting linked).  That's why
the configure diffs show LDFLAGS still here, but shifted to a diff
place in the output list.  A few dirs stop exporting LDFLAGS, but
that's because they don't do any linking, only compiling, so it's
correct.

As for the use of $ldflags instead of the standard $LDFLAGS, I can't
really explain that at all.  Just use the right name so users don't
have to dig into why their setting isn't respected, and then use a
non-standard name instead.  Adjust the testsuite to match.
2022-01-21 17:10:10 -05:00
Mike Frysinger 4317e0676a newlib: stop checking --enable-multilib in subdirs
None of the subdirs actually use the multilib arg, so include the
logic only in the top-level configure.
2022-01-21 17:10:10 -05:00
Mike Frysinger 6746e06043 newlib: avoid duplicate awk checks
Since AM_INIT_AUTOMAKE calls AC_PROG_AWK, and some configure.ac
scripts call it too, we end up testing for awk multiple times.  If
we change NEWLIB_CONFIGURE to require the macro instead, then it
makes sure it's always expanded, but only once.

While we're here, do the same thing with AC_PROG_INSTALL since it
is also called by AM_INIT_AUTOMAKE, although it doesn't currently
result in duplicate configure checks.
2022-01-18 19:25:18 -05:00
Mike Frysinger 71086e8b2d newlib: delete (most) redundant lib_a_CCASFLAGS=$(AM_CCASFLAGS)
Since automake already sets per-library CCASFLAGS to $(AM_CCASFLAGS)
by default, there's no need to explicitly set it here.

Many of these dirs don't have .S files in the first place, so the rule
doesn't even do anything.  That can easily be seen when Makefile.in has
no changes as a result.

For the dirs with .S files, the custom rules are the same as the pattern
.S.o rules, so this is a nice cleanup.

The only dir that was adding extra flags (newlib/libc/machine/mn10300/)
to the per-library setting can have it moved to the global AM_CCASFLAGS
since the subdir only has one target.  Although the setting just adds
extra debugging flags, so maybe it should be deleted in general.

There are a few dirs that we leave the redundant setting in place.  This
is to workaround an automake limitation in subdirs that support building
with & w/out libtool:
https://www.gnu.org/software/automake/manual/html_node/Objects-created-both-with-libtool-and-without.html
2022-01-18 19:12:02 -05:00
Mike Frysinger 20e3103471 newlib: update to automake-1.15
This matches what the other GNU toolchain projects have done already.
The generated diff in practice isn't terribly large.  This will allow
more use of subdir local.mk includes due to fixes & improvements that
came after the 1.11 release series.
2022-01-14 19:10:38 -05:00
Mike Frysinger a100e80fc9 require autoconf-2.69 exactly
The newlib & libgloss dirs are already generated using autoconf-2.69.
To avoid merging new code and/or accidental regeneration using diff
versions, leverage config/override.m4 to pin to 2.69 exactly.  This
matches what gcc/binutils/gdb are already doing.

The README file already says to use autoconf-2.69.

To accomplish this, it's just as simple as adding -I flags to the
top-level config/ dir when running aclocal.  This is because the
override.m4 file overrides AC_INIT to first require the specific
autoconf version before calling the real AC_INIT.
2022-01-14 15:24:33 -05:00
Mike Frysinger 8fc6b4b30e newlib: regen aclocal.m4 after autoconf update
The configure scripts were regenerated with 2.69 for the newlib-4.2.0
release in 484d2ebf8d, but the aclocal
files were not.  Do that now to avoid confusion between the two as to
which version of autoconf was used.
2022-01-12 07:01:18 -05:00
Mike Frysinger ed20821a40 newlib: migrate from INCLUDES to AM_CPPFLAGS
Since automake deprecated the INCLUDES name in favor of AM_CPPFLAGS,
change all existing users over.  The generated code is the same since
the two variables have been used in the same exact places by design.

There are other cleanups to be done, but lets focus on just renaming
here so we can upgrade to a newer automake version w/out triggering
new warnings.
2022-01-05 20:29:53 -05:00
Jeff Johnston 484d2ebf8d Update newlib to 4.2.0 2021-12-31 12:46:13 -05:00
Jon Turney bfcabeb876
newlib: Regenerate autotools files 2021-12-29 22:45:06 +00:00
Jon Turney a4e734fcdb
newlib: Remove automake option 'cygnus'
The 'cygnus' option was removed from automake 1.13 in 2012, so the
presence of this option prevents that or a later version of automake
being used.

A check-list of the effects of '--cygnus' from the automake 1.12
documentation, and steps taken (where possible) to preserve those
effects (See also this thread [1] for discussion on that):

[1] https://lists.gnu.org/archive/html/bug-automake/2012-03/msg00048.html

1. The foreign strictness is implied.

Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4

2. The options no-installinfo, no-dependencies and no-dist are implied.

Already present in AM_INIT_AUTOMAKE in newlib/acinclude.m4

Future work: Remove no-dependencies and any explicit header dependencies,
and use automatic dependency tracking instead.  Are there explicit rules
which are now redundant to removing no-installinfo and no-dist?

3. The macro AM_MAINTAINER_MODE is required.

Already present in newlib/acinclude.m4

Note that maintainer-mode is still disabled by default.

4. Info files are always created in the build directory, and not in the
source directory.

This appears to be an error in the automake documentation describing
'--cygnus' [2]. newlib's info files are generated in the source
directory, and no special steps are needed to keep doing that.

[2] https://lists.gnu.org/archive/html/bug-automake/2012-04/msg00028.html

5. texinfo.tex is not required if a Texinfo source file is specified.
(The assumption is that the file will be supplied, but in a place that
automake cannot find.)

This effect is overriden by an explicit setting of the TEXINFO_TEX
variable (the directory part of which is fed into texi2X via the
TEXINPUTS environment variable).

6. Certain tools will be searched for in the build tree as well as in the
user's PATH. These tools are runtest, expect, makeinfo and texi2dvi.

For obscure automake reasons, this effect of '--cygnus' is not active
for makeinfo in newlib's configury.

However, there appears to be top-level configury which selects in-tree
runtest, expect and makeinfo, if present. So, if that works as it
appears, this effect is preserved. If not, this may cause problem if
anyone is building those tools in-tree.

This effect is not preserved for texi2dvi. This may cause problems if
anyone is building texinfo in-tree.

If needed, explicit checks for those tools looking in places relative to
$(top_srcdir)/../ as well as in PATH could be added.

7. The check target doesn't depend on all.

This effect is not preseved. The check target now depends on the all
target.

This concern seems somewhat academic given the current state of the
testsuite.

Also note that this doesn't touch libgloss.
2021-12-29 22:45:04 +00:00
Jon Turney 8e166351b3
newlib: Regenerate autotools files 2021-12-29 22:45:03 +00:00
Jon Turney 639cb7ec1a
newlib: Regenerate all autotools files
Regenerate all aclocal.m4, configure and Makefile.in files.
2021-12-09 21:41:35 +00:00
Mike Frysinger 59e83de0b1 libgloss/newlib: update configure.ac in Makefile.in files
The maintainer rules refer to configure.in directly, so update that
after renaming all the configure.ac files.
2021-11-06 14:14:49 -04:00
Mike Frysinger 920617998e libgloss/newlib: rename configure.in to configure.ac
The .in name has been deprecated for a long time in favor of .ac.
2021-09-13 10:14:37 -04:00
Richard Earnshaw 2a3a03972b aarch64: support binary mode for opening files
Newlib for aarch64 uses libgloss for the backend.  One common libgloss
implementation is the 'rdimon' implementation, which uses the Arm
Semihosting protocol.  In order to support a remote host that runs on
Windows we need to know whether a file is to be opened in binary or
text mode.  That means that we need to preserve this information via
O_BINARY until we know what the libgloss binding will be.

This patch simply copies the arm implementation from sys/arm/sys and
puts it in machine/aarch64/sys, because we don't have a 'sys' subtree
on aarch64.
2021-05-26 15:17:11 +01:00
Jeff Johnston 415fdd4279 Bump up newlib version to 4.1.0 2020-12-18 18:50:49 -05:00
Jeff Johnston 14123c991b Bump newlib release to 4.0.0 2020-12-11 14:37:12 -05:00
Eshan dhawan via Newlib fd5e27d362 fenv aarch64 support
Signed-off-by: Eshan dhawan <eshandhawan51@gmail.com>
2020-07-02 12:12:39 +02:00
Jeff Johnston 4e78f8ea16 Bump up newlib release to 3.3.0 2020-01-21 15:17:43 -05:00
Jeff Johnston 1afb22a120 Bump up release to 3.2.0 for yearly snapshot 2020-01-02 14:56:24 -05:00
Jeff Johnston 5726873100 Bump release to 3.1.0 for yearly snapshot 2018-12-31 23:40:11 -05:00
Wilco Dijkstra df7824d1a4 Fix issue with dst bias in memset
This patch fixes an issue in the previous memset loop change. If the
zva size is >= 256 and there are more than 64 bytes left in the
tail, we could enter the loop and thus need to rebias dst by 32 as
well.

Since no known CPUs use this size this can't be tested natively, so I've
tested it on a simulator initialized with a large zva size.

--
2018-11-08 16:45:19 +00:00
Wilco Dijkstra d80db60066 Adjust writeback in non-zero memset
This fixes an ineffiency in the non-zero memset.  Delaying the writeback
until the end of the loop is slightly faster on some cores - this shows
~5% performance gain on Cortex-A53 when doing large non-zero memsets.

Tested against the GLIBC testsuite.
2018-11-06 14:59:51 +00:00
Jon Beniston a9cfb33b6c Add --disable-newlib-fno-builtin to allow compilation without -fno-builtin for smaller and faster code. 2018-08-31 15:40:42 -04:00
Siddhesh Poyarekar d02cc7a09d strcmp.S: Improve performance for misaligned strings
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place.  For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.

This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
2018-07-13 13:27:54 +02:00
Siddhesh Poyarekar 2d9f35c2cc memcmp.S: optimize for medium to large sizes
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources.  The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang.  On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
2018-07-13 13:27:54 +02:00
Siddhesh Poyarekar f44eee8f1b Improve strncmp for mutually misaligned inputs
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient.  Enhance the comparison
similar to strcmp by loading a double-word at a time.  The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark in glibc is as follows:

falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)

All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
2018-07-13 13:27:54 +02:00
Jeff Johnston cd31fbb2ae Add nvptx port.
- From: Cesar Philippidis <cesar@codesourcery.com>
  Date: Tue, 10 Apr 2018 14:43:42 -0700
  Subject: [PATCH] nvptx port

  This port adds support for Nvidia GPU's, which are primarily used as
  offload accelerators in OpenACC and OpenMP.
2018-04-13 15:42:37 -04:00
Jeff Johnston fffd2770db Bump release to 3.0.0 for yearly snapshot
- major release required due to removal of K&R support
2018-01-18 13:07:45 -05:00
Wilco Dijkstra c86063bdc0 Optimized memcmp
This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

ChangeLog:
2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libc/machine/aarch64/memcmp.S (memcmp):
        Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:		153%
Length    1, alignment  1/ 1:		119%
Length    1, alignment  1/ 1:		154%
Length    2, alignment  2/ 2:		121%
Length    2, alignment  2/ 2:		140%
Length    2, alignment  2/ 2:		121%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    4, alignment  4/ 4:		155%
Length    4, alignment  4/ 4:		154%
Length    4, alignment  4/ 4:		161%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    8, alignment  8/ 8:		111%
Length    8, alignment  8/ 8:		130%
Length    8, alignment  8/ 8:		124%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		150%
Length   10, alignment 10/10:		170%
Length   10, alignment 10/10:		137%
Length   10, alignment 10/10:		150%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   12, alignment 12/12:		146%
Length   12, alignment 12/12:		168%
Length   12, alignment 12/12:		156%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		173%
Length   14, alignment 14/14:		167%
Length   14, alignment 14/14:		168%
Length   14, alignment 14/14:		168%
Length   15, alignment 15/15:		168%
Length   15, alignment 15/15:		173%
Length   15, alignment 15/15:		173%
Length    1, alignment  0/ 0:		134%
Length    1, alignment  0/ 0:		127%
Length    1, alignment  0/ 0:		119%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		106%
Length    3, alignment  0/ 0:		82%
Length    3, alignment  0/ 0:		87%
Length    3, alignment  0/ 0:		82%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		122%
Length    5, alignment  0/ 0:		127%
Length    5, alignment  0/ 0:		119%
Length    5, alignment  0/ 0:		127%
Length    6, alignment  0/ 0:		103%
Length    6, alignment  0/ 0:		100%
Length    6, alignment  0/ 0:		100%
Length    7, alignment  0/ 0:		82%
Length    7, alignment  0/ 0:		91%
Length    7, alignment  0/ 0:		87%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		135%
Length   10, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		135%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		135%
Length   13, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  7/ 2:		395%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  0/ 0:		127%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length   64, alignment  0/ 0:		128%
Length   64, alignment  6/ 4:		475%
Length   64, alignment  0/ 0:		131%
Length   64, alignment  0/ 0:		134%
Length   16, alignment  0/ 0:		128%
Length   16, alignment  0/ 0:		119%
Length   16, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  5/ 6:		475%
Length  128, alignment  0/ 0:		130%
Length  128, alignment  0/ 0:		129%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  4/ 8:		545%
Length  256, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		174%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  3/10:		585%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length 1024, alignment  0/ 0:		125%
Length 1024, alignment  2/12:		611%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  0/ 0:		128%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  1/14:		625%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  0/ 0:		125%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  512, alignment  0/ 0:		127%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/16:		125%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/ 0:		125%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment 63/18:		636%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment  0/ 0:		125%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		398%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		477%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  256, alignment  5/10:		543%
Length  256, alignment  5/10:		539%
Length  256, alignment  5/10:		543%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
2017-06-29 20:36:35 +02:00
Sebastian Pop 9938a64ca9 aarch64: optimize the unaligned case of memcmp
This brings to newlib a performance improvement that we developed in Bionic
libc.  That change has been submitted for review to Bionic libc:
https://android-review.googlesource.com/418279

A similar patch has been submitted for review in glibc:
https://sourceware.org/ml/libc-alpha/2017-06/msg01143.html

Patch written by Vikas Sinha and Sebastian Pop.

The performance was measured on the bionic-benchmarks on a hikey (aarch64 8xA53)
board. There was no performance change to the existing benchmark
and a performance improvement on the new benchmark for memcmp
on the unaligned side. The new benchmark has been submitted for
review at https://android-review.googlesource.com/414860

The overall performance improves by 18% for the small data set 8
and the performance improves by 450% for the large data set 64k.

The base is with the libc from /system/lib64. The bionic libc
with this patch is in /data.

hikey:/data # export LD_LIBRARY_PATH=/system/lib64
hikey:/data # ./bionic-benchmarks --benchmark_filter='BM_string_memcmp*'
Run on (8 X 2.4 MHz CPU s)
Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      30 ns         30 ns   22955680    251.07MB/s
BM_string_memcmp/64                     57 ns         57 ns   12349184   1076.99MB/s
BM_string_memcmp/512                   305 ns        305 ns    2297163   1.56496GB/s
BM_string_memcmp/1024                  571 ns        571 ns    1225211   1.66912GB/s
BM_string_memcmp/8k                   4307 ns       4306 ns     162562   1.77177GB/s
BM_string_memcmp/16k                  8676 ns       8675 ns      80676   1.75887GB/s
BM_string_memcmp/32k                 19233 ns      19230 ns      36394   1.58695GB/s
BM_string_memcmp/64k                 36986 ns      36984 ns      18952   1.65029GB/s
BM_string_memcmp_aligned/8             199 ns        199 ns    3519166   38.3336MB/s
BM_string_memcmp_aligned/64            386 ns        386 ns    1810734   158.073MB/s
BM_string_memcmp_aligned/512          1735 ns       1734 ns     403981   281.525MB/s
BM_string_memcmp_aligned/1024         3200 ns       3200 ns     218838   305.151MB/s
BM_string_memcmp_aligned/8k          25084 ns      25080 ns      28180   311.507MB/s
BM_string_memcmp_aligned/16k         51730 ns      51729 ns      13521   302.057MB/s
BM_string_memcmp_aligned/32k        103228 ns     103228 ns       6782   302.727MB/s
BM_string_memcmp_aligned/64k        207117 ns     207087 ns       3450   301.806MB/s
BM_string_memcmp_unaligned/8           339 ns        339 ns    2070998   22.5302MB/s
BM_string_memcmp_unaligned/64         1392 ns       1392 ns     502796   43.8454MB/s
BM_string_memcmp_unaligned/512        9194 ns       9194 ns      76133   53.1104MB/s
BM_string_memcmp_unaligned/1024      18325 ns      18323 ns      38206   53.2963MB/s
BM_string_memcmp_unaligned/8k       148579 ns     148574 ns       4713   52.5831MB/s
BM_string_memcmp_unaligned/16k      298169 ns     298120 ns       2344   52.4118MB/s
BM_string_memcmp_unaligned/32k      598813 ns     598797 ns       1085    52.188MB/s
BM_string_memcmp_unaligned/64k     1196079 ns    1196083 ns        540   52.2539MB/s

hikey:/data # export LD_LIBRARY_PATH=/data
hikey:/data # ./bionic-benchmarks --benchmark_filter='BM_string_memcmp*'
Run on (8 X 2.4 MHz CPU s)
Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      30 ns         30 ns   23209918   252.802MB/s
BM_string_memcmp/64                     57 ns         57 ns   12348447   1076.95MB/s
BM_string_memcmp/512                   305 ns        305 ns    2296878   1.56471GB/s
BM_string_memcmp/1024                  572 ns        571 ns    1224426    1.6689GB/s
BM_string_memcmp/8k                   4309 ns       4308 ns     162491   1.77109GB/s
BM_string_memcmp/16k                  9348 ns       9345 ns      74894   1.63285GB/s
BM_string_memcmp/32k                 18329 ns      18322 ns      38249    1.6656GB/s
BM_string_memcmp/64k                 36992 ns      36981 ns      18952   1.65045GB/s
BM_string_memcmp_aligned/8             199 ns        199 ns    3513925   38.3162MB/s
BM_string_memcmp_aligned/64            386 ns        386 ns    1814038   158.192MB/s
BM_string_memcmp_aligned/512          1735 ns       1735 ns     402279   281.502MB/s
BM_string_memcmp_aligned/1024         3204 ns       3202 ns     218761   304.941MB/s
BM_string_memcmp_aligned/8k          25577 ns      25569 ns      27406   305.548MB/s
BM_string_memcmp_aligned/16k         52143 ns      52123 ns      13522   299.769MB/s
BM_string_memcmp_aligned/32k        105169 ns     105127 ns       6637    297.26MB/s
BM_string_memcmp_aligned/64k        206508 ns     206383 ns       3417   302.835MB/s
BM_string_memcmp_unaligned/8           282 ns        282 ns    2482953    27.062MB/s
BM_string_memcmp_unaligned/64          542 ns        541 ns    1298317    112.77MB/s
BM_string_memcmp_unaligned/512        2152 ns       2152 ns     325267   226.915MB/s
BM_string_memcmp_unaligned/1024       4025 ns       4025 ns     173904   242.622MB/s
BM_string_memcmp_unaligned/8k        32276 ns      32271 ns      21818    242.09MB/s
BM_string_memcmp_unaligned/16k       65970 ns      65970 ns      10554   236.851MB/s
BM_string_memcmp_unaligned/32k      131241 ns     131242 ns       5129    238.11MB/s
BM_string_memcmp_unaligned/64k      266159 ns     266160 ns       2661   234.821MB/s
2017-06-26 10:22:40 +02:00