Go to file
Wilco Dijkstra c86063bdc0 Optimized memcmp
This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

ChangeLog:
2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libc/machine/aarch64/memcmp.S (memcmp):
        Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:		153%
Length    1, alignment  1/ 1:		119%
Length    1, alignment  1/ 1:		154%
Length    2, alignment  2/ 2:		121%
Length    2, alignment  2/ 2:		140%
Length    2, alignment  2/ 2:		121%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    4, alignment  4/ 4:		155%
Length    4, alignment  4/ 4:		154%
Length    4, alignment  4/ 4:		161%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    8, alignment  8/ 8:		111%
Length    8, alignment  8/ 8:		130%
Length    8, alignment  8/ 8:		124%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		150%
Length   10, alignment 10/10:		170%
Length   10, alignment 10/10:		137%
Length   10, alignment 10/10:		150%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   12, alignment 12/12:		146%
Length   12, alignment 12/12:		168%
Length   12, alignment 12/12:		156%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		173%
Length   14, alignment 14/14:		167%
Length   14, alignment 14/14:		168%
Length   14, alignment 14/14:		168%
Length   15, alignment 15/15:		168%
Length   15, alignment 15/15:		173%
Length   15, alignment 15/15:		173%
Length    1, alignment  0/ 0:		134%
Length    1, alignment  0/ 0:		127%
Length    1, alignment  0/ 0:		119%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		106%
Length    3, alignment  0/ 0:		82%
Length    3, alignment  0/ 0:		87%
Length    3, alignment  0/ 0:		82%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		122%
Length    5, alignment  0/ 0:		127%
Length    5, alignment  0/ 0:		119%
Length    5, alignment  0/ 0:		127%
Length    6, alignment  0/ 0:		103%
Length    6, alignment  0/ 0:		100%
Length    6, alignment  0/ 0:		100%
Length    7, alignment  0/ 0:		82%
Length    7, alignment  0/ 0:		91%
Length    7, alignment  0/ 0:		87%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		135%
Length   10, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		135%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		135%
Length   13, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  7/ 2:		395%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  0/ 0:		127%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length   64, alignment  0/ 0:		128%
Length   64, alignment  6/ 4:		475%
Length   64, alignment  0/ 0:		131%
Length   64, alignment  0/ 0:		134%
Length   16, alignment  0/ 0:		128%
Length   16, alignment  0/ 0:		119%
Length   16, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  5/ 6:		475%
Length  128, alignment  0/ 0:		130%
Length  128, alignment  0/ 0:		129%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  4/ 8:		545%
Length  256, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		174%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  3/10:		585%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length 1024, alignment  0/ 0:		125%
Length 1024, alignment  2/12:		611%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  0/ 0:		128%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  1/14:		625%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  0/ 0:		125%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  512, alignment  0/ 0:		127%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/16:		125%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/ 0:		125%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment 63/18:		636%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment  0/ 0:		125%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		398%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		477%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  256, alignment  5/10:		543%
Length  256, alignment  5/10:		539%
Length  256, alignment  5/10:		543%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
2017-06-29 20:36:35 +02:00
config Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
etc Remove spurious empty line in changelog entry. 2016-03-22 10:29:22 +01:00
include Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
libgloss Add JLI support. 2017-06-14 14:51:22 +02:00
newlib Optimized memcmp 2017-06-29 20:36:35 +02:00
texinfo * texinfo/texinfo.tex: Update to version 2009-03-28.05. 2009-04-21 12:36:46 +00:00
winsup replace shortcut parameter assignments with read loops, run with sh 2017-06-27 09:37:28 +02:00
.drone.yml Continuous Integration: Add Tea CI build configuration. 2016-06-28 13:42:59 +02:00
.gitattributes Add .gitattributes 2015-03-09 20:53:11 +01:00
.gitignore Add *.swp (Vim swap files) to .gitignore 2015-03-17 12:03:30 +01:00
COPYING
COPYING.LIB Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
COPYING.LIBGLOSS Add Synopsys license for newlib and libgloss 2015-11-12 14:16:32 +01:00
COPYING.NEWLIB Update COPYING.NEWLIB appropriately. 2017-05-25 12:52:18 -04:00
COPYING3 * COPYING3: New file. Contains version 3 of the GNU General Public License. 2007-07-17 13:50:23 +00:00
COPYING3.LIB * COPYING3: New file. Contains version 3 of the GNU General Public License. 2007-07-17 13:50:23 +00:00
ChangeLog Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
MAINTAINERS MAINTAINERS: clarify policy with config/ (and other top level files) 2012-05-12 03:10:17 +00:00
Makefile.def Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
Makefile.in Add missing OBJCOPY variable to Makefile.in 2016-06-26 17:27:03 +01:00
Makefile.tpl Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
README
README-maintainer-mode Cleanups after the update to Autoconf 2.64, Automake 1.11. 2009-08-22 17:08:06 +00:00
compile Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
config-ml.in Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
config.guess Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
config.rpath Remove freebsd1 from libtool.m4 macros and config.rpath. 2011-02-13 21:00:08 +00:00
config.sub Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
configure Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
configure.ac Sync with upstream gcc. 2016-06-23 15:54:55 -04:00
depcomp Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
djunpack.bat * djunpack.bat: Use ".." quoting in Sed command, for the sake of 2009-03-27 13:37:09 +00:00
install-sh Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
libtool.m4 Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
ltgcc.m4 * libtool.m4: Update to libtool 2.2.6. 2008-09-29 15:28:14 +00:00
ltmain.sh PR target/59788 2014-02-05 13:17:47 +00:00
ltoptions.m4 Sync Libtool from GCC. 2010-01-09 21:11:32 +00:00
ltsugar.m4 * libtool.m4: Update to libtool 2.2.6. 2008-09-29 15:28:14 +00:00
ltversion.m4 Sync Libtool from GCC. 2010-01-09 21:11:32 +00:00
lt~obsolete.m4 Sync Libtool from GCC. 2010-01-09 21:11:32 +00:00
makefile.vms
missing Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
mkdep
mkinstalldirs Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
move-if-change Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00
setup.com 2009-09-01 Tristan Gingold <gingold@adacore.com> 2009-09-01 13:38:26 +00:00
src-release * src-release (do-proto-toplevel): Support subdir-path-prefixed 2013-10-15 20:45:52 +00:00
symlink-tree
ylwrap Sync toplevel with upstream GCC. 2016-03-22 10:25:20 +01:00

README

		   README for GNU development tools

This directory contains various GNU compilers, assemblers, linkers, 
debuggers, etc., plus their support routines, definitions, and documentation.

If you are receiving this as part of a GDB release, see the file gdb/README.
If with a binutils release, see binutils/README;  if with a libg++ release,
see libg++/README, etc.  That'll give you info about this
package -- supported targets, how to use it, how to report bugs, etc.

It is now possible to automatically configure and build a variety of
tools with one command.  To build all of the tools contained herein,
run the ``configure'' script here, e.g.:

	./configure 
	make

To install them (by default in /usr/local/bin, /usr/local/lib, etc),
then do:
	make install

(If the configure script can't determine your type of computer, give it
the name as an argument, for instance ``./configure sun4''.  You can
use the script ``config.sub'' to test whether a name is recognized; if
it is, config.sub translates it to a triplet specifying CPU, vendor,
and OS.)

If you have more than one compiler on your system, it is often best to
explicitly set CC in the environment before running configure, and to
also set CC when running make.  For example (assuming sh/bash/ksh):

	CC=gcc ./configure
	make

A similar example using csh:

	setenv CC gcc
	./configure
	make

Much of the code and documentation enclosed is copyright by
the Free Software Foundation, Inc.  See the file COPYING or
COPYING.LIB in the various directories, for a description of the
GNU General Public License terms under which you can copy the files.

REPORTING BUGS: Again, see gdb/README, binutils/README, etc., for info
on where and how to report problems.