4
0
mirror of git://sourceware.org/git/newlib-cygwin.git synced 2025-01-18 20:39:33 +08:00
Wilco Dijkstra c86063bdc0 Optimized memcmp
This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

ChangeLog:
2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libc/machine/aarch64/memcmp.S (memcmp):
        Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:		153%
Length    1, alignment  1/ 1:		119%
Length    1, alignment  1/ 1:		154%
Length    2, alignment  2/ 2:		121%
Length    2, alignment  2/ 2:		140%
Length    2, alignment  2/ 2:		121%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    4, alignment  4/ 4:		155%
Length    4, alignment  4/ 4:		154%
Length    4, alignment  4/ 4:		161%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    8, alignment  8/ 8:		111%
Length    8, alignment  8/ 8:		130%
Length    8, alignment  8/ 8:		124%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		150%
Length   10, alignment 10/10:		170%
Length   10, alignment 10/10:		137%
Length   10, alignment 10/10:		150%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   12, alignment 12/12:		146%
Length   12, alignment 12/12:		168%
Length   12, alignment 12/12:		156%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		173%
Length   14, alignment 14/14:		167%
Length   14, alignment 14/14:		168%
Length   14, alignment 14/14:		168%
Length   15, alignment 15/15:		168%
Length   15, alignment 15/15:		173%
Length   15, alignment 15/15:		173%
Length    1, alignment  0/ 0:		134%
Length    1, alignment  0/ 0:		127%
Length    1, alignment  0/ 0:		119%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		106%
Length    3, alignment  0/ 0:		82%
Length    3, alignment  0/ 0:		87%
Length    3, alignment  0/ 0:		82%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		122%
Length    5, alignment  0/ 0:		127%
Length    5, alignment  0/ 0:		119%
Length    5, alignment  0/ 0:		127%
Length    6, alignment  0/ 0:		103%
Length    6, alignment  0/ 0:		100%
Length    6, alignment  0/ 0:		100%
Length    7, alignment  0/ 0:		82%
Length    7, alignment  0/ 0:		91%
Length    7, alignment  0/ 0:		87%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		135%
Length   10, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		135%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		135%
Length   13, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  7/ 2:		395%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  0/ 0:		127%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length   64, alignment  0/ 0:		128%
Length   64, alignment  6/ 4:		475%
Length   64, alignment  0/ 0:		131%
Length   64, alignment  0/ 0:		134%
Length   16, alignment  0/ 0:		128%
Length   16, alignment  0/ 0:		119%
Length   16, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  5/ 6:		475%
Length  128, alignment  0/ 0:		130%
Length  128, alignment  0/ 0:		129%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  4/ 8:		545%
Length  256, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		174%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  3/10:		585%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length 1024, alignment  0/ 0:		125%
Length 1024, alignment  2/12:		611%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  0/ 0:		128%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  1/14:		625%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  0/ 0:		125%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  512, alignment  0/ 0:		127%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/16:		125%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/ 0:		125%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment 63/18:		636%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment  0/ 0:		125%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		398%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		477%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  256, alignment  5/10:		543%
Length  256, alignment  5/10:		539%
Length  256, alignment  5/10:		543%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
2017-06-29 20:36:35 +02:00
2016-06-23 15:54:55 -04:00
2016-06-23 15:54:55 -04:00
2017-06-14 14:51:22 +02:00
2017-06-29 20:36:35 +02:00
2015-03-09 20:53:11 +01:00
2016-06-23 15:54:55 -04:00
2016-03-22 10:25:20 +01:00
2016-03-22 10:25:20 +01:00
2016-06-23 15:54:55 -04:00
2016-06-23 15:54:55 -04:00
2016-06-23 15:54:55 -04:00
2016-06-23 15:54:55 -04:00
2016-03-22 10:25:20 +01:00
2016-03-22 10:25:20 +01:00
2016-03-22 10:25:20 +01:00
2016-03-22 10:25:20 +01:00
2010-01-09 21:11:32 +00:00
2014-02-05 13:17:47 +00:00
2010-01-09 21:11:32 +00:00
2010-01-09 21:11:32 +00:00
2016-06-23 15:54:55 -04:00
2016-06-23 15:54:55 -04:00
2016-03-22 10:25:20 +01:00
2016-03-22 10:25:20 +01:00
2016-03-22 10:25:20 +01:00

		   README for GNU development tools

This directory contains various GNU compilers, assemblers, linkers, 
debuggers, etc., plus their support routines, definitions, and documentation.

If you are receiving this as part of a GDB release, see the file gdb/README.
If with a binutils release, see binutils/README;  if with a libg++ release,
see libg++/README, etc.  That'll give you info about this
package -- supported targets, how to use it, how to report bugs, etc.

It is now possible to automatically configure and build a variety of
tools with one command.  To build all of the tools contained herein,
run the ``configure'' script here, e.g.:

	./configure 
	make

To install them (by default in /usr/local/bin, /usr/local/lib, etc),
then do:
	make install

(If the configure script can't determine your type of computer, give it
the name as an argument, for instance ``./configure sun4''.  You can
use the script ``config.sub'' to test whether a name is recognized; if
it is, config.sub translates it to a triplet specifying CPU, vendor,
and OS.)

If you have more than one compiler on your system, it is often best to
explicitly set CC in the environment before running configure, and to
also set CC when running make.  For example (assuming sh/bash/ksh):

	CC=gcc ./configure
	make

A similar example using csh:

	setenv CC gcc
	./configure
	make

Much of the code and documentation enclosed is copyright by
the Free Software Foundation, Inc.  See the file COPYING or
COPYING.LIB in the various directories, for a description of the
GNU General Public License terms under which you can copy the files.

REPORTING BUGS: Again, see gdb/README, binutils/README, etc., for info
on where and how to report problems.
Description
No description provided
Readme 153 MiB
Languages
C 61.5%
Makefile 19.6%
C++ 10.4%
Assembly 4.9%
M4 1%
Other 2.4%