newlib-cygwin/newlib/libc/machine
Wilco Dijkstra c86063bdc0 Optimized memcmp
This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

ChangeLog:
2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libc/machine/aarch64/memcmp.S (memcmp):
        Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:		153%
Length    1, alignment  1/ 1:		119%
Length    1, alignment  1/ 1:		154%
Length    2, alignment  2/ 2:		121%
Length    2, alignment  2/ 2:		140%
Length    2, alignment  2/ 2:		121%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    4, alignment  4/ 4:		155%
Length    4, alignment  4/ 4:		154%
Length    4, alignment  4/ 4:		161%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    8, alignment  8/ 8:		111%
Length    8, alignment  8/ 8:		130%
Length    8, alignment  8/ 8:		124%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		150%
Length   10, alignment 10/10:		170%
Length   10, alignment 10/10:		137%
Length   10, alignment 10/10:		150%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   12, alignment 12/12:		146%
Length   12, alignment 12/12:		168%
Length   12, alignment 12/12:		156%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		173%
Length   14, alignment 14/14:		167%
Length   14, alignment 14/14:		168%
Length   14, alignment 14/14:		168%
Length   15, alignment 15/15:		168%
Length   15, alignment 15/15:		173%
Length   15, alignment 15/15:		173%
Length    1, alignment  0/ 0:		134%
Length    1, alignment  0/ 0:		127%
Length    1, alignment  0/ 0:		119%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		106%
Length    3, alignment  0/ 0:		82%
Length    3, alignment  0/ 0:		87%
Length    3, alignment  0/ 0:		82%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		122%
Length    5, alignment  0/ 0:		127%
Length    5, alignment  0/ 0:		119%
Length    5, alignment  0/ 0:		127%
Length    6, alignment  0/ 0:		103%
Length    6, alignment  0/ 0:		100%
Length    6, alignment  0/ 0:		100%
Length    7, alignment  0/ 0:		82%
Length    7, alignment  0/ 0:		91%
Length    7, alignment  0/ 0:		87%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		135%
Length   10, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		135%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		135%
Length   13, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  7/ 2:		395%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  0/ 0:		127%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length   64, alignment  0/ 0:		128%
Length   64, alignment  6/ 4:		475%
Length   64, alignment  0/ 0:		131%
Length   64, alignment  0/ 0:		134%
Length   16, alignment  0/ 0:		128%
Length   16, alignment  0/ 0:		119%
Length   16, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  5/ 6:		475%
Length  128, alignment  0/ 0:		130%
Length  128, alignment  0/ 0:		129%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  4/ 8:		545%
Length  256, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		174%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  3/10:		585%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length 1024, alignment  0/ 0:		125%
Length 1024, alignment  2/12:		611%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  0/ 0:		128%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  1/14:		625%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  0/ 0:		125%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  512, alignment  0/ 0:		127%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/16:		125%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/ 0:		125%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment 63/18:		636%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment  0/ 0:		125%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		398%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		477%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  256, alignment  5/10:		543%
Length  256, alignment  5/10:		539%
Length  256, alignment  5/10:		543%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
2017-06-29 20:36:35 +02:00
..
a29k Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
aarch64 Optimized memcmp 2017-06-29 20:36:35 +02:00
arc Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
arm Fix minor issues in memchr NEON implementation 2017-06-07 12:16:15 +02:00
bfin Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
cr16 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
cris Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
crx Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
d10v Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
d30v Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
epiphany Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
fr30 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
frv Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
ft32 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
h8300 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
h8500 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
hppa Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
i386 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
i960 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
iq2000 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
lm32 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
m32c Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
m32r Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
m68hc11 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
m68k Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
m88k Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
mep Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
microblaze Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
mips Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
mn10200 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
mn10300 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
moxie Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
msp430 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
mt Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
nds32 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
necv70 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
nios2 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
or1k Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
powerpc Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
rl78 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
rx Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
sh Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
sparc Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
spu Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
tic4x Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
tic6x Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
tic80 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
v850 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
visium Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
w65 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
x86_64 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
xc16x Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
xscale Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
xstormy16 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
z8k Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
Makefile.am Make newlib manpages (v3) 2016-07-04 14:17:10 +01:00
Makefile.in Regenerate newlib Makefiles 2016-07-04 17:13:55 +01:00
aclocal.m4 2012-12-20 Jeff Johnston <jjohnstn@redhat.com> 2012-12-20 21:10:27 +00:00
configure Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
configure.in Add support for ARC to newlib 2015-11-12 14:14:17 +01:00