newlib-cygwin/newlib/libc/machine/aarch64
Wilco Dijkstra c86063bdc0 Optimized memcmp
This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

ChangeLog:
2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * newlib/libc/machine/aarch64/memcmp.S (memcmp):
        Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:		153%
Length    1, alignment  1/ 1:		119%
Length    1, alignment  1/ 1:		154%
Length    2, alignment  2/ 2:		121%
Length    2, alignment  2/ 2:		140%
Length    2, alignment  2/ 2:		121%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    3, alignment  3/ 3:		105%
Length    4, alignment  4/ 4:		155%
Length    4, alignment  4/ 4:		154%
Length    4, alignment  4/ 4:		161%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    5, alignment  5/ 5:		173%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    6, alignment  6/ 6:		145%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    7, alignment  7/ 7:		125%
Length    8, alignment  8/ 8:		111%
Length    8, alignment  8/ 8:		130%
Length    8, alignment  8/ 8:		124%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		160%
Length    9, alignment  9/ 9:		150%
Length   10, alignment 10/10:		170%
Length   10, alignment 10/10:		137%
Length   10, alignment 10/10:		150%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   11, alignment 11/11:		160%
Length   12, alignment 12/12:		146%
Length   12, alignment 12/12:		168%
Length   12, alignment 12/12:		156%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		167%
Length   13, alignment 13/13:		173%
Length   14, alignment 14/14:		167%
Length   14, alignment 14/14:		168%
Length   14, alignment 14/14:		168%
Length   15, alignment 15/15:		168%
Length   15, alignment 15/15:		173%
Length   15, alignment 15/15:		173%
Length    1, alignment  0/ 0:		134%
Length    1, alignment  0/ 0:		127%
Length    1, alignment  0/ 0:		119%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		94%
Length    2, alignment  0/ 0:		106%
Length    3, alignment  0/ 0:		82%
Length    3, alignment  0/ 0:		87%
Length    3, alignment  0/ 0:		82%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		122%
Length    5, alignment  0/ 0:		127%
Length    5, alignment  0/ 0:		119%
Length    5, alignment  0/ 0:		127%
Length    6, alignment  0/ 0:		103%
Length    6, alignment  0/ 0:		100%
Length    6, alignment  0/ 0:		100%
Length    7, alignment  0/ 0:		82%
Length    7, alignment  0/ 0:		91%
Length    7, alignment  0/ 0:		87%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length    9, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		136%
Length   10, alignment  0/ 0:		135%
Length   10, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		136%
Length   11, alignment  0/ 0:		135%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   12, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		135%
Length   13, alignment  0/ 0:		136%
Length   13, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   14, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length   15, alignment  0/ 0:		136%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length    4, alignment  0/ 0:		115%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  7/ 2:		395%
Length   32, alignment  0/ 0:		127%
Length   32, alignment  0/ 0:		127%
Length    8, alignment  0/ 0:		111%
Length    8, alignment  0/ 0:		124%
Length    8, alignment  0/ 0:		124%
Length   64, alignment  0/ 0:		128%
Length   64, alignment  6/ 4:		475%
Length   64, alignment  0/ 0:		131%
Length   64, alignment  0/ 0:		134%
Length   16, alignment  0/ 0:		128%
Length   16, alignment  0/ 0:		119%
Length   16, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  5/ 6:		475%
Length  128, alignment  0/ 0:		130%
Length  128, alignment  0/ 0:		129%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length   32, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  4/ 8:		545%
Length  256, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		171%
Length   64, alignment  0/ 0:		174%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  3/10:		585%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  128, alignment  0/ 0:		129%
Length  128, alignment  0/ 0:		128%
Length  128, alignment  0/ 0:		129%
Length 1024, alignment  0/ 0:		125%
Length 1024, alignment  2/12:		611%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length  256, alignment  0/ 0:		128%
Length  256, alignment  0/ 0:		127%
Length  256, alignment  0/ 0:		128%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  1/14:		625%
Length 2048, alignment  0/ 0:		125%
Length 2048, alignment  0/ 0:		125%
Length  512, alignment  0/ 0:		126%
Length  512, alignment  0/ 0:		127%
Length  512, alignment  0/ 0:		127%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/16:		125%
Length 4096, alignment  0/ 0:		125%
Length 4096, alignment  0/ 0:		125%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 1024, alignment  0/ 0:		126%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment 63/18:		636%
Length 8192, alignment  0/ 0:		125%
Length 8192, alignment  0/ 0:		125%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   16, alignment  1/ 2:		317%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		395%
Length   32, alignment  2/ 4:		398%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		475%
Length   64, alignment  3/ 6:		477%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  128, alignment  4/ 8:		479%
Length  256, alignment  5/10:		543%
Length  256, alignment  5/10:		539%
Length  256, alignment  5/10:		543%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length  512, alignment  6/12:		585%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
Length 1024, alignment  7/14:		611%
2017-06-29 20:36:35 +02:00
..
machine Use __machine_*_t_defined for internal types 2016-04-15 14:51:39 +02:00
Makefile.am Add rawmemchr 2016-05-20 10:47:02 +02:00
Makefile.in Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
aclocal.m4 2012-12-20 Jeff Johnston <jjohnstn@redhat.com> 2012-12-20 21:10:27 +00:00
configure Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
configure.in 2012-09-26 Ian Bolton <ian.bolton@arm.com> 2012-09-26 20:06:50 +00:00
memchr-stub.c [aarch64] Add memchr. 2014-07-11 09:10:50 +00:00
memchr.S * libc/machine/aarch64/memchr.S: Add check for zero-sized buffer. 2014-08-19 10:44:44 +00:00
memcmp-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 13:02:19 +00:00
memcmp.S Optimized memcmp 2017-06-29 20:36:35 +02:00
memcpy-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:44:50 +00:00
memcpy.S AArch64: Tune memcpy 2015-11-12 13:38:39 +01:00
memmove-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:54:39 +00:00
memmove.S [AArch64] Optimized memmove. 2015-07-13 13:03:02 +01:00
memset-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:44:50 +00:00
memset.S [AArch64] Rewrite optimized memset. 2015-07-30 12:51:34 +01:00
rawmemchr-stub.c Add rawmemchr 2016-05-20 10:47:02 +02:00
rawmemchr.S Add rawmemchr 2016-05-20 10:47:02 +02:00
setjmp.S 2012-09-26 Ian Bolton <ian.bolton@arm.com> 2012-09-26 20:06:50 +00:00
stpcpy-stub.c * libc/machine/aarch64/strcpy.S (strcpy): Further performance 2015-01-06 09:57:55 +00:00
stpcpy.S * libc/machine/aarch64/strcpy.S (strcpy): Further performance 2015-01-06 09:57:55 +00:00
strchr-stub.c * libc/machine/aarch64/strchr.S: New file 2014-06-10 14:04:31 +00:00
strchr.S * libc/machine/aarch64/strchr.S: New file 2014-06-10 14:04:31 +00:00
strchrnul-stub.c * libc/machine/aarch64/strchrnul.S: New file. 2014-06-11 10:42:54 +00:00
strchrnul.S * libc/machine/aarch64/strchrnul.S (vrepmask): Use a call-clobbered 2014-12-10 09:35:10 +00:00
strcmp-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:44:50 +00:00
strcmp.S 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:44:50 +00:00
strcpy-stub.c * libc/machine/aarch64/strcpy.S: New file. 2014-11-10 14:57:37 +00:00
strcpy.S * libc/machine/aarch64/strcpy.S (strcpy): Further performance 2015-01-06 09:57:55 +00:00
strlen-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:57:11 +00:00
strlen.S * libc/machine/aarch64/strlen.S (strlen): Improve performance. 2015-01-20 10:11:56 +00:00
strncmp-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 12:51:13 +00:00
strncmp.S 2013-01-17 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-17 14:53:32 +00:00
strnlen-stub.c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-10 13:00:40 +00:00
strnlen.S 2013-01-17 Marcus Shawcroft <marcus.shawcroft@linaro.org> 2013-01-17 14:52:37 +00:00
strrchr-stub.c * libc/machine/aarch64/strrchr.S: New file. 2014-12-08 15:21:42 +00:00
strrchr.S * libc/machine/aarch64/strrchr.S: New file. 2014-12-08 15:21:42 +00:00