Sebastian Pop
9938a64ca9
aarch64: optimize the unaligned case of memcmp
...
This brings to newlib a performance improvement that we developed in Bionic
libc. That change has been submitted for review to Bionic libc:
https://android-review.googlesource.com/418279
A similar patch has been submitted for review in glibc:
https://sourceware.org/ml/libc-alpha/2017-06/msg01143.html
Patch written by Vikas Sinha and Sebastian Pop.
The performance was measured on the bionic-benchmarks on a hikey (aarch64 8xA53)
board. There was no performance change to the existing benchmark
and a performance improvement on the new benchmark for memcmp
on the unaligned side. The new benchmark has been submitted for
review at https://android-review.googlesource.com/414860
The overall performance improves by 18% for the small data set 8
and the performance improves by 450% for the large data set 64k.
The base is with the libc from /system/lib64. The bionic libc
with this patch is in /data.
hikey:/data # export LD_LIBRARY_PATH=/system/lib64
hikey:/data # ./bionic-benchmarks --benchmark_filter='BM_string_memcmp*'
Run on (8 X 2.4 MHz CPU s)
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8 30 ns 30 ns 22955680 251.07MB/s
BM_string_memcmp/64 57 ns 57 ns 12349184 1076.99MB/s
BM_string_memcmp/512 305 ns 305 ns 2297163 1.56496GB/s
BM_string_memcmp/1024 571 ns 571 ns 1225211 1.66912GB/s
BM_string_memcmp/8k 4307 ns 4306 ns 162562 1.77177GB/s
BM_string_memcmp/16k 8676 ns 8675 ns 80676 1.75887GB/s
BM_string_memcmp/32k 19233 ns 19230 ns 36394 1.58695GB/s
BM_string_memcmp/64k 36986 ns 36984 ns 18952 1.65029GB/s
BM_string_memcmp_aligned/8 199 ns 199 ns 3519166 38.3336MB/s
BM_string_memcmp_aligned/64 386 ns 386 ns 1810734 158.073MB/s
BM_string_memcmp_aligned/512 1735 ns 1734 ns 403981 281.525MB/s
BM_string_memcmp_aligned/1024 3200 ns 3200 ns 218838 305.151MB/s
BM_string_memcmp_aligned/8k 25084 ns 25080 ns 28180 311.507MB/s
BM_string_memcmp_aligned/16k 51730 ns 51729 ns 13521 302.057MB/s
BM_string_memcmp_aligned/32k 103228 ns 103228 ns 6782 302.727MB/s
BM_string_memcmp_aligned/64k 207117 ns 207087 ns 3450 301.806MB/s
BM_string_memcmp_unaligned/8 339 ns 339 ns 2070998 22.5302MB/s
BM_string_memcmp_unaligned/64 1392 ns 1392 ns 502796 43.8454MB/s
BM_string_memcmp_unaligned/512 9194 ns 9194 ns 76133 53.1104MB/s
BM_string_memcmp_unaligned/1024 18325 ns 18323 ns 38206 53.2963MB/s
BM_string_memcmp_unaligned/8k 148579 ns 148574 ns 4713 52.5831MB/s
BM_string_memcmp_unaligned/16k 298169 ns 298120 ns 2344 52.4118MB/s
BM_string_memcmp_unaligned/32k 598813 ns 598797 ns 1085 52.188MB/s
BM_string_memcmp_unaligned/64k 1196079 ns 1196083 ns 540 52.2539MB/s
hikey:/data # export LD_LIBRARY_PATH=/data
hikey:/data # ./bionic-benchmarks --benchmark_filter='BM_string_memcmp*'
Run on (8 X 2.4 MHz CPU s)
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8 30 ns 30 ns 23209918 252.802MB/s
BM_string_memcmp/64 57 ns 57 ns 12348447 1076.95MB/s
BM_string_memcmp/512 305 ns 305 ns 2296878 1.56471GB/s
BM_string_memcmp/1024 572 ns 571 ns 1224426 1.6689GB/s
BM_string_memcmp/8k 4309 ns 4308 ns 162491 1.77109GB/s
BM_string_memcmp/16k 9348 ns 9345 ns 74894 1.63285GB/s
BM_string_memcmp/32k 18329 ns 18322 ns 38249 1.6656GB/s
BM_string_memcmp/64k 36992 ns 36981 ns 18952 1.65045GB/s
BM_string_memcmp_aligned/8 199 ns 199 ns 3513925 38.3162MB/s
BM_string_memcmp_aligned/64 386 ns 386 ns 1814038 158.192MB/s
BM_string_memcmp_aligned/512 1735 ns 1735 ns 402279 281.502MB/s
BM_string_memcmp_aligned/1024 3204 ns 3202 ns 218761 304.941MB/s
BM_string_memcmp_aligned/8k 25577 ns 25569 ns 27406 305.548MB/s
BM_string_memcmp_aligned/16k 52143 ns 52123 ns 13522 299.769MB/s
BM_string_memcmp_aligned/32k 105169 ns 105127 ns 6637 297.26MB/s
BM_string_memcmp_aligned/64k 206508 ns 206383 ns 3417 302.835MB/s
BM_string_memcmp_unaligned/8 282 ns 282 ns 2482953 27.062MB/s
BM_string_memcmp_unaligned/64 542 ns 541 ns 1298317 112.77MB/s
BM_string_memcmp_unaligned/512 2152 ns 2152 ns 325267 226.915MB/s
BM_string_memcmp_unaligned/1024 4025 ns 4025 ns 173904 242.622MB/s
BM_string_memcmp_unaligned/8k 32276 ns 32271 ns 21818 242.09MB/s
BM_string_memcmp_unaligned/16k 65970 ns 65970 ns 10554 236.851MB/s
BM_string_memcmp_unaligned/32k 131241 ns 131242 ns 5129 238.11MB/s
BM_string_memcmp_unaligned/64k 266159 ns 266160 ns 2661 234.821MB/s
2017-06-26 10:22:40 +02:00
Jeff Johnston
61f181d6b8
Bump release to 2.5.0 for yearly snapshot.
2016-12-22 21:33:54 -05:00
Wilco Dijkstra
e7b1ee2ea6
Add rawmemchr
...
Marcus Shawcroft wrote:
> This patch appears to have been munged by the mail system, can you
> repost as an attachment please.
Sure, I've attached the patch.
Wilco
Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it is the
fastest way to search for '\0', and use memchr with an infinite size for other cases.
This is 3x faster for large sizes.
ChangeLog:
2016-04-22 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libc/machine/aarch64/Makefile.in: Add rawmemchr.S and
rawmemchr-stub.c.
* newlib/libc/machine/aarch64/Makefile.am: Likewise.
* newlib/libc/machine/aarch64/rawmemchr.S (rawmemchr): Add rawmemchr.
* newlib/libc/machine/aarch64/rawmemchr-stub.c (rawmemchr): Likewise.
2016-05-20 10:47:02 +02:00
Sebastian Huber
8a5af1a184
Use __machine_*_t_defined for internal types
...
Newlib defines defaults for internal types via <sys/_types.h> and uses
<machine/_types.h> to let targets define their own type if necessary.
Previously for example
#ifndef __dev_t_defined
typedef short __dev_t;
#endif
However, the __*_t_defined pattern conflicts with the glibc type guard
pattern for user types, e.g. dev_t in this example. Introduce a
__machine_*_t_defined pattern for internal types (defined by
<machine/_types.h>, used by <sys/_types.h>). For example
#ifndef __machine_dev_t_defined
typedef short __dev_t;
#endif
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
2016-04-15 14:51:39 +02:00
Jeff Johnston
fbc4a0827b
Bump up newlib version to 2.4.0 due to feature test refactoring
2016-03-29 17:33:42 -04:00
Jeff Johnston
ad7b3cde9c
Regenerate files for newlib 2.3.0.
2015-12-21 21:32:11 -05:00
Wilco Dijkstra
3c8636acf6
AArch64: Tune memcpy
...
* newlib/libc/machine/aarch64/memcpy.S (memcpy):
Further tuning for performance.
2015-11-12 13:38:39 +01:00
Wilco Dijkstra
127c38bd44
[AArch64] Rewrite optimized memset.
...
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
2015-07-30 12:51:34 +01:00
Marcus Shawcroft
c7806ef76a
[AArch64] Reverting recent optimized memset().
2015-07-15 13:34:58 +01:00
Wilco Dijkstra
3263f90ef7
[AArch64] Optimized memset.
...
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
2015-07-13 13:17:16 +01:00
Wilco Dijkstra
b295f6ba44
[AArch64] Optimized memcpy.
...
This is an optimized memcpy for AArch64. Copies are split into 3 main
cases: small copies of up to 16 bytes, medium copies of 17..96 bytes
which are fully unrolled. Large copies of more than 96 bytes align
the destination and use an unrolled loop processing 64 bytes per
iteration. In order to share code with memmove, small and medium
copies read all data before writing, allowing any kind of overlap. On
a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.
2015-07-13 13:09:02 +01:00
Wilco Dijkstra
9503c7f275
[AArch64] Optimized memmove.
...
This is an optimized memmove for AArch64. All copies of up to 96
bytes and all backward copies are done by the new memcpy. The only
remaining case is large forward copies which are done in the same way
as the memcpy loop, but copying from the end rather than the start.
2015-07-13 13:03:02 +01:00
Corinna Vinschen
086cd00d24
* libc/machine/aarch64/strlen.S (strlen): Improve performance.
2015-01-20 10:11:56 +00:00
Richard Earnshaw
6a35dbf342
* libc/machine/aarch64/strcpy.S (strcpy): Further performance
...
improvements. Adjust to allow building as stpcpy.
* libc/machine/aarch64/stpcpy.S: New file.
* libc/machine/aarch64/stpcpy-stub.c: New file.
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Build stpcpy.
* libc/machine/aarch64/Makefile.in: Regenerated.
2015-01-06 09:57:55 +00:00
Jeff Johnston
0615b4bb5f
2014-12-18 Jeff Johnston <jjohnstn@redhat.com>
...
* NEWS: Update with 2.2.0 info.
* README: Ditto.
* acinclude.m4: Change version number to 2.2.0.
* libc/libc.texinfo: Ditto.
* libm/libm.texinfo: Ditto.
* configure: Regenerated.
* Makefile.in: Regenerated.
* doc/configure: Ditto.
* libc/*/configure: Ditto.
* libm/*/configure: Ditto.
* libc/sys/linux/shared.ld: Add VERS_2.2
2014-12-18 20:30:11 +00:00
Richard Earnshaw
52edca9f86
* libc/machine/aarch64/strcpy.S: Improve handling of short strings.
2014-12-16 15:48:58 +00:00
Richard Earnshaw
8608e14a3b
* libc/machine/aarch64/strchrnul.S (vrepmask): Use a call-clobbered
...
register.
2014-12-10 09:35:10 +00:00
Richard Earnshaw
c53c2915a7
* libc/machine/aarch64/strrchr.S: New file.
...
* libc/machine/aarch64/strrchr-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
2014-12-08 15:21:42 +00:00
Richard Earnshaw
fbb8f1a2c7
* libc/machine/aarch64/strcpy.S: New file.
...
* libc/machine/aarch64/strcpy-stub.S: New file.
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Add new files.
* libc/machine/aarch64/Makefile.in: Regenerate.
2014-11-10 14:57:37 +00:00
Richard Earnshaw
59c3d5a1a4
* libc/machine/aarch64/memchr.S: Add check for zero-sized buffer.
2014-08-19 10:44:44 +00:00
Richard Earnshaw
87375c75b3
[aarch64] Add memchr.
...
2014-07-11 K�vin Petit <kevin.petit@arm.com>
* libc/machine/aarch64/memchr.S: New file.
* libc/machine/aarch64/memchr-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add the new files.
* libc/machine/aarch64/Makefile.in: Regenerated.
2014-07-11 09:10:50 +00:00
Richard Earnshaw
deda48a9fb
* libc/machine/aarch64/strchrnul.S: New file.
...
* libc/machine/aarch64/strchrnul-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
2014-06-11 10:42:54 +00:00
Richard Earnshaw
5efd066df2
* libc/machine/aarch64/strchr.S: New file
...
* libc/machine/aarch64/strchr-stub.c: New file
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
2014-06-10 14:04:31 +00:00
Jeff Johnston
5ac847c629
2013-12-23 Jeff Johnston <jjohnstn@redhat.com>
...
* NEWS: Update with 2.1.0 info.
* README: Ditto.
* acinclude.m4: Change version number to 2.1.0.
* aclocal.m4: Regenerated.
* configure: Ditto.
* Makefile.in: Regenerated.
* doc/aclocal.m4: Ditto.
* doc/configure: Ditto.
* libc/*/aclocal.m4: Ditto.
* libc/*/configure: Ditto.
* libc/libc.texinfo: Ditto.
* libm/*/aclocal.m4: Ditto.
* libm/*/configure: Ditto.
* libm/libm.texinfo: Ditto.
* libc/sys/linux/shared.ld: Add VERS_2.1
2013-12-23 20:45:10 +00:00
Marcus Shawcroft
450fe1bfa3
2013-01-17 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/strncmp.S: Correct arithmetic for
argument N values close to the maximum representable
value in an unsigned 64 bit value.
2013-01-17 14:53:32 +00:00
Marcus Shawcroft
78f66de6ce
2013-01-17 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/strnlen.S: Correct arithmetic for
argument N values close to the maximum representable
value in an unsigned 64 bit value.
2013-01-17 14:52:37 +00:00
Marcus Shawcroft
211f1ec717
2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Add
memcmp-stub.c and memcmp.S
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/memcmp-stub.c: New file.
* libc/machine/aarch64/memcmp.S: New file.
2013-01-10 13:02:19 +00:00
Marcus Shawcroft
ba8f05bcf5
2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Add
strnlen-stub.c and strnlen.S
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strnlen-stub.c: New file.
* libc/machine/aarch64/strnlen.S: New file.
2013-01-10 13:00:40 +00:00
Marcus Shawcroft
82c3d37d07
2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES):
Add strlen.S and strlen-stub.c.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strlen-stub.c: New file.
* libc/machine/aarch64/strlen.S: New file.
2013-01-10 12:57:11 +00:00
Marcus Shawcroft
a8907bda23
2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES):
Add memmove.S and memmove-stub.c.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/memmove-stub.c: New file.
* libc/machine/aarch64/memmove.S: New file.
2013-01-10 12:54:39 +00:00
Marcus Shawcroft
2edd103558
2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Re-ordered.
Add strncmp.S and strncmp-stub.c.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strncmp-stub.c: New file.
* libc/machine/aarch64/strncmp.S: New file.
2013-01-10 12:51:13 +00:00
Marcus Shawcroft
080e96f57c
2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
...
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Add
memcpy.c memcpy-stub.c memset.S memset-stub.c strcmp.S
strcmp-stub.c.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/memcpy-stub.c: New file.
* libc/machine/aarch64/memcpy.S: New file.
* libc/machine/aarch64/memset-stub.c: New file.
* libc/machine/aarch64/memset.S: New file.
* libc/machine/aarch64/strcmp.S: New file.
* libc/machine/aarch64/strcmp-stub.c: New file.
2013-01-10 12:44:50 +00:00
Jeff Johnston
f2d223bd0f
2012-12-20 Jeff Johnston <jjohnstn@redhat.com>
...
* NEWS: Update with 2.0.0 info.
* README: Ditto.
* acinclude.m4: Change version number to 2.0.0.
* aclocal.m4: Regenerated.
* configure: Ditto.
* Makefile.in: Regenerated.
* doc/aclocal.m4: Ditto.
* doc/configure: Ditto.
* libc/*/aclocal.m4: Ditto.
* libc/*/configure: Ditto.
* libc/libc.texinfo: Ditto.
* libm/*/aclocal.m4: Ditto.
* libm/*/configure: Ditto.
* libm/libm.texinfo: Ditto.
* libc/sys/linux/shared.ld: Add VERS_2.0
2012-12-20 21:10:27 +00:00
Jeff Johnston
d7281d547f
2012-12-14 Yufeng Zhang <yufeng.zhang@arm.com>
...
* libc/machine/aarch64/machine/_types.h: New file; define _ssize_t
as long.
2012-12-14 20:45:51 +00:00
Jeff Johnston
c3fe5bf771
2012-09-26 Ian Bolton <ian.bolton@arm.com>
...
Jim MacArthur <jim.macarthur@arm.com>
Marcus Shawcroft <marcus.shawcroft@arm.com>
Nigel Stephens <nigel.stephens@arm.com>
Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Richard Earnshaw <rearnsha@arm.com>
Sofiane Naci <sofiane.naci@arm.com>
Tejas Belagod <tejas.belagod@arm.com>
Yufeng Zhang <yufeng.zhang@arm.com>
* configure.host: Add AArch64.
* libc/include/machine/ieeefp.h: Add AArch64.
* libc/include/machine/setjmp.h: Add AArch64.
* libc/include/machine/time.h: Add AArch64.
* libc/include/sys/config.h: Add AArch64.
* libc/machine/aarch64/Makefile.am: New file.
* libc/machine/aarch64/Makefile.in: Generated.
* libc/machine/aarch64/aclocal.m4: Generated.
* libc/machine/aarch64/configure: Generated.
* libc/machine/aarch64/configure.in: New file.
* libc/machine/aarch64/setjmp.S: New file.
* libc/machine/configure.in: Add AArch64.
* libc/machine/configure: Re-generated.
* libm/machine/aarch64/Makefile.am: New file.
* libm/machine/aarch64/Makefile.in: Generated.
* libm/machine/aarch64/aclocal.m4: Generated.
* libm/machine/aarch64/configure: Generated.
* libm/machine/aarch64/configure.in: New file.
* libm/machine/aarch64/s_ceil.c: New file.
* libm/machine/aarch64/s_floor.c: New file.
* libm/machine/aarch64/s_fma.c: New file.
* libm/machine/aarch64/s_fmax.c: New file.
* libm/machine/aarch64/s_fmin.c: New file.
* libm/machine/aarch64/s_llrint.c: New file.
* libm/machine/aarch64/s_llround.c: New file.
* libm/machine/aarch64/s_lrint.c: New file.
* libm/machine/aarch64/s_lround.c: New file.
* libm/machine/aarch64/s_nearbyint.c: New file.
* libm/machine/aarch64/s_rint.c: New file.
* libm/machine/aarch64/s_round.c: New file.
* libm/machine/aarch64/s_trunc.c: New file.
* libm/machine/aarch64/sf_ceil.c: New file.
* libm/machine/aarch64/sf_floor.c: New file.
* libm/machine/aarch64/sf_fma.c: New file.
* libm/machine/aarch64/sf_fmax.c: New file.
* libm/machine/aarch64/sf_fmin.c: New file.
* libm/machine/aarch64/sf_llrint.c: New file.
* libm/machine/aarch64/sf_llround.c: New file.
* libm/machine/aarch64/sf_lrint.c: New file.
* libm/machine/aarch64/sf_lround.c: New file.
* libm/machine/aarch64/sf_nearbyint.c: New file.
* libm/machine/aarch64/sf_rint.c: New file.
* libm/machine/aarch64/sf_round.c: New file.
* libm/machine/aarch64/sf_trunc.c: New file.
* libm/machine/configure.in: Add AArch64.
* libm/machine/configure: Re-generated.
2012-09-26 20:06:50 +00:00