Commit Graph

7 Commits

Author SHA1 Message Date
Sebastian Huber 96ec8f868e aarch64: Sync with ARM-software/optimized-routines
Update AArch64 assembly string routines from:

https://github.com/ARM-software/optimized-routines

commit 0cf84f26b6b8dcad8287fe30a4dcc1fdabd06560
Author: Sebastian Huber <sebastian.huber@embedded-brains.de>
Date:   Thu Jul 27 17:14:57 2023 +0200

    string: Fix corrupt GNU_PROPERTY_TYPE (5) size

    For ELF32 the notes alignment is 4 and not 8.

Add license and copyright information to COPYING.NEWLIB as entry (56).
2023-10-05 14:16:57 +02:00
Wilco Dijkstra df7824d1a4 Fix issue with dst bias in memset
This patch fixes an issue in the previous memset loop change. If the
zva size is >= 256 and there are more than 64 bytes left in the
tail, we could enter the loop and thus need to rebias dst by 32 as
well.

Since no known CPUs use this size this can't be tested natively, so I've
tested it on a simulator initialized with a large zva size.

--
2018-11-08 16:45:19 +00:00
Wilco Dijkstra d80db60066 Adjust writeback in non-zero memset
This fixes an ineffiency in the non-zero memset.  Delaying the writeback
until the end of the loop is slightly faster on some cores - this shows
~5% performance gain on Cortex-A53 when doing large non-zero memsets.

Tested against the GLIBC testsuite.
2018-11-06 14:59:51 +00:00
Wilco Dijkstra 127c38bd44 [AArch64] Rewrite optimized memset.
This is an optimized memset for AArch64.  Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled.  Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration.  Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128.  STP of Q registers is used to reduce codesize without loss of
performance.
2015-07-30 12:51:34 +01:00
Marcus Shawcroft c7806ef76a [AArch64] Reverting recent optimized memset(). 2015-07-15 13:34:58 +01:00
Wilco Dijkstra 3263f90ef7 [AArch64] Optimized memset.
This is an optimized memset for AArch64.  Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled.  Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration.  Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128.  STP of Q registers is used to reduce codesize without loss of
performance.
2015-07-13 13:17:16 +01:00
Marcus Shawcroft 080e96f57c 2013-01-10 Marcus Shawcroft <marcus.shawcroft@linaro.org>
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Add
	    memcpy.c memcpy-stub.c memset.S memset-stub.c strcmp.S
	    strcmp-stub.c.
	    * libc/machine/aarch64/Makefile.in: Regenerated.
	    * libc/machine/aarch64/memcpy-stub.c: New file.
	    * libc/machine/aarch64/memcpy.S: New file.
	    * libc/machine/aarch64/memset-stub.c: New file.
	    * libc/machine/aarch64/memset.S: New file.
	    * libc/machine/aarch64/strcmp.S: New file.
	    * libc/machine/aarch64/strcmp-stub.c: New file.
2013-01-10 12:44:50 +00:00