GCC for ARC has been updated to provide consistent naming of preprocessor
definitions for different optional architecture features:
* __ARC_BARREL_SHIFTER__ instead of __Xbarrel_shifter for
-mbarrel-shifter
* __ARC_LL64__ instead of __LL64__ for -mll64
* __ARCEM__ instead of __EM__ for -mcpu=arcem
* __ARCHS__ instead of __HS__ for -mcpu=archs
* etc (not used in newlib)
This patch updates assembly routines for ARC to use new definitions instead
of a deprecated ones. To ensure compatibility with older compiler new
definitions are also defined in asm.h if needed, based on deprecated
preprocessor definitions.
*** newlib/ChangeLog ***
2015-12-15 Anton Kolesov <Anton.Kolesov@synopsys.com>
* libc/machine/arc/asm.h: Define new GCC definition for old compiler.
* libc/machine/arc/memcmp-bs-norm.S: Use new GCC defines to detect
processor features.
* libc/machine/arc/memcmp.S: Likewise.
* libc/machine/arc/memcpy-archs.S: Likewise.
* libc/machine/arc/memcpy-bs.S: Likewise.
* libc/machine/arc/memcpy.S: Likewise.
* libc/machine/arc/memset-archs.S: Likewise.
* libc/machine/arc/memset-bs.S: Likewise.
* libc/machine/arc/memset.S: Likewise.
* libc/machine/arc/setjmp.S: Likewise.
* libc/machine/arc/strchr-bs-norm.S: Likewise.
* libc/machine/arc/strchr-bs.S: Likewise.
* libc/machine/arc/strchr.S: Likewise.
* libc/machine/arc/strcmp-archs.S: Likewise.
* libc/machine/arc/strcmp.S: Likewise.
* libc/machine/arc/strcpy-bs-arc600.S: Likewise.
* libc/machine/arc/strcpy-bs.S: Likewise.
* libc/machine/arc/strcpy.S: Likewise.
* libc/machine/arc/strlen-bs-norm.S: Likewise.
* libc/machine/arc/strlen-bs.S: Likewise.
* libc/machine/arc/strlen.S: Likewise.
* libc/machine/arc/strncpy-bs.S: Likewise.
* libc/machine/arc/strncpy.S: Likewise.
Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Reformulate the strcmp-armv7.S selection logic around the architecture
features required by the implementation code rather (some) version of
the architecture that expose those features.
The patch moves the inline ASM thumb2 -Os implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch moves the inline ASM thumb1 -O2 implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch adds strlen.S to contain the complementary preprocessor
logic to strlen-stub.c intended to provide #inclusion of alternative
.S implementations.
Initially we just include the existing strlen-armv7.S implementation.
We rewrite _ISA_ARMV7 in both strlen.S and strlen-stub.c to use the
underlying existing underlying defintion from arm_asm.h in order to
avoide including that file, this is in effect the first step towards a
move to ACLE predefines only.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
In order to maintain consistency both within machine/arm and between
machine/arm and machine/aarch64, rename the 'c' stub to -stub.c.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
This patch flattens the condition code selection used in strlen in an
attempt to make the guarding condition for each alternative
implementation clearer and to structure the logic in a manner that
makes it easier to maintain complementary logic between the
alternative 'C' and assembler implementations.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
ARM newlib has various strcmp implementations that use .cfi_*
directives to generate unwind information.
The effect of this is that the generated objects contain .eh_frame
sections. However, ARM uses its own unwind info format, not
.eh_frame, which is generated by ARM-specific directives, not .cfi_*.
The .eh_frame sections are useless, but also not removed by strip and
may be loaded into memory at runtime.
This patch fixes this by using .cfi_sections .debug_frame (as in
glibc) so that the directives generate .debug_frame instead.
.debug_frame is useful for the debugger, can be removed by strip, and
is not loaded into memory at runtime.
* libc/machine/arm/strcmp-arm-tiny.S: Use .cfi_sections
.debug_frame.
* libc/machine/arm/strcmp-armv4.S: Likewise.
* libc/machine/arm/strcmp-armv4t.S: Likewise.
* libc/machine/arm/strcmp-armv6.S: Likewise.
* libc/machine/arm/strcmp-armv6m.S: Likewise.
* libc/machine/arm/strcmp-armv7.S: Likewise.
* libc/machine/arm/strcmp-armv7m.S: Likewise.
The patch cleans up the auto configury mechanism used to select
different implementations of memchr for various architecture versions.
The approach here is to remove the selection of memchr within automake
and instead use complimentary logic in memchr-stub.c and memchr.S to
choose between the gerneric memchr.c implementation or one of the
architecture specific implementations.
This patch also changes the selection criteria inline with the
previous proposal here:
https://sourceware.org/ml/newlib/2015/msg00752.html
but using the ACLE predefines.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This patch cleans up the auto configury mechanism used to select
different implementations of memcpy for various architecture versions.
The approach here is to remove the selection of memcpy within automake
and instead use complimentary logic in memcpy-stub.c and memcpy.S to
choose between the generic memcpy.c implemenation or one of the
architecture specific memcpy*.S implemenations.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This revised patch flips the remaining preprocessor logic in
memcpy-stub.c to use ACLE defines as requested in the previous review
and removes the now disused HAVE_ARMV7A and HAVE_ARMV8A configure.in
support.
The newlib configury logic that detects architecture version and
chooses an appropriate memcpy implementation does not consider
ARMv8-a.
This patch adds configury logic to detect ARMv8-a along with the
associated changes in Makefile.am and memcpy.
Hi!
I've got the situation, that the function strlen() occurs twice in libc.a
(building newlib for ARM-V7a and Size-Optimized).
In newlib/libc/machine/arm/strlen.c there are the pre-processor stetements ...
#if defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__))
/*...*/
#else
#if !(defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__))
/*...*/
#endif
and in newlib/libc/machine/arm/strlen-armv7.S the "exclude" begins with
/* NOTE: This ifdef MUST match the ones in arm/strlen.c
We fallback to the one in arm/strlen.c for size optimised or
for older architectures. */
#if defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__) && \
!(defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__)))
But this is not completely contrary to arm/strlen.c (see above)!
To fix the logical statement in arm/strlen-armv7.S there are parentheses needed
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
This is an optimized memcpy for AArch64. Copies are split into 3 main
cases: small copies of up to 16 bytes, medium copies of 17..96 bytes
which are fully unrolled. Large copies of more than 96 bytes align
the destination and use an unrolled loop processing 64 bytes per
iteration. In order to share code with memmove, small and medium
copies read all data before writing, allowing any kind of overlap. On
a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.
This is an optimized memmove for AArch64. All copies of up to 96
bytes and all backward copies are done by the new memcpy. The only
remaining case is large forward copies which are done in the same way
as the memcpy loop, but copying from the end rather than the start.
improvements. Adjust to allow building as stpcpy.
* libc/machine/aarch64/stpcpy.S: New file.
* libc/machine/aarch64/stpcpy-stub.c: New file.
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Build stpcpy.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strrchr-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
from the 64-bit _JBTYPE definition.
* libc/machine/mips/setjmp.S: Re-work the o32 FP64 support to match
the now one-and-only supported o32 FP64 ABI extension. Also
support o32 FPXX.
Found by:
find -name '*.h' |xargs grep -i 'attribute.*(([a-z]'
For an example of the type of bugs this causes, try compiling this valid
C11 program (it's valid because 'noreturn' is reserved for use in the
user namespace unless you include <stdnoreturn.h>):
$ cat foo.c
#define noreturn __attribute__((noreturn))
#include <stdlib.h>
$ gcc -c -o foo.o -Wall foo.c
In file included from /usr/include/stdlib.h:11:0,
from foo.c:2:
foo.c:1:18: error: expected ')' before '__attribute__'
#define noreturn __attribute__((noreturn))
^
/usr/include/stdlib.h:66:28: error: expected ',' or ';' before ')' token
_VOID _EXFUN(abort,(_VOID) _ATTRIBUTE ((noreturn)));
^
* libc/machine/spu/spu_timer_internal.h: Decorate attribute names
with __, for namespace safety.
* libc/machine/xscale/machine/profile.h: Likewise.
* libc/include/stdlib.h: Likewise.
* libc/include/_ansi.h: Likewise.
* libc/include/sys/unistd.h: Likewise.
* libc/sys/linux/linuxthreads/libc-symbols.h: Likewise.
* libc/sys/linux/linuxthreads/internals.h: Likewise.
* libc/sys/linux/machine/i386/weakalias.h: Likewise.
* libc/sys/linux/machine/i386/dl-procinfo.h: Likewise.
* libc/sys/linux/machine/i386/dl-machine.h: Likewise.
* libc/sys/linux/libc-symbols.h: Likewise.
* libc/sys/linux/iconv/gconv_charset.h: Likewise.
* libc/sys/linux/include/resolv.h: Likewise.
* libc/sys/linux/sys/unistd.h: Likewise.
* libc/sys/linux/dl/atomicity.h: Likewise.
* libc/sys/linux/dl/dynamic-link.h: Likewise.
* libc/sys/linux/dl/ldsodefs.h: Likewise.
2014-07-11 K�vin Petit <kevin.petit@arm.com>
* libc/machine/aarch64/memchr.S: New file.
* libc/machine/aarch64/memchr-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add the new files.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strchrnul-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strchr-stub.c: New file
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/arm/strcmp-armv4.S: New file.
* libc/machine/arm/strcmp-armv4t.S: New file.
* libc/machine/arm/strcmp-armv6.S: New file.
* libc/machine/arm/strcmp-armv7.S: New file.
* libc/machine/arm/strcmp-armv7m.S: New file.
* libc/machine/arm/strcmp.S: Replace with wrapper for various
implementations.
* libc/machine/arm/Makefile.am (strcmp.o, strcmp.obj): Add
dependencies.
* libc/machine/arm/Makefile.in: Regenerated.
* libc/include/machine/setjmp.h: Add support for __mips_fpr being
64 and treat it the same as if __mips64 is set.
* libc/machine/mips/setjmp.S: Ditto, plus add checks for _MIPS_SIM
being _ABIN32 and _ABI64.
Adjust the conditions for entering the aligned copy loop to
improve performance on mutually misaligned buffer copies.
2013-07-01 Will Newton <will.newton@linaro.org>
* libc/machine/arm/memcpy-armv7a.S: Adjust entry to
aligned loop to improve misaligned copy performance.
Import the latest version of strlen from the Linaro cortex-strings
package. This version is faster across a variety of block size and
alignments on ARMv7.
newlib/ChangeLog:
2013-06-21 Will Newton <will.newton@linaro.org>
* libc/machine/arm/strlen-armv7.S: Import latest strlen
code from Linaro cortex-strings.
* libc/machine/arm/memcpy-stub.c: Use generic memcpy if unaligned
access is not enabled.
* libc/machine/arm/memcpy.S: Faster memcpy implementation for
Cortex A15 cores using NEON and VFP if available.