Marcus Shawcroft wrote:
> This patch appears to have been munged by the mail system, can you
> repost as an attachment please.
Sure, I've attached the patch.
Wilco
Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it is the
fastest way to search for '\0', and use memchr with an infinite size for other cases.
This is 3x faster for large sizes.
ChangeLog:
2016-04-22 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libc/machine/aarch64/Makefile.in: Add rawmemchr.S and
rawmemchr-stub.c.
* newlib/libc/machine/aarch64/Makefile.am: Likewise.
* newlib/libc/machine/aarch64/rawmemchr.S (rawmemchr): Add rawmemchr.
* newlib/libc/machine/aarch64/rawmemchr-stub.c (rawmemchr): Likewise.
2016-04-18 Thomas Preud'homme <thomas.preudhomme@arm.com>
* libc/machine/arm/strlen-stub.c: Check capabilities of architecture
to decide which Thumb implementation to use and fall back to C
implementation for architecture not supporting Thumb mode.
* libc/machine/arm/strlen.S: Likewise.
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Newlib defines defaults for internal types via <sys/_types.h> and uses
<machine/_types.h> to let targets define their own type if necessary.
Previously for example
#ifndef __dev_t_defined
typedef short __dev_t;
#endif
However, the __*_t_defined pattern conflicts with the glibc type guard
pattern for user types, e.g. dev_t in this example. Introduce a
__machine_*_t_defined pattern for internal types (defined by
<machine/_types.h>, used by <sys/_types.h>). For example
#ifndef __machine_dev_t_defined
typedef short __dev_t;
#endif
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Intel MCU System V ABI are incompartible with i386 System V ABI:
o Minimum instruction set is Intel Pentium ISA minus x87 instructions
o No x87 or vector registers
o First three args are passed in %eax, %edx and %ecx
o Full specification available here:
https://github.com/hjl-tools/x86-psABI/wiki/iamcu-psABI-0.7.pdf
newlib/
* configure.host: Add new ix86-*-elfiamcu target
newlib/libc/include/
* setjmp.h: Change _JBLEN for Intel MCU target
newlib/libc/machine/i386/
* memchr.S: (memchr) Target-specific size-optimized version
* memcmp.S: (memcmp) Likewise
* memcpy.S: (memcpy) Likewise
* memmove.S: (memmove) Likewise
* memset.S: (memset) Likewise
* setjmp.S: (setjmp) Likewise
* strchr.S: (strchr) Likewise
* strlen.S: (strlen) Likewise
newlib/libc/stdlib/
* srtold.c: (__flt_rounds) Disable for Intel MCU
Prototypes also added for initstate() and setstate() but they
were not implemented in the shared newlib code.
* newlib/libc/include/cygwin/stdlib.h: Prototypes added.
* winsup/cygwin/include/cygwin/stdlib.h: Prototypes removed.
* newlib/libc/stdlib/random.c: New file.
* newlib/libc/machine/epiphany/machine/stdlib.h: Removed
* newlib/libc/stdlib/Makefile.am: Added random.c.
* newlib/libc/stdlib/stdlib.tex: Added random.def.
* newlib/libc/stdlib/Makefile.in: Regenerated.
libgloss:
* arm/Makefile.in: Add newlib/libc/machine/arm to the include path if
newlib is present.
* arm/arm.h: Include acle-compat.h.
(THUMB_V7_V6M): Rename to ...
(PREFER_THUMB): This. Use ACLE macros __ARM_ARCH_ISA_ARM instead of
__ARM_ARCH_6M__ to decide whether to define it.
(THUMB1_ONLY): Define for Thumb-1 only targets.
(THUMB_V7M_V6M): Rename to ...
(THUMB_VXM): This. Defined based on __ARM_ARCH_ISA_ARM, excluding
ARMv7.
* arm/crt0.S: Use THUMB1_ONLY rather than __ARM_ARCH_6M__,
!__ARM_ARCH_ISA_ARM rather than THUMB_V7M_V6M for fp enabling, and
PREFER_THUMB rather than THUMB_V7_V6M. Rename other occurences of
THUMB_V7M_V6M to THUMB_VXM.
* arm/linux-crt0.c: Likewise.
* arm/redboot-crt0.S: Likewise.
* arm/swi.h: Likewise.
* arm/trap.S: Likewise.
newlib:
* libc/machine/arm/memcpy-stub.c: Use ACLE macros __ARM_ARCH_ISA_THUMB
and __ARM_ARCH_ISA_ARM to check for Thumb-2 only targets rather than
__ARM_ARCH and __ARM_ARCH_PROFILE.
* libc/machine/arm/memcpy.S: Likewise.
* libc/machine/arm/setjmp.S: Likewise for Thumb-1 only target and
include acle-compat.h.
* libc/machine/arm/strcmp.S: Likewise for Thumb-1 and Thumb-2 only
target and include acle-compat.h.
* libc/sys/arm/arm.h: Include acle-compat.h.
(THUMB_V7_V6M): Rename to ...
(PREFER_THUMB): This. Use ACLE macro __ARM_ARCH_ISA_ARM instead of
__ARM_ARCH_6M__ to decide whether to define it.
(THUMB1_ONLY): Define for Thumb-1 only targets.
(THUMB_V7M_V6M): Rename to ...
(THUMB_VXM): This. Defined based on __ARM_ARCH_ISA_ARM, excluding
ARMv7.
* libc/sys/arm/crt0.S: Use PREFER_THUMB rather than THUMB_V7_V6M and
rename THUMB_V7M_V6M into THUMB_VXM.
* libc/sys/arm/swi.h: Likewise.
GCC for ARC has been updated to provide consistent naming of preprocessor
definitions for different optional architecture features:
* __ARC_BARREL_SHIFTER__ instead of __Xbarrel_shifter for
-mbarrel-shifter
* __ARC_LL64__ instead of __LL64__ for -mll64
* __ARCEM__ instead of __EM__ for -mcpu=arcem
* __ARCHS__ instead of __HS__ for -mcpu=archs
* etc (not used in newlib)
This patch updates assembly routines for ARC to use new definitions instead
of a deprecated ones. To ensure compatibility with older compiler new
definitions are also defined in asm.h if needed, based on deprecated
preprocessor definitions.
*** newlib/ChangeLog ***
2015-12-15 Anton Kolesov <Anton.Kolesov@synopsys.com>
* libc/machine/arc/asm.h: Define new GCC definition for old compiler.
* libc/machine/arc/memcmp-bs-norm.S: Use new GCC defines to detect
processor features.
* libc/machine/arc/memcmp.S: Likewise.
* libc/machine/arc/memcpy-archs.S: Likewise.
* libc/machine/arc/memcpy-bs.S: Likewise.
* libc/machine/arc/memcpy.S: Likewise.
* libc/machine/arc/memset-archs.S: Likewise.
* libc/machine/arc/memset-bs.S: Likewise.
* libc/machine/arc/memset.S: Likewise.
* libc/machine/arc/setjmp.S: Likewise.
* libc/machine/arc/strchr-bs-norm.S: Likewise.
* libc/machine/arc/strchr-bs.S: Likewise.
* libc/machine/arc/strchr.S: Likewise.
* libc/machine/arc/strcmp-archs.S: Likewise.
* libc/machine/arc/strcmp.S: Likewise.
* libc/machine/arc/strcpy-bs-arc600.S: Likewise.
* libc/machine/arc/strcpy-bs.S: Likewise.
* libc/machine/arc/strcpy.S: Likewise.
* libc/machine/arc/strlen-bs-norm.S: Likewise.
* libc/machine/arc/strlen-bs.S: Likewise.
* libc/machine/arc/strlen.S: Likewise.
* libc/machine/arc/strncpy-bs.S: Likewise.
* libc/machine/arc/strncpy.S: Likewise.
Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Reformulate the strcmp-armv7.S selection logic around the architecture
features required by the implementation code rather (some) version of
the architecture that expose those features.
The patch moves the inline ASM thumb2 -Os implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch moves the inline ASM thumb1 -O2 implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch adds strlen.S to contain the complementary preprocessor
logic to strlen-stub.c intended to provide #inclusion of alternative
.S implementations.
Initially we just include the existing strlen-armv7.S implementation.
We rewrite _ISA_ARMV7 in both strlen.S and strlen-stub.c to use the
underlying existing underlying defintion from arm_asm.h in order to
avoide including that file, this is in effect the first step towards a
move to ACLE predefines only.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
In order to maintain consistency both within machine/arm and between
machine/arm and machine/aarch64, rename the 'c' stub to -stub.c.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
This patch flattens the condition code selection used in strlen in an
attempt to make the guarding condition for each alternative
implementation clearer and to structure the logic in a manner that
makes it easier to maintain complementary logic between the
alternative 'C' and assembler implementations.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
ARM newlib has various strcmp implementations that use .cfi_*
directives to generate unwind information.
The effect of this is that the generated objects contain .eh_frame
sections. However, ARM uses its own unwind info format, not
.eh_frame, which is generated by ARM-specific directives, not .cfi_*.
The .eh_frame sections are useless, but also not removed by strip and
may be loaded into memory at runtime.
This patch fixes this by using .cfi_sections .debug_frame (as in
glibc) so that the directives generate .debug_frame instead.
.debug_frame is useful for the debugger, can be removed by strip, and
is not loaded into memory at runtime.
* libc/machine/arm/strcmp-arm-tiny.S: Use .cfi_sections
.debug_frame.
* libc/machine/arm/strcmp-armv4.S: Likewise.
* libc/machine/arm/strcmp-armv4t.S: Likewise.
* libc/machine/arm/strcmp-armv6.S: Likewise.
* libc/machine/arm/strcmp-armv6m.S: Likewise.
* libc/machine/arm/strcmp-armv7.S: Likewise.
* libc/machine/arm/strcmp-armv7m.S: Likewise.
The patch cleans up the auto configury mechanism used to select
different implementations of memchr for various architecture versions.
The approach here is to remove the selection of memchr within automake
and instead use complimentary logic in memchr-stub.c and memchr.S to
choose between the gerneric memchr.c implementation or one of the
architecture specific implementations.
This patch also changes the selection criteria inline with the
previous proposal here:
https://sourceware.org/ml/newlib/2015/msg00752.html
but using the ACLE predefines.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This patch cleans up the auto configury mechanism used to select
different implementations of memcpy for various architecture versions.
The approach here is to remove the selection of memcpy within automake
and instead use complimentary logic in memcpy-stub.c and memcpy.S to
choose between the generic memcpy.c implemenation or one of the
architecture specific memcpy*.S implemenations.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This revised patch flips the remaining preprocessor logic in
memcpy-stub.c to use ACLE defines as requested in the previous review
and removes the now disused HAVE_ARMV7A and HAVE_ARMV8A configure.in
support.
The newlib configury logic that detects architecture version and
chooses an appropriate memcpy implementation does not consider
ARMv8-a.
This patch adds configury logic to detect ARMv8-a along with the
associated changes in Makefile.am and memcpy.
Hi!
I've got the situation, that the function strlen() occurs twice in libc.a
(building newlib for ARM-V7a and Size-Optimized).
In newlib/libc/machine/arm/strlen.c there are the pre-processor stetements ...
#if defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__))
/*...*/
#else
#if !(defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__))
/*...*/
#endif
and in newlib/libc/machine/arm/strlen-armv7.S the "exclude" begins with
/* NOTE: This ifdef MUST match the ones in arm/strlen.c
We fallback to the one in arm/strlen.c for size optimised or
for older architectures. */
#if defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__) && \
!(defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__)))
But this is not completely contrary to arm/strlen.c (see above)!
To fix the logical statement in arm/strlen-armv7.S there are parentheses needed
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
This is an optimized memcpy for AArch64. Copies are split into 3 main
cases: small copies of up to 16 bytes, medium copies of 17..96 bytes
which are fully unrolled. Large copies of more than 96 bytes align
the destination and use an unrolled loop processing 64 bytes per
iteration. In order to share code with memmove, small and medium
copies read all data before writing, allowing any kind of overlap. On
a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.
This is an optimized memmove for AArch64. All copies of up to 96
bytes and all backward copies are done by the new memcpy. The only
remaining case is large forward copies which are done in the same way
as the memcpy loop, but copying from the end rather than the start.
improvements. Adjust to allow building as stpcpy.
* libc/machine/aarch64/stpcpy.S: New file.
* libc/machine/aarch64/stpcpy-stub.c: New file.
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Build stpcpy.
* libc/machine/aarch64/Makefile.in: Regenerated.