- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
With this change the arm platform can now be fully compiled with Clang.
Tested by comparing the output with GCC 4.8.2, and Clang 4.0, using a
variety of arches, big/little endianness, and arm/thumb mode to verify
the generated assembly output matches between GCC vs Clang with UAL, and
also GCC with UAL vs GCC with non-UAL, for all preprocessor code blocks.
The only difference found is an extra nop at the end of the function
when compiled with GCC using armv7-a/thumb/little-endian/-O2 compared to
Clang. The nop is not emitted when compiled in big-endian mode.
The implementation of the POSIX access() function is nothing machine
specific like memcpy(), etc. Move it back to the system domain. This
avoids problems due to the include search order of the Newlib/GCC build
which picks up machine includes before system includes.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
In patch b219285f873cc79361355938bd2a994957b4a6ef you have a syntax
error in the PLD instruction. The syntax for the pld argument should be
in square brackets as it's a memory address like so: pld [r1]. With
your patch the newlib build fails for armv7-a targets. This patch fixes
the build failures.
Tested by making sure the newlib build completes successfully.
2016-01-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* libc/machine/arm/strcpy.c (strcpy): Fix PLD assembly syntax.
* libc/machine/arm/strlen-stub.c (strlen): Likewise.
LTO can re-order top-level assembly blocks, which can cause this
macro definition to appear after its use (or not at all), causing
compilation failures. On modern toolchains (armv4t+), assembly
should write `bx lr` in all cases, and linkers will transparently
convert them to `mov pc, lr`, allowing us to simply remove the
macro.
(source: https://groups.google.com/forum/#!topic/comp.sys.arm/3l7fVGX-Wug
and verified empirically)
For the armv4.S file, preserve this macro to maximize backwards
compatibility.
LTO can re-order top-level assembly blocks, which can cause this
macro definition to appear after its use (or not at all), causing
compilation failures. As the macro has very few uses, simply removing
it by inlining is a simple fix.
n.b. one of the macro invocations in strlen-stub.c was already
guarded by the relevant #define, so it is simply converted directly
to a pld
In the case of memcpy-armv7m.S being built for a big-endian multilib
(including armv7 without a specific profile), realignment code made
assumptions about the byte ordering being little-endian.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2016-04-18 Thomas Preud'homme <thomas.preudhomme@arm.com>
* libc/machine/arm/strlen-stub.c: Check capabilities of architecture
to decide which Thumb implementation to use and fall back to C
implementation for architecture not supporting Thumb mode.
* libc/machine/arm/strlen.S: Likewise.
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
libgloss:
* arm/Makefile.in: Add newlib/libc/machine/arm to the include path if
newlib is present.
* arm/arm.h: Include acle-compat.h.
(THUMB_V7_V6M): Rename to ...
(PREFER_THUMB): This. Use ACLE macros __ARM_ARCH_ISA_ARM instead of
__ARM_ARCH_6M__ to decide whether to define it.
(THUMB1_ONLY): Define for Thumb-1 only targets.
(THUMB_V7M_V6M): Rename to ...
(THUMB_VXM): This. Defined based on __ARM_ARCH_ISA_ARM, excluding
ARMv7.
* arm/crt0.S: Use THUMB1_ONLY rather than __ARM_ARCH_6M__,
!__ARM_ARCH_ISA_ARM rather than THUMB_V7M_V6M for fp enabling, and
PREFER_THUMB rather than THUMB_V7_V6M. Rename other occurences of
THUMB_V7M_V6M to THUMB_VXM.
* arm/linux-crt0.c: Likewise.
* arm/redboot-crt0.S: Likewise.
* arm/swi.h: Likewise.
* arm/trap.S: Likewise.
newlib:
* libc/machine/arm/memcpy-stub.c: Use ACLE macros __ARM_ARCH_ISA_THUMB
and __ARM_ARCH_ISA_ARM to check for Thumb-2 only targets rather than
__ARM_ARCH and __ARM_ARCH_PROFILE.
* libc/machine/arm/memcpy.S: Likewise.
* libc/machine/arm/setjmp.S: Likewise for Thumb-1 only target and
include acle-compat.h.
* libc/machine/arm/strcmp.S: Likewise for Thumb-1 and Thumb-2 only
target and include acle-compat.h.
* libc/sys/arm/arm.h: Include acle-compat.h.
(THUMB_V7_V6M): Rename to ...
(PREFER_THUMB): This. Use ACLE macro __ARM_ARCH_ISA_ARM instead of
__ARM_ARCH_6M__ to decide whether to define it.
(THUMB1_ONLY): Define for Thumb-1 only targets.
(THUMB_V7M_V6M): Rename to ...
(THUMB_VXM): This. Defined based on __ARM_ARCH_ISA_ARM, excluding
ARMv7.
* libc/sys/arm/crt0.S: Use PREFER_THUMB rather than THUMB_V7_V6M and
rename THUMB_V7M_V6M into THUMB_VXM.
* libc/sys/arm/swi.h: Likewise.
Reformulate the strcmp-armv7.S selection logic around the architecture
features required by the implementation code rather (some) version of
the architecture that expose those features.
The patch moves the inline ASM thumb2 -Os implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch moves the inline ASM thumb1 -O2 implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch adds strlen.S to contain the complementary preprocessor
logic to strlen-stub.c intended to provide #inclusion of alternative
.S implementations.
Initially we just include the existing strlen-armv7.S implementation.
We rewrite _ISA_ARMV7 in both strlen.S and strlen-stub.c to use the
underlying existing underlying defintion from arm_asm.h in order to
avoide including that file, this is in effect the first step towards a
move to ACLE predefines only.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
In order to maintain consistency both within machine/arm and between
machine/arm and machine/aarch64, rename the 'c' stub to -stub.c.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
This patch flattens the condition code selection used in strlen in an
attempt to make the guarding condition for each alternative
implementation clearer and to structure the logic in a manner that
makes it easier to maintain complementary logic between the
alternative 'C' and assembler implementations.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
ARM newlib has various strcmp implementations that use .cfi_*
directives to generate unwind information.
The effect of this is that the generated objects contain .eh_frame
sections. However, ARM uses its own unwind info format, not
.eh_frame, which is generated by ARM-specific directives, not .cfi_*.
The .eh_frame sections are useless, but also not removed by strip and
may be loaded into memory at runtime.
This patch fixes this by using .cfi_sections .debug_frame (as in
glibc) so that the directives generate .debug_frame instead.
.debug_frame is useful for the debugger, can be removed by strip, and
is not loaded into memory at runtime.
* libc/machine/arm/strcmp-arm-tiny.S: Use .cfi_sections
.debug_frame.
* libc/machine/arm/strcmp-armv4.S: Likewise.
* libc/machine/arm/strcmp-armv4t.S: Likewise.
* libc/machine/arm/strcmp-armv6.S: Likewise.
* libc/machine/arm/strcmp-armv6m.S: Likewise.
* libc/machine/arm/strcmp-armv7.S: Likewise.
* libc/machine/arm/strcmp-armv7m.S: Likewise.
The patch cleans up the auto configury mechanism used to select
different implementations of memchr for various architecture versions.
The approach here is to remove the selection of memchr within automake
and instead use complimentary logic in memchr-stub.c and memchr.S to
choose between the gerneric memchr.c implementation or one of the
architecture specific implementations.
This patch also changes the selection criteria inline with the
previous proposal here:
https://sourceware.org/ml/newlib/2015/msg00752.html
but using the ACLE predefines.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This patch cleans up the auto configury mechanism used to select
different implementations of memcpy for various architecture versions.
The approach here is to remove the selection of memcpy within automake
and instead use complimentary logic in memcpy-stub.c and memcpy.S to
choose between the generic memcpy.c implemenation or one of the
architecture specific memcpy*.S implemenations.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This revised patch flips the remaining preprocessor logic in
memcpy-stub.c to use ACLE defines as requested in the previous review
and removes the now disused HAVE_ARMV7A and HAVE_ARMV8A configure.in
support.
The newlib configury logic that detects architecture version and
chooses an appropriate memcpy implementation does not consider
ARMv8-a.
This patch adds configury logic to detect ARMv8-a along with the
associated changes in Makefile.am and memcpy.
Hi!
I've got the situation, that the function strlen() occurs twice in libc.a
(building newlib for ARM-V7a and Size-Optimized).
In newlib/libc/machine/arm/strlen.c there are the pre-processor stetements ...
#if defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__))
/*...*/
#else
#if !(defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__))
/*...*/
#endif
and in newlib/libc/machine/arm/strlen-armv7.S the "exclude" begins with
/* NOTE: This ifdef MUST match the ones in arm/strlen.c
We fallback to the one in arm/strlen.c for size optimised or
for older architectures. */
#if defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__) && \
!(defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__)))
But this is not completely contrary to arm/strlen.c (see above)!
To fix the logical statement in arm/strlen-armv7.S there are parentheses needed
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
* libc/machine/arm/strcmp-armv4.S: New file.
* libc/machine/arm/strcmp-armv4t.S: New file.
* libc/machine/arm/strcmp-armv6.S: New file.
* libc/machine/arm/strcmp-armv7.S: New file.
* libc/machine/arm/strcmp-armv7m.S: New file.
* libc/machine/arm/strcmp.S: Replace with wrapper for various
implementations.
* libc/machine/arm/Makefile.am (strcmp.o, strcmp.obj): Add
dependencies.
* libc/machine/arm/Makefile.in: Regenerated.
Adjust the conditions for entering the aligned copy loop to
improve performance on mutually misaligned buffer copies.
2013-07-01 Will Newton <will.newton@linaro.org>
* libc/machine/arm/memcpy-armv7a.S: Adjust entry to
aligned loop to improve misaligned copy performance.
Import the latest version of strlen from the Linaro cortex-strings
package. This version is faster across a variety of block size and
alignments on ARMv7.
newlib/ChangeLog:
2013-06-21 Will Newton <will.newton@linaro.org>
* libc/machine/arm/strlen-armv7.S: Import latest strlen
code from Linaro cortex-strings.