The previous fenv support for ARM used the soft-float implementation of
FreeBSD. Newlib uses the one from libgcc by default. They are not
compatible. Having an GCC incompatible soft-float fenv support in
Newlib makes no sense. A long-term solution could be to provide a
libgcc compatible soft-float support. This likely requires changes in
the GCC configuration. For now, provide a stub implementation for
soft-float multilibs similar to RISC-V.
Move implementation to one file and delete now unused files. Hide
implementation details. Remove function parameter names from header
file to avoid name conflicts.
Provide VFP support if __SOFTFP__ is not defined like glibc.
Reviewed-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Signed-off-by: Eshan dhawan <eshandhawan51@gmail.com>
This patch fixes a bug in RISC-V's memcpy implementation where an
integer wraparound occurs when src + size < 8 * sizeof(long), causing
the word-sized copy loop to be incorrectly entered.
Signed-off-by: Chih-Mao Chen <cmchen@andestech.com>
Most code in newlib already uses unified syntax, but just a couple of
laggards remain. This patch removes these and means the the entire
code base has now been converted.
This edits licenses held by Berkeley and NetBSD, both of which
have removed the advertising requirement from their licenses.
Signed-off-by: Keith Packard <keithp@keithp.com>
If we had architecture-specific exception bits, we could just set them
to match the processor, but instead ieeefp.h is shared by all targets
so we need to map between the public values and the register contents.
Signed-off-by: Keith Packard <keithp@keithp.com>
This makes the fpsetround function actually do something rather than
just return -1 due to the default 'fall-through' behavior of the switch
statement.
Signed-off-by: Keith Packard <keithp@keithp.com>
s[0:3] contain a descriptor used to set up the initial value of the
stack, but only the lower 48 bits of s[0:1] are currently used.
The reent marker is currently set in s3, but by stashing it in the
upper 16 bits of s[0:1] instead, s3 can be freed up for other purposes.
Update the offsets used to save registers into the stejmp jmp_buf
structure in order to:
* Avoid writing the supervision register outside the buffer and thus
clobbering something on the stack. Previously the supervision register
was written at offset 124 while the buffer was of length 124.
* Shrink the jmp_buf down to the size actually needed, by avoiding holes
at the locations of omitted registers.
Invert equality check instruction to correct the return value handling
in longjmp.
The return value should be the value of the second argument to longjmp,
unless the argument value was 0 in which case it should be 1.
Previously, longjmp would set return value 1 if the second argument was
non-zero, and 0 if it was 0, which was incorrect.
From: Andrew Stubbs <ams@codesourcery.com>
Fix a bug in which the high-part of 64-bit values are being corrupted, leading
to erroneous stack overflow errors. The problem was only that the mixed-size
calculations are being treated as signed when they should be unsigned.
This patch adds implementations of memcpy, memmove, memset and strcmp
optimized for size. The changes have been tested in
riscv/riscv-gnu-toolchain by riscv-dejagnu with
riscv-sim.exp/riscv-sim-nano.exp.
"tiny" printf is derived from _vfprintf_r in libc/stdio/nano-vfprintf.c.
"tiny" puts has been implemented so that it just calls write, without
any other processing.
Support for buffering, reentrancy and streams has been removed from
these functions to achieve reduced code size.
This reduced code size implementation of printf and puts can be enabled
in an application by passing "--wrap printf" and "--wrap puts" to the
GNU linker. This will replace references to "printf" and "puts" in user
code with "__wrap_printf" and "__wrap_puts" respectively.
If there is no implementation of these __wrap* functions in user code,
these "tiny" printf and puts implementations will be linked into the
final executable.
The wrapping mechanism is supposed to be invisible to the user:
- A GCC wrapper option such as "-mtiny-printf" will be added to alias
these wrap commands.
- If the user is unaware of the "tiny" implementation, and chooses to
implement their own __wrap_printf and __wrap_puts, their own
implementation will be automatically chosen over the "tiny" printf and
puts from the library.
Newlib must be configured with --enable-newlib-nano-formatted-io for
the "tiny" printf and puts functions to be built into the library.
Code size reduction examples:
printf("Hello World\n")
baseline - msp430-elf-gcc gcc-8_3_0-release
text data bss
5638 214 26
"tiny" puts enabled
text data bss
714 90 20
printf("Hello %d\n", a)
baseline - msp430-elf-gcc gcc-8_3_0-release
text data bss
10916 614 28
"tiny" printf enabled
text data bss
4632 280 20
These missing includes were causing build warnings, but also a real bug in
which the "size" parameter to "write" was being passed in 32-bit, whereas it
ought to be 64-bit. This led to intermittent bad behaviour.
Add support for the AMD GCN GPU architecture. This is primarily intended for
use with OpenMP and OpenACC offloading. It can also be used for stand-alone
programs, but this is intended mostly for testing the compiler and is not
expected to be useful in general.
The GPU architecture is highly parallel, and therefore Newlib must be
configured to use dynamic re-entrancy, and thread-safe malloc.
The only I/O available is a via a shared-memory interface provided by libgomp
and the gcn-run tool included with GCC. At this time this is limited to
stdout, argc/argv, and the return code.
This patch fixes an issue in the previous memset loop change. If the
zva size is >= 256 and there are more than 64 bytes left in the
tail, we could enter the loop and thus need to rebias dst by 32 as
well.
Since no known CPUs use this size this can't be tested natively, so I've
tested it on a simulator initialized with a large zva size.
--
This fixes an ineffiency in the non-zero memset. Delaying the writeback
until the end of the loop is slightly faster on some cores - this shows
~5% performance gain on Cortex-A53 when doing large non-zero memsets.
Tested against the GLIBC testsuite.
Move common content of the various <sys/dirent.h> and the latest FreeBSD
<dirent.h> to <dirent.h>.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This macro selects a compiler option that disables recognition of
common memset/memcpy patterns and converting those to direct
memset/memcpy calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.
This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark in glibc is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
At least with Binutils 2.30 and GCC 7.3 we need symbol definitions
without the leading underscore.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
- For prevent confuse about what BSD license variant we used, 2- or
3-clause license, we change the license to FreeBSD license to make
it unambiguously refers to the 2-clause license.
With this change the arm platform can now be fully compiled with Clang.
Tested by comparing the output with GCC 4.8.2, and Clang 4.0, using a
variety of arches, big/little endianness, and arm/thumb mode to verify
the generated assembly output matches between GCC vs Clang with UAL, and
also GCC with UAL vs GCC with non-UAL, for all preprocessor code blocks.
The only difference found is an extra nop at the end of the function
when compiled with GCC using armv7-a/thumb/little-endian/-O2 compared to
Clang. The nop is not emitted when compiled in big-endian mode.
The implementation of the POSIX access() function is nothing machine
specific like memcpy(), etc. Move it back to the system domain. This
avoids problems due to the include search order of the Newlib/GCC build
which picks up machine includes before system includes.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
In patch b219285f87 you have a syntax
error in the PLD instruction. The syntax for the pld argument should be
in square brackets as it's a memory address like so: pld [r1]. With
your patch the newlib build fails for armv7-a targets. This patch fixes
the build failures.
Tested by making sure the newlib build completes successfully.
2016-01-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* libc/machine/arm/strcpy.c (strcpy): Fix PLD assembly syntax.
* libc/machine/arm/strlen-stub.c (strlen): Likewise.
LTO can re-order top-level assembly blocks, which can cause this
macro definition to appear after its use (or not at all), causing
compilation failures. On modern toolchains (armv4t+), assembly
should write `bx lr` in all cases, and linkers will transparently
convert them to `mov pc, lr`, allowing us to simply remove the
macro.
(source: https://groups.google.com/forum/#!topic/comp.sys.arm/3l7fVGX-Wug
and verified empirically)
For the armv4.S file, preserve this macro to maximize backwards
compatibility.
LTO can re-order top-level assembly blocks, which can cause this
macro definition to appear after its use (or not at all), causing
compilation failures. As the macro has very few uses, simply removing
it by inlining is a simple fix.
n.b. one of the macro invocations in strlen-stub.c was already
guarded by the relevant #define, so it is simply converted directly
to a pld
In the case of memcpy-armv7m.S being built for a big-endian multilib
(including armv7 without a specific profile), realignment code made
assumptions about the byte ordering being little-endian.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
strcmp.S contained invalid guard for code that used barrel-shifter optional
instruction - it was checking for !ARC601 instead of whether barrel shifter
is present. While it is true that ARC601 doesn't have barrel shifter, so
does some other ARC EM configurations.
2016-07-21 Anton Kolesov <Anton.Kolesov@synopsys.com>
* libc/machine/arc/strcmp.S: Fix big endian without barrel shifter.
Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Prealloc instruction may not be present in all HS variants. Hence, use
prefetch instead of prealloc.
newlib/
2016-04-26 Claudiu Zissulescu <claziss@synopsys.com>
* libc/machine/arc/memset-archs.S: Use prefetch.
Add makedocbook, a tool to process makedoc markup and output DocBook XML
refentries.
Process all the source files which are processed with makedoc with
makedocbook as well
Add chapter-texi2docbook, a tool to automatically generate DocBook XML
chapter files from the chapter .texi files. For generating man pages all we
care about is the content of the refentries, so all this needs to do is
convert the @include of the makedoc generated .def files to xi:include of
the makedocbook generated .xml files.
Add skeleton Docbook XML book files, lib[cm].in.xml which include these
generated chapters, which in turn include the generated files containing
refentries, which is processed with xsltproc to generate the lib[cm].xml
Add new make targets to generate and install man pages from lib[cm].xml
Add makedocbook, a tool to process makedoc markup and output DocBook XML
refentries.
Process all the source files which are processed with makedoc with
makedocbook as well
Add chapter-texi2docbook, a tool to automatically generate DocBook XML
chapter files from the chapter .texi files. For generating man pages all we
care about is the content of the refentries, so all this needs to do is
convert the @include of the makedoc generated .def files to xi:include of
the makedocbook generated .xml files.
Add skeleton Docbook XML book files, lib[cm].in.xml which include these
generated chapters, which in turn include the generated files containing
refentries, which is processed with xsltproc to generate the lib[cm].xml
Add new make targets to generate and install man pages from lib[cm].xml
Marcus Shawcroft wrote:
> This patch appears to have been munged by the mail system, can you
> repost as an attachment please.
Sure, I've attached the patch.
Wilco
Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it is the
fastest way to search for '\0', and use memchr with an infinite size for other cases.
This is 3x faster for large sizes.
ChangeLog:
2016-04-22 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libc/machine/aarch64/Makefile.in: Add rawmemchr.S and
rawmemchr-stub.c.
* newlib/libc/machine/aarch64/Makefile.am: Likewise.
* newlib/libc/machine/aarch64/rawmemchr.S (rawmemchr): Add rawmemchr.
* newlib/libc/machine/aarch64/rawmemchr-stub.c (rawmemchr): Likewise.
2016-04-18 Thomas Preud'homme <thomas.preudhomme@arm.com>
* libc/machine/arm/strlen-stub.c: Check capabilities of architecture
to decide which Thumb implementation to use and fall back to C
implementation for architecture not supporting Thumb mode.
* libc/machine/arm/strlen.S: Likewise.
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Newlib defines defaults for internal types via <sys/_types.h> and uses
<machine/_types.h> to let targets define their own type if necessary.
Previously for example
#ifndef __dev_t_defined
typedef short __dev_t;
#endif
However, the __*_t_defined pattern conflicts with the glibc type guard
pattern for user types, e.g. dev_t in this example. Introduce a
__machine_*_t_defined pattern for internal types (defined by
<machine/_types.h>, used by <sys/_types.h>). For example
#ifndef __machine_dev_t_defined
typedef short __dev_t;
#endif
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Intel MCU System V ABI are incompartible with i386 System V ABI:
o Minimum instruction set is Intel Pentium ISA minus x87 instructions
o No x87 or vector registers
o First three args are passed in %eax, %edx and %ecx
o Full specification available here:
https://github.com/hjl-tools/x86-psABI/wiki/iamcu-psABI-0.7.pdf
newlib/
* configure.host: Add new ix86-*-elfiamcu target
newlib/libc/include/
* setjmp.h: Change _JBLEN for Intel MCU target
newlib/libc/machine/i386/
* memchr.S: (memchr) Target-specific size-optimized version
* memcmp.S: (memcmp) Likewise
* memcpy.S: (memcpy) Likewise
* memmove.S: (memmove) Likewise
* memset.S: (memset) Likewise
* setjmp.S: (setjmp) Likewise
* strchr.S: (strchr) Likewise
* strlen.S: (strlen) Likewise
newlib/libc/stdlib/
* srtold.c: (__flt_rounds) Disable for Intel MCU
Prototypes also added for initstate() and setstate() but they
were not implemented in the shared newlib code.
* newlib/libc/include/cygwin/stdlib.h: Prototypes added.
* winsup/cygwin/include/cygwin/stdlib.h: Prototypes removed.
* newlib/libc/stdlib/random.c: New file.
* newlib/libc/machine/epiphany/machine/stdlib.h: Removed
* newlib/libc/stdlib/Makefile.am: Added random.c.
* newlib/libc/stdlib/stdlib.tex: Added random.def.
* newlib/libc/stdlib/Makefile.in: Regenerated.
libgloss:
* arm/Makefile.in: Add newlib/libc/machine/arm to the include path if
newlib is present.
* arm/arm.h: Include acle-compat.h.
(THUMB_V7_V6M): Rename to ...
(PREFER_THUMB): This. Use ACLE macros __ARM_ARCH_ISA_ARM instead of
__ARM_ARCH_6M__ to decide whether to define it.
(THUMB1_ONLY): Define for Thumb-1 only targets.
(THUMB_V7M_V6M): Rename to ...
(THUMB_VXM): This. Defined based on __ARM_ARCH_ISA_ARM, excluding
ARMv7.
* arm/crt0.S: Use THUMB1_ONLY rather than __ARM_ARCH_6M__,
!__ARM_ARCH_ISA_ARM rather than THUMB_V7M_V6M for fp enabling, and
PREFER_THUMB rather than THUMB_V7_V6M. Rename other occurences of
THUMB_V7M_V6M to THUMB_VXM.
* arm/linux-crt0.c: Likewise.
* arm/redboot-crt0.S: Likewise.
* arm/swi.h: Likewise.
* arm/trap.S: Likewise.
newlib:
* libc/machine/arm/memcpy-stub.c: Use ACLE macros __ARM_ARCH_ISA_THUMB
and __ARM_ARCH_ISA_ARM to check for Thumb-2 only targets rather than
__ARM_ARCH and __ARM_ARCH_PROFILE.
* libc/machine/arm/memcpy.S: Likewise.
* libc/machine/arm/setjmp.S: Likewise for Thumb-1 only target and
include acle-compat.h.
* libc/machine/arm/strcmp.S: Likewise for Thumb-1 and Thumb-2 only
target and include acle-compat.h.
* libc/sys/arm/arm.h: Include acle-compat.h.
(THUMB_V7_V6M): Rename to ...
(PREFER_THUMB): This. Use ACLE macro __ARM_ARCH_ISA_ARM instead of
__ARM_ARCH_6M__ to decide whether to define it.
(THUMB1_ONLY): Define for Thumb-1 only targets.
(THUMB_V7M_V6M): Rename to ...
(THUMB_VXM): This. Defined based on __ARM_ARCH_ISA_ARM, excluding
ARMv7.
* libc/sys/arm/crt0.S: Use PREFER_THUMB rather than THUMB_V7_V6M and
rename THUMB_V7M_V6M into THUMB_VXM.
* libc/sys/arm/swi.h: Likewise.
GCC for ARC has been updated to provide consistent naming of preprocessor
definitions for different optional architecture features:
* __ARC_BARREL_SHIFTER__ instead of __Xbarrel_shifter for
-mbarrel-shifter
* __ARC_LL64__ instead of __LL64__ for -mll64
* __ARCEM__ instead of __EM__ for -mcpu=arcem
* __ARCHS__ instead of __HS__ for -mcpu=archs
* etc (not used in newlib)
This patch updates assembly routines for ARC to use new definitions instead
of a deprecated ones. To ensure compatibility with older compiler new
definitions are also defined in asm.h if needed, based on deprecated
preprocessor definitions.
*** newlib/ChangeLog ***
2015-12-15 Anton Kolesov <Anton.Kolesov@synopsys.com>
* libc/machine/arc/asm.h: Define new GCC definition for old compiler.
* libc/machine/arc/memcmp-bs-norm.S: Use new GCC defines to detect
processor features.
* libc/machine/arc/memcmp.S: Likewise.
* libc/machine/arc/memcpy-archs.S: Likewise.
* libc/machine/arc/memcpy-bs.S: Likewise.
* libc/machine/arc/memcpy.S: Likewise.
* libc/machine/arc/memset-archs.S: Likewise.
* libc/machine/arc/memset-bs.S: Likewise.
* libc/machine/arc/memset.S: Likewise.
* libc/machine/arc/setjmp.S: Likewise.
* libc/machine/arc/strchr-bs-norm.S: Likewise.
* libc/machine/arc/strchr-bs.S: Likewise.
* libc/machine/arc/strchr.S: Likewise.
* libc/machine/arc/strcmp-archs.S: Likewise.
* libc/machine/arc/strcmp.S: Likewise.
* libc/machine/arc/strcpy-bs-arc600.S: Likewise.
* libc/machine/arc/strcpy-bs.S: Likewise.
* libc/machine/arc/strcpy.S: Likewise.
* libc/machine/arc/strlen-bs-norm.S: Likewise.
* libc/machine/arc/strlen-bs.S: Likewise.
* libc/machine/arc/strlen.S: Likewise.
* libc/machine/arc/strncpy-bs.S: Likewise.
* libc/machine/arc/strncpy.S: Likewise.
Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Reformulate the strcmp-armv7.S selection logic around the architecture
features required by the implementation code rather (some) version of
the architecture that expose those features.
The patch moves the inline ASM thumb2 -Os implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch moves the inline ASM thumb1 -O2 implementation out into its
own .S file.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
The patch adds strlen.S to contain the complementary preprocessor
logic to strlen-stub.c intended to provide #inclusion of alternative
.S implementations.
Initially we just include the existing strlen-armv7.S implementation.
We rewrite _ISA_ARMV7 in both strlen.S and strlen-stub.c to use the
underlying existing underlying defintion from arm_asm.h in order to
avoide including that file, this is in effect the first step towards a
move to ACLE predefines only.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
In order to maintain consistency both within machine/arm and between
machine/arm and machine/aarch64, rename the 'c' stub to -stub.c.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
This patch flattens the condition code selection used in strlen in an
attempt to make the guarding condition for each alternative
implementation clearer and to structure the logic in a manner that
makes it easier to maintain complementary logic between the
alternative 'C' and assembler implementations.
Tested by building newlib and comparing libc.a binaries before and
after for all permutations of:
Architectures:
armv4 armv4t armv5 armv5t armv5te armv6 armv6j armv6k
armv6z armv6kz armv6t2 armv6-m armv6s-m armv7 armv7-a
armv7ve armv7-r armv7-m armv7e-m armv8-a iwmmxt iwmmxt2
ISAs:
thumb arm
Optimization Levels:
Os O2
Excluding:
armv6s-m -mthumb
armv6-m -mthumb
armv6zk -mthumb
armv6z -mthumb
armv6k -mthumb
armv6j -mthumb
ARM newlib has various strcmp implementations that use .cfi_*
directives to generate unwind information.
The effect of this is that the generated objects contain .eh_frame
sections. However, ARM uses its own unwind info format, not
.eh_frame, which is generated by ARM-specific directives, not .cfi_*.
The .eh_frame sections are useless, but also not removed by strip and
may be loaded into memory at runtime.
This patch fixes this by using .cfi_sections .debug_frame (as in
glibc) so that the directives generate .debug_frame instead.
.debug_frame is useful for the debugger, can be removed by strip, and
is not loaded into memory at runtime.
* libc/machine/arm/strcmp-arm-tiny.S: Use .cfi_sections
.debug_frame.
* libc/machine/arm/strcmp-armv4.S: Likewise.
* libc/machine/arm/strcmp-armv4t.S: Likewise.
* libc/machine/arm/strcmp-armv6.S: Likewise.
* libc/machine/arm/strcmp-armv6m.S: Likewise.
* libc/machine/arm/strcmp-armv7.S: Likewise.
* libc/machine/arm/strcmp-armv7m.S: Likewise.
The patch cleans up the auto configury mechanism used to select
different implementations of memchr for various architecture versions.
The approach here is to remove the selection of memchr within automake
and instead use complimentary logic in memchr-stub.c and memchr.S to
choose between the gerneric memchr.c implementation or one of the
architecture specific implementations.
This patch also changes the selection criteria inline with the
previous proposal here:
https://sourceware.org/ml/newlib/2015/msg00752.html
but using the ACLE predefines.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This patch cleans up the auto configury mechanism used to select
different implementations of memcpy for various architecture versions.
The approach here is to remove the selection of memcpy within automake
and instead use complimentary logic in memcpy-stub.c and memcpy.S to
choose between the generic memcpy.c implemenation or one of the
architecture specific memcpy*.S implemenations.
Regressed for armv7-a armv5 armv8-a, correct selection of memcpy
implementation by manual inspection of a test program built for these
three architectures.
This revised patch flips the remaining preprocessor logic in
memcpy-stub.c to use ACLE defines as requested in the previous review
and removes the now disused HAVE_ARMV7A and HAVE_ARMV8A configure.in
support.
The newlib configury logic that detects architecture version and
chooses an appropriate memcpy implementation does not consider
ARMv8-a.
This patch adds configury logic to detect ARMv8-a along with the
associated changes in Makefile.am and memcpy.
Hi!
I've got the situation, that the function strlen() occurs twice in libc.a
(building newlib for ARM-V7a and Size-Optimized).
In newlib/libc/machine/arm/strlen.c there are the pre-processor stetements ...
#if defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__))
/*...*/
#else
#if !(defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__))
/*...*/
#endif
and in newlib/libc/machine/arm/strlen-armv7.S the "exclude" begins with
/* NOTE: This ifdef MUST match the ones in arm/strlen.c
We fallback to the one in arm/strlen.c for size optimised or
for older architectures. */
#if defined(_ISA_ARM_7) || defined(__ARM_ARCH_6T2__) && \
!(defined (__OPTIMIZE_SIZE__) || defined (PREFER_SIZE_OVER_SPEED) || \
(defined (__thumb__) && !defined (__thumb2__)))
But this is not completely contrary to arm/strlen.c (see above)!
To fix the logical statement in arm/strlen-armv7.S there are parentheses needed
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
This is an optimized memset for AArch64. Memset is split into 4 main
cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
fully unrolled. Large memsets of more than 96 bytes align the
destination and use an unrolled loop processing 64 bytes per
iteration. Memsets of zero of more than 256 use the dc zva
instruction, and there are faster versions for the common ZVA sizes 64
or 128. STP of Q registers is used to reduce codesize without loss of
performance.
This is an optimized memcpy for AArch64. Copies are split into 3 main
cases: small copies of up to 16 bytes, medium copies of 17..96 bytes
which are fully unrolled. Large copies of more than 96 bytes align
the destination and use an unrolled loop processing 64 bytes per
iteration. In order to share code with memmove, small and medium
copies read all data before writing, allowing any kind of overlap. On
a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.
This is an optimized memmove for AArch64. All copies of up to 96
bytes and all backward copies are done by the new memcpy. The only
remaining case is large forward copies which are done in the same way
as the memcpy loop, but copying from the end rather than the start.
improvements. Adjust to allow building as stpcpy.
* libc/machine/aarch64/stpcpy.S: New file.
* libc/machine/aarch64/stpcpy-stub.c: New file.
* libc/machine/aarch64/Makefile.am (lib_a_SOURCES): Build stpcpy.
* libc/machine/aarch64/Makefile.in: Regenerated.
* libc/machine/aarch64/strrchr-stub.c: New file.
* libc/machine/aarch64/Makefile.am: Add them to build list.
* libc/machine/aarch64/Makefile.in: Regenerated.
from the 64-bit _JBTYPE definition.
* libc/machine/mips/setjmp.S: Re-work the o32 FP64 support to match
the now one-and-only supported o32 FP64 ABI extension. Also
support o32 FPXX.