Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.
This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp in glibc.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark in glibc is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
- From: Cesar Philippidis <cesar@codesourcery.com>
Date: Tue, 10 Apr 2018 14:43:42 -0700
Subject: [PATCH] nvptx port
This port adds support for Nvidia GPU's, which are primarily used as
offload accelerators in OpenACC and OpenMP.
At least with Binutils 2.30 and GCC 7.3 we need symbol definitions
without the leading underscore.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
- For prevent confuse about what BSD license variant we used, 2- or
3-clause license, we change the license to FreeBSD license to make
it unambiguously refers to the 2-clause license.
With this change the arm platform can now be fully compiled with Clang.
Tested by comparing the output with GCC 4.8.2, and Clang 4.0, using a
variety of arches, big/little endianness, and arm/thumb mode to verify
the generated assembly output matches between GCC vs Clang with UAL, and
also GCC with UAL vs GCC with non-UAL, for all preprocessor code blocks.
The only difference found is an extra nop at the end of the function
when compiled with GCC using armv7-a/thumb/little-endian/-O2 compared to
Clang. The nop is not emitted when compiled in big-endian mode.
The implementation of the POSIX access() function is nothing machine
specific like memcpy(), etc. Move it back to the system domain. This
avoids problems due to the include search order of the Newlib/GCC build
which picks up machine includes before system includes.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
In patch b219285f87 you have a syntax
error in the PLD instruction. The syntax for the pld argument should be
in square brackets as it's a memory address like so: pld [r1]. With
your patch the newlib build fails for armv7-a targets. This patch fixes
the build failures.
Tested by making sure the newlib build completes successfully.
2016-01-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* libc/machine/arm/strcpy.c (strcpy): Fix PLD assembly syntax.
* libc/machine/arm/strlen-stub.c (strlen): Likewise.
LTO can re-order top-level assembly blocks, which can cause this
macro definition to appear after its use (or not at all), causing
compilation failures. On modern toolchains (armv4t+), assembly
should write `bx lr` in all cases, and linkers will transparently
convert them to `mov pc, lr`, allowing us to simply remove the
macro.
(source: https://groups.google.com/forum/#!topic/comp.sys.arm/3l7fVGX-Wug
and verified empirically)
For the armv4.S file, preserve this macro to maximize backwards
compatibility.
LTO can re-order top-level assembly blocks, which can cause this
macro definition to appear after its use (or not at all), causing
compilation failures. As the macro has very few uses, simply removing
it by inlining is a simple fix.
n.b. one of the macro invocations in strlen-stub.c was already
guarded by the relevant #define, so it is simply converted directly
to a pld
In the case of memcpy-armv7m.S being built for a big-endian multilib
(including armv7 without a specific profile), realignment code made
assumptions about the byte ordering being little-endian.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
strcmp.S contained invalid guard for code that used barrel-shifter optional
instruction - it was checking for !ARC601 instead of whether barrel shifter
is present. While it is true that ARC601 doesn't have barrel shifter, so
does some other ARC EM configurations.
2016-07-21 Anton Kolesov <Anton.Kolesov@synopsys.com>
* libc/machine/arc/strcmp.S: Fix big endian without barrel shifter.
Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Prealloc instruction may not be present in all HS variants. Hence, use
prefetch instead of prealloc.
newlib/
2016-04-26 Claudiu Zissulescu <claziss@synopsys.com>
* libc/machine/arc/memset-archs.S: Use prefetch.
Add makedocbook, a tool to process makedoc markup and output DocBook XML
refentries.
Process all the source files which are processed with makedoc with
makedocbook as well
Add chapter-texi2docbook, a tool to automatically generate DocBook XML
chapter files from the chapter .texi files. For generating man pages all we
care about is the content of the refentries, so all this needs to do is
convert the @include of the makedoc generated .def files to xi:include of
the makedocbook generated .xml files.
Add skeleton Docbook XML book files, lib[cm].in.xml which include these
generated chapters, which in turn include the generated files containing
refentries, which is processed with xsltproc to generate the lib[cm].xml
Add new make targets to generate and install man pages from lib[cm].xml
Add makedocbook, a tool to process makedoc markup and output DocBook XML
refentries.
Process all the source files which are processed with makedoc with
makedocbook as well
Add chapter-texi2docbook, a tool to automatically generate DocBook XML
chapter files from the chapter .texi files. For generating man pages all we
care about is the content of the refentries, so all this needs to do is
convert the @include of the makedoc generated .def files to xi:include of
the makedocbook generated .xml files.
Add skeleton Docbook XML book files, lib[cm].in.xml which include these
generated chapters, which in turn include the generated files containing
refentries, which is processed with xsltproc to generate the lib[cm].xml
Add new make targets to generate and install man pages from lib[cm].xml
Marcus Shawcroft wrote:
> This patch appears to have been munged by the mail system, can you
> repost as an attachment please.
Sure, I've attached the patch.
Wilco
Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it is the
fastest way to search for '\0', and use memchr with an infinite size for other cases.
This is 3x faster for large sizes.
ChangeLog:
2016-04-22 Wilco Dijkstra <wdijkstr@arm.com>
* newlib/libc/machine/aarch64/Makefile.in: Add rawmemchr.S and
rawmemchr-stub.c.
* newlib/libc/machine/aarch64/Makefile.am: Likewise.
* newlib/libc/machine/aarch64/rawmemchr.S (rawmemchr): Add rawmemchr.
* newlib/libc/machine/aarch64/rawmemchr-stub.c (rawmemchr): Likewise.
2016-04-18 Thomas Preud'homme <thomas.preudhomme@arm.com>
* libc/machine/arm/strlen-stub.c: Check capabilities of architecture
to decide which Thumb implementation to use and fall back to C
implementation for architecture not supporting Thumb mode.
* libc/machine/arm/strlen.S: Likewise.
Introduce <machine/_endian.h> to let target based customization of
<machine/endian.h> via
* _LITTLE_ENDIAN,
* _BIG_ENDIAN,
* _PDP_ENDIAN, and
* _BYTE_ORDER.
defines. Add definitions expected by FreeBSD to
<machine/endian.h> like
* _QUAD_HIGHWORD,
* _QUAD_LOWWORD,
* __bswap16(),
* __bswap32(),
* __bswap64(),
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs().
Also, if __BSD_VISIBLE
* LITTLE_ENDIAN,
* BIG_ENDIAN,
* PDP_ENDIAN, and
* BYTE_ORDER.
Targets that define __machine_host_to_from_network_defined in
<machine/_endian.h> must provide their own implementation of
* __htonl(),
* __htons(),
* __ntohl(), and
* __ntohs(),
otherwise a default implementation is provided by <machine/endian.h>.
In case of GCC defines to builtins are used.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Newlib defines defaults for internal types via <sys/_types.h> and uses
<machine/_types.h> to let targets define their own type if necessary.
Previously for example
#ifndef __dev_t_defined
typedef short __dev_t;
#endif
However, the __*_t_defined pattern conflicts with the glibc type guard
pattern for user types, e.g. dev_t in this example. Introduce a
__machine_*_t_defined pattern for internal types (defined by
<machine/_types.h>, used by <sys/_types.h>). For example
#ifndef __machine_dev_t_defined
typedef short __dev_t;
#endif
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Intel MCU System V ABI are incompartible with i386 System V ABI:
o Minimum instruction set is Intel Pentium ISA minus x87 instructions
o No x87 or vector registers
o First three args are passed in %eax, %edx and %ecx
o Full specification available here:
https://github.com/hjl-tools/x86-psABI/wiki/iamcu-psABI-0.7.pdf
newlib/
* configure.host: Add new ix86-*-elfiamcu target
newlib/libc/include/
* setjmp.h: Change _JBLEN for Intel MCU target
newlib/libc/machine/i386/
* memchr.S: (memchr) Target-specific size-optimized version
* memcmp.S: (memcmp) Likewise
* memcpy.S: (memcpy) Likewise
* memmove.S: (memmove) Likewise
* memset.S: (memset) Likewise
* setjmp.S: (setjmp) Likewise
* strchr.S: (strchr) Likewise
* strlen.S: (strlen) Likewise
newlib/libc/stdlib/
* srtold.c: (__flt_rounds) Disable for Intel MCU
Prototypes also added for initstate() and setstate() but they
were not implemented in the shared newlib code.
* newlib/libc/include/cygwin/stdlib.h: Prototypes added.
* winsup/cygwin/include/cygwin/stdlib.h: Prototypes removed.
* newlib/libc/stdlib/random.c: New file.
* newlib/libc/machine/epiphany/machine/stdlib.h: Removed
* newlib/libc/stdlib/Makefile.am: Added random.c.
* newlib/libc/stdlib/stdlib.tex: Added random.def.
* newlib/libc/stdlib/Makefile.in: Regenerated.