Commit Graph

175 Commits

Author SHA1 Message Date
Thomas Wolff 204ee3cf6a fix and amend scripts and makefile rules to generate Unicode data 2021-07-06 15:35:37 +02:00
Thomas Wolff 11fdae24b7 update to Unicode 13.0 2021-07-06 15:35:37 +02:00
Corinna Vinschen 80bd01ef83 Add build mechanism to share common header files between machines
So far the build mechanism in newlib only allowed to either define
machine-specific headers, or headers shared between all machines.
In some cases, architectures are sufficiently alike to share header
files between them, but not with other architectures.  A good example
is ix86 vs. x86_64, which share certain traits with each other, but
not with other architectures.

Introduce a new configure variable called "shared_machine_dir".  This
dir can then be used for headers shared between architectures.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2021-04-13 12:55:33 +02:00
Thomas Wolff c8204b1069 Locale modifier "@cjksingle" to enforce single-width CJK width.
This option follows a proposal in the Terminals Working Group Specifications
(https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9#note_406682).
It makes locale width consistent with the corresponding mintty feature.
2020-02-18 11:35:42 +01:00
Giacomo Tesio 6aaaa2e768 memmem.c and strstr.c: do not require -std=c99 2019-08-14 10:39:37 +02:00
Thomas Wolff 41397e13ce update to Unicode 11.0 2019-01-13 23:33:51 +01:00
Wilco Dijkstra 353ebae304 Improve performance of memmem
This patch significantly improves performance of memmem using a novel
modified Horspool algorithm.  Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 2 use a dedicated linear search.  Very long needles
use the Two-Way algorithm (to avoid increasing stack size inlining is now disabled).

The performance gain is 6.6 times on English text on AArch64 using random
needles with average size 8 (this is even faster than the recently improved strstr
algorithm, so I'll update that in the near future).

The size-optimized memmem has also been rewritten from scratch to get a
2.7x performance gain.

Tested against GLIBC testsuite and randomized tests.

Message-Id: <DB5PR08MB1030649D051FA8532A4512C883B20@DB5PR08MB1030.eurprd08.prod.outlook.com>
2019-01-01 09:44:59 -06:00
Wilco Dijkstra 473f1a3a5d Improve performance of strstr
v3: Add support for read ahead using strnlen, giving an additional 25% speedup
on large inputs (both short and long needles).

This patch significantly improves performance of strstr by using Sunday's
Quick-Search algorithm.  Due to its simplicity it has the best average
performance of string matching algorithms on almost all inputs.  It uses a
bad-character shift table to skip past mismatches.

The needle length is limited to 254 - this reduces the shift table memory
4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
The limit also implies its worst-case performance is linear.

Larger needles are processed by the Two-Way algorithm.  The macro AVAILABLE
has been improved to use strnlen to read the input in chunks.  This results
in a 2.5 times speedup for large needles, reducing the performance drop when
the Quick-Search algorithm can't be used.

The code for 1-4 byte needles has been simplified and now uses unsigned
char.  Since the optimized code relies on 8-bit chars, we defer to the
size-optimized implementation if CHAR_BIT > 8.

The performance gain of finding a set of randomly chosen words of size 8 in
256 bytes of English text is 14 times on AArch64. For longer haystacks the
gain is well over 20 times.

The size-optimized strstr has also been rewritten from scratch to improve
performance.  On the same test the performance gain is 69%.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

--
2018-10-18 19:51:39 +02:00
Wilco Dijkstra 6dbb20dfc7 Improve strstr performance of short needles
Improve strstr performance for the common case of short needles.  For a single
character strchr is best, for 2-4 characters a small loop is fastest.  For these
the speedup over the Two-Way algorithm is ~10 times on large strings.

Newlib builds, the new code passes GLIBC testsuite. OK for commit?
2018-09-05 10:09:31 +02:00
Keith Packard 82dfae9ab0 Use __inhibit_loop_to_libcall in all memset/memcpy implementations
This macro selects a compiler option that disables recognition of
common memset/memcpy patterns and converting those to direct
memset/memcpy calls.

Signed-off-by: Keith Packard <keithp@keithp.com>
2018-08-29 16:05:37 +02:00
Thomas Wolff 44d90834fb fix/enhance Unicode table generation scripts
Scripts do not try to acquire Unicode data by best-effort magic anymore.
Options supported:
-h for help
-i to copy Unicode data from /usr/share/unicode/ucd first
-u to download Unicode data from unicode.org first
If (despite of -i or -u if given) the necessary Unicode files are not
available locally, table generation is skipped, but no error code is
returned, so not to obstruct the build process if called from a Makefile.
2018-03-14 10:44:32 +01:00
Thomas Wolff 37132125bc width data generation 2018-03-12 10:17:20 +01:00
Thomas Wolff 8e8fd6c849 use generated width data 2018-03-12 10:17:20 +01:00
Thomas Wolff 71291047e2 generated width data, Unicode 10.0
These tables provide character width properties for use by the
wcwidth/wcswidth functions. They are generated from Unicode.
2018-03-12 10:17:20 +01:00
Yaakov Selkowitz 70ee6b17df ansification: remove _EXFUN, _EXFUN_NOTHROW
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:29 -06:00
Yaakov Selkowitz 9087163804 ansification: remove _DEFUN
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:26 -06:00
Yaakov Selkowitz 670b01da7f ansification: remove _CAST_VOID
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:17 -06:00
Yaakov Selkowitz e6321aa6a6 ansification: remove _PTR
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:16 -06:00
Yaakov Selkowitz eea249da3b ansification: remove _PARAMS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:13 -06:00
Yaakov Selkowitz 0bda30e1ff ansification: remove _CONST
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:08 -06:00
Yaakov Selkowitz 6783860a2e ansification: remove _AND
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:05 -06:00
Jon Turney c006fd459f makedoc: make errors visible
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
2017-12-07 11:54:11 +00:00
Yaakov Selkowitz 352c8f2f0d string: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:52 -06:00
Yaakov Selkowitz 8a94bca694 string: add wmempcpy
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-11-30 04:06:49 -06:00
Sebastian Huber 1592a0be0c Fix warnings and documentation in strnstr.c
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
2017-09-19 15:35:09 -05:00
Corinna Vinschen 192986ab03 newlib: string/Makefile.am (CHEWOUT_FILES): Add strnstr.def
Regenerate strings/Makefile.in

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2017-08-30 16:48:55 +02:00
Corinna Vinschen 5fc315b597 newlib: strnstr: drop traditional synopisis
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2017-08-30 16:45:36 +02:00
Sichen Zhao 42885ea4b8 Add man page entry for strnstr.c. 2017-08-30 15:10:07 +02:00
Sichen Zhao f22054c94d Modify strnstr.c. 2017-08-30 15:08:58 +02:00
Corinna Vinschen c070326d31 newlib: rebuild string/Makefile.in
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2017-08-25 18:00:46 +02:00
Sichen Zhao c206d04422 Port strnstr.c to newlib. 2017-08-25 18:00:46 +02:00
Sichen Zhao 3437665ac8 Import strnstr.c from FreeBSD. 2017-08-25 18:00:46 +02:00
Sebastian Huber 461152e4eb Add ffsl(), ffsll(), fls(), flsl(), flsll()
Use compiler builtin for ffs().  Remove duplicate implementation from
Cygwin.

Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
2017-07-05 13:49:48 +02:00
Sebastian Huber d736941a51 Implement bzero() via memset()
Use memset() to implement bzero() to profit from machine-specific
memset() optimizations.

Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
2017-07-05 13:49:48 +02:00
Yaakov Selkowitz ec86124748 string: fix strverscmp doc inclusion
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-06-19 11:52:02 -05:00
Yaakov Selkowitz 59e09b6419 string: add strverscmp
The actual implementation is from musl (MIT license).

Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-06-19 08:16:42 -05:00
Joel Sherrill 33c7b2b544 libc/string/strsignal.c: Use of || not && lead to dead code.
Coverity Id: 175333
2017-03-15 12:04:34 -05:00
Pat Pannuto e02866a1b4 Add missing headers to fix implicit function defns
A few files were missing headers for memset/malloc, likely missed
because the files don't directly call the functions, rather they
come in via macros in libc/include/sys/reent.h:

    #define _REENT_CHECK(var, what, type, size, init) do { \
      struct _reent *_r = (var); \
      if (_r->what == NULL) { \
        _r->what = (type)malloc(size); \

    #define _REENT_CHECK_ASCTIME_BUF(var) \
      _REENT_CHECK(var, _asctime_buf, char *, _REENT_ASCTIME_SIZE, \
        memset((var)->_asctime_buf, 0, _REENT_ASCTIME_SIZE))

Without these fixes, implicit function signatures are provided,
which gcc warns could cause aliasing issues down the line:

    ../../../../../../../newlib-2.5.0/newlib/libc/time/asctime.c:62:3: warning: type of 'memset' does not match original declaration [-Wlto-type-mismatch]
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/string.h:29:7: note: return value type mismatch
     _PTR  _EXFUN(memset,(_PTR, int, size_t));
           ^
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/string.h:29:7: note: 'memset' was previously declared here
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/string.h:29:7: note: code may be misoptimized unless -fno-strict-aliasing is used
    ../../../../../../../newlib-2.5.0/newlib/libc/time/asctime.c:62:3: warning: type of 'malloc' does not match original declaration [-Wlto-type-mismatch]
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: return value type mismatch
     extern _PTR malloc _PARAMS ((size_t));
                 ^
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: 'malloc' was previously declared here
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: code may be misoptimized unless -fno-strict-aliasing is used

    ../../../../../../../newlib-2.5.0/newlib/libc/time/lcltime.c:58:3: warning: type of 'malloc' does not match original declaration [-Wlto-type-mismatch]
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: return value type mismatch
     extern _PTR malloc _PARAMS ((size_t));
                 ^
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: 'malloc' was previously declared here
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: code may be misoptimized unless -fno-strict-aliasing is used

    ../../../../../../../newlib-2.5.0/newlib/libc/string/strsignal.c:70:3: warning: type of 'malloc' does not match original declaration [-Wlto-type-mismatch]
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: return value type mismatch
     extern _PTR malloc _PARAMS ((size_t));
                 ^
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: 'malloc' was previously declared here
    /Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: code may be misoptimized unless -fno-strict-aliasing is used

Including the proper headers elminates the implicit function
signatures and these warnings.
2017-01-16 10:14:28 +01:00
Jeff Johnston 61f181d6b8 Bump release to 2.5.0 for yearly snapshot. 2016-12-22 21:33:54 -05:00
Corinna Vinschen 091a0ac120 Fix typo in strerror doc
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2016-08-31 13:43:19 +02:00
Eric Blake ec0117b6e1 strerror_l: Fix copy-and-paste typo
Signed-off-by: Eric Blake <eblake@redhat.com>
2016-08-23 14:36:06 -05:00
Corinna Vinschen 7501249d5e Mention strerror_l in libc/string/strings.tex
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2016-08-23 18:15:55 +02:00
Corinna Vinschen 463a8afaa5 Implement missing POSIX-1.2008 function strerror_l
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2016-08-23 17:51:14 +02:00
Corinna Vinschen 5085ce2106 Remove extern declaration of __locale_cjk_lang in string/local.h
Now that __locale_cjk_lang is an inline function in setlocale.h and
setlocale.h is included, the declaration doesn't make sense.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
2016-08-21 12:14:32 +02:00
Corinna Vinschen c1b7d9d93d Implement per-locale string functions
strcasecmp_l, strcoll_l, strncasecmp_l, strxfrm_l,
wcscasecmp_l, wcscoll_l, wcstrncasecmp_l, wcstrxfrm_l,
strftime_l.

Add missing CHEWOUT_FILES from previous patch.

TODO: strfmon_l.

Signed-off by: Corinna Vinschen <corinna@vinschen.de>
2016-08-15 10:56:58 +02:00
Jon Turney d7e47a557e Regenerate newlib Makefiles 2016-07-04 17:13:55 +01:00
Sebastian Huber 55c239d834 Add timingsafe_memcmp()
This function is used by LibreSSL and OpenSSH and is provided by the
OpenBSD libc.

	* libc/include/string.h (timingsafe_memcmp): Declare.
	* libc/string/timingsafe_memcmp.c: New file.
	* libc/string/Makefile.am: Add new file.
	* libc/string/Makefile.in: Regenerate.
2016-03-18 12:33:40 +01:00
Sebastian Huber 559fd77dda Add timingsafe_bcmp()
This function is used by LibreSSL and OpenSSH and is provided by the
OpenBSD libc.

	* libc/include/string.h (timingsafe_bcmp): Declare.
	* libc/string/timingsafe_bcmp.c: New file.
	* libc/string/Makefile.am: Add new file.
	* libc/string/Makefile.in: Regenerate.
2016-03-18 12:33:40 +01:00
Sebastian Huber 8740fa7fd0 Add explicit_bzero()
This function is used by LibreSSL and OpenSSH and is provided by the
OpenBSD libc.

	* libc/include/string.h (explicit_bzero): Declare.
	* libc/string/explicit_bzero.c: New file.
	* libc/string/Makefile.am: Add new file.
	* libc/string/Makefile.in: Regenerate.
2016-03-18 12:33:40 +01:00
Yaakov Selkowitz 8b8952064c Fix compile with GCC 5 -Werror
newlib/libc/
	* stdio64/freopen64.c: Include <string.h> for memset().
	* stdlib/quick_exit.c: Include <unistd.h> for _exit().
	* string/gnu_basename.c (__gnu_basename): Fix discarded const
	qualifier warning.
	* stdlib/strtold.c: Include "mprec.h" for _strtorx_r().

Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2016-02-12 10:16:06 -06:00