This patch significantly improves performance of memmem using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.
By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.
Small needles up to size 2 use a dedicated linear search. Very long needles
use the Two-Way algorithm (to avoid increasing stack size inlining is now disabled).
The performance gain is 6.6 times on English text on AArch64 using random
needles with average size 8 (this is even faster than the recently improved strstr
algorithm, so I'll update that in the near future).
The size-optimized memmem has also been rewritten from scratch to get a
2.7x performance gain.
Tested against GLIBC testsuite and randomized tests.
Message-Id: <DB5PR08MB1030649D051FA8532A4512C883B20@DB5PR08MB1030.eurprd08.prod.outlook.com>
v3: Add support for read ahead using strnlen, giving an additional 25% speedup
on large inputs (both short and long needles).
This patch significantly improves performance of strstr by using Sunday's
Quick-Search algorithm. Due to its simplicity it has the best average
performance of string matching algorithms on almost all inputs. It uses a
bad-character shift table to skip past mismatches.
The needle length is limited to 254 - this reduces the shift table memory
4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
The limit also implies its worst-case performance is linear.
Larger needles are processed by the Two-Way algorithm. The macro AVAILABLE
has been improved to use strnlen to read the input in chunks. This results
in a 2.5 times speedup for large needles, reducing the performance drop when
the Quick-Search algorithm can't be used.
The code for 1-4 byte needles has been simplified and now uses unsigned
char. Since the optimized code relies on 8-bit chars, we defer to the
size-optimized implementation if CHAR_BIT > 8.
The performance gain of finding a set of randomly chosen words of size 8 in
256 bytes of English text is 14 times on AArch64. For longer haystacks the
gain is well over 20 times.
The size-optimized strstr has also been rewritten from scratch to improve
performance. On the same test the performance gain is 69%.
Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
--
Improve strstr performance for the common case of short needles. For a single
character strchr is best, for 2-4 characters a small loop is fastest. For these
the speedup over the Two-Way algorithm is ~10 times on large strings.
Newlib builds, the new code passes GLIBC testsuite. OK for commit?
This macro selects a compiler option that disables recognition of
common memset/memcpy patterns and converting those to direct
memset/memcpy calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Scripts do not try to acquire Unicode data by best-effort magic anymore.
Options supported:
-h for help
-i to copy Unicode data from /usr/share/unicode/ucd first
-u to download Unicode data from unicode.org first
If (despite of -i or -u if given) the necessary Unicode files are not
available locally, table generation is skipped, but no error code is
returned, so not to obstruct the build process if called from a Makefile.
Discard QUICKREF sections, rather than writing them to stderr
Discard MATHREF sections, rather than discarding as an error
Pass NOTES sections through to texinfo, rather than discarding as an error
Don't redirect makedoc stderr to .ref file
Remove makedoc output on error
Remove .ref files from CLEANFILES
Regenerate Makefile.ins
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Use memset() to implement bzero() to profit from machine-specific
memset() optimizations.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
A few files were missing headers for memset/malloc, likely missed
because the files don't directly call the functions, rather they
come in via macros in libc/include/sys/reent.h:
#define _REENT_CHECK(var, what, type, size, init) do { \
struct _reent *_r = (var); \
if (_r->what == NULL) { \
_r->what = (type)malloc(size); \
#define _REENT_CHECK_ASCTIME_BUF(var) \
_REENT_CHECK(var, _asctime_buf, char *, _REENT_ASCTIME_SIZE, \
memset((var)->_asctime_buf, 0, _REENT_ASCTIME_SIZE))
Without these fixes, implicit function signatures are provided,
which gcc warns could cause aliasing issues down the line:
../../../../../../../newlib-2.5.0/newlib/libc/time/asctime.c:62:3: warning: type of 'memset' does not match original declaration [-Wlto-type-mismatch]
/Volumes/code/external/newlib-cygwin/newlib/libc/include/string.h:29:7: note: return value type mismatch
_PTR _EXFUN(memset,(_PTR, int, size_t));
^
/Volumes/code/external/newlib-cygwin/newlib/libc/include/string.h:29:7: note: 'memset' was previously declared here
/Volumes/code/external/newlib-cygwin/newlib/libc/include/string.h:29:7: note: code may be misoptimized unless -fno-strict-aliasing is used
../../../../../../../newlib-2.5.0/newlib/libc/time/asctime.c:62:3: warning: type of 'malloc' does not match original declaration [-Wlto-type-mismatch]
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: return value type mismatch
extern _PTR malloc _PARAMS ((size_t));
^
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: 'malloc' was previously declared here
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: code may be misoptimized unless -fno-strict-aliasing is used
../../../../../../../newlib-2.5.0/newlib/libc/time/lcltime.c:58:3: warning: type of 'malloc' does not match original declaration [-Wlto-type-mismatch]
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: return value type mismatch
extern _PTR malloc _PARAMS ((size_t));
^
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: 'malloc' was previously declared here
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: code may be misoptimized unless -fno-strict-aliasing is used
../../../../../../../newlib-2.5.0/newlib/libc/string/strsignal.c:70:3: warning: type of 'malloc' does not match original declaration [-Wlto-type-mismatch]
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: return value type mismatch
extern _PTR malloc _PARAMS ((size_t));
^
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: 'malloc' was previously declared here
/Volumes/code/external/newlib-cygwin/newlib/libc/include/malloc.h:37:13: note: code may be misoptimized unless -fno-strict-aliasing is used
Including the proper headers elminates the implicit function
signatures and these warnings.
Now that __locale_cjk_lang is an inline function in setlocale.h and
setlocale.h is included, the declaration doesn't make sense.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
This function is used by LibreSSL and OpenSSH and is provided by the
OpenBSD libc.
* libc/include/string.h (timingsafe_memcmp): Declare.
* libc/string/timingsafe_memcmp.c: New file.
* libc/string/Makefile.am: Add new file.
* libc/string/Makefile.in: Regenerate.
This function is used by LibreSSL and OpenSSH and is provided by the
OpenBSD libc.
* libc/include/string.h (timingsafe_bcmp): Declare.
* libc/string/timingsafe_bcmp.c: New file.
* libc/string/Makefile.am: Add new file.
* libc/string/Makefile.in: Regenerate.
This function is used by LibreSSL and OpenSSH and is provided by the
OpenBSD libc.
* libc/include/string.h (explicit_bzero): Declare.
* libc/string/explicit_bzero.c: New file.
* libc/string/Makefile.am: Add new file.
* libc/string/Makefile.in: Regenerate.
These source files have makedoc markup, but aren't listed to be chewed by
makedoc. I am assuming that is accidental.
Future work: Note that stdio/fseeko.c, stdio/ftello.c and common/s_isnand.c have
makedoc markup, but duplicate stdio/fseek.c, stdio/ftell.c and common/s_isnan.c
respectively.
2015-06-23 Jon Turney <jon.turney@dronecode.org.uk>
* libc/ctype/Makefile.am (CHEWOUT_FILES): Add isblank.def.
* libc/ctype/ctype.tex: Include isblank and add to menu.
* libc/posix/Makefile.am (CHEWOUT_FILES): Add posix_spawn.def.
* libc/posix/posix.tex: Include posix_spawn and add to menu.
* libc/stdio64/Makefile.am (CHEWOUT_FILES): Add fdopen.def.
* libc/stdio64/stdio64.tex: Include fdopen64 and add to menu.
* libc/stdio64/fdopen64.c: Improve one-line description.
* libc/string/Makefile.am (CHEWOUT_FILES): Add strchrnul.def.
* libc/string/strings.tex: Include strchrnul and add to menu.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>