Commit Graph

9 Commits

Author SHA1 Message Date
Giacomo Tesio 8d81fcc18e memmem.c and strstr.c: do not require -std=c99 2020-10-14 10:53:58 -04:00
Wilco Dijkstra 473f1a3a5d Improve performance of strstr
v3: Add support for read ahead using strnlen, giving an additional 25% speedup
on large inputs (both short and long needles).

This patch significantly improves performance of strstr by using Sunday's
Quick-Search algorithm.  Due to its simplicity it has the best average
performance of string matching algorithms on almost all inputs.  It uses a
bad-character shift table to skip past mismatches.

The needle length is limited to 254 - this reduces the shift table memory
4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
The limit also implies its worst-case performance is linear.

Larger needles are processed by the Two-Way algorithm.  The macro AVAILABLE
has been improved to use strnlen to read the input in chunks.  This results
in a 2.5 times speedup for large needles, reducing the performance drop when
the Quick-Search algorithm can't be used.

The code for 1-4 byte needles has been simplified and now uses unsigned
char.  Since the optimized code relies on 8-bit chars, we defer to the
size-optimized implementation if CHAR_BIT > 8.

The performance gain of finding a set of randomly chosen words of size 8 in
256 bytes of English text is 14 times on AArch64. For longer haystacks the
gain is well over 20 times.

The size-optimized strstr has also been rewritten from scratch to improve
performance.  On the same test the performance gain is 69%.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

--
2018-10-18 19:51:39 +02:00
Wilco Dijkstra 6dbb20dfc7 Improve strstr performance of short needles
Improve strstr performance for the common case of short needles.  For a single
character strchr is best, for 2-4 characters a small loop is fastest.  For these
the speedup over the Two-Way algorithm is ~10 times on large strings.

Newlib builds, the new code passes GLIBC testsuite. OK for commit?
2018-09-05 10:09:31 +02:00
Yaakov Selkowitz 9087163804 ansification: remove _DEFUN
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:26 -06:00
Yaakov Selkowitz 0bda30e1ff ansification: remove _CONST
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:08 -06:00
Yaakov Selkowitz 6783860a2e ansification: remove _AND
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2018-01-17 11:47:05 -06:00
Yaakov Selkowitz 352c8f2f0d string: remove TRAD_SYNOPSIS
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
2017-12-01 03:41:52 -06:00
Eric Blake 40617efc8b Make strstr and strcasestr O(n), not O(n^2); add memmem.
* libc/string/str-two-way.h: New file.
* libc/string/memmem.c (memmem): New file.
* libc/include/string.h (memmem): Declare for all platforms.
* libc/string/strstr.c (strstr): Provide O(n) implementation when
not optimizing for space.
* libc/string/strcasestr.c (strcasestr): Likewise.
* libc/string/Makefile.am (ELIX_SOURCES): Rename to...
(ELIX_2_SOURCES): ...this.
(ELIX_4_SOURCES): New category, for memmem.
(lib_a_SOURCES, libstring_la_SOURCES): Build new file.
(CHEWOUT_FILES): Build documentation for memmem.
* libc/string/strings.tex: Include new docs.
2008-01-12 04:25:55 +00:00
Christopher Faylor 8a0efa53e4 import newlib-2000-02-17 snapshot 2000-02-17 19:39:52 +00:00