newlib-cygwin/newlib/libc/string
Wilco Dijkstra 473f1a3a5d Improve performance of strstr
v3: Add support for read ahead using strnlen, giving an additional 25% speedup
on large inputs (both short and long needles).

This patch significantly improves performance of strstr by using Sunday's
Quick-Search algorithm.  Due to its simplicity it has the best average
performance of string matching algorithms on almost all inputs.  It uses a
bad-character shift table to skip past mismatches.

The needle length is limited to 254 - this reduces the shift table memory
4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
The limit also implies its worst-case performance is linear.

Larger needles are processed by the Two-Way algorithm.  The macro AVAILABLE
has been improved to use strnlen to read the input in chunks.  This results
in a 2.5 times speedup for large needles, reducing the performance drop when
the Quick-Search algorithm can't be used.

The code for 1-4 byte needles has been simplified and now uses unsigned
char.  Since the optimized code relies on 8-bit chars, we defer to the
size-optimized implementation if CHAR_BIT > 8.

The performance gain of finding a set of randomly chosen words of size 8 in
256 bytes of English text is 14 times on AArch64. For longer haystacks the
gain is well over 20 times.

The size-optimized strstr has also been rewritten from scratch to improve
performance.  On the same test the performance gain is 69%.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

--
2018-10-18 19:51:39 +02:00
..
Makefile.am string: add wmempcpy 2017-11-30 04:06:49 -06:00
Makefile.in makedoc: make errors visible 2017-12-07 11:54:11 +00:00
WIDTH-A width data generation 2018-03-12 10:17:20 +01:00
ambiguous.t generated width data, Unicode 10.0 2018-03-12 10:17:20 +01:00
bcmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
bcopy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
bzero.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
combining.t generated width data, Unicode 10.0 2018-03-12 10:17:20 +01:00
explicit_bzero.c Add explicit_bzero() 2016-03-18 12:33:40 +01:00
ffsl.c Add ffsl(), ffsll(), fls(), flsl(), flsll() 2017-07-05 13:49:48 +02:00
ffsll.c Add ffsl(), ffsll(), fls(), flsl(), flsll() 2017-07-05 13:49:48 +02:00
fls.c Add ffsl(), ffsll(), fls(), flsl(), flsll() 2017-07-05 13:49:48 +02:00
flsl.c Add ffsl(), ffsll(), fls(), flsl(), flsll() 2017-07-05 13:49:48 +02:00
flsll.c Add ffsl(), ffsll(), fls(), flsl(), flsll() 2017-07-05 13:49:48 +02:00
gnu_basename.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
index.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
local.h ansification: remove _EXFUN, _EXFUN_NOTHROW 2018-01-17 11:47:29 -06:00
memccpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
memchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
memcmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
memcpy.c Use __inhibit_loop_to_libcall in all memset/memcpy implementations 2018-08-29 16:05:37 +02:00
memmem.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
memmove.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
mempcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
memrchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
memset.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
mkunidata fix/enhance Unicode table generation scripts 2018-03-14 10:44:32 +01:00
mkwide width data generation 2018-03-12 10:17:20 +01:00
mkwidthA width data generation 2018-03-12 10:17:20 +01:00
rawmemchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
rindex.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
stpcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
stpncpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
str-two-way.h * lib/str-two-way.h (two_way_long_needle): Avoid bug with long 2010-10-06 09:29:35 +00:00
strcasecmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strcasecmp_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
strcasestr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strcat.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strchrnul.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strcmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strcoll.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strcoll_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
strcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strcspn.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strdup.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strdup_r.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strerror.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strerror_r.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strings.tex Add man page entry for strnstr.c. 2017-08-30 15:10:07 +02:00
strlcat.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strlcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strlen.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strlwr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strncasecmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strncasecmp_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
strncat.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strncmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strncpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strndup.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strndup_r.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strnlen.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strnstr.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
strpbrk.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strrchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strsep.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strsignal.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strspn.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strstr.c Improve performance of strstr 2018-10-18 19:51:39 +02:00
strtok.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strtok_r.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strupr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strverscmp.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
strxfrm.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
strxfrm_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
swab.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
timingsafe_bcmp.c Add timingsafe_bcmp() 2016-03-18 12:33:40 +01:00
timingsafe_memcmp.c Add timingsafe_memcmp() 2016-03-18 12:33:40 +01:00
u_strerr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
uniset width data generation 2018-03-12 10:17:20 +01:00
wcpcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcpncpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcscasecmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcscasecmp_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
wcscat.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcschr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcscmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcscoll.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcscoll_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
wcscpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcscspn.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsdup.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
wcslcat.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcslcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcslen.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsncasecmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsncasecmp_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
wcsncat.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsncmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsncpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsnlen.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcspbrk.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsrchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsspn.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsstr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcstok.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcstrings.tex string: add wmempcpy 2017-11-30 04:06:49 -06:00
wcswidth.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsxfrm.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wcsxfrm_l.c string: remove TRAD_SYNOPSIS 2017-12-01 03:41:52 -06:00
wcwidth.c use generated width data 2018-03-12 10:17:20 +01:00
wide.t generated width data, Unicode 10.0 2018-03-12 10:17:20 +01:00
wmemchr.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wmemcmp.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wmemcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wmemmove.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wmempcpy.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
wmemset.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00
xpg_strerror_r.c ansification: remove _DEFUN 2018-01-17 11:47:26 -06:00