v3: Add support for read ahead using strnlen, giving an additional 25% speedup
on large inputs (both short and long needles).
This patch significantly improves performance of strstr by using Sunday's
Quick-Search algorithm. Due to its simplicity it has the best average
performance of string matching algorithms on almost all inputs. It uses a
bad-character shift table to skip past mismatches.
The needle length is limited to 254 - this reduces the shift table memory
4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
The limit also implies its worst-case performance is linear.
Larger needles are processed by the Two-Way algorithm. The macro AVAILABLE
has been improved to use strnlen to read the input in chunks. This results
in a 2.5 times speedup for large needles, reducing the performance drop when
the Quick-Search algorithm can't be used.
The code for 1-4 byte needles has been simplified and now uses unsigned
char. Since the optimized code relies on 8-bit chars, we defer to the
size-optimized implementation if CHAR_BIT > 8.
The performance gain of finding a set of randomly chosen words of size 8 in
256 bytes of English text is 14 times on AArch64. For longer haystacks the
gain is well over 20 times.
The size-optimized strstr has also been rewritten from scratch to improve
performance. On the same test the performance gain is 69%.
Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
--