Does someone on this list know how to implement and tune substring
indexing, or know where to ask? It looks to me like OpenLDAP's
substring filtering algorithm, and partly the indexing algorithm, can be
significantly improved. I'd appreciate advice on how to do this - or
advice not to bother:-)
Currently it works like this (thanks to Howard for the explanation):
A subany index will store a hash of each 4-byte substring of an
indexed string, e.g. "abcdef" gets index-values for *abcd*, *bcde* and
*cdef*. A subinitial index gets initial substrings of 2-4 characters:
ab*, abc* and abcd*. Similar for subfinal. A search for
(cn=*123456789*) is treated as (&(cn=*1234*)(cn=*3456*)(cn=*5678*))
for indexing purposes, stepping 2 characters at a time.
The numbers 4 and 2 can now be configured.
I can see several things to improve with that, but I'd rather not
reinvent an outdated approach: