[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
ITS#3611 Index clustering patch for fast slapadd
- To: openldap-devel@OpenLDAP.org
- Subject: ITS#3611 Index clustering patch for fast slapadd
- From: Howard Chu <hyc@symas.com>
- Date: Mon, 26 Sep 2005 08:21:17 -0700
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050925 SeaMonkey/1.1a
Some more thoughts on implementing this feature...
Again, rather than change the current IDL format, I think we can just
use some level of IDL caching for slapadd -q. In this case, we would
always allocate IDLs of BDB_IDL_DB_MAX size and use e.g.
bdb_idl_append_one() on the cached IDLs.
I'm not sure if the regular idlcachesize parameter is appropriate here,
but it makes sense to start with that for experimenting and introduce a
new config parameter later if it's needed. Once the LRU limit is
reached, flush the oldest cache blocks out to the database before
freeing/re-using them.
As an added refinement, add a "previous size" field to the IDL cache
entry, set to the length of the IDL the last time it was read from disk.
Then on subsequent flushes only the IDs after prev_size need to be
written to disk.
On a side note, I wasn't too happy with the bit vector results I was
getting; the in-memory structure I proposed before was too slow. I'm
stopping that investigation for now. Also I was only using it in
hdb_dn2idl, and needed to convert back to regular IDL format before
returning, which was expensive. I suppose a future attempt that uses
vectors throughout the backend may be less of a problem. However, the
biggest problem seemed to be that I had very long vectors that were
mostly empty, and no fast way to get to only the set bits.
It occurs to me now that we can use lookup tables to reduce the
enumeration time. E.g., we can enumerate 3 bits at a time with an 8 row
table:
0: 0
1: 1
2: 2
3: 1,2
4: 3
5: 1,3
6: 2,3
7: 1,2,3
(For the actual implementation 4 or 8 bits / 16 or 256 rows would make
more sense.)
The original code I used simply shifted through 32 bit positions,
breaking if the shifted value was zero, which was definitely too slow.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/