[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: indexing
> -----Original Message-----
> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
> Sent: Wednesday, December 05, 2001 7:11 PM
> At 07:02 PM 2001-12-05, Howard Chu wrote:
> >> >Good point. I have another approach in mind, but I haven't
> >> tested it yet to
> >> >see if it actually gains anything - use BDB's DB_MULTIPLE
> >> support. Instead
> >> >of storing IDLs as a single Key/Data pair, store individual IDs
> >> as multiple
> >> >data items under a single key. This way individual IDs can be
> >> added/deleted
> >> >to an index without having to rewrite the entire IDL each time.
> >> That ought
> >> >to cut down on a lot of the thrashing and overflow pages that
> >> are currently
> >> >being generated.
> >>
> >> I considered doing this, but thought that might easily cause
> >> the database to have too many keys.
> I'm curious as to the key count in databases containing substrings
> indices.
As I understand it, the number of keys in the database shouldn't change,
although the number of data items increases (with multiple data items
per key). Examining a database file with a hex dump confirms that keys are
only stored once. For my 10011 entry sample, with
index objectclass eq
index cn,sn,uid pres,eq,sub
I get database files:
dn2id objectclass cn sn uid
(keys/data)
(DB size)
back-ldbm 20039/20039 26/26 48103/48103 34168/34168 48215/48215
2.2MB 253KB 3.2MB 2.2MB 3.2MB
back-bdb 20032/40031 6/40022 48098/189327 34163/118602 48210/189330
2.1MB 442KB 4.6MB 2.9MB 4.6MB
As you can see, the number of keys is comparable for all except objectclass.
I believe in this case, back-ldbm has been forced to split off into indirect
ID_BLOCKs because there are too many IDs to fit into a single block.
With one base node, 10 ou's under that, and 10000 users under the ou's:
The number of keys for dn2id should include
10011 entries
10010 subtrees (10011 minus 1 for skipping the base)
11 onelevels (base, 10 ou's)
The number of data items for dn2id should include
10011 entries
20010 subtrees (skipped the base)
10010 onelevels (10 under base, 10000 under ou's)
Again, for back-ldbm the number of keys is probably larger due to ID_BLOCK
splits. For back-hdb, the attribute index databases are identical to
back-bdb. (back-hdb doesn't use a dn2id database though.)
Using the DB_MULTIPLE approach uses a bit more space in the attribute
indices. I'm a bit puzzled about why the dn2id index didn't grow by a
similar factor.
Runtimes for this scenario:
back-ldbm
ldadd 6.630u 1.120s 3:08.51 4.1% 0+0k 0+0io 2049pf+0w
slapd 110.300u 63.490s 3:26.79 84.0% 0+0k 0+0io 7132pf+0w
back-bdb
ldadd 6.150u 0.960s 3:45.96 3.1% 0+0k 0+0io 609pf+0w
slapd 94.650u 67.720s 4:15.82 63.4% 0+0k 0+0io 6586pf+0w
back-hdb
ldadd 6.400u 0.920s 3:26.11 3.5% 0+0k 0+0io 1935pf+0w
slapd 86.290u 57.310s 3:41.46 64.8% 0+0k 0+0io 6178pf+0w
There is a noticable delay from when ldapadd exits and when slapd exits.
That time consists entirely of disk activity, flushing whatever buffers/logs
to the disk. Looking at the CPU time figures, I would guess that most of the
logging overhead can be hidden by storing the log on a separate physical
volume. (This run generates just about 110MB of logs in back-bdb, and 102MB
in back-hdb.)
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support