[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: indexing



> -----Original Message-----
> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
> Sent: Wednesday, December 05, 2001 7:11 PM

> At 07:02 PM 2001-12-05, Howard Chu wrote:
> >> >Good point. I have another approach in mind, but I haven't
> >> tested it yet to
> >> >see if it actually gains anything - use BDB's DB_MULTIPLE
> >> support. Instead
> >> >of storing IDLs as a single Key/Data pair, store individual IDs
> >> as multiple
> >> >data items under a single key. This way individual IDs can be
> >> added/deleted
> >> >to an index without having to rewrite the entire IDL each time.
> >> That ought
> >> >to cut down on a lot of the thrashing and overflow pages that
> >> are currently
> >> >being generated.
> >>
> >> I considered doing this, but thought that might easily cause
> >> the database to have too many keys.

> I'm curious as to the key count in databases containing substrings
> indices.

As I understand it, the number of keys in the database shouldn't change,
although the number of data items increases (with multiple data items
per key). Examining a database file with a hex dump confirms that keys are
only stored once. For my 10011 entry sample, with

	index objectclass eq
	index cn,sn,uid pres,eq,sub

I get database files:

		dn2id		objectclass	cn		sn		uid
		(keys/data)
		(DB size)
back-ldbm	20039/20039	26/26		48103/48103	34168/34168	48215/48215
		2.2MB		253KB		3.2MB		2.2MB		3.2MB

back-bdb	20032/40031	6/40022    48098/189327 34163/118602 48210/189330
		2.1MB		442KB		4.6MB		2.9MB		4.6MB

As you can see, the number of keys is comparable for all except objectclass.
I believe in this case, back-ldbm has been forced to split off into indirect
ID_BLOCKs because there are too many IDs to fit into a single block.

With one base node, 10 ou's under that, and 10000 users under the ou's:
The number of keys for dn2id should include
	10011 entries
	10010 subtrees (10011 minus 1 for skipping the base)
	   11 onelevels (base, 10 ou's)

The number of data items for dn2id should include
	10011 entries
	20010 subtrees (skipped the base)
	10010 onelevels (10 under base, 10000 under ou's)

Again, for back-ldbm the number of keys is probably larger due to ID_BLOCK
splits. For back-hdb, the attribute index databases are identical to
back-bdb. (back-hdb doesn't use a dn2id database though.)

Using the DB_MULTIPLE approach uses a bit more space in the attribute
indices. I'm a bit puzzled about why the dn2id index didn't grow by a
similar factor.

Runtimes for this scenario:
back-ldbm
ldadd 6.630u 1.120s 3:08.51 4.1%      0+0k 0+0io 2049pf+0w
slapd 110.300u 63.490s 3:26.79 84.0%  0+0k 0+0io 7132pf+0w

back-bdb
ldadd 6.150u 0.960s 3:45.96 3.1%      0+0k 0+0io 609pf+0w
slapd 94.650u 67.720s 4:15.82 63.4%   0+0k 0+0io 6586pf+0w

back-hdb
ldadd 6.400u 0.920s 3:26.11 3.5%      0+0k 0+0io 1935pf+0w
slapd 86.290u 57.310s 3:41.46 64.8%   0+0k 0+0io 6178pf+0w

There is a noticable delay from when ldapadd exits and when slapd exits.
That time consists entirely of disk activity, flushing whatever buffers/logs
to the disk. Looking at the CPU time figures, I would guess that most of the
logging overhead can be hidden by storing the log on a separate physical
volume. (This run generates just about 110MB of logs in back-bdb, and 102MB
in back-hdb.)

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support