[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Indexing performance



> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Leone, Todd

> Currently were using version 2.17, when I index ~400,000 
> entries with 10
> eq indexes It takes 17 minutes --- but when I add 1 sub to 
> the index it
> take 4 hrs... Is the sub indexing process improved in 2.19, if not, is
> it something that's being addressed?

substring indexing writes a large amount of data. There's no real way to
reduce the data volume involved. The only way to speed this up is with
careful tuning of the BDB configuration. Use "db_stat -m" to see how your BDB
cache is performing; if you see non-zero values for "pages forced from cache"
then the cache is probably too small. The slapindex process is extremely
cache, memory and I/O intensive because it reads the entire database and
writes to every index. You'll also get the best speed with a large log buffer
size and with NoSync.

In the FAQ http://www.openldap.org/faq/index.cgi?file=893 I recommend a log
buffer size of 2MB to go with the default log file size of 10MB. If you're
indexing a lot of attributes, the log volume generated by indexing a single
entry may actually exceed 2MB, so this can be a factor in performance as
well.

You should use top or iostat to monitor I/O load on the system while
slapindex runs. If you see a large percentage of time being spent in IO wait,
then something isn't configured right or maybe the database is just too large
for the available memory. Because the BDB cache is stored on disk, the disk
where the cache resides can become a major bottleneck. (The development code
has added an option to store the cache in shared memory instead. This can be
better if your system can accomodate a large enough shared memory region.)
The FAQ recommends storing the BDB log files on a separate disk from the
database files. It turns out that you also want the BDB cache file, and thus
the entire BDB environment home directory, to live on a separate disk from
the database files, because so much of the I/O consists of transferring data
pages between the database files and the BDB cache. When both are on the same
disk, the seek overhead kills throughput.

On one database that consumes about 1GB in id2entry and dn2id, with 50-some
attributes indexed, it took over 12 hours to run slapindex. By moving the BDB
cache onto a memory-based filesystem (tmpfs, RAMdisk, whatever you want to
call it), the time dropped to 1 hour and 15 minutes. In a few instances I saw
a single entry generate over 6MB of index updates in the transaction log
(multivalued attributes with lots of values, all substring indexed) but these
were pretty rare.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support 

<<attachment: winmail.dat>>