> -----Original Message----- > From: owner-openldap-software@OpenLDAP.org > [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Leone, Todd > Currently were using version 2.17, when I index ~400,000 > entries with 10 > eq indexes It takes 17 minutes --- but when I add 1 sub to > the index it > take 4 hrs... Is the sub indexing process improved in 2.19, if not, is > it something that's being addressed? substring indexing writes a large amount of data. There's no real way to reduce the data volume involved. The only way to speed this up is with careful tuning of the BDB configuration. Use "db_stat -m" to see how your BDB cache is performing; if you see non-zero values for "pages forced from cache" then the cache is probably too small. The slapindex process is extremely cache, memory and I/O intensive because it reads the entire database and writes to every index. You'll also get the best speed with a large log buffer size and with NoSync. In the FAQ http://www.openldap.org/faq/index.cgi?file=893 I recommend a log buffer size of 2MB to go with the default log file size of 10MB. If you're indexing a lot of attributes, the log volume generated by indexing a single entry may actually exceed 2MB, so this can be a factor in performance as well. You should use top or iostat to monitor I/O load on the system while slapindex runs. If you see a large percentage of time being spent in IO wait, then something isn't configured right or maybe the database is just too large for the available memory. Because the BDB cache is stored on disk, the disk where the cache resides can become a major bottleneck. (The development code has added an option to store the cache in shared memory instead. This can be better if your system can accomodate a large enough shared memory region.) The FAQ recommends storing the BDB log files on a separate disk from the database files. It turns out that you also want the BDB cache file, and thus the entire BDB environment home directory, to live on a separate disk from the database files, because so much of the I/O consists of transferring data pages between the database files and the BDB cache. When both are on the same disk, the seek overhead kills throughput. On one database that consumes about 1GB in id2entry and dn2id, with 50-some attributes indexed, it took over 12 hours to run slapindex. By moving the BDB cache onto a memory-based filesystem (tmpfs, RAMdisk, whatever you want to call it), the time dropped to 1 hour and 15 minutes. In a few instances I saw a single entry generate over 6MB of index updates in the transaction log (multivalued attributes with lots of values, all substring indexed) but these were pretty rare. -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
<<attachment: winmail.dat>>