[Date Prev][Date Next] [Chronological] [Thread] [Top]

MDB slapadd potential improvements

To: "OpenLDAP-devel@openldap.org" <OpenLDAP-devel@openldap.org>
Subject: MDB slapadd potential improvements
From: Howard Chu <hyc@symas.com>
Date: Sat, 14 Apr 2012 06:13:40 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0a1) Gecko/20111224 Firefox/12.0a1 SeaMonkey/2.9a1

I have a "thread" branch on ssh://ada.openldap.org/~hyc/OD/head/.git with somepreliminary patches to improve multi-threaded indexing in back-mdb slapadd. Sofar the scale of improvement is small, but there may be ways to enhance itfurther from here.

Currently back-mdb's multi-threaded indexing code is largely copied fromback-bdb/hdb, and it is always slower than regular single-threaded slapadd.The slowdown comes from a number of reasons.1) Thread synchronization overhead is huge. Much greater than the actualcost of index processing.2) Index processing is awkward since MDB doesn't support multi-threadedwrites (while BDB does).

1) is because we're using cond_wait/cond_broadcast to perform synchronization,and that requires every waiting thread to successfully acquire the conditionmutex before progressing any further. So it is inherently serial, even usingcond_broadcast. Also, we're doing this twice, once to start all of the indexprocessing for an entry, and once at the end of index processing. The latterone is required so that we will know it's safe to dispose of the current entryand move on to the next.

The new code uses a single pthread_barrier for synchronization, and it is onlyused at the start of index processing. Index cleanup is deferred, and all ofthe relevant data is double-buffered. This allows us to only need to worryabout the start of processing; the processing can take as long as it needs.This restructure requires a small change in slapadd as well, to maintain twoEntry pointers instead of just one, and free them in a staggered fashion. Theback-bdb/hdb threaded indexer can also be restructured along these lines for asimilar benefit.

2) Index results were being gathered in several malloc'd structures and thenindividually freed after being written to the DB. Now I'm using sl_malloc, andsimply resetting the memctx between entries, thus eliminating all cost offree() operations.

These changes are sufficient to bring multithreaded performance down to juston par with single-threaded slapadd. In fact there's very little to gain herewhen all of the DB writes are still single-threaded.

One further tweak: most of our index keys are 4-byte hash values. Using theMDB_INTEGERKEY flag allows them to be compared word-at-a-time instead ofbyte-at-a-time, which gives a further speedup. At present this is the mostsignificant improvement.

With the original back-mdb code currently in git master, slapadd of our testLDIF (4.9GB, 6326513 entries, 31 indexed attributes) was taking 1h50m on ada.

(Using -q. Without -q, 2h46m.)

With no indexing, it takes only 9m11s. Even though sizewise indices don'taccount for much, they cost a lot in numbers of keys. 6M entries means 6M keysin the id2entry DB and 12M keys in the dn2id DB. Indexing on 31 attributeswith multiple values, substrings, and other such parameters thrown in, amountsto hundreds of millions of keys, and each of these needs multiple comparisonsto be inserted into their proper index.

I tried a further hack on MDB_APPEND to eliminate some more of this overhead.Currently, MDB_APPEND only affects the behavior of mdb_page_split, causing thenewly added key to simply be added as the first node of a new page rather thansplitting the existing page in half and copying half of the keys to the newpage. (This in itself is a pretty major speedup for slapadd, but obviouslyalready factored in.) The new code also simply appends new keys to the end ofthe DB's last page (rather than searching for its insertion point), whichagain is useful for eliminating unnecessary comparisons. This is onlyeffective when adding an entryID to an already existing index key. Thisbrought the slapadd -q time down to 1h46m.


Turning on the MDB_INTEGERKEY flag brought the slapadd -q time down to 1h32m.

Adding the threading rewrite to that, the time came down to 1h29m withtool-threads set to 4. Using more threads actually slows things down, againbecause thread synchronization becomes too expensive.

Only the index key generation is being done in the extra threads, and thereality is that index key generation is not a very significant cost in theoverall workload.

But anyway, some of the work done here can be ported back into back-bdb/hdb toimprove things there. In particular, the use of two synchronization events perentry can be reduced to one by adding double-buffering there too. And somemalloc/free overhead in the index threads can be removed by using sl_malloc ineach thread. Of course we first need to add pthread_barrier detection to theconfigure script, as well as wrapper functions in libldap_r.(Anyone interested in handling this? And writing the compatibility code forWindows?)

In the meantime, I'm considering an MDB environment option to support multiplethreads in a single write TXN, as long as each thread is operating on separatedatabases. This would hopefully allow us to further distribute the load ofindexing without adding too much new complexity to libmdb.

If you have an account on ada.openldap.org I encourage you to checkout thiscode and think about what's suitable to merge back into master.


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Follow-Ups:
- Re: MDB slapadd potential improvements
  - From: Howard Chu <hyc@symas.com>

Prev by Date: Move DDS-specific schema out of slap_schema?
Next by Date: Re: MDB slapadd potential improvements
Index(es):
- Chronological
- Thread