[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
MDB slapadd potential improvements
I have a "thread" branch on ssh://ada.openldap.org/~hyc/OD/head/.git with some
preliminary patches to improve multi-threaded indexing in back-mdb slapadd. So
far the scale of improvement is small, but there may be ways to enhance it
further from here.
Currently back-mdb's multi-threaded indexing code is largely copied from
back-bdb/hdb, and it is always slower than regular single-threaded slapadd.
The slowdown comes from a number of reasons.
1) Thread synchronization overhead is huge. Much greater than the actual
cost of index processing.
2) Index processing is awkward since MDB doesn't support multi-threaded
writes (while BDB does).
1) is because we're using cond_wait/cond_broadcast to perform synchronization,
and that requires every waiting thread to successfully acquire the condition
mutex before progressing any further. So it is inherently serial, even using
cond_broadcast. Also, we're doing this twice, once to start all of the index
processing for an entry, and once at the end of index processing. The latter
one is required so that we will know it's safe to dispose of the current entry
and move on to the next.
The new code uses a single pthread_barrier for synchronization, and it is only
used at the start of index processing. Index cleanup is deferred, and all of
the relevant data is double-buffered. This allows us to only need to worry
about the start of processing; the processing can take as long as it needs.
This restructure requires a small change in slapadd as well, to maintain two
Entry pointers instead of just one, and free them in a staggered fashion. The
back-bdb/hdb threaded indexer can also be restructured along these lines for a
similar benefit.
2) Index results were being gathered in several malloc'd structures and then
individually freed after being written to the DB. Now I'm using sl_malloc, and
simply resetting the memctx between entries, thus eliminating all cost of
free() operations.
These changes are sufficient to bring multithreaded performance down to just
on par with single-threaded slapadd. In fact there's very little to gain here
when all of the DB writes are still single-threaded.
One further tweak: most of our index keys are 4-byte hash values. Using the
MDB_INTEGERKEY flag allows them to be compared word-at-a-time instead of
byte-at-a-time, which gives a further speedup. At present this is the most
significant improvement.
With the original back-mdb code currently in git master, slapadd of our test
LDIF (4.9GB, 6326513 entries, 31 indexed attributes) was taking 1h50m on ada.
(Using -q. Without -q, 2h46m.)
With no indexing, it takes only 9m11s. Even though sizewise indices don't
account for much, they cost a lot in numbers of keys. 6M entries means 6M keys
in the id2entry DB and 12M keys in the dn2id DB. Indexing on 31 attributes
with multiple values, substrings, and other such parameters thrown in, amounts
to hundreds of millions of keys, and each of these needs multiple comparisons
to be inserted into their proper index.
I tried a further hack on MDB_APPEND to eliminate some more of this overhead.
Currently, MDB_APPEND only affects the behavior of mdb_page_split, causing the
newly added key to simply be added as the first node of a new page rather than
splitting the existing page in half and copying half of the keys to the new
page. (This in itself is a pretty major speedup for slapadd, but obviously
already factored in.) The new code also simply appends new keys to the end of
the DB's last page (rather than searching for its insertion point), which
again is useful for eliminating unnecessary comparisons. This is only
effective when adding an entryID to an already existing index key. This
brought the slapadd -q time down to 1h46m.
Turning on the MDB_INTEGERKEY flag brought the slapadd -q time down to 1h32m.
Adding the threading rewrite to that, the time came down to 1h29m with
tool-threads set to 4. Using more threads actually slows things down, again
because thread synchronization becomes too expensive.
Only the index key generation is being done in the extra threads, and the
reality is that index key generation is not a very significant cost in the
overall workload.
But anyway, some of the work done here can be ported back into back-bdb/hdb to
improve things there. In particular, the use of two synchronization events per
entry can be reduced to one by adding double-buffering there too. And some
malloc/free overhead in the index threads can be removed by using sl_malloc in
each thread. Of course we first need to add pthread_barrier detection to the
configure script, as well as wrapper functions in libldap_r.
(Anyone interested in handling this? And writing the compatibility code for
Windows?)
In the meantime, I'm considering an MDB environment option to support multiple
threads in a single write TXN, as long as each thread is operating on separate
databases. This would hopefully allow us to further distribute the load of
indexing without adding too much new complexity to libmdb.
If you have an account on ada.openldap.org I encourage you to checkout this
code and think about what's suitable to merge back into master.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/