[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
indexed search performance
I compared back-bdd backend with a back-ldbm backend
using the DirectoryMark messaging scenario.
1) back-bdb with transactional data store
2) back-ldbm with a full entry cache (entries in cache = entries in
directory)
We ran the benchmark for 5 minutes after an initial ALLID search for cache
warming up.
- Test Environment
Directory data
64011 entries generated by DirectoryMark dbgen
indexing
objectClass : eq
sn, description, seeAlso : eq
cn : eq, sub
server
Pentium III 1GHz 1CPU, 512MB RAM
client
simulation of 8 clients each having 2 search threads
test scenario
messaging scenario : mimics MTAs searching for user IDs
slapd backends under comparison : HEAD as of Dec 20, 2001.
back-bdb : with txn support
back-ldbm : with a full entry cache
DirectoryMark messaging scenario assumes MTAs such as sendmail search for
user ids.
The scenario consists of exact match searches over ID fields (description
fields) of
inetOrgPerson (orgnizationalPerson). Total 8 clients were set up in this
test and each
of them have 2 concurrent threads in it. The directory has 64011 entries.
During the test, slapd automatically spawned 32 threads for search tasks.
The following table shows the dissection of average execution time
(real time) of one of the 32 threads for search operations.
back-bdb-txn back-ldbm-cache
Entire Search 9887 986
+ASN.1 Decoding 364 118
-dn_normalize 148 45
-get_filter 128 21
-ber_scanf 34 22
-select_backend 20 1
-misc 36 29
+Backend Search 9423 847
+base entry retrieval 3024 24
-dn2id(_matched) 2716 11
-id2entry 283 10
+candidate selection 1073 267
+filter_candidate 1063 260
+list_candidate 1051 254
+list_candidate 230
+equality_candidate 1014 105
-key_read / intersection 922 69
+id2entry retrieval 939 15
-db read 879
-cache_find_entry_id 10
-test_filter 144 91
-entry transmission 855 270
-thread_yield 3255 105
-status transmission 32 23
The above data represents the average time to perform search operations.
Each number is the time to perform corresponding sub-operations and is in
microseconds.
The first column shows such sub-operations in a hierarchical manner.
>From the numbers, we can see that the effect of caches is significant.
The average search time of back-ldbm-cache is 10 times lower than that of
back-bdb-txn.
The low base entry retrieval time of back-ldbm comes from the entry cache
keyed by DN,
while the low id2entry retrieval time comes from the entry cache keyed by
ID.
The performance advantage of back-ldbm in key_read seems to come from the
dbcache.
>From the above observations, it seems great to implement entry caches for
back-bdb
even in the case that entries are appropirately indexed.
I'll get back with more data with the DirectoryMark addressing scenario
soon.
------------------------
Jong Hyuk Choi
IBM Thomas J. Watson Research Center - Enterprise Linux Group
P. O. Box 218, Yorktown Heights, NY 10598
email: jongchoi@us.ibm.com
(phone) 914-945-3979 (fax) 914-945-4425 TL: 862-3979