[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
back-mdb optimization
back-mdb was first feature-complete a week ago. Between then and now I've
spent a bit of time profiling its behavior and eliminating hot spots. For this
work I used a test database of 250,000 synthetic entries, running under
valgrind's callgrind tool and my FunctionCheck profiler. (Occasionally the two
tools would disagree on where hot spots were, so I wound up targeting both.)
The callgrind output is available on http://highlandsun.com/hyc/mdb_search/
for reference.
With the initial back-mdb code, which was basically back-bdb with all of its
caching logic removed, an ldapsearch that scanned the entire DB ran in
8,875,046,744 instructions. The biggest hot spot was memnrcmp(), which is the
libmdb function for comparing two strings in reverse byte order. The top 10
functions were:
1,828,659,111 memnrcmp
1,151,097,812 strncasecmp
911,117,166 mdb_search_node
628,388,775 avl_find
612,549,560 ad_keystring
600,502,442 entry_decode
494,045,070 slap_bv2ad
376,177,636 mdb_search_page
285,882,273 attr_index_name_cmp
199,751,242 mdb_cursor_set
(That basically corresponded to commit e5b1dce6a7904e0eb31029959865730fc813ce57)
-=-=-=-
The next step was to eliminate slap_bv2ad() from the entry_decode() path,
using numeric IDs for attributeDescriptions in the database. Rewriting slapd
entry_decode as mdb_entry_decode() with this change brought the total search
execution down to 5,410,551,759 instructions. The top 10 functions were:
1,823,974,563 memnrcmp
796,099,511 mdb_search_node
528,502,136 mdb_entry_decode
376,251,165 mdb_search_page
199,751,329 mdb_cursor_set
167,097,364 strncasecmp
151,524,069 attr_clean
143,741,422 mdb_get_page
87,494,764 cursor_push_page
83,500,204 is_ad_subtype
(commit 1e32fcf099ba8c15333365fe68aefa5217ae3d8c)
-=-=-=-
Next was to eliminate some redundant navigation of the dn2id index. It was
doing essentially the same traversal twice on each search candidate - once to
determine if the candidate belonged to the search scope, and once to assemble
the entryDN. With this change the total search came down to 3,767,565,531
instructions. The top functions were:
1,018,812,205 memnrcmp
528,501,424 mdb_entry_decode
463,442,142 mdb_search_node
212,601,386 mdb_search_page
167,097,240 strncasecmp
151,523,868 attr_clean
93,001,026 mdb_cursor_set
83,500,204 is_ad_subtype
80,889,277 avl_find
80,496,022 mdb_get_page
(commit 6c8e4f2671b6aed41cd5098725048dbe2513612c)
-=-=-=-
The next step was a minor libmdb cleanup, restructuring it so that key/data
pairs were always guaranteed to start on a 2-byte aligned address. (While x86
didn't seem to care, CPUs like SPARC would SIGBUS otherwise.) This
restructuring brought execution down to 3,441,377,693 instructions - making
code more portable is always a good thing, even if the impact is minor. The
top functions were
1,018,873,702 memnrcmp
537,251,494 mdb_entry_decode
463,465,251 mdb_search_node
212,597,818 mdb_search_page
151,523,868 attr_clean
93,001,026 mdb_cursor_set
83,500,204 is_ad_subtype
80,496,022 mdb_get_page
63,750,282 attrs_alloc
62,500,669 mdb_search
(commit 293df78b2be77d6d153fd7052cc62d3377dc5501)
-=-=-=-
Next, finally start doing something about memnrcmp. First is simply writing a
more integer-oriented function cintcmp, which operates on unaligned integers a
byte at a time. This had only a small impact as well, getting us down only a
bit to 3,342,205,373 instructions.
919,700,412 cintcmp
(commit f9c8796d0b3ed9bc0f51c76bb28609121b1e2eec)
The rest of the trace profile is basically identical to the previous one.
-=-=-=-
Next was a bit of libmdb code cleanup and restructuring. The performance
change was minimal, bringing total execution down to 3,310,510,255
instructions. The trace profile is mostly the same as the previous.
commit dac3fae3b540841ae753bea16f3b353e2124c43d)
-=-=-=-
Next was a further speedup of cintcmp, changing it to operate on unsigned
shorts instead of just chars, now that we had guaranteed 2-byte alignment.
This brought execution down to 2,832,817,987 instructions, and shook up the
profile a little bit:
537,251,494 mdb_entry_decode
475,922,450 cintcmp
457,377,978 mdb_search_node
176,828,204 mdb_search_page_root
151,523,868 attr_clean
83,500,204 is_ad_subtype
(commit 1b69295a48cca409ed0c2f3fe655325e00f55ce2)
-=-=-=-
Next was a further rewrite of mdb_entry_decode, using tmpmem allocs instead of
the slapd central entry_alloc/attrs_alloc functions. This brought execution
down to 2,483,077,294 instructions. The profile is much like the previous, but
attr_clean and its associated functions disappear. The top functions were:
535,751,482 mdb_entry_decode
475,922,450 cintcmp
457,377,978 mdb_search_node
176,828,204 mdb_search_page_root
83,500,204 is_ad_subtype
(commit f72d65b77aa6cd4439ee0ad80b498f4ed707a278)
-=-=-=-
Next was another tweak for mdb_search(), keeping the cursor on the id2entry
database for the duration of the search. This eliminated a lot of
mdb_search_page overhead since usually the cursor was already on the right
page when the next entry was being fetched. This change brought execution down
to 2,241,832,009 instructions. The top functions were:
535,751,482 mdb_entry_decode
391,064,166 cintcmp
350,350,674 mdb_search_node
139,905,148 mdb_search_page_root
93,256,473 mdb_cursor_set
83,500,204 is_ad_subtype
(commit 54ced52c047425b432075946dd2997c52f020de0)
-=-=-=-
Finally (as of this morning) a bit of cleanup and restructuring in libmdb, to
eliminate a bunch of cruft in the previous data structure layout. This change
was more for esthetic reasons than for performance, but it still offered a
small gain, with execution at 2,232,560,312 instructions. The top functions:
535,751,482 mdb_entry_decode
391,064,166 cintcmp
342,681,498 mdb_search_node
129,900,173 mdb_search_page_root
90,006,339 mdb_cursor_set
83,500,204 is_ad_subtype
(commit 25529a4c36903d0456b1251712de32f665850029)
-=-=-=-
At the outset libmdb had its clumsy parts. Now the libmdb/mdb.o text+data is
only 31255 bytes - it's tight and very efficient. The entire DB engine can
execute entirely within a CPU's L1 instruction cache, with room to spare.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/