[Date Prev][Date Next] [Chronological] [Thread] [Top]

2.1 & 2.2 statistics, and some odd behavior that needs to be examined.



As I'm sure most (if not all) of you are aware, I've been performing a number of tests on OpenLDAP 2.1 and 2.2, to see how the products compare to each other, and how different tuning options in 2.2 affect the outcome of the tests.
They can be seen at:


<http://www.stanford.edu/~quanah/directories/statistics/>
The above page should always work, but I restructure the pages underneath it periodically, so don't bookmark them. ;)


For the most part, 2.2 is a clear winner over 2.1.

Some general conclusions:

Btree is a definate win when it comes to running slapadd and slapindex (I think this should be at least a configure option in 2.2.6)
Memory cache is pretty much essential
HDB is generally better than BDB (but there are some odd issues with the idlcache I've noted to Howard)
syncRepl as it currently behaves is not ready for production use (Although I am corresponding with Jong about this regularly, so this may change in the near future).


However, there is a serious threading issue in 2.2 when it is used with SASL and a disk-based database cache.

You can see this looking at Tests on Solaris Servers->Performance Tests on Replica Servers. All of my servers have the same underlying software packages, so OpenLDAP is the *only* variation on them. This lets me know that the issue I am seeing must be in OpenLDAP 2.2, or in how OpenLDAP 2.2 interfaces with those packages as compared to how 2.1 interfaces with those packages. The DB_CONFIG parameters are the same across all the systems.

In 2.1 (using 2.1.24) the system is set up with BDB and a disk based cache. The performance test shows an average rate of 74.856 answers/second using a SASL/GSSAPI authenticated bind using a filter of (uid=<whatever>) returning sumaildrop. This is using a mixed set of accounts (Some uid's exist and have maildrop, some exist and don't have maildrop, and some don't exist at all). At the worst, I see a 6 answers/second response rate, and at the best I see a 94 answers/second response rate in the time this test runs. The test has 30 hosts querying the server for this information. All of querying hosts stay querying throughout the test.

In 2.2 (using 2.2.5 with btree patch), the system is set up with BDB and a disk based cache (I see the same results using btree or hash indices). The performance test shows an average rate of 28.5958 answers/second (btree) or 32.7432 answers/second (hash). This is half of the performance in 2.1! At the worst, I see 0 answers a second (22-90 instances) and at the best, I see 116 answers/second (1 instance). What this also doesn't show, is that it is *impossible* to keep all 30 hosts querying the server. They get GSSAPI errors or "Can't contact LDAP server errors", and drop off. Once about 6 servers drop off, the rest will stay querying the server, with only occasional dropoffs.

However, if I do this same setup, except that I use a memory based cache instead of a disk based cache, the performance shoots up to 126 answers/second (BDB) to 177 answers/second (HDB). No hosts die off, and the range is from 17 answers/second (BDB low) to 189 answers/second (HDB high).

I have a feeling if whatever is causing the problem in the disk cache scenario can be resolved, that the memory cache numbers could shoot even higher.

Another thing to note is I did the same disk cache test, only doing simple bind (anonymous) instead of SASL/GSSAPI binds. I had 1 host of 30 drop (Can't contact LDAP Server). The remaining 29 ran the server at a whopping 222 answers/second (178 low, 266 high). That is why I finger the threading and disk cache as being part of the issue.

Any ideas on where I can proceed from here to help identify where the issue(s) are occuring?

--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/TSS/Computing Systems
ITSS/TSS/Infrastructure Operations
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html