[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: thread pools, performance
Howard Chu wrote:
Yep, that helped. Raising rx-usecs from default 20 to 1000, and rx-frames from
default 5 to 100, I'm getting 43k auths/sec with back-null (in 4 separate
thread pools) and the core fielding the interrupts is only about 80% busy now
instead of 100%. I'm afraid my load generators may be maxed out now, because I
can't seem to drive up the load on the server any higher even though there's
more idle CPU.
The current code in HEAD (with only 1 thread pool) is reaching 36k auths/sec
with back-null, so it's actually not far off from my experimental peak rate.
Considering that HEAD was at 25k/sec last week (and now in 2.4.6) that's
pretty decent.
With back-bdb and 1 million users I'm getting 26.1k/sec with plaintext
passwords (up from 19.3k/sec last week). With {SSHA} passwords that drops to
25.7k/sec (~1.5% difference).
I have to put this tinkering on hold for a bit, to run some authrate tests
against ActiveDirectory on this machine (using W2K3sp2 X64). Later on we'll do
a W2K3 OpenLDAP build for comparison as well. Should be entertaining...
Just for reference, using slapadd with tool-threads set to 4, it took 7:05.17
seconds to load an LDIF with 1 million user objects. These user objects had
plaintext passwords. When I later decided to change them to {SSHA} passwords
it took 10:12.38 to ldapmodify all of them.
This machine came with a pair of Maxtor 36GB 10k RPM SCSI drives. We added a
pair of IBM 146GB 10k RPM SCSI drives. One of the 36GB drives has FedoraCore6
on it. We installed Windows 2003 SP2 Enterprise Edition for x86_64 on the
other 36GB drive.
We split the 146GB drives into two partitions each, with each partition
occupying half of the drive. The partitions are assigned such that both
Windows and Linux get equivalent layouts:
/dev/sdc1 - NTFS, AD logs
/dev/sdc2 - XFS, OpenLDAP data
/dev/sdd1 - XFS, OpenLDAP logs
/dev/sdd2 - NTFS, AD data
My assumption here is that the transaction log partition will get more
frequent activity, and the data partition will just get the occasional flush.
So, I chose to place the log partitions on the outer tracks of the drives
where they should have higher throughput and lower latency.
Anyway, using Microsoft's ldifde tool to import the same 1 million user LDIF,
using 8 threads, took 4:23:46.85 (yes, that's over 4 hours for MS AD vs about
7 minutes for OpenLDAP). By the way, we configured the server as noted in this
Microsoft document
http://www.microsoft.com/downloads/details.aspx?FamilyID=52e7c3bd-570a-475c-96e0-316dc821e3e7&DisplayLang=en
in Appendix B: Setup Instructions Step 1. That allowed us to import regular
inetOrgPerson entries with userPassword attributes and have AD treat them as
actual user accounts. (Otherwise we would have had to convert all the entries
to use the Microsoft unicodePwd attribute instead.)
Unfortunately, the accounts imported this way were all initially disabled. So
we had to ldapmodify their userAccountControl attribute to enable them all
before we could proceed with the authentication tests. It took 20:57.017
seconds to ldapmodify all 1 million user records.
Finally we got to running the actual authrate tests, which yielded a peak rate
of 4526 auths/second with 40 client threads. The rate declined from there as
more clients were added; AD clearly isn't capable of handling very many
concurrent sessions. It also appears that most of the CPUs were idle, perhaps
3 out of 8 cores were actually doing any work. I.e., AD doesn't scale well
across multiple CPUs.
Unfortunately the native AD server runs as a privileged process and Windows
doesn't allow you to alter its processor affinity settings, so there's no way
to directly measure how it scales from one core up to eight. But I guess
there's really nothing interesting to see here anyway. (For reference, even
when restricted to only a single core on this machine, OpenLDAP 2.4.5 handled
about 8800 auths/second, coming from even more client threads. And that was
before any other tweaks.)
The numbers speak for themselves.
It's enlightening to look at the actual CPU time used during the import tasks.
For ldifde on W2K3 we got:
time ldifde.exe -i -f examp3.ldif -h -q 8
261.10u 140.73s 4:23:46.85 2.5%
For slapadd on FC6 we got:
time .slapadd -f slapd.conf.slam -q -l example.ldif.1mil
260.75u 80.86s 7:05.17 80%
One interesting part here is that the amount of user CPU time is nearly
identical in both cases. That implies that both slapadd and ldifde are doing
about the same amount of work to parse the input LDIF. (For all we know they
could be doing *exactly* the same work, using our own code. Or it could just
be an interesting coincidence.)
Comparing the rest of the time isn't really fair since it seems that ldifde
just feeds data into a running server using LDAP, while slapadd simply writes
to the DB directly. I guess for the sake of fairness we'll have to time an
OpenLDAP import using ldapadd next.
We'll remove AD and test ADAM next. At least, running as a normal user
process, we should be able to tweak its processor affinity so we can plot how
it scales with number of cores. Later we'll build a 64 bit OpenLDAP on Windows
and see how it fares. My experience with 32 bit Windows has been that slapd
runs about as fast on Windows as it does on Linux. But with the silly limits
that Windows places on how many sockets a process can have open, (64 IIRC) you
really can't subject it to as heavy a load in production use.
At this point I'd have a few choice things to say about Microsoft in general
and AD in particular, but I think the numbers speak for themselves.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/