User-agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.9a9pre) Gecko/2007100901 SeaMonkey/2.0a1pre
Howard Chu wrote:
Based on some experimental changes I've already made, I see a difference
between 25K auths/sec with the current code, vs 39K auths/sec using separate
thread pools.
With some more tweaking and adding a faster client load generator to the mix,
I've coaxed 42,000 auths/sec out of the box. (42,004, actually.) (That was a
peak over a 30 second interval; 41,800 is more typical over a sustained
duration.) Analyzing the profile traces is interesting; the ethernet driver is
the biggest CPU consumer at around 8.6%, followed by strval2str at 3.8%, then
pthread_mutex_lock at 2.8%. As a practical matter we're already doing pretty
well when the kernel/network overhead is greater than any of our own code. At
these levels we're only getting about 690% of the CPU for our code, 100% is
completely consumed by interrupt handling, and the remaining 10% is idle time
(which I believe in this case is really the time a CPU spent blocked waiting
for an already taken mutex).
It's pretty amazing to watch the processor status in top and see an entire CPU
consumed by interrupt processing. That kind of points to some major walls down
the road; while any 1GHz or faster processor today can saturate 100Mbps
ethernet, it takes much faster processors to fully utilize 1Gbps ethernet. And
unlike bulk data transfer protocols like ftp or http, we won't get any benefit
from using jumbo frames in typical LDAP deployments.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/