Howard Chu writes:
The -O2 build is faster from about 4 to 24 client threads. From 28 on
up, the nonoptimized code is faster at every load level. I was
originally using gcc 4.1.2 but I'm seeing the same result now using
gcc 4.2.2. Also, slapd is only configured with 8 worker threads in all
of these tests. Strange that whatever optimizations the compiler has
generated speeds things up for lighter load, but works against it
under heavier load.
Not really. Lots of possible optimizations are trade-offs between
unguessable guesstimates - cache usage, branch prediction, whatever.
Maybe some small piece of code got unluckily optimized and dominates
the rest under heavy load. With a bit of luck, the difference between
light and heavy runs will stand out with some sort of profiling (gprof,
cachegrind, helgrind, whatever).