[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
slapd frontend contention
Over the past couple years I've noticed that slapd's frontend was a bottleneck
in our benchmarks on larger systems. But I hadn't quantified the extent of the
problem. This week I stumbled across mutrace, which is a very handy tool that
measures how much time is spent in mutexes.
Running against back-mdb with the test008 setup, using -l 100 on the tester
showed some pretty interesting data. This is a run where slapd was configured
with listener-threads 2 on my quadcore laptop:
The #1 contended mutex is in the thread pool, pool->ltp_mutex.
#2 and #4 are the socket-table mutexes. (With the default of 1 listener thread
they're correspondingly higher in the chart.) #3 is slap_op_mutex which is
used to ensure unique timestamps in each Operation structure. #5 is the LMDB
writer mutex. #6 is the attr_alloc mutex, which we should probably consider
eliminating but nothing after that is very significant. 7,8,9 are connection
writer mutexes, and #10 is the pending CSN list mutex.
#####
mutrace: 0.2 successfully initialized for process slapd (PID: 10868).
^C
mutrace: Showing statistics for process slapd (PID: 10868).
mutrace: 717 mutexes used.
Mutex #471 (0x0x10bae48) first referenced by:
mutrace.c:1448 pthread_mutex_init()
tpool.c:217 ldap_pvt_thread_pool_init()
Mutex #488 (0x0x9100f0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
daemon.c:2906 slapd_daemon()
Mutex #497 (0x0x9107e0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
init.c:92 slap_init()
main.c:787 main()
libc-start.c:258 __libc_start_main()
Mutex #487 (0x0x9100a0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
daemon.c:1629 slapd_daemon_init()
Mutex #635 (0x0x7ffe2d78c040) first referenced by:
mutrace.c:1448 pthread_mutex_init()
mdb.c:3795 mdb_env_setup_locks()
mdb.c:3906 mdb_env_open()
init.c:174 mdb_db_open()
backend.c:224 backend_startup_one()
backend.c:330 backend_startup()
init.c:219 slap_startup()
main.c:992 main()
libc-start.c:258 __libc_start_main()
Mutex #495 (0x0x910720) first referenced by:
mutrace.c:1448 pthread_mutex_init()
attr.c:708 attr_init()
entry.c:95 entry_init()
init.c:117 slap_init()
main.c:787 main()
libc-start.c:258 __libc_start_main()
Mutex #113 (0x0x7ffe2d59c248) first referenced by:
mutrace.c:1448 pthread_mutex_init()
connection.c:421 connection_init()
Mutex #98 (0x0x7ffe2d59b6c8) first referenced by:
mutrace.c:1448 pthread_mutex_init()
connection.c:421 connection_init()
Mutex #105 (0x0x7ffe2d59bc88) first referenced by:
mutrace.c:1448 pthread_mutex_init()
connection.c:421 connection_init()
Mutex #4 (0x0x116cf20) first referenced by:
mutrace.c:1448 pthread_mutex_init()
backend.c:624 backend_db_init()
bconfig.c:1661 config_generic()
config.c:345 config_set_vals()
config.c:419 config_add_vals()
config.c:783 read_config_file()
bconfig.c:4439 read_config()
main.c:793 main()
libc-start.c:258 __libc_start_main()
mutrace: Showing 10 mutexes in order of (write) contention count:
Mutex # Locked Changed Cont. cont.Time[ms] tot.Time[ms] avg.Time[ms]
471 231744 178161 41572 14511.248 634.339 0.003
488 108544 83353 9455 2449.662 356.085 0.003
497 102537 97750 8429 5156.696 245.420 0.002
487 89990 73426 7462 2477.159 318.022 0.004
635 2401 2263 1562 5838.290 905.338 0.377
495 23079 2596 205 65.932 74.804 0.003
113 44029 7438 165 95.214 1260.856 0.029
98 30427 5647 146 79.943 1100.263 0.036
105 18433 4504 119 66.397 965.679 0.052
4 4800 3540 117 30.774 12.467 0.003
... ... ... ... ... ... ...
mutrace: Note that rwlocks are shown as two lines: write locks then read locks.
mutrace: 45 condition variables used.
Condvar #43 (0x0x10bae70) first referenced by:
mutrace.c:1744 pthread_cond_init()
tpool.c:220 ldap_pvt_thread_pool_init()
mutrace: Showing 1 condition variables in order of wait contention count:
Cond # Waits Signals Cont. tot.Time[ms] cont.Time[ms] avg.Time[ms]
43 18389 106667 17090 64222.152 10010.642 0.544
... ... ... ... ... ... ...
mutrace: Total runtime is 10582.976 ms.
mutrace: Results for SMP with 4 processors.
mutrace: WARNING: 2 internal hash collisions detected. Results might not be as
reliable as they could be.
mutrace: Try to increase --hash-size=, which is currently at 67137.
#####
The total runtime is relatively arbitrary, based on how quickly I typed ^C
after the slapd-tester program finished. But notice that the condition
variable is contended for through almost the entire runtime. This is also in
the threadpool, pool->ltp_cond.
I've now written a patch to allow a configurable number of workqueues in a
threadpool. Here's the same slapd-tester workload against slapd with queues=4.
#####
mutrace -d --hash-size=67137 ../servers/slapd/slapd -f /tmp/testr/slapd.1.conf
-h ldap://:9011 -s0 -d0
mutrace: 0.2 successfully initialized for process slapd (PID: 10191).
^C
mutrace: Showing statistics for process slapd (PID: 10191).
mutrace: 724 mutexes used.
Mutex #535 (0x0x90f7e0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
init.c:92 slap_init()
[0x65742f706d742f00] ??() ??:0
Mutex #528 (0x0x90f0f0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
daemon.c:2906 slapd_daemon()
Mutex #527 (0x0x90f0a0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
daemon.c:1629 slapd_daemon_init()
Mutex #691 (0x0xfd8eb0) first referenced by:
mutrace.c:1448 pthread_mutex_init()
tpool.c:247 ldap_pvt_thread_pool_init0()
Mutex #692 (0x0xfd8f78) first referenced by:
mutrace.c:1448 pthread_mutex_init()
tpool.c:247 ldap_pvt_thread_pool_init0()
Mutex #694 (0x0xfd9108) first referenced by:
mutrace.c:1448 pthread_mutex_init()
tpool.c:247 ldap_pvt_thread_pool_init0()
Mutex #693 (0x0xfd9040) first referenced by:
mutrace.c:1448 pthread_mutex_init()
tpool.c:247 ldap_pvt_thread_pool_init0()
Mutex #530 (0x0x7fa641167040) first referenced by:
mutrace.c:1448 pthread_mutex_init()
mdb.c:3795 mdb_env_setup_locks()
mdb.c:3906 mdb_env_open()
init.c:174 mdb_db_open()
backend.c:224 backend_startup_one()
backend.c:330 backend_startup()
init.c:220 slap_startup()
libc-start.c:258 __libc_start_main()
Mutex #533 (0x0x90f720) first referenced by:
mutrace.c:1448 pthread_mutex_init()
attr.c:708 attr_init()
entry.c:95 entry_init()
init.c:117 slap_init()
[0x65742f706d742f00] ??() ??:0
Mutex #592 (0x0x7fa640f76108) first referenced by:
mutrace.c:1448 pthread_mutex_init()
connection.c:421 connection_init()
mutrace: Showing 10 mutexes in order of (write) contention count:
Mutex # Locked Changed Cont. cont.Time[ms] tot.Time[ms] avg.Time[ms]
535 102537 97558 11075 6417.798 247.221 0.002
528 119100 89722 8905 2224.464 373.760 0.003
527 105147 81593 7472 2074.638 339.454 0.003
691 63260 47195 2924 680.201 190.002 0.003
692 60573 47065 2537 504.543 180.968 0.003
694 55821 42992 2296 548.873 171.009 0.003
693 54335 39828 2234 644.864 188.289 0.003
530 2401 2068 1437 6222.018 865.117 0.360
533 23079 2464 246 87.519 66.836 0.003
592 34036 5594 236 91.546 1306.007 0.038
... ... ... ... ... ... ...
mutrace: Note that rwlocks are shown as two lines: write locks then read locks.
mutrace: 51 condition variables used.
Condvar #45 (0x0xfd8fa0) first referenced by:
mutrace.c:1744 pthread_cond_init()
tpool.c:250 ldap_pvt_thread_pool_init0()
Condvar #49 (0x0xfd9130) first referenced by:
mutrace.c:1744 pthread_cond_init()
tpool.c:250 ldap_pvt_thread_pool_init0()
Condvar #47 (0x0xfd9068) first referenced by:
mutrace.c:1744 pthread_cond_init()
tpool.c:250 ldap_pvt_thread_pool_init0()
Condvar #43 (0x0xfd8ed8) first referenced by:
mutrace.c:1744 pthread_cond_init()
tpool.c:250 ldap_pvt_thread_pool_init0()
mutrace: Showing 4 condition variables in order of wait contention count:
Cond # Waits Signals Cont. tot.Time[ms] cont.Time[ms] avg.Time[ms]
45 5580 27496 3692 13306.017 792.845 0.142
49 3798 26011 2333 13349.744 658.738 0.173
47 4364 24985 2204 14182.993 1001.988 0.230
43 3929 29665 2102 12901.160 1186.375 0.302
... ... ... ... ... ... ...
mutrace: Total runtime is 23920.267 ms.
mutrace: Results for SMP with 4 processors.
#####
Now the #1 contended mutex is the slap_op_time mutex. #2 and #3 are the socket
table mutexes. #4,5,6,7 are the threadpool workqueue mutexes. The important
thing to note is that while the total time all 4 of these mutexes are held
adds up to about the same as the single mutex in the previous run, the total
contended time is less than 1/6th of the previous run. I.e., splitting 1 mutex
into 4 reduced our overhead by a factor of 6.
For the condvar, the total contended time of the 4 cvs is 2.7x less than the
previous run.
The code is currently available on my mutex branch on ada.openldap.org. The
pool-pause feature hasn't yet been tested. When I've finished that I'll merge
to master for RE25.
I believe this patch finally gets the frontend out of LMDB's way. I've also
looked at restructuring the writer mutexes, but that may prove to be a
deadend, based on the signalling requirements we have.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/