[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#4108) LWL causing core dumps in HEAD
hyc@symas.com wrote:
> Running slapd on a single processor machine, I no longer see any
> asserts. Running on my dual machine, if syslog is enabled, I
> occasionally see an assert in slapd_set_read() called from
> connection_read(). With syslog disabled, this problem doesn't occur. No
> bright ideas come to mind at the moment...
Some context:
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87548 op=1 UNBIND
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87548 fd=15 closed
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 fd=15 ACCEPT from
IP=192.168.1.22:23726 (IP=0.0.0.0:9000)
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 BIND
dn="cn=Sandi Couto,ou=peons,o=foo" method=128
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 BIND
dn="cn=Sandi Couto,ou=Peons,o=foo" mech=SIMPLE ssf=0
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 RESULT
tag=97 err=0 text=
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=1 UNBIND
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 fd=15 closed
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87550 fd=15 ACCEPT from
IP=192.168.1.22:23727 (IP=0.0.0.0:9000)
daemon: listen=11, new connection on 15
daemon: added 15r
daemon: activity on 2 descriptors
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 2 descriptors
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 2 descriptors
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 busy
daemon: removing 15
daemon: listen=11, new connection on 15
slapd.head: ../../../head/servers/slapd/daemon.c:567: slapd_set_read:
Assertion `((slap_daemon.sd_index[(s)]) != -1)' failed.
daemon: activity on 1 descriptor
Program received signal SIGABRT, Aborted.
[Switching to Thread 1090517344 (LWP 10355)]
0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
#1 0x00002aaaab6e8c7e in abort () from /lib64/tls/libc.so.6
#2 0x00002aaaab6e1420 in __assert_fail () from /lib64/tls/libc.so.6
#3 0x000000000042e7a3 in slapd_set_read (s=15, wake=1)
at ../../../head/servers/slapd/daemon.c:567
#4 0x00000000004369bf in connection_read (s=15, op=0x40fff480)
at ../../../head/servers/slapd/connection.c:1665
#5 0x0000000000435efe in connection_read_thread (ctx=0x40fff4c0, argv=0xf)
at ../../../head/servers/slapd/connection.c:1419
#6 0x0000000000556184 in ldap_int_thread_pool_wrapper (xpool=0x788710)
at ../../../head/libraries/libldap_r/tpool.c:615
#7 0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#8 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
(gdb) info thr
10 Thread 1149237600 (LWP 10362) 0x00002aaaab5ad55a in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
9 Thread 1140848992 (LWP 10361) 0x00002aaaab5ad55a in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
8 Thread 1132460384 (LWP 10360) 0x00002aaaab5ad55a in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
7 Thread 1124071776 (LWP 10359) 0x00002aaaab5ad55a in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
6 Thread 1115683168 (LWP 10358) 0x00002aaaab5af8eb in
__lll_mutex_lock_wait
() from /lib64/tls/libpthread.so.0
5 Thread 1107294560 (LWP 10357) 0x00002aaaab5af8eb in
__lll_mutex_lock_wait
() from /lib64/tls/libpthread.so.0
4 Thread 1098905952 (LWP 10356) 0x00002aaaab5ad55a in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
* 3 Thread 1090517344 (LWP 10355) 0x00002aaaab6e76cd in raise ()
from /lib64/tls/libc.so.6
2 Thread 1082128736 (LWP 10337) 0x00002aaaab5af8eb in
__lll_mutex_lock_wait
() from /lib64/tls/libpthread.so.0
1 Thread 46912511001792 (LWP 10328) 0x00002aaaab5abd2f in pthread_join ()
from /lib64/tls/libpthread.so.0
(gdb) thr 6
[Switching to thread 6 (Thread 1115683168 (LWP 10358))]#5
0x000000000043108a in slap_listener (sl=0x2aaaab71c520) at
../../../head/servers/slapd/daemon.c:1519
1519 slapd_add( s, 1, NULL );
(gdb) bt
#0 0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
from /lib64/tls/libpthread.so.0
#1 0x0000000000000000 in ?? ()
#2 0x0000000000000008 in ?? ()
#3 0x00002aaaab5ac877 in pthread_mutex_lock () from
/lib64/tls/libpthread.so.0
#4 0x00002aaaab76ddc8 in syslog () from /lib64/tls/libc.so.6
#5 0x000000000043108a in slap_listener (sl=0x2aaaab71c520)
at ../../../head/servers/slapd/daemon.c:1519
(gdb) thr 5
[Switching to thread 5 (Thread 1107294560 (LWP 10357))]#0
0x00002aaaab5af8eb in __lll_mutex_lock_wait () from
/lib64/tls/libpthread.so.0
(gdb) bt 20
#0 0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
from /lib64/tls/libpthread.so.0
#1 0x0000000000766fe8 in slap_counters ()
#2 0x0000000000000000 in ?? ()
#3 0x00002aaaab5ac877 in pthread_mutex_lock () from
/lib64/tls/libpthread.so.0
#4 0x000000000082e7c0 in ?? ()
#5 0x0000000041fff290 in ?? ()
#6 0x000000000058a3e9 in ber_memcalloc (n=46912527370128, s=1107293072)
at ../../../head/libraries/liblber/memory.c:306
#7 0x00000000004593b9 in do_bind (op=0x2aaaadf02f20, rs=0x41fff3e0)
at ../../../head/servers/slapd/bind.c:54
#8 0x0000000000435a98 in connection_operation (ctx=0x41fff4c0,
arg_v=0x2aaaadf02f20) at ../../../head/servers/slapd/connection.c:1292
#9 0x0000000000556184 in ldap_int_thread_pool_wrapper (xpool=0x788710)
at ../../../head/libraries/libldap_r/tpool.c:615
#10 0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#11 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
(gdb) thr 2
[Switching to thread 2 (Thread 1082128736 (LWP 10337))]#0
0x00002aaaab5af8eb in __lll_mutex_lock_wait () from
/lib64/tls/libpthread.so.0
(gdb) bt 20
#0 0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
from /lib64/tls/libpthread.so.0
#1 0x0000000000000000 in ?? ()
#2 0x0000000000000000 in ?? ()
#3 0x00002aaaab5ac877 in pthread_mutex_lock () from
/lib64/tls/libpthread.so.0
#4 0x0000000000000001 in ?? ()
#5 0x00000000005962ce in __PRETTY_FUNCTION__.11610 ()
#6 0x0000000000000000 in ?? ()
#7 0x0000000000435ed3 in connection_client_stop (s=0)
at ../../../head/servers/slapd/connection.c:1404
#8 0x00000000004319a1 in slapd_daemon_task (ptr=0x0)
at ../../../head/servers/slapd/daemon.c:1734
#9 0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#10 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
(More stack frames follow...)
(gdb) frame 8
#8 0x00000000004319a1 in slapd_daemon_task (ptr=0x0)
at ../../../head/servers/slapd/daemon.c:1734
1734 ldap_pvt_thread_mutex_lock( &slap_daemon.sd_mutex );
(gdb) thr 3
[Switching to thread 3 (Thread 1090517344 (LWP 10355))]#0
0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
(gdb) frame 4
#4 0x00000000004369bf in connection_read (s=15, op=0x40fff480)
at ../../../head/servers/slapd/connection.c:1665
1665 slapd_set_read( s, 1 );
(gdb) p *c
$13 = {c_struct_state = 2, c_conn_state = 3, c_conn_idx = 3,
c_close_reason = 0x596d1b "?", c_mutex = {__m_reserved = 2, __m_count = 0,
__m_owner = 0x100002873, __m_kind = 0, __m_lock = {__status = 0,
__spinlock = 0}}, c_sb = 0x2aaaadc02d90, c_starttime = 1130586974,
c_activitytime = 1130586974, c_connid = 87550, c_peer_domain = {bv_len
= 7,
bv_val = 0x2aaaadf03460 "unknown"}, c_peer_name = {bv_len = 21,
bv_val = 0x2aaaadf034e0 "IP=192.168.1.22:23727"}, c_listener = 0x76a260,
c_sasl_bind_in_progress = 0, c_sasl_bind_mech = {bv_len = 0, bv_val =
0x0},
c_sasl_dn = {bv_len = 0, bv_val = 0x0}, c_sasl_authz_dn = {bv_len = 0,
bv_val = 0x0}, c_authz_backend = 0x0, c_authz_cookie = 0x0, c_authz = {
sai_method = 128, sai_mech = {bv_len = 0, bv_val = 0x0}, sai_dn = {
bv_len = 0, bv_val = 0x0}, sai_ndn = {bv_len = 0, bv_val = 0x0},
sai_ssf = 0, sai_transport_ssf = 0, sai_tls_ssf = 0, sai_sasl_ssf = 0},
c_protocol = 0, c_ops = {stqh_first = 0x2aaaadf02f20,
stqh_last = 0x2aaaadf03078}, c_pending_ops = {stqh_first = 0x0,
stqh_last = 0x2aaaac8788a8}, c_write_mutex = {__m_reserved = 0,
__m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0,
__spinlock = 0}}, c_write_cv = {__c_lock = {__status = 8589934592,
__spinlock = 1}, __c_waiting = 0x1,
__padding = "\001\000\000\000\000\000\000\000\220\207\207»*\000",
__align = 0}, c_currentber = 0x2aaaadc00930, c_writewaiter = 0,
c_is_tls = 0, c_needs_tls_accept = 0, c_sasl_layers = 0, c_sasl_done = 0,
c_sasl_authctx = 0x2aaaadf00b40, c_sasl_sockctx = 0x0,
c_sasl_extra = 0x2aaaadf03630, c_sasl_bindop = 0x0,
c_pagedresults_state = {
ps_be = 0x0, ps_size = 0, ps_cookie = 0, ps_count = 0},
c_n_ops_received = 1, c_n_ops_executing = 1, c_n_ops_pending = 0,
c_n_ops_completed = 0, c_n_get = 1, c_n_read = 1, c_n_write = 0, c_pb
= 0x0,
c_extensions = 0x0, c_clientfunc = 0, c_clientarg = 0x0,
c_send_ldap_result = 0x4490d9 <slap_send_ldap_result>,
c_send_search_entry = 0x449e34 <slap_send_search_entry>,
c_send_search_reference = 0x44c0ac <slap_send_search_reference>,
c_send_ldap_extended = 0x449973 <slap_send_ldap_extended>,
c_send_ldap_intermediate = 0x449c15 <slap_send_ldap_intermediate>}
(gdb)
What really doesn't make sense here is that the reader thread that
crashed shouldn't even be running yet. I.e., the debug output shows that
fd 15 was just removed from slapd control. Then a new connection was
accepted, using that same fd. But the listener thread hasn't added 15
back into slapd control yet - that's the slapd_add() call that happens
right after the StatsLog invocation. So, fd 15 really is still invalid
at this point. But since it is invalid, there should not be any active
reader threads using that descriptor yet. Notice that thread 5 has
already read a Bind operation from that fd, and the conn->c_conn_state
is SLAP_C_BINDING. Again, it should not be possible for a reader thread
to have already gotten this far, since the accepting thread hasn't
enabled the descriptor yet.
The only possibility that comes to mind is that this reader thread is
leftover from the previous invocation. I.e., it was submitted into the
pool while the previous connection was active, and the connection was
closed before the thread got to run. Since the listener thread has just
called connection_init(), the conn structure is valid when the reader
thread runs. But since the listener hasn't called slapd_add() yet, the
socket is still invalid.
Moving the slapd_add() so that it occurs while the conn structure is
locked appears to fix this problem.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/