Howard Chu wrote:
And unfortunately I had no time to do any more debugging until now; with
St. Patrick's Day this Tuesday I had gigs all weekend. I also see that
the test050 run I left overnight eventually crashed, and the symptoms
are the same as in Quanah's. So, there's still more to track down.
Look as if I might have hit the same, see stack trace at the end.
For reference:
violino:~/OD/hobj/tests/testrun> grep rid=003 !$
grep rid=003 slapd.1.log
=>do_syncrepl rid=003
do_syncrepl: rid=003 retrying (9 retries left)
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
olcSyncrepl: {2}rid=003 provider=ldap://localhost:9013/
binddn="cn=config" bin
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
olcSyncrepl: {2}rid=003 provider=ldap://localhost:9013/
binddn="cn=config" bin
=>do_syncrepl rid=003
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
do_syncrepl: rid=003 quitting
The odd thing here of course is that it should never jump from '9
retries left' to 'quitting', there should be at least 9 failures / retry
messages. Seems like we have a wild memory overwrite occurring.
I assume it is quitting due to config update. Looks to me as if
syncinfo structures are released while still active.
Rein
(gdb) where
#0 0x0000002a968d2540 in strlen () from /lib64/tls/libc.so.6
#1 0x0000002a968a4a0b in vfprintf () from /lib64/tls/libc.so.6
#2 0x0000002a968c4434 in vsnprintf () from /lib64/tls/libc.so.6
#3 0x0000002a958c3181 in lutil_debug (debug=<value optimized out>,
level=<value optimized out>, fmt=0x448076c8 "$") at debug.c:66
#4 0x00000000004957d1 in do_syncrepl (ctx=0x44807e90, arg=0x858150)
at syncrepl.c:1261
#5 0x0000002a9567e415 in ldap_int_thread_pool_wrapper (
xpool=<value optimized out>) at tpool.c:663
#6 0x0000002a9675310a in start_thread () from /lib64/tls/libpthread.so.0
#7 0x0000002a969288b3 in clone () from /lib64/tls/libc.so.6
#8 0x0000000000000000 in ?? ()
(gdb) print si
$1 = (syncinfo_t *) 0x0
(gdb) print *rtask
$2 = {next_sched = {tv_sec = 7598733802573148208,
tv_usec = 14422794207978861}, interval = {tv_sec = 384, tv_usec = 64},
tnext = {stqe_next = 0x84bc30}, rnext = {stqe_next = 0x858870},
routine = 0,
arg = 0x0, tname = 0x505cc0 "do_syncrepl", tspec = 0x857d94 "rid=004"}
(gdb) thr 8
[Switching to thread 8 (process 23265)]#0 0x0000002a968d2540 in strlen ()
from /lib64/tls/libc.so.6
(gdb) frame 4
#4 0x00000000004957d1 in do_syncrepl (ctx=0x41801e90, arg=0x858a30)
at syncrepl.c:1261
1261 Debug( LDAP_DEBUG_TRACE, "=>do_syncrepl %s\n", si->si_ridtxt, 0, 0 );
(gdb) print si
$3 = (syncinfo_t *) 0x20
(gdb) print *rtask
$4 = {next_sched = {tv_sec = 7526470944284832317,
tv_usec = 7598542775770181185}, interval = {tv_sec =
8751185004989543539,
tv_usec = 3683997482740818493}, tnext = {stqe_next =
0x6974202235203030},
rnext = {stqe_next = 0x333d74756f656d}, routine = 0xc0, arg = 0x20,
tname = 0x84bc10 "\220\004",
tspec = 0x69666e6f43657361<Address 0x69666e6f43657361 out of bounds>}