[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: syncrepl consumer locks up (ITS#3263)
On Monday 02 August 2004 19:41, Jong-Hyuk wrote:
> A stack trace when it is locked up and/or when a search to syncrepl<rid>
> entry is performed will help to identify the case. It'll also be of great
> help if you send me the syncrepl part of the slapd.conf file.
> - Jong-Hyuk
I finally could reproduce the problem. Following are the steps I took:
- stop provider
- after some time (1 or 2 minutes) restart provider
Now I waited some time (about 5 minutes or more) but the consumer didn't
reestablish the connection.
- stop consumer
- wait a couple seconds
- restart consumer
The consumer apparently did something and then vanished. I couldn't find a
core file or anything, also the log isn't very helpful:
Aug 3 13:52:27 panther slapd[6500]: [ID 542995 local4.debug] slapd shutdown:
waiting for 0 threads to terminate
Aug 3 13:52:28 panther slapd[6500]: [ID 486161 local4.debug] slapd stopped.
Aug 3 13:52:39 panther slapd[6897]: [ID 702911 local4.debug] @(#) $OpenLDAP:
slapd 2.2.15 (Aug 2 2004 17:32:49) $
Aug 3 13:52:39 panther
kuenne@gazelle:/usr/local/src/ldap/openldap-2.2.15/servers/slapd
Aug 3 13:52:39 panther slapd[6897]: [ID 527854 local4.debug] bdb_initialize:
Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)
Aug 3 13:52:39 panther last message repeated 1 time
Aug 3 13:52:39 panther slapd[6897]: [ID 294927 local4.debug] bdb_db_init:
Initializing bdb database
Aug 3 13:52:40 panther slapd[6898]: [ID 100111 local4.debug] slapd starting
That's all I could find.
- start consumer again
Now it runs but if I try to access the syncrepl123 entry it hangs. Following
are stacktraces for all the threads:
(dbx) where -v
current thread: t@1
=>[1] __lwp_wait(0x4, 0xffbff994, 0x39a94, 0xfefb2cb0, 0x5, 0xffbff92c), at
0xfef1e748
[2] lwp_wait(0x4, 0xffbff994, 0x44be4, 0x0, 0x52000, 0xa400), at 0xfefbdd7c
[3] _thrp_join(0x4, 0x0, 0x0, 0x1, 0x273bcc, 0xffbff994), at 0xfefb9900
[4] slapd_daemon(0x0, 0x1e1e44, 0x0, 0x0, 0x0, 0x0), at 0x46adc
[5] main(0x5, 0x2735a8, 0xffbffae8, 0x2cd2f8, 0x1e1c00, 0x0), at 0x3ac94
(dbx) threads
> t@1 a l@1 ?() running in __lwp_wait()
t@2 a l@2 reg_thread() sleep on 0x2740d8 in __lwp_park()
t@4 a l@4 ?() running in _libc_poll()
t@5 a l@5 ?() running in ___lwp_cond_wait()
t@6 a l@6 ?() running in ___lwp_cond_wait()
t@7 a l@7 ?() sleep on 0x283950 in __lwp_park()
t@8 a l@8 ?() sleep on 0x283950 in __lwp_park()
t@9 a l@9 ?() sleep on 0x283950 in __lwp_park()
t@10 a l@10 ?() sleep on 0x283950 in __lwp_park()
t@11 a l@11 ?() sleep on 0x283950 in __lwp_park()
(dbx) where -v t@2
current thread: t@2
=>[1] __lwp_park(0x0, 0xfedfbe60, 0x0, 0x1, 0x0, 0x0), at 0xfefc5f88
[2] cond_wait_queue(0x2740d8, 0xfefd8b88, 0x0, 0x0, 0xfef60200, 0xfefd8000),
at 0xfefc3230
[3] cond_wait_common(0x0, 0x275d68, 0xfedfbe60, 0x0, 0x0, 0x410fd148), at
0xfefc37a8
[4] _ti_cond_timedwait(0x2740d8, 0x275d68, 0xfedfbf98, 0x0, 0x0, 0x0), at
0xfefc3c38
[5] _cond_timedwait_cancel(0x2740d8, 0x275d68, 0xfedfbf98, 0xe10, 0x0, 0x0),
at 0xfefc3c6c
[6] slp_dequeue_timed(0x275d88, 0xfedfbf98, 0xfedfbf94, 0x0, 0x0, 0x0), at
0xff387ec8
[7] reg_thread(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff3860b0
(dbx) where -v t@3
current thread: t@3
dbx: read of registers from (0xfef60c00) failed -- debugger service failed
(dbx) where -v t@4
current thread: t@4
=>[1] _libc_poll(0xfa3ff2b0, 0x7, 0x1d4c, 0x0, 0x1b58, 0x0), at 0xfef1ca1c
[2] _libc_select(0x1d4c, 0xfef3f2b0, 0x0, 0xfa3ffee0, 0xfa3ffe60, 0x0), at
0xfeece980
[3] select(0x1b, 0xfa3ffe60, 0xfa3ffee0, 0x0, 0xfa3fff60, 0x248f44), at
0xfefbed74
[4] 0x456a4(0x23a3fc, 0xfa3fff60, 0x1e73f8, 0x1, 0x8, 0x1b), at 0x456a3
(dbx) where -v t@5
current thread: t@5
=>[1] ___lwp_cond_wait(0x1fe6d49e0, 0x1fe6d49c8, 0x0, 0xffffffffffffffff, 0x0,
0x0), at 0xfef1e830
[2] _lwp_cond_wait(0xfe6d49e0, 0xfe6d49c8, 0xffffefd8, 0xfffbd2bc,
0xfffbd448, 0x0), at 0xfef158ac
[3] __db_pthread_mutex_lock(0x2e8ea0, 0xfe6d49c8, 0xfe6d49e0, 0x2e8ea0, 0x1,
0x0), at 0xff24efb8
[4] __lock_get_internal(0x2e9218, 0x80005abd, 0x0, 0x0, 0x2, 0xfe6d49c8), at
0xff2f2da4
[5] __lock_vec(0xf9bff490, 0xfe6f3f40, 0xffffffff, 0xff2f0f04, 0x1, 0x0), at
0xff2f1058
[6] bdb_cache_entry_db_relock(0x2e8ea0, 0x80005abd, 0x4cd700, 0x1, 0x0,
0xf9bff650), at 0xb8a14
[7] bdb_cache_find_id(0xf9bffb1c, 0x0, 0x2286, 0xf9bff5ec, 0x0, 0xcd), at
0xb94d4
[8] bdb_dn2entry(0xf9bffb1c, 0x0, 0x4cd700, 0xf9bff660, 0x0, 0xcd), at
0xbd4c4
[9] bdb_entry_get(0xf9bffb1c, 0xf9bffb44, 0x0, 0xf9bff650, 0xf9bff660,
0xf9bff6c8), at 0xc0a8c
[10] backend_attribute(0xf9bffb1c, 0x0, 0xf9bffb44, 0x27ddc8, 0xf9bffa84,
0x0), at 0x55984
[11] 0x8e95c(0xf9bffb1c, 0x28c590, 0x7b, 0x23bc00, 0x0, 0x3cb1ec), at
0x8e95b
[12] do_syncrepl(0x1a, 0x2ebcd0, 0x0, 0x1d9800, 0x273b38, 0x0), at 0x8f958
[13] 0xec3b0(0x26ebc0, 0xf9bffe20, 0x5, 0x283930, 0x28, 0x283938), at
0xec3af
(dbx) where -v t@6
current thread: t@6
=>[1] ___lwp_cond_wait(0x1fe6d4eb0, 0x1fe6d4e98, 0x0, 0xffffffffffffffff, 0x0,
0x0), at 0xfef1e830
[2] _lwp_cond_wait(0xfe6d4eb0, 0xfe6d4e98, 0x0, 0x0, 0x0, 0x0), at
0xfef158ac
[3] __db_pthread_mutex_lock(0x2e8ea0, 0xfe6d4e98, 0xfe6d4eb0, 0x2e8ea0, 0x1,
0x0), at 0xff24efb8
[4] __lock_get_internal(0x2e9218, 0xc7, 0x0, 0x0, 0x1, 0xfe6d4e98), at
0xff2f2da4
[5] __lock_get_pp(0x2e8ea0, 0xc7, 0x0, 0xf933f6b8, 0x1, 0xf93ff9a4), at
0xff2f1de8
[6] 0xb8b28(0x2e8ea0, 0xc7, 0x4cd700, 0x0, 0x0, 0xf93ff9a4), at 0xb8b27
[7] bdb_cache_find_id(0x4cf068, 0x0, 0x2286, 0xf933f804, 0x0, 0xc7), at
0xb93f8
[8] bdb_dn2entry(0x4cf068, 0x0, 0x4cd700, 0xf93ff9b4, 0x1, 0xc7), at 0xbd4c4
[9] bdb_do_search(0x4cf068, 0xf93ffd84, 0x4cf068, 0x1f, 0x0, 0x0), at
0x9dc74
[10] do_search(0x4cf068, 0xf93ffd84, 0x9d744, 0x4cf098, 0x4cf098, 0x0), at
0x4b2b8
[11] 0x48da8(0xf93ffe20, 0x4cf068, 0x1d84a8, 0xf93ffd88, 0x37a8f0, 0x63), at
0x48da7
[12] 0xec3b0(0x26ebc0, 0xf93ffe20, 0x6, 0x283930, 0x378d48, 0x283938), at
0xec3af
(dbx) where -v t@7
current thread: t@7
=>[1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xfefd8000, 0x0), at 0xfefc5f88
[2] cond_wait_queue(0x283950, 0xfefd8b88, 0x0, 0x0, 0xfef60400, 0xfefd8000),
at 0xfefc3230
[3] _cond_wait_cancel(0x283950, 0x283938, 0x0, 0x0, 0x0, 0x0), at 0xfefc39ec
[4] pthread_cond_wait(0x283950, 0x283938, 0x10, 0x0, 0x0, 0x42), at
0xfefc3a28
[5] 0xec494(0x26ebc0, 0xf8bffe20, 0x7, 0x283930, 0x4cda60, 0x283938), at
0xec493
(dbx) where -v t@8
current thread: t@8
=>[1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xfefd8000, 0x0), at 0xfefc5f88
[2] cond_wait_queue(0x283950, 0xfefd8b88, 0x0, 0x0, 0xfef60e00, 0xfefd8000),
at 0xfefc3230
[3] _cond_wait_cancel(0x283950, 0x283938, 0x0, 0x0, 0x0, 0x0), at 0xfefc39ec
[4] pthread_cond_wait(0x283950, 0x283938, 0x10, 0x0, 0x0, 0x60), at
0xfefc3a28
[5] 0xec494(0x26ebc0, 0xf83ffe20, 0x8, 0x283930, 0xdd5728, 0x283938), at
0xec493
(dbx) where -v t@9
current thread: t@9
=>[1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xfefd8000, 0x0), at 0xfefc5f88
[2] cond_wait_queue(0x283950, 0xfefd8b88, 0x0, 0x0, 0xfef61000, 0xfefd8000),
at 0xfefc3230
[3] _cond_wait_cancel(0x283950, 0x283938, 0x0, 0x0, 0x0, 0x0), at 0xfefc39ec
[4] pthread_cond_wait(0x283950, 0x283938, 0x10, 0x0, 0x0, 0x60), at
0xfefc3a28
[5] 0xec494(0x26ebc0, 0xf7bffe20, 0x9, 0x283930, 0x4cda60, 0x283938), at
0xec493
(dbx) where -v t@10
current thread: t@10
=>[1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xfefd8000, 0x0), at 0xfefc5f88
[2] cond_wait_queue(0x283950, 0xfefd8b88, 0x0, 0x0, 0xfef61200, 0xfefd8000),
at 0xfefc3230
[3] _cond_wait_cancel(0x283950, 0x283938, 0x0, 0x0, 0x0, 0x0), at 0xfefc39ec
[4] pthread_cond_wait(0x283950, 0x283938, 0x10, 0x0, 0x0, 0x60), at
0xfefc3a28
[5] 0xec494(0x26ebc0, 0xf73ffe20, 0xa, 0x283930, 0x4cda60, 0x283938), at
0xec493
(dbx) where -v t@11
current thread: t@11
=>[1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xfefd8000, 0x0), at 0xfefc5f88
[2] cond_wait_queue(0x283950, 0xfefd8b88, 0x0, 0x0, 0xfef61400, 0xfefd8000),
at 0xfefc3230
[3] _cond_wait_cancel(0x283950, 0x283938, 0x0, 0x0, 0x0, 0x0), at 0xfefc39ec
[4] pthread_cond_wait(0x283950, 0x283938, 0x10, 0x0, 0x0, 0x60), at
0xfefc3a28
[5] 0xec494(0x26ebc0, 0xf6bffe20, 0xb, 0x283930, 0x4cda60, 0x283938), at
0xec493
Apparently t@5 is the syncrepl thread and t@6 is my search thread. The
database locks look like:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object
Locker Mode Count Status ----------------- Object ---------------
c0 READ 1 HELD 0x33158 len: 5 data: 0x000x00"0x860x00
80005abd WRITE 1 WAIT 0x33158 len: 5 data: 0x000x00"0x860x00
c7 READ 1 WAIT 0x33158 len: 5 data: 0x000x00"0x860x00
ba READ 1 HELD id2entry.bdb handle 0
c1 READ 1 HELD id2entry.bdb handle 0
bc READ 1 HELD dn2id.bdb handle 0
c3 READ 1 HELD dn2id.bdb handle 0
c8 READ 1 HELD objectClass.bdb handle 0
d4 READ 1 HELD uid.bdb handle 0
e0 READ 1 HELD uidNumber.bdb handle 0
cc READ 1 HELD gidNumber.bdb handle 0
da READ 1 HELD memberUid.bdb handle 0
dd READ 1 HELD automountKey.bdb handle 0
Other threads are still working, at least the server is still responding to
searches, just the syncrepl stuff hangs.
Karsten.
--
At the source of every error which is blamed on the computer you will
find at least two human errors, including the error of blaming it on
the computer.