[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5391) hdb deadlock
richton@nbcs.rutgers.edu wrote:
> Full_Name: Aaron Richton
> Version: 2.3.40
> OS: Solaris 9
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (128.6.31.135)
>
>
> One hdb backend on one slave died ~21:58 yesterday...
>
> current thread: t@5
> [1] _libc_poll(0xffffffff4f3ff430, 0x0, 0x3e8, 0x0, 0x0, 0x0), at
> 0xffffffff7f0a741c
> [2] _select(0x3e8, 0xffffffff7f1bc728, 0xffffffff7f1bc728, 0x0,
> 0xffffffff7f1bc728, 0x0), at 0xffffffff7f05a74c
> [3] select(0x0, 0x0, 0x0, 0x0, 0xffffffff4f3ff5b0, 0x0), at
> 0xffffffff7e0108e8
> =>[4] __os_sleep(dbenv = 0x1005b2610, secs = 1U, usecs = 0), line 84 in
> "os_sleep.c"
> [5] __memp_sync_int(dbenv = 0x1005b2610, dbmfp = (nil), trickle_max = 0, op =
> DB_SYNC_CACHE, wrotep = (nil)), line 362 in "mp_sync.c"
> [6] __memp_sync(dbenv = 0x1005b2610, lsnp = (nil)), line 99 in "mp_sync.c"
> [7] __txn_checkpoint(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
> flags = 0), line 1389 in "txn.c"
> [8] __txn_checkpoint_pp(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
> flags = 0), line 1288 in "txn.c"
> [9] hdb_checkpoint(ctx = 0xffffffff4f3ffc30, arg = 0x1004b4c60), line 165 in
> "config.c"
> [10] ldap_int_thread_pool_wrapper(xpool = 0x10041e500), line 478 in "tpool.c"
>
> (dbx) where
> current thread: t@16
> [1] _libc_poll(0xffffffff46ffe3e0, 0x0, 0x3e8, 0x0, 0x0, 0x0), at
> 0xffffffff7f0a741c
> [2] _select(0x3e8, 0xffffffff7f1bc728, 0xffffffff7f1bc728, 0x0,
> 0xffffffff7f1bc728, 0x0), at 0xffffffff7f05a74c
> [3] select(0x0, 0x0, 0x0, 0x0, 0xffffffff46ffe560, 0x0), at
> 0xffffffff7e0108e8
> =>[4] __os_sleep(dbenv = 0x1005b2610, secs = 1U, usecs = 0), line 84 in
> "os_sleep.c"
> [5] __memp_sync_int(dbenv = 0x1005b2610, dbmfp = (nil), trickle_max = 0, op =
> DB_SYNC_CACHE, wrotep = (nil)), line 439 in "mp_sync.c"
> [6] __memp_sync(dbenv = 0x1005b2610, lsnp = (nil)), line 99 in "mp_sync.c"
> [7] __txn_checkpoint(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
> flags = 0), line 1389 in "txn.c"
> [8] __txn_checkpoint_pp(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
> flags = 0), line 1288 in "txn.c"
> [9] hdb_delete(op = 0xffffffff46fff618, rs = 0xffffffff46fff088), line 537 in
> "delete.c"
> [10] syncrepl_entry(si = 0x1004b4e50, op = 0xffffffff46fff618, entry = (nil),
> modlist = 0xffffffff46fff320, syncstate = 3, syncUUID = 0xffffffff46fff3c0,
> syncCookie_req = 0xffffffff46fff360, syncCSN =
> 0xffffffff46fff390), line 2006 in "syncrepl.c"
> [11] do_syncrep2(op = 0xffffffff46fff618, si = 0x1004b4e50), line 731 in
> "syncrepl.c"
> [12] do_syncrepl(ctx = 0xffffffff46fffc30, arg = 0x1004b5030), line 1095 in
> "syncrepl.c"
> [13] ldap_int_thread_pool_wrapper(xpool = 0x10041e500), line 478 in "tpool.c"
>
>
> I can't get db_stat to join the environment. If there's anything else that can
> be gleaned from slapd itself, I'd be glad to poke around the core; otherwise,
> I'm off to rm/slapadd...
>
> "This makes sense and shouldn't happen in 2.3.41" would be fine too, but none of
> the changes (to my eye) looked locking related.
Unfortunately no, nothing familiar here. There's nothing in the BDB
documentation that says two threads are not allowed to call txn_checkpoint
concurrently, but I suppose it may be excessive to make multiple calls in
rapid succession.
One thing that I've started doing recently in my configs is to skip the #bytes
option (leave it zero), so that only time-based checkpoints occur. Since
they're done in a dedicated task, only one thread at a time can trigger a
checkpoint.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/