[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5707) HEAD/RE24 and BDB 4.7.25p1 hanging
hyc@symas.com wrote:
> I was unable to reproduce the problem on my multi-core machines, but I do see
> it on a single-core machine. I've sent a backtrace and other debug info to the
> Oracle folks, will see what they have to say.
I see the problem; it's a bug in BDB's multi-partition lock manager. When
using multiple lock table partitions, it obtains a lock on the system-wide
lock mutex and a lock on the per-region mutex. On a single core system it
defaults to a single lock table. In this case, the macro that obtains the
system-wide lock behaves identically to the per-region lock. I.e., both
attempt to acquire the exact same mutex. Since it's already held, the process
deadlocks.
(gdb) bt
#0 0xb7f37424 in __kernel_vsyscall ()
#1 0xb7b36c4e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0xb7b32a3c in _L_mutex_lock_88 () from /lib/libpthread.so.0
#3 0xb7b3242d in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104)
at ../dist/../mutex/mut_pthread.c:207
#5 0xb7daad19 in __lock_getobj (lt=0x8a84848, obj=0xbfd492ec, ndx=492,
create=1, retp=0xbfd491e4) at ../dist/../lock/lock.c:1470
#6 0xb7da7f53 in __lock_get_internal (lt=0x8a84848, sh_locker=0xb776d508,
flags=1, obj=0xbfd492ec, lock_mode=DB_LOCK_READ, timeout=0,
lock=0xbfd493cc) at ../dist/../lock/lock.c:588
#7 0xb7da77d6 in __lock_get_api (env=0x8a84550, locker=2147483659, flags=1,
obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc)
at ../dist/../lock/lock.c:423
#8 0xb7da765b in __lock_get_pp (dbenv=0x8a841c0, locker=2147483659, flags=1,
obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc)
at ../dist/../lock/lock.c:395
#9 0x08124fb8 in bdb_dn2id_lock (bdb=0x8a68620, dn=0xbfd493f0, rw=0,
txn=0x8a890b8, lock=0xbfd493cc)
at ../../../../head/servers/slapd/back-bdb/dn2id.c:47
#10 0x08125d7d in bdb_dn2id (op=0xbfd49640, dn=0xbfd493f0, ei=0xbfd493e0,
txn=0x8a890b8, lock=0xbfd493cc)
at ../../../../head/servers/slapd/back-bdb/dn2id.c:307
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) frame 4
#4 0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104)
at ../dist/../mutex/mut_pthread.c:207
207 RET_SET((pthread_mutex_lock(&mutexp->mutex)), ret);
(gdb) p *mutexp
$1 = {mutex = {__data = {__lock = 2, __count = 0, __owner = 29470, __kind = 0,
__nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
__size =
"\002\000\000\000\000\000\000\000\036s\000\000\000\000\000\000\001\000\000\000\000\000\000",
__align = 2}, cond = {__data = {__lock = 0,
__futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0,
__mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = '\0' <repeats 47 times>, __align = 0}, pid = 29470,
tid = 3080046272, mutex_next_link = 0, alloc_id = 6, mutex_set_wait = 1,
mutex_set_nowait = 129, flags = 3}
(gdb)
The mutex being acquired in frame 4 is the same one that was already acquired
in frame 7, __lock_get_api line 418.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/