[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
2.3.11/BDB bdb_cache_find_id deadlock
Hey,
Server: OpenLDAP 2.3.11
Backend: BDB 4.2.52 + patches
Server is replicated to from a master, and otherwise used for
read-operations only.
I'm looking at a deadlock we're currently suffering from. Some threads
are still serving, but the majority are stuck, with this backtrace:
Thread 47 (Thread 1643199408 (LWP 2240)):
#0 0x400007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x4026fa86 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#2 0x40039b7f in __db_pthread_mutex_lock_openldap_slapd_rhl_42 () from
/usr/lib/tls/i686/libslapd_db-4.2.so
#3 0x400b6f7c in __lock_get_openldap_slapd_rhl_42 () from
/usr/lib/tls/i686/libslapd_db-4.2.so
#4 0x400b6864 in __lock_get_openldap_slapd_rhl_42 () from
/usr/lib/tls/i686/libslapd_db-4.2.so
#5 0x400b67a2 in __lock_get_pp_openldap_slapd_rhl_42 () from
/usr/lib/tls/i686/libslapd_db-4.2.so
#6 0x08102f6d in bdb_cache_entry_db_relock ()
#7 0x08103836 in bdb_cache_find_id ()
#8 0x080d7047 in bdb_search ()
#9 0x0807644c in fe_op_search ()
#10 0x08075bd3 in do_search ()
#11 0x08073d5a in connection_done ()
#12 0x08175848 in ldap_pvt_thread_pool_destroy ()
#13 0x4026d341 in start_thread () from /lib/tls/libpthread.so.0
#14 0x4034efee in clone () from /lib/tls/libc.so.6
In back-bdb/cache.c bdb_cache_find_id(), we have:
if ( locker2 != locker ) {
/* If we're using the per-thread txn, release all
* of its page locks now.
*/
DB_LOCKREQ list;
list.op = DB_LOCK_PUT_ALL;
list.obj = NULL;
bdb->bi_dbenv->lock_vec( bdb->bi_dbenv, locker2,
0, &list, 1, NULL );
/* If this txn was deadlocked, we must abort it
* and invalidate this per-thread txn.
*/
if ( rc == DB_LOCK_DEADLOCK ) {
bdb_txn_get( op, bdb->bi_dbenv, <id, 1 );
}
}
Shouldn't the call to 'lock_vec' be setting 'rc' here ?
The only other thing I can think of is that we are downgrading the wrong
lock in the preceeding code:
that would have mattered due to the DB_LOCK_PUT_ALL applying to the
transaction.
if ( rc == 0 ) {
/* If we succeeded, downgrade back to a readlock. */
rc = bdb_cache_entry_db_relock( bdb->bi_dbenv, locker,
*eip, 0, 0, lock );
} else {
I would have thought this call was redundant in the case where locker !=
locker2, since DB_LOCK_PUT_ALL would clear up the write-lock we claim
for the transaction.
I notice that this code has disappeared with revision 1.106 of cache.c
though, so perhaps that clears the issue I'm seeing as well.
Any thoughts ?
Regards,
Nick.