[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-bdb deadlocks



I assume you aren't using ACLs which might access other entries?

Looks like be_group() and be_attribute() are using the transaction
id passed in via o_private....

Kurt

At 04:40 PM 2002-01-16, Howard Chu wrote:
>I believe a similar solution to this is needed in back-bdb, but I haven't
>yet spent much time thinking about it can work. The current situation is that
>back-bdb gets deadlocked very easily by more than 2 simultaneous updaters.
>There doesn't seem to be any easy solution to this; my debug session with three
>deadlocked threads showed that two of them were deadlocked in txn_abort - i.e.,
>deadlock detection already did it's job, but new locks were still needed to
>unravel the transaction. Pretty rude. (I suspect this situation arose because I
>was testing bdb_add where the parent entry was obtained outside of the
>transaction. With the parent entry inside the transaction, the deadlock still
>occurs, but doesn't get detected!)
>
>Someone needs to take a careful look at what records we touch and in what
>order. My suspicion is that we cannot guarantee that we can keep transactions
>out of each others' way. Considering that the deeper a tree gets, the more
>objects get locked for a transaction, the number of locks being held gets huge.
>I briefly experimented with nested transactions before, with no better success.
>I guess this is because even after the nested transaction is committed, none of
>its resources/locks are released until the parent transaction commits. Perhaps
>I've gone the wrong direction; and maybe all operations (even read-only) need
>to be wrapped in transactions just to keep locks consistent.
>
>  -- Howard Chu
>  Chief Architect, Symas Corp.       Director, Highland Sun
>  http://www.symas.com               http://highlandsun.com/hyc
>  Symas: Premier OpenSource Development and Support
>
>-----Original Message-----
>From: owner-openldap-bugs@OpenLDAP.org
>[mailto:owner-openldap-bugs@OpenLDAP.org] On Behalf Of
>adamson@andrew.cmu.edu
>Sent: Thursday, October 11, 2001 8:04 AM
>To: openldap-its@OpenLDAP.org
>Subject: Re: index corruption (1164) still present in 2.0.15 with db
>3.1.17 (ITS#1359)
>
>
>> I think you've hit on the problem.  This likely effects
>> DN parent/subtree indices as well.  We can either try to
>> add some fine grained r/w locks or one giant r/w lock.
>> I suggest the latter and that it initially be quite
>> giant.  That is, every "database" have one giant lock
>> which is acquired for read for LDAP read operations and
>> acquired for write for LDAP write operations.  This
>> will serialize all writes within a given "database".
>
>I have a program that splices a lot of threads and does a lot of
>add & delete operations to several attributes in many entries. (My days at
>Transarc IBM testing AFS are coming back to me) The idea is to have many
>threads in the slapd server accessing the same keys in the same index
>files at the same time.  It does a pretty efficient job of causing and
>detecting index corruption.
>
>I did some experimental work on locking by adding a mutex to each dbcache
>structure in back-ldbm. It is only invoked in key_change() in key.c,
>around the two calls to change an index item:
>
>        ldap_pvt_thread_mutex_lock( &db->dbc_write_mutex );
>        if (op == SLAP_INDEX_ADD_OP) {
>            /* Add values */
>            rc = idl_insert_key( be, db, key, id );
>
>        } else {
>            /* Delete values */
>            rc = idl_delete_key( be, db, key, id );
>        }
>        ldap_pvt_thread_mutex_unlock( &db->dbc_write_mutex );
>
>If I placed the lock higher in the stack, such as in indexer(), it also
>works, but that means the lock is in place while CPU intensive indexing
>functions for matching rules are being called, which is inefficient.
>The lock cannot be placed any lower in the stack, for greater granularity,
>since idl_insert_key and idl_delete_key are the ones doing the
>modifications.  A greater level of granularity could be achieved by
>locking on a dbcache-key basis, but this would require a lot more
>complexity (lists of keys that are being modified).
>
>I initialize the mutex when the dbcache is being initialized (e.g. the
>cache name is being copied in) in ldbm_cache_open() and I destroy the
>mutex in ldbm_cache_destroy().
>
>
>My particular test program is no longer able to corrupt the index files
>with this "medium grain" locking.
>
>
>-Mark Adamson
> Carnegie Mellon