[Date Prev][Date Next] [Chronological] [Thread] [Top]
RE: question on back-bdb locking issue.

To: "Howard Chu" <hyc@highlandsun.com>
Subject: RE: question on back-bdb locking issue.
From: "Jonghyuk Choi" <jongchoi@us.ibm.com>
Date: Tue, 9 Apr 2002 17:42:05 -0400
Cc: <kurt@OpenLDAP.org>, openldap-devel@OpenLDAP.org
Importance: Normal
Sensitivity:

>The lock_detect_task was removed because it was too slow to detect and
>recover from deadlocks. Instead the BDB library is configured to perform a
>deadlock check on every lock conflict.

Are these two mechanism (1. periodic check; 2. perform check upon conflict)
able to coexist ? BTW, what was the benefits of having lock_detect_task() ?

>In my read of the notes below, I see some problems with using a timeout to
>detect BDB/cache deadlocks:
  >1) we already eliminated the lock_detect_task, which works the same way,
>because it gave extremely poor performance in the presence of multiple
>writer threads.
  >2) this approach forces a particular BDB transaction to abort when it's
>deadlocked with a cache lock. It interferes with the BDB library's
deadlock
>resolution algorithm. I don't know for sure that this is a huge problem,
>but it seems to me that the deadlock recovery will be less effective in
>this scenario. It seems that you could wind up aborting several BDB
>transactions with no beneficial effect due to a single cache lock, and
>it will require a full timeout to occur before each abort happens. Again,
>there could be a noticable performance loss.

  When we set timeout, it can actually co-exist with the deadlock detection
mechanism
  of BDB. However, as you pointed out, the BDB mechanism cannot detect a
deadlock
  which involves a cache entry lock with it. The timeout works in this case
only.
  Deadlocks among only BDB locks continue to be resolved by the BDB
detection mechanism.
  If the possibility of such a deadlock (between BDB locks and cache entry
locks) is quite low,
  the loss would not be noticeable.

>Of the three alternatives listed above, I still lean toward #2 but I guess
>we need to see some timing profiles to determine if it's actually worth
>doing or if the BDB library overhead makes it too expensive.

What would be the granularity of BDB locks in #2 case ?
Does it make sense to have per cache entry BDB lock ?

- Jong

=================================================================================
> -----Original Message-----
> From: Jonghyuk Choi [mailto:jongchoi@us.ibm.com]

> It seems that there are three possible approaches to the deadlock problem
> of back-bdb.
>
> 1) Use of single reader / writer lock such as giant_rwlock in back-ldbm.
> 2) Use of BerkeleyDB locking feature for non-BerkeleyDB operations (cache
> mutex, entry)
> 3) Use of timeout by calling DB_ENV->set_timeout() function.
>     Even when cache entry lock and BerkeleyDB lock interfere, BerkeleyDB
> lock will get
>     DEADLOCK error after a preset timeout time later (precision of lock
> detector runs, though).
>
> 1) does not look good because of poor performance especially with
frequent
> add/update/modify.

agreed

> 2) may result in a major redesign and BerkeleyDB locking can be
> more costly
> in performance than posix thread locking.

don't know but I'll take your word for it since you seem to have already
examined this
>
> Need comments on 3).
> When BerkeleyDB lock can be timed-out to make transaction rollback with
> releasing other types of locks,
> then deadlock does not seem to occur.  (please refer to the note at the
> end)
>
> Because the time out value can be set per-transaction as well as at
DB_ENV
> level,
> we may dynamically change the value to make the deadlock detection time
> short and
> to reduce the number of deadlock detection (due to false detection) at
the
> same time.

Don't have any ideas at present about how we would tune this timeout.
>
> BTW, does lock_detect_task() of back-bdb/init.c need more work to be
> included ? (currently excluded by #if 0 ... #endif)
> This task seems to perform lock detection every
> bdb->bi_lock_detect_seconds
> interval, which is specified in slapd.conf
> along with the type of detector.

The lock_detect_task was removed because it was too slow to detect and
recover from deadlocks. Instead the BDB library is configured to perform a
deadlock check on every lock conflict.


In my read of the notes below, I see some problems with using a timeout to
detect BDB/cache deadlocks:
  1) we already eliminated the lock_detect_task, which works the same way,
because it gave extremely poor performance in the presence of multiple
writer threads.
  2) this approach forces a particular BDB transaction to abort when it's
deadlocked with a cache lock. It interferes with the BDB library's deadlock
resolution algorithm. I don't know for sure that this is a huge problem,
but it seems to me that the deadlock recovery will be less effective in
this scenario. It seems that you could wind up aborting several BDB
transactions with no beneficial effect due to a single cache lock, and
it will require a full timeout to occur before each abort happens. Again,
there could be a noticable performance loss.

Of the three alternatives listed above, I still lean toward #2 but I guess
we need to see some timing profiles to determine if it's actually worth
doing or if the BDB library overhead makes it too expensive.

  -- Howard

> ==================================================================
> =======================
>
> There are currently three types of locking in slapd with back-bdb.
> 1) cache mutex
> 2) cache entry reader / writer lock
> 3) BerkeleyDB transactional store's page lock
>
> The interaction between cache mutex and cache entry lock has already been
> taken care of. Attempts to lock a cache entry occurs at two points.
> One is trylock() in bdb_cache_find_entry_id() and the other is lock() in
> bdb_cache_entry_rw(). Because the lock is not acquired and mutex lock is
> released
> after trylock() fails, deadlock condition doesn't exist with trylock().
> bdb_cache_entry_rw() is invoked from two points : bdb_add() and
> id2entry_rw
> ().
> bdb_add() tries to add a newly created entry to the entry cache, so that
> there's no
> other threads that try to add the same entry, and accordingly no
> possibility of deadlock.
> id2entry_rw() tries to return the entry having the specified id.
> It's possible for two threads to attempt to retreive the same entry at
the
> same time.
> After the first thread invokes bdb_cache_add_entry_rw(), the second
thread
> will be notified that the entry already exists in the AVL tree of
> the entry
> cache.
> The second thread, then tries to retrieve the entry from the cache using
> bdb_cache_find_entry_id().
> If the entry is deleted between the add and retrieval attempts of the
> second thread,
> it repeats the process of adding and retrieval attempts.
> When the add and retrieval attempts return fail, they do not acquire
cache
> entry locks
> and they release the cache mutex lock before exit, there's no risk of
> deadlock.
>
> The page lock of BerkeleyDB transactional store is managed by BerkeleyDB
> database
> by a deadlock recovery mechanism. When a DB operation is given a deadlock
> return code,
> we can abort the transaction and restart from the start. All the
> locks held
> by a transaction
> are released when we rollback the transaction, thereby breaking the
> deadlock.
>
> The deadlock recovery scheme of BerkeleyDB can also make it possible to
> recover from
> a possible deadlock condition in which different types of locks, such as
> the cache mutex
> and the cache entry lock are involved, when it is configure to be
> timed-out.
> Suppose after a thread has locked a DB page lock and another thread has
> locked a cache entry lock,
> they attempt to acquire locks already held by each other. BerkeleyDB
> detects the long delay
> in the DB operation and gives it the DEADLOCK return code. When the
thread
> rolls back
> it will release the cache entry lock. Hence the other thread
> which has been
> blocked on the
> entry lock can acquire it and can continue.
>
>
> ------------------------
> Jong Hyuk Choi
> IBM Thomas J. Watson Research Center - Enterprise Linux Group
> P. O. Box 218, Yorktown Heights, NY 10598
> email: jongchoi@us.ibm.com
> (phone) 914-945-3979    (fax) 914-945-4425   TL: 862-3979

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support
Prev by Date: Re: api question
Next by Date: Should slurpd as a separate process?
Index(es):
- Chronological
- Thread