[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB killed process and LOCK_MUTEX_W()

To: Dimitrios Apostolou <jimis@gmx.net>
Subject: Re: LMDB killed process and LOCK_MUTEX_W()
From: Howard Chu <hyc@symas.com>
Date: Wed, 16 Jul 2014 06:03:02 -0700
Cc: openldap-technical@openldap.org
In-reply-to: <alpine.DEB.2.10.1407161337380.3268@localhost>
References: <alpine.DEB.2.10.1407161337380.3268@localhost>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26a1

Dimitrios Apostolou wrote:

Hello,

in my program using LMDB, I've experienced rare deadlocks in highly
concurrent mixed (read/write/cursor iteration) workloads. The end result
is that hundreds of threads are hanging waiting on LOCK_MUTEX_W().
Unfortunately I'm not quite sure why this happens.

If my understanding is correct, this mutex is locked from the beginning of
the transaction, until the commit/abort, effectively serialising writers.
So I assume that somehow a writer dies or is violently killed, so he is
not able to run its atexit() cleanups, and this shared mutex remains
locked forever.

What would you suggest for such a situation? I'm thinking of patching LMDB
to lock with mutex_timedwait() and periodically check if the PID having
taken the mutex is still alive. Is the writer PID stored somewhere, or a
change of format will be needed? Any other ideas are welcome!

We have a patch to use robust mutexes. They're a few percent slower but willallow recovery from this situation.

But aside from that, either your software has a bug, or someone is messingwith your system, and you need to find out what's going on and stop that.


Thanks in advance,
Dimitris



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Prev by Date: Re: issue with bad data ? In MMR setup
Next by Date: Q: using logrotate for auditlog file
Index(es):
- Chronological
- Thread