[Date Prev][Date Next] [Chronological] [Thread] [Top]

LMDB dead process detection

To: "OpenLDAP-devel@openldap.org" <OpenLDAP-devel@openldap.org>
Subject: LMDB dead process detection
From: Howard Chu <hyc@symas.com>
Date: Tue, 16 Jul 2013 17:34:13 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0 SeaMonkey/2.22a1

There's been a long-running discussion about the need to have APIs in liblmdbfor displaying the reader table and clearing out stale slots. Quite a few openquestions on the topic:


1) What should the API look like for examining the table?

My initial instinct is to provide an iterator function that returns infoabout the next slot each time it's called. Not sure that this is necessary ormost convenient though.Another possibility is just a one-shot function that walks the table itselfand dumps the output as a formatted string to stdout, stderr, or a customoutput callback.


2) What should APIs look like for clearing out a stale slot?

Should it just be implicit inside the library, with no externally visibleAPI? I.e., should the library periodically check on its own, with no outsideintervention? Or should there be an API that lets a user explicitly request aparticular slot to be freed? This latter sounds pretty dangerous, sincefreeing a slot that's actually still in use would allow a reader's view of theDB to be corrupted.


3) What approach should be used for automatic detection of stale slots?

Currently we record the process ID and thread ID of a reader in the table.It's not clear to me that the thread ID has anything more than informationalvalue. Since we register a per-thread destructor for slots, exiting threadsshould never be leaving stale slots in the first place. I'm also not sure thatthere are good APIs for an outside caller to determine the liveness of a giventhread ID.The process ID is also prone to wraparound; it's still very common forLinux systems to use 15 bit process IDs. So just checking that a pid is stillalive doesn't guarantee that it's the same process that was using an LMDBenvironment at any point in time. We have two main approaches to work aroundthis latter issue:

A) set a byte range lock for every process attached to the environment.This is what slapd's alock.c already does, which is used with BDB- and LDBM-based backends. This is fairly portable code, and has the desirable propertythat file locks automatically go away when a process exits. But:a) On Windows, the OS can take several minutes to clean up the locks ofan exited process. So just checking for presence of a lock could erroneouslyconsider a process to be alive long after it had actually died.b) file lock syscalls are fairly slow to execute. If we are checkingliveness frequently, there will be a noticeable performance hit. Theirperformance also degrades exponentially with the number of processes lockingconcurrently, and degrades further still if networked filesystems are involved.

      c) This approach won't tell us if a process is in Zombie state.

   B) check process ID and process start time.

This appears to be a fairly reliable approach, and reasonably fast, but thereis no POSIX standard API for obtaining this process information. Methods forobtaining the info are fairly well documented across a variety of platforms(AIX, HPUX, multiple BSDs, Linux, Solaris, etc.) but they are all different.It appears that we can implement this compactly for each of the systems, butit means carrying around a dozen or so different implementations.

Also, assuming we want to support shared LMDB access across NFS (as discussedin an earlier thread), it seems we're going to have to use a lock-basedsolution anyway, since process IDs won't be meaningful across host boundaries.

We can implement approach (A) fairly easily, with no major repercussions. For(B) we would need to add a field to the reader table records to store theprocess start time. (Thus a lockfile format change.)

(note: performance of fcntl locks vs checking process start time was measuredwith some simple code on my laptop running Linux. These functions are allhighly OS-dependent, so the perf ratios may vary quite a lot from system tosystem.)

The relative performance may not even be an issue in general, since we wouldonly need to trigger a scan if a writer actually finds that some reader txn ispreventing it from using free pages from the freeDB. Most of the time thiswouldn't be happening. But if there were a legitimate long running read txn(e.g., for mdb_env_copy) we may find ourselves checking fairly often.


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Follow-Ups:
- Re: LMDB dead process detection
  - From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>

Prev by Date: Re: BerkeleyDB support EOL?
Next by Date: Re: LMDB dead process detection
Index(es):
- Chronological
- Thread