On Thu, 2005-06-02 at 12:42 +0200, Steffen Hansen wrote: > Hi. > > We use OpenLDAP in the Kolab project, but after switching to the bdb > backend there have been several reports about stability problems. Slapd > sometimes seems to hang when someone tries to write to the database > (for example with ldapadd). > > The complete description is available at > https://intevation.de/roundup/kolab/issue707 > > Currently we use openldap-2.2.23 and db-4.2.52.2. Do you have any > suggestions on how I can get to the bottom of this problem? Anyone else > having similar problems? I'm out of ideas here, so any kind of help or > suggestion is greatly appreciated. I've been seeing odd dbd hangs - if you strace -f -p the slapd (or stuck process) you see it in futex lock(). There's been odd mutterings on the list but no definite example. A repeated ps listing showing CPU% will show it tend to zero, but as far as I see, it's really not doing anything. I have no doubt I could be wrong as my analysis isn't great. What I find you have to do is, kill -9 the hung process (anyother kill isn't strong enough). Then check with a db_verify for each db file (I have to supply -o ). One of these will hang. You'll have to kill -9 that. Do a db_recover which should work. Call db_verify again just to make sure - it should pass. Now you can restart you slapd process. As for why, I do not know. Because our imports take so long (even on sunfire z20s) pull and pushing data takes ages and the hangs occur at some point later, usually when I chop out a portion of the tree and ldapadd a new one in. The chop will lock, I'll kill it resulting in broken db, fixable by my process above. This might be because of our data - it's a huge mess of data, with some directories contains lots of entries (>9000). I wouldn't have thought this to be a problem though. We're also using openldap in with a rootless tree, something else which may others aren't doing. All this and it's not even in a live environment. I'm winging it right now, because when live I wont be doing this brutal surgery of the tree - and I don't have any other option right now as 2.3.3 isn't ready yet. We have an 2.2.24 on stock 4.2.52 on rh9 in production which performs faultlessly but we aren't touching it. I would love to be able to spend time on investigating but I'm being pulled in several different directions right now, such is life. We're using OpenLDAP 2.2.26 with DB 4.2.53 with 3 patches (lock, lock2 and db_transactions). This on Fedora Core 3 x86_64 on Opteron. -- Rob Fielding rob@dsvr.net www.dsvr.co.uk Development Designer Servers Business Serve Plc
Attachment:
signature.asc
Description: This is a digitally signed message part