[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: ldap deadlock?
Ok, I think my eyes just popped open a bit here, I was assuming
something could not/would not happen.
Let's say I have a pristinely clean db, no outstanding locks, I start up
slapd, it runs fine, never dies, never is stopped and started, is not
accessed by anything but slapd, slapcat, and db_checkpoint (each always
successfully) while it is running, can a lock go stale in that environment?
My gut feeling here is that the answer is going to be yes, thus being
the root cause of these occurrences. In my ideal world I never expect
that to/could happen...
Furthermore, I see these on occasion:
connection_read(62): no connection!
Sometimes this occurs right after the ACCEPT without a corresponding op=
ENTRY for that fd. Other times I see it after one or more op= ENTRY
operations. This appears to me that the client is not gracefully
disconnecting, the "no connection" message can be time stamp the same
second as the ACCEPT so I know it's not due to idle timeout. I'm pretty
sure the culprit in most of these "no connection" messages is via
sendmail on our MTA doing lookups.
Could instances of these be causing stale locks??
Curt Blank wrote:
Howard Chu wrote:
Curt Blank wrote:
I'm looking for ideas here. ldap seems to deadlock once in a while
whereby it continues to accept connections as noted in the log file
but it does not return anything to the query, the query just hangs.
It's openldap 2.2.28 using Berkley db 4.2.52 as the backend on a
SuSE 9.3 platform. All patches are up to snuff on the OS side.
I'm hoping for pointers to help see what might be going on.
As of today I started running db_deadlock in the background wit the
-a y option to see if that helps.
This deadlocking is getting people up in arms here because it is
disrupting authentication for the whole campus and I guess I can't
blame them.
There have been no deadlocks reported in OpenLDAP 2.2 after 2.2.20.
More likely you had an unclean shutdown and restarted without running
db_recover, so you have stale locks in the environment. You should
upgrade to 2.3 which does recovery automatically.
No, I know that isn't/wasn't the case, I manually ran db_recover with
the -v option ~16 hours before the last occurrence of this and the
server did not/was not shutdown in between nor did the slapd die and
it wasn't stopped/started. This last time (last Friday) our backup
started 12 minutes after it was only accepting connections and not
responding with data and that really compounded the problem. The
backup does a db_checkpoint and it hung and stopping the slapd daemon
did not correct the problem. slapd stopped cleanly but when restarted
it just sat there and would not even accept connections. The
db_checkpoint would not complete and after about 10 minutes was
killed. I know I know not the best thing to do but when you have
people on campus pissed because they can't login time is one luxury
that we do not have, and yes db_recover was successfully run again
before slapd was started. But, I'm a bit leery of it right now....
One thing I failed to mention is that it appeared that a slurp
replication to this slave server started at the time slapd started
only accepting connections and not responding with data. So that's a
write and that is what got me to start thinking about a deadlock
situation.