All I can say is that I have a similar issue and logged an ITS (7013)
still under investigation.
David Engeset a écrit :
I
upgraded and updated four of our OpenLDAP servers that we have back in
May to run the latest stable version of OpenLDAP (2.4.23) along with
BDB (4.8.30). Everything was running with no issues until a little
over a month later one of the servers slapd processes hung, the only
way I could restart the process was to use kill -9, all other kill
options failed. Over the next month and a half the issue reoccurred on
the same server and occurred on two of the other servers. There was
nothing in the logs to indicate an issue with running out of file
descriptors, dead locks or anything else. I set out to see if I could
recreate the issue and I found if I had around 20000 entries, which our
database is roughly around 21000, and ran a script to randomly query,
one a time, the entries in the database and then run another script
that added 1000 entries, one at a time, then deleted them in reverse
order, one at a time, and will continue to do so infinitely. When I
ran the two scripts simultaneously they would hang after 3 to 16
deletes were completed. I attempted to use the latest version of
OpenLDAP (2.4.26) to see if any of the bug fixes in it would help and I
still get the same results, I even tried to run it with all of the
supported versions of BDB, 4.4, 4.5, 4.6, 4.7, 5.0 and 5.1 with the
same results. I ran it with full logging on and I was not able to find
any thing that pointed to the problem.
We have been running OpenLDAP 2.2 and 2.3 for years (many servers
without any restarting of slapd for over a year) without any lockups,
so I decided to test with OpenLDAP 2.3.43 with BDB 4.2.52 (with
patches) and loaded the same exact database and the same exact tests
and it runs literally for hours with no issues. I attempted to upgrade
the version of BDB to 4.4 and I started to experience the hanging
again, so it appears to be a BDB issue. I searched for related issues
with no success and considering that others are running 2.4 with newer
versions of BDB for a couple of years now I find it odd that I am
running into this issue on my first use of 2.4.
I tested all of this on CentOS 5.4, 5.6 and Fedora 17 with the same
results. Does anyone have any ideas or suggestions on what I can try
to do to fix this issue?
Below are some of the configs I am using on my last attempts to resolve
the issue:
DB_CONFIG:
set_cachesize 0 536870912 1
set_lg_regionmax 10485760
set_lg_max 104857600
set_lg_bsize 2097152
set_lg_dir /var/log/bdb
set_tmp_dir /var/log/bdb
# This one I added recently to see if it might help.
set_lk_detect DB_LOCK_DEFAULT
slapd.conf:
include /usr/local/etc/openldap/schema/cosine.schema
include /usr/local/etc/openldap/schema/nis.schema
include /usr/local/etc/openldap/schema/misc.schema
include /usr/local/etc/openldap/schema/inetorgperson.schema
pidfile /usr/local/var/run/slapd.pid
argsfile /usr/local/var/run/slapd.args
conn_max_pending 1000
database bdb
cachesize 20000
suffix "dc=example,dc=net"
checkpoint 5120 30
rootdn "cn=Manager,dc=example,dc=net"
rootpw secrect
directory /usr/local/var/openldap-data
# Indices to maintain
index default pres,eq
index cn,uid
#index WhidNetCustID,CustID,ID
index sn pres,eq,sub
index objectClass eq
index uidNumber eq
index gidNumber eq
index memberUid eq
# database access control definitions
access to attrs=userPassword
by self write
by anonymous auth
by dn="cn=Admin,dc=example,dc=net" write
by * none
access to *
by self write
by dn="cn=Admin,dc=example,dc=net" write
by * read
I can send out the LDIF I am using and the perl scripts that I run to
break it for anyone who is interested.
Thank you,
|