[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: slapd stability problems with add/change operations
Quanah Gibson-Mount wrote:
> If you are running stock BDB 4.2.52 without the required patches from
sleepycat, I am not surprised by your problems.
thanks for all the feedback so far, my comments:
- We have all patches in FreeBSDs BDB, I confirmed that with the MD5
sums in the port and the files I found @ sleepycat.com homepage.
- According to the feedback from Howard Chu I read some links about
DB_CONFIG, namely:
http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
http://www.openldap.org/faq/data/cache/1074.html
http://www.openldap.org/faq/data/cache/1075.html
http://www.openldap.org/faq/data/cache/893.html
According to the calculation in the sample I calculated that for my DB.
We just have around 3000 entries and DB files of around 5MB max so as
expected I didn't get more than 256K (in fact I didn't even come close
to it). Anyway, I switched it to 2MB because this machine has 1GB of RAM
and doesn't do much more than OpenLDAP.
Then I tried to find some docs about the locks:
http://www.sleepycat.com/docs/api_c/env_set_lk_max_objects.html
http://www.sleepycat.com/docs/ref/lock/max.html
I couldn't really find much about locks and OpenLDAP, except the config
file in debian:
--
[...]
# Sven Hartge reported that he had to set this value incredibly high
# to get slapd running at all. See http://bugs.debian.org/303057
# for more information.
# Number of objects that can be locked at the same time.
set_lk_max_objects 5000
# Number of locks (both requested and granted)
set_lk_max_locks 5000
# Number of lockersX
set_lk_max_lockers 5000
--
His bug report is interesting as well, also because they write that one
has to *redo* (slapcat/slapadd) the complete database to get the changes
active. Didn't know that first.
So I redid my DB as well with the following DB_CONFIG file:
--
# set cachesize to 2MB for now
set_cachesize 0 2097152 0
# Number of objects that can be locked at the same time.
set_lk_max_objects 10000
# Number of locks (both requested and granted)
set_lk_max_locks 10000
# Number of lockers
set_lk_max_lockers 10000
--
Note that I have no clue how to define the object lock number. You will
find some docs but as I don't know how many locks a read operation
requires it's a bit hard to judge for me. I thought 10'000 should be
fine but I already found config files with 100'000 (probably their DB is
much bigger than mine).
As stresstest we did this:
- launch an add operation set from the metadb, we had about 110 add
operations. This operation will do quite some reads first to see what is
missing.
- launch two syncs of the two ldap-slaves we have
Like this we can reproducible hang slapd within seconds. Note that I
don't get any hints in the logfile about why it hangs. The last entry I
see with loglevel 256 is the add operation, then it hangs.
I do have the impression that it took a bit longer to hang it since I've
changed the locks to 10'000. But as I said, I don't know if this should
be sufficient now for my databases.
some stats:
--
# db_stat-4.2 -d id2entry.bdb:
53162 Btree magic number.
9 Btree version number.
Flags: little-endian
2 Minimum keys per-page.
16384 Underlying database page size.
2 Number of levels in the tree.
2613 Number of unique keys in the tree.
2613 Number of data items in the tree.
1 Number of tree internal pages.
...
--
--
# db_stat-4.2 -d dn2id.bdb
53162 Btree magic number.
9 Btree version number.
Flags: duplicates, little-endian
2 Minimum keys per-page.
4096 Underlying database page size.
3 Number of levels in the tree.
5300 Number of unique keys in the tree.
15537 Number of data items in the tree.
6 Number of tree internal pages.
...
--
-> not a size issue I hope ;)
Ah and what I didn't mention so far, we have an SMP box (2CPUs) for this
machine. Not sure if this matters or not.
so, once again I'm running out of ideas here. Comments are welcome
cu
Adrian
--
Adrian Gschwend
System Administrator
Berne University of Applied Sciences
Biel, Switzerland