[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Too many executing vs syncrepl
Do you have different versions of OpenLDAP on the master vs the
replicas? Or did you upgrade everything at once? How large is your
database? How many entries does your database have? Are you using a disk
cache or a memory cache for BDB?
Unfortunately, yes. in this case, ldapmaster is 2.4.23, and so is ldapslave01;
2.4.23, but pop06 has loopback ldap of 2.3.41.
This is not supported and clearly undesirable, but with so many hosts it takes a
long time to schedule maintenances and upgrade. Plus we admins need to sleep
occasionally.
The database is:
-rw------- 1 root root 2.4G Sep 30 08:55 id2entry.bdb
with total of about 7GB. (including bdb environment files, but no transaction
files).
We already upgraded machines to 8GB RAM from a previous conversation. DB stats
report (ldapmaster):
4GB Total cache size
8 Number of caches
8 Maximum number of caches
512MB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
102M Requested pages found in the cache (99%)
204394 Requested pages not found in the cache
4282 Pages created in the cache
204394 Pages read into the cache
402689 Pages written from the cache to the backing file
0 Clean pages forced from the cache
0 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
208575 Current total page count
208559 Current clean page count
16 Current dirty page count
524296 Number of hash buckets used for page location
4096 Assumed page size used
102M Total number of times hash chains searched for a page (102254300)
18 The longest hash chain searched for a page
125M Total number of hash chain entries checked for page (125288450)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited for (0%)
50 The number of region locks that required waiting (0%)
0 The number of buffers frozen
0 The number of buffers thawed
0 The number of frozen buffers freed
208752 The number of page allocations
and on pop loopback ldap:
80MB 2KB 912B Total cache size.
1 Number of caches.
80MB 8KB Pool individual cache size.
0 Requested pages mapped into the process' address space.
1439M Requested pages found in the cache (99%).
18M Requested pages not found in the cache.
1954 Pages created in the cache.
18M Pages read into the cache.
255279 Pages written from the cache to the backing file.
18M Clean pages forced from the cache.
40207 Dirty pages forced from the cache.
0 Dirty pages written by trickle-sync thread.
9069 Current total page count.
9055 Current clean page count.
14 Current dirty page count.
8191 Number of hash buckets used for page location.
1476M Total number of times hash chains searched for a page.
9 The longest hash chain searched for a page.
3383M Total number of hash buckets examined for page location.
2992M The number of hash bucket locks granted without waiting.
44448 The number of hash bucket locks granted after waiting.
3233 The maximum number of times any hash bucket lock was waited for.
63M The number of region locks granted without waiting.
76892 The number of region locks granted after waiting.
18M The number of page allocations.
36M The number of hash buckets examined during allocations
728 The max number of hash buckets examined for an allocation
18M The number of pages examined during allocations
360 The max number of pages examined for an allocation
(much smaller as it shares resources. Also only syncrepls the mail tree from LDAP).
> Are you using a disk cache or a memory cache for BDB?
You can do that now? I'm afraid that the only BDB specific work I have done, are
the DB_CONFIG entries shown in the slapd.conf previous. Repeated here for your
convenience:
set_lk_detect DB_LOCK_DEFAULT
set_lg_max 52428800
set_cachesize 4 0 8
set_flags db_log_autoremove
set_lk_max_objects 1500
set_lk_max_locks 1500
set_lk_max_lockers 1500
Now, looking at "too many executing", we do need to do something about it. It
happens on both old and new versions, so it could just be that we are at
capacity already.
If I grep for the message on ldapmaster, we get 0 for all days. Of course,
master only sycnrepls to the 4 slaves. The 4 slaves in turn, syncrepls to all
loopback ldaps, but also get the occasional direct request.
# gzgrep "too many executing" /var/log/slaplog-201009$day.gz | wc -l
day master slave01 slave02 pop06
29 0 16 3689 66
28 0 21 2916 65
27 0 0 582 27
26 0 0 839 2
The difference between slave01 and slave02 here is the "idlcachesize 15000"
line. We added it to slave01 last week, and I did the 6 am LiveUP to add it to
slave02 today. Looks like that is a step in the right direction.
--
Jorgen Lundman | <lundman@lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)