[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: slapd eating up resource on 2.3.14
>>>>> "aej" == Allan E Johannesen <aej@WPI.EDU> writes:
aej> 2.3.15 also runs fine for the first thousand queries, but then bogs down.
aej> There were several changes to ...bdb/cache.c in 2.3.14, and that file is
aej> identical in 2.3.15. I sort of suspect the problem is there, but I
aej> haven't the knowledge of the internals, nor the knowledge of Berkeley db
aej> locking, to figure it out.
I put a bunch of "Debug()" displays into servers/slapd/back-bdb/cache.c in
2.3.13 and 2.3.14.
At about a thousand client queries, 2.3.14 shows a loop in
bdb_cache_lru_purge(), a new routine, which was mainly excised from
bdb_cache_lru_add() in 2.3.13.
Since I only put the Debug()'s in after lock() and unlock() calls, I only see
that activity. At this point in 2.3.14, there are 935 (in my test) repetitions
of
bdb_cache_lru_purge: bdb_cache_entry_db_lock( bdb->bi_dbenv, bdb->bi_cache.c_locker, elru, 1, 1, lockp )
bdb_cache_lru_purge: bdb_cache_entry_db_unlock( bdb->bi_dbenv, lockp )
A similarly long loop does not appear in the test run of 2.3.13. There are
occasional single appearances of the pairs
bdb_cache_lru_add: bdb_cache_entry_db_lock( bdb->bi_dbenv, bdb->bi_cache.c_locker, elru, 1, 1, lockp )
bdb_cache_lru_add: bdb_cache_entry_db_unlock( bdb->bi_dbenv, lockp )
in the 2.3.13 run, but not a long repetition like the 2.3.14 run showed. After
that loop, response is slow and slapd eats lots of CPU.
In the overall debug output (slapd -d1), the first call to the routine appears
at about line 101,000 in both cases, but in the 2.3.14 case, there is a loop of
lock/unlocks, but only single instances sprikled through the 2.3.13 case.
Unless there was a decision to change behavior, which would result in this
different activity, I think there's some sort of problem in the development of
bdb_cache_lru_purge() from the former 2.3.13 source...