[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: OpenLDAP high CPU usage when performing mass changes
Hi Howard,
>> At a guess, based on the minimal amount of information here, you've run into the glibc malloc fragmentation issue,
>> and switching to tcmalloc might avoid the problem.
What's the quickest way to validate this on the running-at-99%-slapd, prior to falling back on tcmalloc?
Can the proc's smaps reveal this? Like if we're seeing loads many 64MB regions?
Thanks
++Cyrille
-----Original Message-----
From: openldap-technical-bounces@OpenLDAP.org [mailto:openldap-technical-bounces@OpenLDAP.org] On Behalf Of Howard Chu
Sent: Friday, March 16, 2012 8:32 AM
To: Jeffrey Crawford
Cc: OpenLDAP technical list
Subject: Re: OpenLDAP high CPU usage when performing mass changes
Jeffrey Crawford wrote:
> We are using openldap 2.4.26 with BDB 4.8 and have replication set up
> in mirror mode for our main ldap database. There are a couple of other
> replicas that have a subset of the data that the main cluster has but
> we are seeing the following behavior on all of them.
>
> When performing mass updates via LDAP, lets say on the order of 30,000
> entries being added to existing entries. We've noticed that the CPU
> use of the slapd instances goes through the roof (between 65% and 95%
> continuously), and seems to stay there until it is restarted.
When the CPU usage goes high like that it should be pretty easy to see where it's going, by getting a gdb stack trace of the running process.
At a guess, based on the minimal amount of information here, you've run into the glibc malloc fragmentation issue, and switching to tcmalloc might avoid the problem.
> The Problem is that this system has to be highly available, even for
> writing and when these updates "shock" the system, the response time
> goes way down when the process are turning like that. I don't think
> they are trying to catch up to the data changes because if I let them
> run a while after the updates are done. (Talking like 1hr) and then
> restart the instances, they go back to their normal state.
If you have the SYNC loglevel enabled, it should be obvious whether update traffic is the cause or not.
> So far the only way I've been able to mitigate the issues is to
> reconfigure our ldap proxy instances to a machine that is having less
> trouble, restart the instances that are chugging along, then repoint
> the proxies back to the one just started, and start the others. Not exactly a quick operation.
>
> I've played with cache settings for both OpenLDAP and BDB and have
> gotten the frequency of this issue reduced but I can't seem to get rid
> of it completely and it shows up quite often after large data
> manipulations. I'm at a loss of how to debug since nothing is
> crashing. Any suggestions on how to find out what's causing this would
> be very helpful. The logs are not throwing any warnings or posting
> messages that would seem out of the ordinary and I have played with
> the log settings but nothing seems to relate to anything that might explain why we are seeing CPU usage to go so high.
I would suggest you try out back-mdb in RE24. MDB uses 1/4 the total memory of BDB and it performs far fewer mallocs, so glibc malloc fragmentation should not be a problem. (I would have suggested 2.4.30, but the ITS#7190 fix is rather important if you have large volumes of delete operations. The other MDB-related ITSs, #7191 and #7196, are only crucial for non-X86 and non-Linux
platforms.)
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/