RE: Slapd High CPU usage on Solaris 9

Statistically, that should be relevant. I mean, I usually do.

i=0; while [ $i -lt 100 ]; do pstack <MYPID> > pstack.$i; (( i+=1 )); done;

Yes no sleep, just a burst of pstacks. That is statistically as correct as any sampling based profilers would tell, without the complexity of having to install one such tool (kernel prereq, etc…) and you can collect that in less than a minute.

Sometimes though that can considered as hard to read for people not used to it.

If you pass me with your output, I may try to help.

Best Regards

++Cyrille

From: Luca Polidoro [mailto:luca.polidoro@gmail.com]
Sent: Friday, September 06, 2013 3:08 PM
To: Maucci, Cyrille
Cc: openldap-technical@openldap.org
Subject: Re: Slapd High CPU usage on Solaris 9

Hi, I have already done these tests, but the result provides little information, none of which is useful for directing the analysis.

2013/9/6 Maucci, Cyrille <cyrille.maucci@hp.com>

When I myself face such a problem, I usually pstack the process a few times to very quickly know what the guy is doing.

And that usually gives me a good clue.

++Cyrille

From: openldap-technical-bounces@OpenLDAP.org [mailto:openldap-technical-bounces@OpenLDAP.org] On Behalf Of Luca Polidoro
Sent: Monday, August 12, 2013 3:31 PM
To: openldap-technical@openldap.org
Subject: Slapd High CPU usage on Solaris 9

Hello,

I am writing to to submit a case that has been happening in the last 2 weeks in our infrastructure. This is structured as follows:

1 provider: Solaris 9 SPARC - Sun Fire V490 - last OS patch level
CPU: 4-1500 Mhz
RAM: 32 GB

OpenLDAP version used: Berkeley DB 2.4.23 and 4.8.30 (with database bdb) all 64-bit

18 consumer: Solaris 9 SPARC - last OS patch level with different types of features (CPU, RAM)

On the following consumer products:

Consumer 1: Solaris 9 SPARC - Sun Fire 480R - last OS patch level
CPU: 4-900 Mhz
RAM: 8 GB

Consumer 2: Solaris 9 SPARC - Sun Fire 480R - last OS patch level
CPU: 4-1050 Mhz
RAM: 8 GB

Consumer 3: Solaris 9 SPARC - Sun Fire 480R - last OS patch level
CPU: 4-1050 Mhz
RAM: 8 GB

Consumer 4: Solaris 9 SPARC - Sun Fire V210 - last OS patch level
CPU: 2-1336 Mhz
RAM: 8 GB

we are noticing an increase in the cpu used by the slapd process. In fact, the process is constantly between 85% and 95%, and became completely unusable and then we are forced to restart.

LDAP with 1.000.000 objects.

This is the consumer's slapd.conf (I have omitted parts of the ACL, includes, etc..):

# See slapd.conf(5) for details on configuration options.
# This file should NOT be world readable.
#

#
# VERSION v2 - Digital Tru64
#
allow bind_v2

Some include
...

#
# tuning parameters - START
# ------------------------------
#
conn_max_pending 1000
conn_max_pending_auth 1000

idletimeout    500
sizelimit        unlimited
threads          8
timelimit        500
disallow bind_anon

#
# tuning parameters - END
# ----------------------------
#

...

#######################################################################
# bdb database definitions
#######################################################################

database        bdb
suffix          "xxxxxxxxxxxx"
rootdn          "cn=root,ou=ldapusers,xxxxx"

directory       /var/openldap-2.4.23_64/var/openldap-data
#####disallow limit for syncuser
limits dn.children="ou=syncusers,xxxx" size=unlimited
index   objectClass,entryCSN,entryUUID eq
index   ou eq,sub,subinitial,subany,subfinal
index   uidOwner eq
index    uid eq
index    memberUid eq

#shm_key 1100
cachesize 1000000
cachefree 10000
dncachesize 1000000
idlcachesize 1000000
searchstack 16
checkpoint 1024 10

overlay ppolicy
ppolicy_default "cn=Standard,ou=Policies,xxxx"
ppolicy_use_lockout

############################SYNCREPL CONF
syncrepl   rid=011
           provider=ldap://xxxxxx
           type=refreshAndPersist
           interval=00:00:15:00
           retry="15 10 120 +"
           searchbase="xxxxx"
           filter="(objectClass=*)"
           attrs="*,+"
           scope=sub
           schemachecking=on
           bindmethod=simple
           binddn="xxxxxx"
           credentials=xxxx
############################SYNCREPL CONF

These are the bdb files:

420M    dn2id.bdb
30M    entryCSN.bdb
32M    entryUUID.bdb
1,4G    id2entry.bdb
18M    memberUid.bdb
4,9M    objectClass.bdb
5,3M    ou.bdb
17M    uid.bdb
17M    uidOwner.bdb

this is DB CONFIG:

-----------------------------------------------------------

##########################################
###########################################
#set_cachesize 0 300000000 10
#set_lg_regionmax 262144
#set_lg_bsize 2097152
###########################################
###########################################
# replaces lockdetect directive
#set_lk_detect DB_LOCK_EXPIRE
set_lk_detect DB_LOCK_DEFAULT

# uncomment if dbnosync required
#AGGIUNTO TUTTO
#set_flags DB_TXN_WRITE_NOSYNC
####AGGIUNTO
set_flags DB_LOG_AUTOREMOVE
# multiple set_flags directives allowed

# sets max log size = 5M (BDB default=10M)
set_lg_max 25242880
set_lg_dir /var/openldap-2.4.23_64/logs

set_cachesize 2 274726912 1
# sets a database cache of 5M and
# allows fragmentation
# does NOT replace slapd.conf cachesize
# this is a database parameter

#txn_checkpoint 128 15 0
# replaces checkpoint in slap.conf
# writes checkpoint if 128K written or every 15 mins
# 0 = no writes - no update
set_lk_max_locks 2500
set_lk_max_lockers 2500
set_lk_max_objects 2500

---------------------------------------------------

We have tried to change the number of threads bringing them to 16, we lowered the parameters idletimeout and timelimit, but without result.

Appreciate your feedback.

Thanks,

Luca