[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: slapd hangs at 100% cpu in sched_yield (ITS#2030)
I've added some debug msgs to see what locker IDs we're getting and freeing.
We dutifully free all the locker IDs we allocate, there are no mismatches.
The locker IDs go onto a free list, but don't ever get re-used. I think this
is a simple consequence of the fact that locker IDs and transaction IDs must
be kept unique over the life of the database. We may need to re-work our
approach, to allocate one per thread and keep it for the life of the process
so we don't run out so quickly.
There are a couple of known bugs in BDB 4.0's lock management that are
relevant here. You may have to wait for Sleepycat to release BDB 4.1 to get
complete relief from these problems.
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support
> -----Original Message-----
> From: Steven Wilton [mailto:steven.wilton@team.eftel.com]
> Sent: Thursday, August 22, 2002 11:04 PM
> To: Howard Chu
> Subject: Re: slapd hangs at 100% cpu in sched_yield (ITS#2030)
>
>
>
> I would have thought that the number of lock requests/releases is related
> to database operations. The number of locks and the number of lockers are
> related to objects used to kep track of locks in the software using the
> bdb library (ie slapd) (I am just guessing here though.
>
> The number of locks in use increses constantly for each request,
> regardless of what the filter is, except for a search on a non-indexed
> field. The number of lockers increses only when objectclass is included in
> the filter.
>
> Steven
> On Fri, 23 Aug 2002 13:28:40 Howard Chu wrote:
> > Still not sure how to interpret this info. The number of lock requests
> > and
> > lock releases
> > are equal, so we're not leaking anything there. So I think perhaps BDB 4
> > has
> > a bug if the
> > current number of locks is still non-zero.
> >
> > -- Howard Chu
> > Chief Architect, Symas Corp. Director, Highland Sun
> > http://www.symas.com http://highlandsun.com/hyc
> > Symas: Premier OpenSource Development and Support
> >
> > > -----Original Message-----
> > > From: Steven Wilton [mailto:steven.wilton@team.eftel.com]
> > > Sent: Thursday, August 22, 2002 7:41 PM
> > > To: Howard Chu
> > > Subject: Re: slapd hangs at 100% cpu in sched_yield (ITS#2030)
> > >
> > >
> > > > No, this is not an unbounded problem. The maximum number of locks
> > needed
> > > > is
> > > > tied to the maximum number of slapd threads. It will also depend on
> > the
> > > > number of attribute indices you have configured. Each index is stored
> > in
> > > > its
> > > > own database, and some number of locks are needed to navigate those
> > > > databases
> > > > as well. You can use db_stat to help arrive at the ideal number of
> > locks
> > > > for
> > > > your configuration. Eventually you will reach a high-water mark and
> > no
> > > > more
> > > > locks will be needed.
> > > >
> > >
> > > I have upgraded to openldap 2.1.4m, and I have looked at the actual
> > search
> > > queries being sent to the ldap server (using ethereal on the loopback
> > > interface). It looks like different types of ldap queries will result
> > in
> > > a different number of bdb resources being lost in the process. Look at
> > > the following sequence of commands, and the number of current locks,
> > and
> > > current lockers used at the end of each:
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 101 Last allocated locker ID.
> > > 9 Number of lock modes.
> > > 1000 Maximum number of locks possible.
> > > 1000 Maximum number of lockers possible.
> > > 1000 Maximum number of objects possible.
> > > 24 Current locks.
> > > 27 Maximum number of locks so far.
> > > 61 Current number of lockers.
> > > 62 Maximum number lockers so far.
> > > 0 Current number lock objects.
> > > 6 Maximum number of lock objects so far.
> > > 306 Number of lock requests.
> > > 306 Number of lock releases.
> > > 0 Number of lock requests that would have waited.
> > > 0 Number of lock conflicts.
> > > 0 Number of deadlocks.
> > > 0 Number of transaction timeouts.
> > > 0 Number of lock timeouts.
> > > 352KB Lock region size (360448 bytes).
> > > 0 The number of region locks granted after waiting.
> > > 945 The number of region locks granted without waiting.
> > >
> > > sv1:~# ldapsearch -b o=EFTEL
> > > '(&(objectclass=posixAccount)(uidNumber=20678))'
> > >
> > > # extended LDIF
> > > #
> > > # LDAPv3
> > > # filter: (&(objectclass=posixAccount)(uidNumber=20678))
> > > # requesting: ALL
> > > #
> > >
> > > # prolfe, People, q-net, net, au, EFTEL
> > > dn: uid=prolfe,ou=People,dc=q-net,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: prolfe
> > > uid: prolfe
> > > homeDirectory: /home/au/net/q-net/p/prolfe
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Name Withheld
> > > mail: prolfe@q-net.net.au
> > > uidNumber: 20678
> > > gidNumber: 200
> > > loginShell: /bin/nologin
> > >
> > > # search result
> > > search: 2
> > > result: 0 Success
> > >
> > > # numResponses: 2
> > > # numEntries: 1
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 107 Last allocated locker ID.
> > > 9 Number of lock modes.
> > > 1000 Maximum number of locks possible.
> > > 1000 Maximum number of lockers possible.
> > > 1000 Maximum number of objects possible.
> > > 25 Current locks.
> > > 28 Maximum number of locks so far.
> > > 66 Current number of lockers.
> > > 67 Maximum number lockers so far.
> > > 0 Current number lock objects.
> > > 6 Maximum number of lock objects so far.
> > > 318 Number of lock requests.
> > > 318 Number of lock releases.
> > > 0 Number of lock requests that would have waited.
> > > 0 Number of lock conflicts.
> > > 0 Number of deadlocks.
> > > 0 Number of transaction timeouts.
> > > 0 Number of lock timeouts.
> > > 352KB Lock region size (360448 bytes).
> > > 0 The number of region locks granted after waiting.
> > > 979 The number of region locks granted without waiting.
> > >
> > > sv1:~# ldapsearch -b o=EFTEL uid=swilton
> > >
> > > # extended LDIF
> > > #
> > > # LDAPv3
> > > # filter: uid=swilton
> > > # requesting: ALL
> > > #
> > >
> > > # swilton, People, vision, net, au, EFTEL
> > > dn: uid=swilton,ou=People,dc=vision,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: top
> > > objectClass: shadowAccount
> > > objectClass: eftelradiusSimpleProfile
> > > uid: swilton
> > > cn: swilton
> > > mail: swilton@vision.net.au
> > > shadowLastChange: 11529
> > > shadowWarning: 7
> > > uidNumber: 43056
> > > gidNumber: 118
> > > homeDirectory: /home/au/net/vision/s/swilton
> > > gecos: Internal User
> > > loginShell: /bin/nologin
> > >
> > > # swilton, People, q-net, net, au, EFTEL
> > > dn: uid=swilton,ou=People,dc=q-net,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: swilton
> > > uid: swilton
> > > homeDirectory: /home/au/net/q-net/s/swilton
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Steven Wilton
> > > mail: swilton@q-net.net.au
> > > uidNumber: 16273
> > > gidNumber: 200
> > > loginShell: /bin/nologin
> > >
> > > # swilton, People, eftel, net, au, EFTEL
> > > dn: uid=swilton,ou=People,dc=eftel,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: swilton
> > > uid: swilton
> > > homeDirectory: /home/au/net/eftel/s/swilton
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Steven Wilton
> > > mail: swilton@eftel.net.au
> > > uidNumber: 1908
> > > gidNumber: 100
> > > loginShell: /bin/nologin
> > > eftelradiusCheckItem: Simultaneous-Use=0
> > >
> > > # search result
> > > search: 2
> > > result: 0 Success
> > >
> > > # numResponses: 4
> > > # numEntries: 3
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 108 Last allocated locker ID.
> > > 9 Number of lock modes.
> > > 1000 Maximum number of locks possible.
> > > 1000 Maximum number of lockers possible.
> > > 1000 Maximum number of objects possible.
> > > 26 Current locks.
> > > 29 Maximum number of locks so far.
> > > 66 Current number of lockers.
> > > 67 Maximum number lockers so far.
> > > 0 Current number lock objects.
> > > 6 Maximum number of lock objects so far.
> > > 327 Number of lock requests.
> > > 327 Number of lock releases.
> > > 0 Number of lock requests that would have waited.
> > > 0 Number of lock conflicts.
> > > 0 Number of deadlocks.
> > > 0 Number of transaction timeouts.
> > > 0 Number of lock timeouts.
> > > 352KB Lock region size (360448 bytes).
> > > 0 The number of region locks granted after waiting.
> > > 1002 The number of region locks granted without waiting.
> > >
> > >
> > >
> > > From what I can tell on a variety of other queries, using a filter of
> > > uid=x or uidnumber=x result in an increase of 1x lock being used but
> > not
> > > freed. Using the (&(objectclass=x)(uidnumber=y)) filer, 5x lockers and
> > 1x
> > > lock are used but not freed. Interestingly enough, searching on an
> > > unindexed field uses no extra resources (but takes over 1 minute to
> > > complete) as follows:
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > > 109 Last allocated locker ID.
> > > 9 Number of lock modes.
> > > 1000 Maximum number of locks possible.
> > > 1000 Maximum number of lockers possible.
> > > 1000 Maximum number of objects possible.
> > > 27 Current locks.
> > > 30 Maximum number of locks so far.
> > > 66 Current number of lockers.
> > > 67 Maximum number lockers so far.
> > > 0 Current number lock objects.
> > > 6 Maximum number of lock objects so far.
> > > 334 Number of lock requests.
> > > 334 Number of lock releases.
> > > 0 Number of lock requests that would have waited.
> > > 0 Number of lock conflicts.
> > > 0 Number of deadlocks.
> > > 0 Number of transaction timeouts.
> > > 0 Number of lock timeouts.
> > > 352KB Lock region size (360448 bytes).
> > > 0 The number of region locks granted after waiting.
> > > 1021 The number of region locks granted without waiting.
> > >
> > > sv1:~# ldapsearch -b o=EFTEL mail=prolfe@q-net.net.au
> > >
> > > # extended LDIF
> > > #
> > > # LDAPv3
> > > # filter: mail=prolfe@q-net.net.au
> > > # requesting: ALL
> > > #
> > >
> > > # prolfe, People, q-net, net, au, EFTEL
> > > dn: uid=prolfe,ou=People,dc=q-net,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: prolfe
> > > uid: prolfe
> > > homeDirectory: /home/au/net/q-net/p/prolfe
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Name Withheld
> > > mail: prolfe@q-net.net.au
> > > uidNumber: 20678
> > > gidNumber: 200
> > > loginShell: /bin/nologin
> > >
> > > # search result
> > > search: 2
> > > result: 0 Success
> > >
> > > # numResponses: 2
> > > # numEntries: 1
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 110 Last allocated locker ID.
> > > 9 Number of lock modes.
> > > 1000 Maximum number of locks possible.
> > > 1000 Maximum number of lockers possible.
> > > 1000 Maximum number of objects possible.
> > > 27 Current locks.
> > > 30 Maximum number of locks so far.
> > > 66 Current number of lockers.
> > > 67 Maximum number lockers so far.
> > > 0 Current number lock objects.
> > > 6 Maximum number of lock objects so far.
> > > 68181 Number of lock requests.
> > > 68181 Number of lock releases.
> > > 0 Number of lock requests that would have waited.
> > > 0 Number of lock conflicts.
> > > 0 Number of deadlocks.
> > > 0 Number of transaction timeouts.
> > > 0 Number of lock timeouts.
> > > 352KB Lock region size (360448 bytes).
> > > 0 The number of region locks granted after waiting.
> > > 114101 The number of region locks granted without waiting.
> > >
> > >
> > >
> > > Indexes are as follows:
> > >
> > > index objectClass pres,eq
> > > index cn,sn,uid eq
> > > index uidNumber,gidNumber,memberUid eq
> > >
> > >
> > > during the above test queries, no other queries were sent to the ldap
> > > server from any other program.
> > >
> > > Should I open a new its with the above info?
> > >
> > > thanks
> > >
> > > Steven
> > >
> > > ps.once we run out of lock_id's, the new LOCK_ID result checking code
> > > causes failed ldap lookups as I expected, but we do not go into an
> > endless
> > > loop (which is good). If we fix the above resource leak, we should
> > never
> > > reach a point where we run out of lock_id's which is the result we are
> > > really after.
> > >
> > >
> >
>
>