[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd core dumping and locking up.



Yesterday slapd core dumped on both of our pop servers. Here is some more
information, any help would be great.

This is what I did when our server zoot core dumped:

With -d 256 -l 256:

/var/log/slapd.log
Jan 27 07:58:50 zoot slapd[10693]: daemon: conn=3363750 fd=73 connection
from IP=127.0.0.1:1587 (IP=127.0.0.1:389) accepted.
Jan 27 07:58:57 zoot slapd[73834]: slapd starting

/var/log/messages
Jan 27 07:58:57 zoot /kernel: pid 10693 (slapd), uid 0: exited on signal
6 (core dumped)

zoot: {17} % gdb -q -core slapd.zoot.012703.core
Core was generated by `slapd'.
Program terminated with signal 6, Abort trap.
#0  0x1021cad8 in ?? ()
(gdb)

I did see an error in the logs saying it was unable to open/read
gidNumber.dbb. When in installed openldap from ports on mobius i had to
convert the dbb files to the new dbb formation using this:

mv gidNumber.dbb gidNumber.old
db3_dump gidNumber.old | db3_load gidNumber.dbb

I tried that on zoot and dingo but it didn't work, gave me this error:
db_dump: gidNumber.old: unexpected file type or format
db_dump: open: gidNumber.old: Invalid argument

While I did that dingo got and update and createda new gidNumber.dbb file.
There is one other file (objectClass.dbb) that hasn't been updated since
Apr 19 2002.

None of what I did prevented dingo from core dumping, it did around 1PM
EST. One thing I noticed is in the logs the conn=# is always mid 3 mil.
Not sure if that makes a difference or not. So I set -d 1 -l 1 and this is
something an entry:

(NOTE the ber_get_next errorno=35, twice in each entry)

Jan 28 08:09:35 dingo slapd[58079]: connection_get(50): got connid=241340
Jan 28 08:09:35 dingo slapd[58079]: connection_read(50): checking for
input on id=241340
Jan 28 08:09:35 dingo slapd[58079]: ber_get_next on fd 50 failed errno=35
(Resource temporarily unavailable)
Jan 28 08:09:35 dingo slapd[58079]: do_bind
Jan 28 08:09:35 dingo slapd[58079]: do_bind: version=2
dn="uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net" method=128
Jan 28 08:09:35 dingo slapd[58079]: dn2entry_r: dn:
"UID=BLUEDAWGY,OU=USERS,O=GWI.NET,DC=GWI,DC=NET"
Jan 28 08:09:35 dingo slapd[58079]: => dn2id(
"UID=BLUEDAWGY,OU=USERS,O=GWI.NET,DC=GWI,DC=NET" )
Jan 28 08:09:35 dingo slapd[58079]: ====>
cache_find_entry_dn2id("UID=BLUEDAWGY,OU=USERS,O=GWI.NET,DC=GWI,DC=NET"):
33932 (1 tries)
Jan 28 08:09:35 dingo slapd[58079]: <= dn2id 33932 (in cache)
Jan 28 08:09:35 dingo slapd[58079]: => id2entry_r( 33932 )
Jan 28 08:09:35 dingo slapd[58079]: ====> cache_find_entry_id( 33932 )
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net" (found) (1 tries)
Jan 28 08:09:35 dingo slapd[58079]: <= id2entry_r( 33932 ) 0x9ec6920
(cache)
Jan 28 08:09:35 dingo slapd[58079]: ====> cache_return_entry_r( 33932 ):
returned (0)
Jan 28 08:09:35 dingo slapd[58079]: do_bind: v2 bind:
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net" to
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net"
Jan 28 08:09:35 dingo slapd[58079]: send_ldap_result: conn=241340 op=0 p=2
Jan 28 08:09:35 dingo slapd[58079]: send_ldap_response: msgid=1 tag=97
err=0
Jan 28 08:09:35 dingo slapd[58079]: connection_get(50): got connid=241340
Jan 28 08:09:35 dingo slapd[58079]: connection_read(50): checking for
input on id=241340
Jan 28 08:09:35 dingo slapd[58079]: ber_get_next on fd 50 failed errno=35
(Resource temporarily unavailable)
Jan 28 08:09:35 dingo slapd[58079]: do_search
Jan 28 08:09:35 dingo slapd[58079]: => ldbm_back_search
Jan 28 08:09:35 dingo slapd[58079]: dn2entry_r: dn:
"UID=BLUEDAWGY,OU=USERS,O=GWI.NET,DC=GWI,DC=NET"
Jan 28 08:09:35 dingo slapd[58079]: => dn2id(
"UID=BLUEDAWGY,OU=USERS,O=GWI.NET,DC=GWI,DC=NET" )
Jan 28 08:09:35 dingo slapd[58079]: ====>
cache_find_entry_dn2id("UID=BLUEDAWGY,OU=USERS,O=GWI.NET,DC=GWI,DC=NET"):
33932 (1 tries)
Jan 28 08:09:35 dingo slapd[58079]: <= dn2id 33932 (in cache)
Jan 28 08:09:35 dingo slapd[58079]: => id2entry_r( 33932 )
Jan 28 08:09:35 dingo slapd[58079]: ====> cache_find_entry_id( 33932 )
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net" (found) (1 tries)
Jan 28 08:09:35 dingo slapd[58079]: <= id2entry_r( 33932 ) 0x9ec6920
(cache)
Jan 28 08:09:35 dingo slapd[58079]: base_candidates: base:
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net"
Jan 28 08:09:35 dingo slapd[58079]: ====> cache_return_entry_r( 33932 ):
returned (0)
Jan 28 08:09:35 dingo slapd[58079]: => id2entry_r( 33932 )
Jan 28 08:09:35 dingo slapd[58079]: ====> cache_find_entry_id( 33932 )
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net" (found) (1 tries)
Jan 28 08:09:35 dingo slapd[58079]: <= id2entry_r( 33932 ) 0x9ec6920
(cache)
Jan 28 08:09:35 dingo slapd[58079]: => send_search_entry:
"uid=bluedawgy,ou=Users,o=gwi.net,dc=gwi,dc=net"
Jan 28 08:09:35 dingo slapd[58079]: <= send_search_entry
Jan 28 08:09:35 dingo slapd[58079]: ====> cache_return_entry_r( 33932 ):
returned (0)
Jan 28 08:09:35 dingo slapd[58079]: send_ldap_search_result 0::
Jan 28 08:09:35 dingo slapd[58079]: send_ldap_response: msgid=2 tag=101
err=0
Jan 28 08:09:35 dingo slapd[58079]: connection_get(50): got connid=241340
Jan 28 08:09:35 dingo slapd[58079]: connection_read(50): checking for
input on id=241340
Jan 28 08:09:35 dingo slapd[58079]: ber_get_next on fd 50 failed errno=5
(Input/output error)
Jan 28 08:09:35 dingo slapd[58079]: connection_read(50): input error=-2
id=241340, closing.
Jan 28 08:09:35 dingo slapd[58079]: connection_closing: readying
conn=241340 sd=50 for close
Jan 28 08:09:35 dingo slapd[58079]: connection_close: deferring
conn=241340 sd=50
Jan 28 08:09:35 dingo slapd[58079]: do_unbind
Jan 28 08:09:35 dingo slapd[58079]: connection_resched: attempting closing
conn=241340 sd=50
Jan 28 08:09:35 dingo slapd[58079]: connection_close: conn=241340 sd=50

Again if you can provide me with any help that would be great.

Thank you,
Lew A
GWI Operations

---------------------------------
  A tiger can smile
  A snake will say it loves you
  Lies make us evil
---------------------------------

On Fri, 17 Jan 2003, Lew A wrote:

> Hello,
>
> We are running to pop servers with slapd running on both. We are having a
> problem with slapd locking up and we have to force restarting it in order
> to get to respond again. (I have enabled debugging and logging so we can
> get some more information about this).
>
> Another problem we're having is slapd core dumps sometimes. There is
> nothing useful in the logs for this (debugging set to 256). Last night
> slapd core dumped on one server and about 14 hours later it dumped on
> another server. They both have only dumped 1 time before and it was at the
> exact same time (odd that it happens so close to each other). Here are the
> logs from one of the servers (pretty useless).
>
> Jan 17 09:04:15 dingo slapd[97658]: daemon: conn=3543943 fd=65 connection
> from IP=127.0.0.1:3596 (IP=127.0.0.1:389) accepted.
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543943 op=0 BIND
> dn="UID=SHBOBALEK,OU=USERS,O=GWI.NET,DC=GWI,DC=NET" method=128
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543943 op=0 RESULT tag=97 err=0
> text=
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543943 op=1 SRCH
> base="uid=shbobalek,ou=Users,o=gwi.net,dc=gwi,dc=net" scope=0 filter="(objectClass=*)"
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543943 op=1 SEARCH RESULT
> tag=101 err=0 text=
> Jan 17 09:04:15 dingo slapd[97658]: daemon: conn=3543944 fd=66 connection
> from IP=127.0.0.1:3597 (IP=127.0.0.1:389) accepted.
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543944 op=0 BIND
> dn="UID=POPIG,OU=USERS,O=GWI.NET,DC=GWI,DC=NET" method=128
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543944 op=0 RESULT tag=97 err=0
> text=
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543944 op=1 SRCH
> base="uid=popig,ou=Users,o=gwi.net,dc=gwi,dc=net" scope=0 filter="(objectClass=*)"
> Jan 17 09:04:15 dingo slapd[97658]: conn=3543944 op=1 SEARCH RESULT
> tag=101 err=0 text=
> Jan 17 09:04:15 dingo slapd[97658]: daemon: conn=3543945 fd=67 connection
> from IP=127.0.0.1:3598 (IP=127.0.0.1:389) accepted.
> Jan 17 09:04:24 dingo slapd[57855]: slapd starting
>
> I debuged the core file and got this information:
>
> dingo: {20} % gdb -q -core /var/qmail/supervise/slapd/slapd.core
> Core was generated by `slapd'.
> Program terminated with signal 6, Abort trap.
> #0  0x1021cad8 in ?? ()
> (gdb) quit
> dingo: {21} %
>
> I have no idea what this error code is. If anyone could help me that'd be
> great. We have been having the lock-up issue alot more frequently but
> haven't had the time to look into it until now.
>
> We are running FreeBSD 4.5 p-23, OpenLDAP 2.0.25, Configured with
> '--with-tls' option only. If you need anymore information that could help
> please feel free. Any help you can give us would be great.
>
> Thank you,
> Lew A
> GWI Operations
>
> ---------------------------------
>   A tiger can smile
>   A snake will say it loves you
>   Lies make us evil
> ---------------------------------
>