[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapd suddenly shuts down



> -----Original Message-----
> From: Quanah Gibson-Mount [mailto:quanah@stanford.edu]
> Sent: Miércoles, 21 de Julio de 2004 11:49 a.m.
> To: Héctor Miranda; openldap-software@OpenLDAP.org
> Subject: Re: slapd suddenly shuts down
> 
>  
> --On Wednesday, July 21, 2004 11:27 AM -0500 Héctor Miranda
> <hmiranda@insys-corp.com.mx> wrote:
> 
> > Hi,
> >
> > Is there any reason(s) that could make slapd to terminate at a specified
> > time?
> >
> > This, because I had two stand alone OpenLDAP up and running in a load
> > balanced schema but now I had the necessity to separate them (site
> > migration). Now there have been two unexpected slapd shutdowns on
> > different times without an apparent good reason at the machine that is
> > supporting my corp's load.
> >
> > This is what the log shows up:
> >
> > Jul 20 10:41:46 solaris slapd[899]: [ID 347666 local4.debug] conn=236179
> > op=0 BIND dn="cn=Manager,o=corp" method=128
> > Jul 20 10:41:46 solaris slapd[899]: [ID 338319 local4.debug] conn=236177
> > op=3 UNBIND
> > Jul 20 10:41:46 solaris slapd[899]: [ID 237057 local4.debug] conn=236179
> > op=0 BIND dn="cn=Manager,o=corp" mech=simple ssf=0
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=236177
> > fd=42 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 217296 local4.debug] conn=236179
> > op=0 RESULT tag=97 err=0 text=
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=201650
> > fd=11 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=235457
> > fd=15 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=201651
> > fd=16 closed
> > .
> > .
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=236176
> > fd=45 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=236178
> > fd=46 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=236179
> > fd=47 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 952275 local4.debug] conn=236025
> > fd=50 closed
> > Jul 20 10:41:46 solaris slapd[899]: [ID 542995 local4.debug] slapd
> > shutdown: waiting for 0 threads to terminate
> > Jul 20 10:41:47 solaris slapd[899]: [ID 486161 local4.debug] slapd
> > stopped.
> >
> > It seems a normal shutdown, but it shouldn't happen since nobody stopped
> > manually the slapd.
> >
> > The machine is a well sized Solaris 8 and with 8Gigs on RAM we think it
> > can handle the processing of a 4000 users directory.
> >
> > Has anyone had the same situation??
> 
> We run 10 OpenLDAP servers on Solaris 8, and I've never seen this.  Does
> the shutdown always happen at the same time of day? It looks to me that
> someone or something is sending slapd a kill signal.  Maybe a Sun process
> thinking slapd is its own directory service?
> 
> What version of OpenLDAP are you running?

Quanah,

There have been two shutdowns, one at 10:41 AM and the other on a different
day and server at 4:44 PM, here is the log for this one: (my peak time is
daily from 10 AM to 2 PM)

Jul 14 16:44:09 solaris slapd[10998]: [ID 641214 local4.debug] deferring
operation
Jul 14 16:48:31 solaris slapd[10998]: [ID 641214 local4.debug] deferring
operation
Jul 14 16:54:28 solaris slapd[10998]: [ID 641214 local4.debug] deferring
operation
Jul 14 16:59:24 solaris slapd[10998]: [ID 641214 local4.debug] deferring
operation
Jul 14 17:00:03 solaris slapd[10998]: [ID 542995 local4.debug] slapd
shutdown: waiting for 1 threads to terminate
Jul 14 17:00:03 solaris slapd[10998]: [ID 486161 local4.debug] slapd
stopped.

This is a little different because it says the server was deferring a lot of
operations just before it shutted down.

The version is OpenLDAP 2.1.21 and BDB 4.1.25 patched.

Right now I'm creating a cron-script that prints the CPU usage every minute
and revive slapd in case it dies; in order to eliminate the possibility of a
performance trouble.

I don't think that a Sun process could be thinking slapd is its own
directory because this problem is recent (july 14 and 21) and this machines
have been on production for more than a year ago and nothing like this ever
happened before.

Thanks!