[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Slapd database goes corrupt repeatedly after recovery
Dear list,
We are using an OpenLDAP/slapd server to manage the user accounts of our Samba server and have recently run into the problem that users cannot connect to Samba drives anymore after some time. Samba complains that it cannot connect to the LDAP server (see below for error message in Samba log) and the slapd log shows
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (uid) not indexed
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (sambaSID) not indexed
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (sambaSID) not indexed
Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): file id2entry.bdb has LSN 1/382892, past end of log at 1/283666
Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): Commonly caused by moving a database from one database environment
Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): to another without clearing the database LSNs, or by removing all of
Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): the log files from a database environment
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/382892 past current end-of-log of 1/283666
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 5
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/378772 past current end-of-log of 1/283666
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 7
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/373647 past current end-of-log of 1/283666
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 8
Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): txn_checkpoint: failed to flush the buffer cache: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 11:38:51 office-server slapd[3433]: conn=62 op=29 do_search: invalid dn (sambaDomainName=,sambaDomainName=foo,dc=foo,dc=org)
Mar 25 11:38:51 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 11:39:01 office-server slapd[3433]: last message repeated 26 times
Mar 25 11:39:01 office-server CRON[3657]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
Mar 25 11:39:14 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 11:39:47 office-server slapd[3433]: last message repeated 35 times
Mar 25 11:39:48 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 11:39:49 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 11:39:50 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 11:40:51 office-server slapd[3433]: last message repeated 164 times
Mar 25 11:40:51 office-server slapd[3433]: last message repeated 3 times
Mar 25 11:40:52 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 11:41:53 office-server slapd[3433]: last message repeated 294 times
Strangely, restarting slapd helps and users can use Samba again for a limited and arbitrary period of time until the problem pops up again. I tried fixing the database using
db4.7_recover -v -h /var/lib/ldap
but again, the problem pops up again later.
I realized that when I shut down slapd using "/etc/init.d/slapd stop", it complains about the database being corrupt (even if so far no problems appeared):
Mar 25 10:12:35 office-server slapd[16880]: slapd shutdown: waiting for 0 operations/tasks to finish
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/382892 past current end-of-log of 1/278482
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 5
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/378772 past current end-of-log of 1/278482
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 7
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/373647 past current end-of-log of 1/278482
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 8
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 10:12:35 office-server slapd[16880]: bdb_db_close: database "dc=foo,dc=org": txn_checkpoint failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974).
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): File handles still open at environment close
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Open file handle: /var/lib/ldap/log.0000000001
Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
Mar 25 10:12:35 office-server slapd[16880]: bdb_db_close: database "dc=foo,dc=org": close failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974)
Mar 25 10:12:35 office-server slapd[16880]: slapd stopped.
Mar 25 10:12:46 office-server slapd[19194]: @(#) $OpenLDAP: slapd 2.4.18 (Sep 8 2009 17:47:22) $#012#011buildd@crested:/build/buildd/openldap-2.4.18/debian/build/servers/slapd
Does anybody have an idea what the problem might be?
Many thanks for any hints or pointers!
Kaspar
--
Samba Log File:
[2008/09/23 11:22:22, 0] lib/smbldap.c:smbldap_connect_system(982)
failed to bind to server ldap://localhost/ with dn="cn=admin,dc=foo,dc=org" Error: Can't contact LDAP server (unknown)
[2008/09/23 11:22:22, 1] lib/smbldap.c:another_ldap_try(1153)
Connection to LDAP server failed for the 1 try!
[2008/09/23 11:22:23, 2] lib/smbldap.c:smbldap_open_connection(786)
smbldap_open_connection: connection opened
[2008/09/23 11:22:23, 2] lib/smbldap.c:smbldap_connect_system(982)
failed to bind to server ldap://localhost/ with dn="cn=admin,dc=foo,dc=org" Error: Can't contact LDAP server (unknown)
[2008/09/23 11:22:23, 1] lib/smbldap.c:another_ldap_try(1153)
Connection to LDAP server failed for the 2 try!
Server details: Ubuntu 9.10, slapd 2.4.18
Slapd configuration file (slapd.conf):
# This is the main slapd configuration file. See slapd.conf(5) for more
# info on the configuration options.
#######################################################################
# Global Directives:
# Features to permit
#allow bind_v2
# Schema and objectClass definitions
include /etc/ldap/schema/core.schema
include /etc/ldap/schema/cosine.schema
include /etc/ldap/schema/nis.schema
include /etc/ldap/schema/inetorgperson.schema
include /etc/ldap/schema/samba.schema
include /etc/ldap/schema/misc.schema
# Where the pid file is put. The init.d script
# will not stop the server if you change this.
pidfile /var/run/slapd/slapd.pid
# List of arguments that were passed to the server
argsfile /var/run/slapd/slapd.args
# Read slapd.conf(5) for possible values
loglevel 392
# Where the dynamically loaded modules are stored
modulepath /usr/lib/ldap
moduleload back_bdb
# The maximum number of entries that is returned for a search operation
sizelimit 500
# The tool-threads parameter sets the actual amount of cpu's that is used
# for indexing.
tool-threads 1
#######################################################################
# Specific Backend Directives for bdb:
# Backend specific directives apply to this backend until another
# 'backend' directive occurs
backend bdb
#######################################################################
# Specific Backend Directives for 'other':
# Backend specific directives apply to this backend until another
# 'backend' directive occurs
#backend <other>
#######################################################################
# Specific Directives for database #1, of type bdb:
# Database specific directives apply to this databasse until another
# 'database' directive occurs
database bdb
# The base of your directory in database #1
suffix "dc=baselgovernance,dc=org"
# rootdn directive for specifying a superuser on the database. This is needed
# for syncrepl.
# rootdn "cn=admin,dc=baselgovernance,dc=org"
# Where the database file are physically stored for database #1
directory "/var/lib/ldap"
# The dbconfig settings are used to generate a DB_CONFIG file the first
# time slapd starts. They do NOT override existing an existing DB_CONFIG
# file. You should therefore change these settings in DB_CONFIG directly
# or remove DB_CONFIG and restart slapd for changes to take effect.
# For the Debian package we use 2MB as default but be sure to update this
# value if you have plenty of RAM
dbconfig set_cachesize 0 2097152 0
# Sven Hartge reported that he had to set this value incredibly high
# to get slapd running at all. See http://bugs.debian.org/303057 for more
# information.
# Number of objects that can be locked at the same time.
dbconfig set_lk_max_objects 1500
# Number of locks (both requested and granted)
dbconfig set_lk_max_locks 1500
# Number of lockers
dbconfig set_lk_max_lockers 1500
# Indexing options for database #1
index objectClass eq
# Save the time that the entry gets modified, for database #1
lastmod on
# Checkpoint the BerkeleyDB database periodically in case of system
# failure and to speed slapd shutdown.
checkpoint 512 30
# Where to store the replica logs for database #1
# replogfile /var/lib/ldap/replog
# The userPassword by default can be changed
# by the entry owning it if they are authenticated.
# Others should not be able to see it, except the
# admin entry below
# These access lines apply to database #1 only
access to attrs=userPassword,sambaNTPassword,sambaLMPassword
by dn="cn=admin,dc=baselgovernance,dc=org" write
by anonymous auth
by self write
by * none
# Ensure read access to the base for things like
# supportedSASLMechanisms. Without this you may
# have problems with SASL not knowing what
# mechanisms are available and the like.
# Note that this is covered by the 'access to *'
# ACL below too but if you change that as people
# are wont to do you'll still need this if you
# want SASL (and possible other things) to work
# happily.
access to dn.base="" by * read
# The admin dn has full write access, everyone else
# can read everything.
access to *
by dn="cn=admin,dc=baselgovernance,dc=org" write
by * read
# For Netscape Roaming support, each user gets a roaming
# profile for which they have write access to
#access to dn=".*,ou=Roaming,o=morsnet"
# by dn="cn=admin,dc=baselgovernance,dc=org" write
# by dnattr=owner write
#######################################################################
# Specific Directives for database #2, of type 'other' (can be bdb too):
# Database specific directives apply to this databasse until another
# 'database' directive occurs
#database <other>
# The base of your directory for database #2
#suffix "dc=debian,dc=org"
# Indices to maintain
## required by OpenLDAP
#index objectclass eq
index cn pres,sub,eq
index sn pres,sub,eq
## required to support pdb_getsampwnam
index uid pres,sub,eq
## required to support pdb_getsambapwrid()
index displayName pres,sub,eq
## uncomment these if you are storing posixAccount and
## posixGroup entries in the directory as well
##index uidNumber eq
##index gidNumber eq
##index memberUid eq
index sambaSID pres,sub,eq
index sambaPrimaryGroupSID eq
index sambaDomainName eq
index default sub