[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: 8 hours tests ends with inconsistent DB.
Hmmm.... Well I think OpenLDAP is definitely rock solid for most cases
it is used. But there are cases where problems raises. I've discovered
DB corruptions, attributes which could suddenly not be found anymore
which where present for weeks in the schema definition, transaction logs
suddenly owned by root but OpenLDAP was running as "openldap" user
and things like that. I haven't opened a case for this issues because until
now I believed that these problems are simply caused by misconfiguration
for which I'm responsible. That's why people seek help on a mailing list.
Maybe someone else have had such problems in the past and so he or
she could help quickly.
For the problems I mentioned above it now really seem's to be my own
fault. For the case of the DB corruptions I could now reproduce it. In
this case I'm loading 500.000 entries with ldapadd into the directory. An
entry consists of about 23 attributes. After about 440.000 entries I
get the following messeages:
....
conn=1 op=442515 ADD dn="uid=442515,ou=icpuser,l=root"
conn=1 op=442515 RESULT tag=105 err=0 text=
conn=1 op=442516 ADD dn="uid=442516,ou=icpuser,l=root"
bdb(l=root): malloc: Cannot allocate memory: 1147
free(): invalid pointer 0x925b44c8!
conn=1 op=442516 RESULT tag=105 err=80 text=entry store failed
conn=1 op=442517 ADD dn="uid=442517,ou=icpuser,l=root"
bdb(l=root): malloc: Cannot allocate memory: 32768
free(): invalid pointer 0x925b48c8!
bdb(l=root): PANIC: Cannot allocate memory
free(): invalid pointer 0x925b4978!
slapd shutdown: waiting for 1 threads to terminate
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
Now I'm doing a "db_recover -c -v" (also tried "normal" "db_recover -v").
This command cames back with the message that db recovery completed
successfully:
db_recover: Finding last valid log LSN: file: 101 offset 36674186
db_recover: Recovery starting from [1][28]
db_recover: Recovery complete at Sun Jun 13 14:57:52 2004
db_recover: Maximum transaction ID 800e158c Recovery checkpoint
[101][36674186]
If I'm now starting OpenLDAP again I get the following messages:
....
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb_db_open: dbenv_open failed: DB_RUNRECOVERY: Fatal error, run
database recovery (-30978)
backend_startup: bi_db_open(0) failed! (-30978)
bdb(l=root): txn_checkpoint interface requires an environment configured
for the transaction subsystem
bdb_db_destroy: txn_checkpoint failed: Invalid argument (22)
slapd stopped.
connections_destroy: nothing to destroy.
Hmmm... The database according to db_recover should now be in a consistent
state but OpenLDAP doesn't share this opinion with db_recover. Well
according
to the first messages we are running out of resources needed. That could
happen.
But as I already mentioned above a db_recover should bring the database
back in
consistent state. And according to db_recover this should be the case.
But still
I can't start OpenLDAP quitting with the message mentioned above. This was
yesterday. Today I started the "db_recover -c -v" again followed by starting
OpenLDAP again. And OpenLDAP started without problems! What happend?
Something
must changed since yesterday. But I haven't changed anything... Well I
think that
OpenLDAP still claimed resources for a while after it crashed which
today where
freed by the kernel. So like Quanah mentioned it seams that I have to
increase
kernel resources again. Maybe if I just have had rebooted the server
yesterday
I could have continued with ldapadd. It haven't had solved the problem
just a
quick "hack" of course ;-)
To make everything complete here is the configuration I used (hope I
have all included):
DB_CONFIG:
set_cachesize 0 524288000 0
set_shm_key 1
set_lg_regionmax 1048576
set_lg_max 52428800
set_lg_bsize 2097152
set_lk_max_lockers 1000 # default
set_lk_max_locks 1000 # default
set_lk_max_objects 1000 # default
set_tx_max 100
slapd.conf (extract of relvant settings):
loglevel 96
idletimeout 10
sizelimit unlimited
threads 16
cachesize 10000
checkpoint 1024 1
(Note: A checkpoint every 1 min. or 1024 kByte should be really
no problem. The I/O subsystem is happy with this settings. I set
it this low because I don't want loose to much information in case
of a crash.)
Kernel resources:
cat /proc/sys/kernel/sem
250 32000 32 128
cat /proc/sys/kernel/shmall
2147483648
cat /proc/sys/kernel/shmmax
2147483648
cat /proc/sys/kernel/shmmni
4096
cat /proc/sys/kernel/msgmax
8192
cat /proc/sys/kernel/msgmnb
16384
cat /proc/sys/kernel/msgmni
1024
Hardware:
Fujitsu Siemens RX300
2x Intel Xeon CPU 3.06GHz
2 GByte RAM
I/O Compaq EVA SAN (this is NOT a remote filesystem like NFS! It's SCSI
over FibreChannel)
OS:
Redhat ES 3 (Update 1), Kernel 2.4.21
libc 2.3.2
Now for the next test I will now increase the kernel resources. I will take
the recommendations Oracle suggests for SHM/SEM because I haven't found
such (good)
informations for OpenLDAP until now.
Cheers,
Robert
Trevor Warren wrote:
--- Wesley D Craig <wes@umich.edu> wrote:
On 12 Jun 2004, at 07:48, Trevor Warren wrote:
think this tells
you anything about scale. This *might* tell you
something about how
easy it is to give the application to someone who
doesn't know
anything. The test you're performing might be
[snip]
With about half a decade with Floss i may not be a
guru at it but i surely know a thing or two about
/proc configurations and appropriate hdparms that
could set your config vroooming.
Thanks for all the criticism wes.
Trevor
appropriate if you're
hoping to repackage and distribute OpenLDAP to 50
million customers.
:wes
=====
( >- -< )
/~\ ______________________________________ /~\
| \) / Scaling FLOSS in the Enterprise \ (/ |
|_|_ \ trevorwarren@yahoo.com / _|_|
\____________________________________/
__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/