[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
malloc: Cannot allocate memory
Hey all,
With openldap 2.3 in production for several weeks now, I experienced my
first crash of slapd today. I have my loglevel at 256 and was able to
find this in the logs.
Started with a bunch of these:
Nov 2 18:00:41 ldap1 slapd[1525]: bdb(dc=fuse,dc=net): malloc: Cannot
allocate memory: 377
Finally ended with:
Nov 2 18:00:41 ldap1 slapd[1525]: ch_calloc of 1 elems of 392 bytes
failed
I then was paged and restarted slapd and it was able to recover the db
automatically. Nice feature by the way in 2.3! Its running smooth again
since then.
Now, I had another ldap server go down yesterday and the system completely
died, just beeps when you power it on. Dell said that means it has no
memory, so either all 4 memory chips died or the motherboard died. They
are sending someone over tomorrow. Because of this fact, it makes me
suspect that perhaps they shipped those servers with a batch of bad
memory. But just in case, that isn't the case, I was hoping someone had
some suggestions for me.
Here is some relevant info.
The DB has about 400,000 dn's, each with about 5 attributes in them. I'm
running openldap 2.3.7 w/ a syncprov.c patch and bdb 4.2 w/ 4 patches from
sleepycat and one from the openldap dist. I'm running this on a FreeBSD
5.4 machine with 2 2.8G CPUs and 2G of ram. I built the distribution from
source. The machine that died is a syncrepl slave machine using
refreshandpersist.
From 12:00AM to 8:00PM today I've had 48,612 connections to that machine.
They are mostly just simple binds, followed by an equality search on an
indexed attribute.
In slapd.conf I have the following.
The backend is bdb
Indexing 6 attributes with eq
index objectClass eq
index uid eq
index radiusGroupName eq
index accountNumber eq
index entryUUID eq
index entryCSN eq
cachesize 100000
idlcachesize 300000
checkpoint 1024 5
Now that I think about it, that 100,000/300,000 might be a bit high. What
do you think? Could that cause the memory error? I'm not sure if this
cache is related to the cachesize in DB_CONFIG - I'm assuming this uses
extra memory outside the DB_CONFIG size. Is that correct?
My DB_CONFIG file has the following:
set_cachesize 0 536870912 1
set_lg_regionmax 1048576
set_lg_bsize 2097152
set_lg_max 10485760
set_flags DB_LOG_AUTOREMOVE
Since I've got 2G of RAM, I could up that 512M cachesize higher if needed
and mayble lower the cachesize in slapd.conf. Do you think that would
help?
Here is top, sorted by res about an hour after the restart:
Mem: 217M Active, 1051M Inact, 164M Wired, 80M Cache, 112M Buf, 492M Free
Swap: 1024M Total, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU
COMMAND
33669 ldap 20 0 756M 395M kserel 0 0:25 0.00% 0.00% slapd
Any suggestions?
Unfortunately, this is a production machine and one of my other slaves is
in the shop, so there isn't much room for experimentation.
I was hoping someone had some insight to perhaps the slapd.conf
cachesize/idlecachesize settings and the DB_CONFIG settings. Think there
is anything I could try to help with an issue like this? Any ideas of why
this could have happened if its not bad memory in the machine?
Any help/advice/suggestions/etc... is appreciated.
-Dusty Doris