[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5171) hdb txn_checkpoint failures



Aaron Richton wrote:
>> No. The BDB transaction log files don't know (or care) anything about IP 
>> addresses. Nothing at the slapd layer could have any direct effect on the BDB 
>> transaction logs. How exactly did you reconfigure the servers, did you stop 
>> them and restart them or did you use cn=config?
> 
> echo 192.blahblah master.r.e >> /etc/hosts
> 
> The master changed from 128.blahblah to 192.blahblah. Same physical 
> machine, just different interface. On slave4 and 6, I didn't touch slapd.

(Of course, if you only appended to /etc/hosts then the old address is still 
in there and getting used first..)

>> Might as well get the db_stat -l output for a few of them to compare.
> 
> This isn't going well at all; they just can't join the environment. I 
> tried on slave1, it hung. I tried on slave4 under truss, it hung. (We're 
> talking >30 minutes here.) Although I swear I've run db_stat hot, I killed 
> db_stat (ungracefully, sadly) and stopped slapd (gracefully) on slave1, 
> ran db_stat again, and it hung there...and corrupted the environment to 
> the point where I couldn't get db_recover/slapd to run. (I ended up 
> blowing the slave1 database away; it's refreshing from syncrepl now.)

> I've got a few more slaves that I haven't shot in the foot yet, and I only 
> tried this on one of the suffixes on slave{1,4}. Plenty of more 
> opportunities to screw this up yet if there's anything to try...I suppose 
> I could go for -N, or if the command line is going to be a pain, I could 
> join the slapd process with dbx and print ->log_stat myself (although I 
> might need a bit of hand holding on that)...
> 
> [the hang on slave4]
> db_stat         ->    libdb-4.2.so:*db_env_create(0xffbffaec, 0x0, 0x17154)
> lwp_mutex_lock(0xFF0D0000)      (sleeping...)
>          mutex type: USYNC_PROCESS
> 
>   ff307248 __db_des_get (29ac0, 29d78, 29d78, ffbff9d0, 0, ffbff9d9) + c0
>   ff305780 __db_e_attach (29ac0, ffbffa94, 40400, 40000, 33e021, 29d71) + 6e0
>   ff2ff434 __dbenv_open (29ac0, 0, 40400, 0, 0, 0) + 664
>   00016514 db_init  (29ac0, 0, 4, 100000, ffbffba0, ff3deb54) + 64
>   00011e3c main     (2, ffbffc44, ffbffc50, 29800, 0, 0) + 9a4
>   00011470 _start   (0, 0, 0, 0, 0, 0) + 108

If this is happening even with slapd cleanly shut down then it should also 
prevent slapd from restarting, since slapd first attempts to join an existing 
environment before trying to create a new one. And that really implies that 
the rest of the environment is shot.

-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/