[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5171) hdb txn_checkpoint failures
Aaron Richton wrote:
>> itself. Again, we can't really tell without single-stepping thru the BDB
>> library code. It may not be worth the effort, but that's your call.
>
> The lock was
>
> env_region.c:290 MUTEX_LOCK(dbenv, &renv->mutex);
>
> but that wasn't making much sense....and after a couple minutes in dbx I
> realized that I've been killing myself with the attempts at db_stat.
> Yesterday's attempts were running db_* binaries with a wrong (but
> compatible) ABI. It'd be nice if Sleepycat had some more/earlier checks
> for that, but oh well...
Kinda figured that that's what happened.
> So anyway, I corrupted base2/slave4 by running the wrong db_stat, but that
> left three other bases on slave4 and all three bases on slave6. I ran
> db_stat -l on them, the output is:
>
> https://www.nbcs.rutgers.edu/~richton/its5171_dbstatl
> BTW, this ABI screwup shouldn't be the root cause of the failures...I
> haven't tried any db tools until the course of debugging this. These are
> AUTOREMOVE, so db_archive is unlikely, for instance.
It's still rather suspicious that slave4 and slave6 both had identical log
status for base1 (1/188113) but different requested locations (1/8730339 vs
1/8730401). If they're identically configured slaves then they ought to be in
lock-step. Then again, obviously they're not identical since slave6 doesn't
show base4 in your log.
Do you have the db_stat output from an uncorrupted slave? What about the master?
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/