[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5171) hdb txn_checkpoint failures
> One more thing to check is just using "ls -l" to see if the actual size of
> the log files corresponds with the db_stat offsets. E.g. if slave6 base1's
> log.0000001 is really 8MB but the LSN is only 233KB, then we have to look for
> a weird in-memory corruption. If not, then somebody reset your logs.
No, it looks like those sizes all match. Actually, the "reset logs" may
well be the case (although I still can't imagine how, I'm willing to just
chalk this whole thing up to user error...of course logs show that the
user was me, which is a shame :) and is hard to disprove (with only one
log file active) with the exception of base2. base2 has multiple log files
going back:
[slave4]
-rw------- 1 root root 9999986 Sep 6 18:03 log.0000000001
-rw------- 1 root root 9999967 Sep 10 14:03 log.0000000002
-rw------- 1 root root 9999983 Sep 18 16:33 log.0000000003
-rw------- 1 root root 9429761 Oct 8 05:33 log.0000000004
[slave6]
-rw------- 1 root root 9999986 Sep 6 18:03 log.0000000001
-rw------- 1 root root 9999967 Sep 10 14:03 log.0000000002
-rw------- 1 root root 9999983 Sep 18 16:33 log.0000000003
-rw------- 1 root root 9429761 Oct 8 05:33 log.0000000004
which of course match the db_stat -l, but also extend back prior to
September 24 according to the filesystem timestamps. I guess the argument
could be made that log 4 was truncated on September 24...would that be
detected/come up sane/come up bad in the db_stat?