[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5342) DN and naming attribute mismatch



OK. I can reproduce in test, but I'm starting to not care due to the 
circumstances.


My January 29 production database produces err=80s when hit. The important 
open question is what caused the *initial* corruption: the hardware (which 
I note has been reliable since downgrading to 2.3.39), 2.3.37 (or possibly 
even some earlier version), or 2.3.40. I can think of no way to answer 
this question at this time, short of additional experimentation/data 
collection, and it may be near-impossible to find out. (Unfortunate, 
because it is the important question.)

A perverse side effect is that the same corrupted production database, 
loaded against 2.3.39, happily accepts all changes. Now, .39 and .40 would 
be expected to produce identical results. Which of the two is wrong in 
this case? I find myself saying, again, that expectations during 
"impossible situations" are a difficult subject.


Now, why don't I care about figuring out which one of them is right? 
Because I can only instigate failure when something is in an "impossible 
situation" at t=0. In the test environment, I gained the luxury of a 
debugging procedure I couldn't afford in production: a full rm/slapadd of 
the entire database. If I rm/slapadd using entirely 2.3.39, things work. 
And if I rm/slapadd using entirely 2.3.40, things work: so far I cannot 
make an initial corruption with 2.3.40, although corruption at t=0 can be 
worsened by it.


Given the fact that I observed database issues like #5262 in my own test 
environment, I find it plausible that 2.3.37 corrupted my database on the 
way out the door. I've also observed, in test, that I can work around this 
with slapcat/rm/slapadd.

While all this has been going on, I've been stressing 2.3.40 with test008, 
and it seems to be fine. I will likely try 2.3.40 again next week, but 
plan on a slightly abnormal upgrade procedure that includes a 
slapcat/rm/slapadd. This way, at least the sins of the past will be fully 
purged prior to the first 2.3.40 start, so if any future issues arise 
we'll have end-to-end 2.3.40 accountability.