[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Fwd: RE24 testing
Rein Tollevik wrote:
Pierangelo Masarati skrev:
Rein Tollevik wrote:
Howard Chu skrev:
Howard Chu wrote:
There's not much difference between HEAD and RE24 really. And I've
seen
test050 fail on both, at random times. Still haven't tracked it down.
Hm. I just saw test050 lock up on me, a repeat of the discussion in
ITS#5454. Seems like we need to go back to a trylock here.
This reopens the race condition window that ITS closed. If a trylock
is required then we must find some other way of asserting that the
socket is either not enabled while the lock is held, or that any
messages received while it is locked is acted upon.
I have run test050 with this patch, and see the same failures as in
http://www.openldap.org/lists/openldap-devel/200809/msg00051.html
I thought the errors was due to some recent changes in the code, but
it appears to be the populating of consumer2 added in revision 1.11
of test050 itself that triggers it. I.e, the actual bug can have
been there for some time :-(
AFAIR, writing to consumer2 was exactly added to trigger an issue that
test050 was hiding in its initial form.
Do you remember what issue? After applying the patch to ITS#5719 this
test has switch from failing almost half of the times I run it to now
managing several hundreds iterations before deadlocking (i.e not the
same symptoms as I used to see). I have had two cases where it was
hung, and they seem to be related to the pool pausing that takes place
when the updates to the config is replicated to the second consumer.
The tests hangs in ldapadd populating consumer2, where at least one
thread (on the second consumer) replicating the config is hanging in
handle_pause(). Note, this is without the lock -> trylock change in
syncrepl.c, as I'm convinced that would reintroduce the syncrepl hangs
we have seen earlier. None of the threads seem to be waiting on this
lock, but I haven't had time to investigate it enough to conclude
anything yet.
I felt the need to add a write to consumer2 after someone complained
that writes to it were not replicated. That modification to the test
confirmed that writes to consumer2 were not replicated consistently (at
least, most of the times, AFAIR). In fact, prior to that modification,
only 2-way MMR was tested, and N-way (at least, 3-way) did not work.
Then, the fix we're discussing resulted in writes to consumer2 being
consistently replicated. I confirm that test050 is succeeding here for
re24 (single core i386).
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Fax: +39 0382 476497
Email: ando@sys-net.it
-----------------------------------