[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7085) mutex lockup issue
--0015174c117a5169be04b1e058e5
Content-Type: text/plain; charset=ISO-8859-1
Quanah,
We have compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4
hours into our testing in similar manner to what I have reported earlier. I
believe this issue still occurs on the latest version.
However, when I used gdb, I didn't notice the mutex locked threads like I
did with OpenLDAP 2.4.22.
Following is from locked 2.4.26 slapd server.
(gdb) info thread
14 Thread 0x418dd940 (LWP 13814) 0x00000037aa4d48a8 in epoll_wait ()
from /lib64/libc.so.6
13 Thread 0x420de940 (LWP 13815) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
12 Thread 0x428df940 (LWP 13816) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
11 Thread 0x430e0940 (LWP 13843) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
10 Thread 0x438e1940 (LWP 13855) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
9 Thread 0x440e2940 (LWP 13856) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
8 Thread 0x448e3940 (LWP 13857) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
7 Thread 0x450e4940 (LWP 13858) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
6 Thread 0x458e5940 (LWP 13859) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
5 Thread 0x460e6940 (LWP 13860) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
4 Thread 0x468e7940 (LWP 2007) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
3 Thread 0x470e8940 (LWP 2008) 0x00000037aa4cd722 in select () from
/lib64/libc.so.6
2 Thread 0x478e9940 (LWP 2009) 0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
* 1 Thread 0x2ac6ccfdc930 (LWP 13805) 0x00000037aac07b35 in pthread_join
() from /lib64/libpthread.so.0
(gdb) thread 3
[Switching to thread 3 (Thread 0x470e8940 (LWP 2008))]#0
0x00000037aa4cd722 in select () from /lib64/libc.so.6
(gdb) bt
#0 0x00000037aa4cd722 in select () from /lib64/libc.so.6
#1 0x000000000054ece5 in ?? ()
#2 0x000000000054aa15 in ?? ()
#3 0x0000000000557637 in ?? ()
#4 0x0000000000557c11 in ?? ()
#5 0x00000000004b2d93 in ?? ()
#6 0x00000000004e9d7c in ?? ()
#7 0x00000037aac0673d in start_thread () from /lib64/libpthread.so.0
#8 0x00000037aa4d44bd in clone () from /lib64/libc.so.6
It looks like it's waiting on select() on thread 3 which never get fired
when I access it using ldapsearch command.
I ran strace on ldapsearch (on a client machine) and following is what I
see at the end of the log..
$ strace ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b
mds-vo-name=WT2,o=grid
"(&(objectClass=GlueLocation)(GlueLocationName=TIMESTAMP))"
....
write(1, "\n", 1
) = 1
write(3, "0l\2\1\2cg\4\26mds-vo-name=WT2,o=grid\n"..., 110) = 110
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1
Not sure if this strace is useful or not.. but after this, ldapsearch never
returned.
Thanks,
Soichi
On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <quanah@zimbra.com>wrote:
> --On Wednesday, November 09, 2011 2:01 PM +0000 hayashis@indiana.eduwrote:
>
> Full_Name: Soichi Hayashi
>> Version: 2.4.22
>>
>
> OpenLDAP 2.4.22 is quite old, and had various known issues. Please use a
> current release (2.4.26). This report will not be investigated unless you
> can reproduce it with a current release of OpenLDAP. You also fail to note
> what BDB release you are using, and whether or not it has all the relevant
> patches applied to it. If you have a broken policy of only using vendor
> provided packages, then you will need to send a bug report to RedHat, as it
> is their job to maintain their vendor packages.
>
>
> Thanks!
>
> --Quanah
>
> --
>
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra :: the leader in open source messaging and collaboration
>
--0015174c117a5169be04b1e058e5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div>Quanah,</div><div><br></div><div>We have compiled OpenLDAP 2.4.26 with=
BDB 5.2.36. The OpenLDAP locked up 4 hours into our testing in similar man=
ner to what I have reported earlier. I believe this issue still occurs on t=
he latest version.</div>
<div><br></div><div>However, when I used gdb, I didn't notice the mutex=
locked threads like I did with OpenLDAP 2.4.22.</div><div><br></div><div>F=
ollowing is from locked 2.4.26 slapd server.</div><div><br></div><div>(gdb)=
info thread</div>
<div>=A0 14 Thread 0x418dd940 (LWP 13814) =A00x00000037aa4d48a8 in epoll_wa=
it () from /lib64/libc.so.6</div><div>=A0 13 Thread 0x420de940 (LWP 13815) =
=A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libp=
thread.so.0</div>
<div>=A0 12 Thread 0x428df940 (LWP 13816) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 11 Thre=
ad 0x430e0940 (LWP 13843) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC=
_2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 10 Thread 0x438e1940 (LWP 13855) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 9 Threa=
d 0x440e2940 (LWP 13856) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_=
2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 8 Thread 0x448e3940 (LWP 13857) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 7 Thread=
0x450e4940 (LWP 13858) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 6 Thread 0x458e5940 (LWP 13859) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 5 Thread=
0x460e6940 (LWP 13860) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 4 Thread 0x468e7940 (LWP 2007) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 3 Thread =
0x470e8940 (LWP 2008) =A00x00000037aa4cd722 in select () from /lib64/libc.s=
o.6</div>
<div>=A0 2 Thread 0x478e9940 (LWP 2009) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>* 1 Thread 0x=
2ac6ccfdc930 (LWP 13805) =A00x00000037aac07b35 in pthread_join () from /lib=
64/libpthread.so.0</div>
<div>(gdb) thread 3</div><div>[Switching to thread 3 (Thread 0x470e8940 (LW=
P 2008))]#0 =A00x00000037aa4cd722 in select () from /lib64/libc.so.6</div><=
div>(gdb) bt</div><div>#0 =A00x00000037aa4cd722 in select () from /lib64/li=
bc.so.6</div>
<div>#1 =A00x000000000054ece5 in ?? ()</div><div>#2 =A00x000000000054aa15 i=
n ?? ()</div><div>#3 =A00x0000000000557637 in ?? ()</div><div>#4 =A00x00000=
00000557c11 in ?? ()</div><div>#5 =A00x00000000004b2d93 in ?? ()</div><div>=
#6 =A00x00000000004e9d7c in ?? ()</div>
<div>#7 =A00x00000037aac0673d in start_thread () from /lib64/libpthread.so.=
0</div><div>#8 =A00x00000037aa4d44bd in clone () from /lib64/libc.so.6</div=
><div><br></div><div>It looks like it's waiting on select() on thread 3=
which never get fired when I access it using ldapsearch command.=A0</div>
<div><br></div><div>I ran strace on ldapsearch (on a client machine) and fo=
llowing is what I see at the end of the log..</div><div><br></div><div>$ st=
race ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b mds-vo-name=3DWT2,o=3Dg=
rid "(&(objectClass=3DGlueLocation)(GlueLocationName=3DTIMESTAMP))=
"</div>
<div><br></div><div>....</div><div>write(1, "\n", 1</div><div>) =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 1</div><div>write(3, "=
0l\2\1\2cg\4\26mds-vo-name=3DWT2,o=3Dgrid\n"..., 110) =3D 110</div><di=
v>poll([{fd=3D3, events=3DPOLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1</div>
<div><br></div><div>Not sure if this strace is useful or not.. but after th=
is, ldapsearch never returned.</div><div><br></div><div>Thanks,</div><div>S=
oichi</div><div><br></div><br><div class=3D"gmail_quote">On Wed, Nov 9, 201=
1 at 1:13 PM, Quanah Gibson-Mount <span dir=3D"ltr"><<a href=3D"mailto:q=
uanah@zimbra.com">quanah@zimbra.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">--On Wednesday, November 09, 2011 2:01 PM +=
0000 <a href=3D"mailto:hayashis@indiana.edu" target=3D"_blank">hayashis@ind=
iana.edu</a> wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Full_Name: Soichi Hayashi<br>
Version: 2.4.22<br>
</blockquote>
<br>
OpenLDAP 2.4.22 is quite old, and had various known issues. =A0Please use a=
current release (2.4.26). =A0This report will not be investigated unless y=
ou can reproduce it with a current release of OpenLDAP. =A0You also fail to=
note what BDB release you are using, and whether or not it has all the rel=
evant patches applied to it. =A0If you have a broken policy of only using v=
endor provided packages, then you will need to send a bug report to RedHat,=
as it is their job to maintain their vendor packages.<br>
<br>
<br>
Thanks!<span class=3D"HOEnZb"><font color=3D"#888888"><br>
<br>
--Quanah<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Sr. Member of Technical Staff<br>
Zimbra, Inc<br>
A Division of VMware, Inc.<br>
--------------------<br>
Zimbra :: =A0the leader in open source messaging and collaboration<br>
</font></span></blockquote></div><br>
--0015174c117a5169be04b1e058e5--