[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
(ITS#3384) slapd stops forking after the forked processes are hung on read(stdin)
Full_Name: Tho Tran
Version: 2.2.17
OS: FC2
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (192.128.134.68)
Issues:
The shell-database programs which is written in Python are hung on
sys.stdin.readlines() after they are forked by slapd. This happens after the
slapd server is stressed with multiple ldapsearch client applications.
When the shell-database application are hung on read, slapd continues to process
ldap requests but it stops forking the shell-database program.
Environment:
LINUX Fedora-core-2
openLDAP 2.2.17 and Python 2.3.3.
slapd.conf is configured to have up to 128 threads.
The LDAP client can run on the same or another machine. The client is a shell
program like this.
desthost=fllinux1
cnt=0
nloop=100
echo "Authenticating test1@att.com" > test1.log
while [ $cnt -lt $nloop ]
do
sleep 0
cnt=`expr $cnt + 1`
ldapsearch -h $desthost -x -v -b 'dc=openldap,dc=com'
'email=test1@att.com'
rc=`ldapsearch -h $desthost -x -D
'cn=test1@att.com,ou=people,dc=openldap,dc=com' -w 'password'`
if [ ! "$rc" ]
then
echo "`date`: loop $cnt: Bind test1@att.com FAIL" >> test1.log
else
echo "`date`: loop $cnt: Bind test1@att.com OK" >> test1.log
fi
done
The LDAP server is configured to use shell for database. The shell program is
written in Python, search.py. For each LDAP search or bind request, slapd forks
the search.py program. The search.py performs read using sys.std.readlines()
and parses the LDAP request. For the search request, search.py simply
constructs a proper LDAP response and sends the reply via stdout using Python
print statement. For the bind request, the application contacts another server
via HTTP to valid the entity, constructs a proper LDAP response, and sends the
reply via stdout. (For the bind request in the test, I ran both cases, one
contacting another server via HTTP to validate the request and one without
contacting another server. In both cases, I encounter the same issues.
However, based on my observation, the issues surface sooner and with less
clients when the code that processes the bind request does not contact the
remote server for validating data. I doubt this has anything to do with the
root cause of the issues.)
Test environments:
After running 3 to 4 client applications for a short while, two search.py
processes are hung on read(0, ?) and slapd stops forking new search.py
processes. However, slapd continues to process new LDAP requests according to
/var/log/ldap.log.
When search.py processes are hung, lsof i :389 shows that there are only 11
processes having access to port 389.
Here is the output from ?ps ef | grep search? showing the hung processes.
[root@fllinux1 openldap]#
[root@fllinux1 openldap]# ps -ef | grep search
ldap 23044 21440 0 13:55 ? 00:00:00 python
/usr/local/etc/openldap/search.py
ldap 23045 21732 0 13:55 ? 00:00:00 python
/usr/local/etc/openldap/search.py
root 23174 21236 0 14:17 pts/15 00:00:00 grep search
Here is the output from ?lsof? output.
[root@fllinux1 openldap]# lsof -i :389
BAD CASES:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
slapd 21422 ldap 7u IPv4 93404 TCP *:ldap (LISTEN)
slapd 21422 ldap 8u IPv4 101998 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45050 (ESTABLISHED)
slapd 21422 ldap 9u IPv4 102005 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45052 (ESTABLISHED)
slapd 21422 ldap 10u IPv4 102083 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45053 (ESTABLISHED)
python 23044 ldap 7u IPv4 93404 TCP *:ldap (LISTEN)
python 23044 ldap 8u IPv4 101998 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45050 (ESTABLISHED)
python 23044 ldap 9u IPv4 102005 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45052 (ESTABLISHED)
python 23045 ldap 7u IPv4 93404 TCP *:ldap (LISTEN)
python 23045 ldap 8u IPv4 101998 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45050 (ESTABLISHED)
python 23045 ldap 9u IPv4 102005 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45052 (ESTABLISHED)
The lsof shows that each search.py process (pid 23044 and pid 23045) has two fds
opened on port 389. Is this right? There should only be one fd on port 389.
In the normal case, there should be only one fd opened on port 389. See lsof
data below.
GOOD CASES:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
slapd 21422 ldap 7u IPv4 93404 TCP *:ldap (LISTEN)
slapd 21422 ldap 8u IPv4 104357 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45109 (ESTABLISHED)
python 23426 ldap 7u IPv4 93404 TCP *:ldap (LISTEN)
python 23426 ldap 8u IPv4 104357 TCP
fllinux1.web.att.com:ldap->ibm-linux-1:45109 (ESTABLISHED)
Here is the output from strace of two hung processes
[root@fllinux1 openldap]#
[root@fllinux1 openldap]# strace -p 23044
Process 23044 attached - interrupt to quit
read(0, <unfinished ...>
Process 23044 detached
[root@fllinux1 openldap]# strace -p 23045
Process 23045 attached - interrupt to quit
read(0, <unfinished ...>
Process 23045 detached
[root@fllinux1 openldap]#
Here is the output from /var/log/ldap.log showing that slapd continues to
process incoming request but it does not want to fork more search.py
application. The ldap.log did not show forking activity. Because I did not see
any new search.py processes and my search.py application log did not show new
activities, I inferred that slapd did not fork new application to handle the new
ldap request.
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: activity on 1 descriptors
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: new connection on 16
Nov 4 15:20:10 fllinux1 slapd[21422]: conn=927 fd=16 ACCEPT from
IP=135.16.76.143:45059 (IP=0.0.0.0:389)
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: added 16r
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: activity on:
Nov 4 15:20:10 fllinux1 slapd[21422]:
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: select: listen=6 active_threads=0
tvp=NULL
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: select: listen=7 active_threads=0
tvp=NULL
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: activity on 1 descriptors
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: activity on:
Nov 4 15:20:10 fllinux1 slapd[21422]: 16r
Nov 4 15:20:10 fllinux1 slapd[21422]:
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: read activity on 16
Nov 4 15:20:10 fllinux1 slapd[21422]: connection_get(16)
Nov 4 15:20:10 fllinux1 slapd[21422]: connection_get(16): got connid=927
Nov 4 15:20:10 fllinux1 slapd[21422]: connection_read(16): checking for input
on id=927
Nov 4 15:20:10 fllinux1 slapd[21422]: ber_get_next on fd 16 failed errno=11
(Resource temporarily unavailable)
Nov 4 15:20:10 fllinux1 slapd[21422]: do_bind
Nov 4 15:20:10 fllinux1 slapd[21422]: >>> dnPrettyNormal:
<cn=admin,ou=people,dc=openldap,dc=com>
Nov 4 15:20:10 fllinux1 slapd[21422]: <<< dnPrettyNormal:
<cn=admin,ou=people,dc=openldap,dc=com>,
<cn=admin,ou=people,dc=openldap,dc=com>
Nov 4 15:20:10 fllinux1 slapd[21422]: do_bind: version=3
dn="cn=admin,ou=people,dc=openldap,dc=com" method=128
Nov 4 15:20:10 fllinux1 slapd[21422]: conn=927 op=0 BIND
dn="cn=admin,ou=people,dc=openldap,dc=com" method=128
Nov 4 15:20:10 fllinux1 slapd[21422]: => access_allowed: auth access to
"cn=admin,ou=people,dc=openldap,dc=com" "entry" requested
Nov 4 15:20:10 fllinux1 slapd[21422]: => access_allowed: backend default auth
access granted to "(anonymous)"
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: select: listen=6 active_threads=0
tvp=NULL
Nov 4 15:20:10 fllinux1 slapd[21422]: daemon: select: listen=7 active_threads=0
tvp=NULL