[Date Prev][Date Next] [Chronological] [Thread] [Top]

Problem with new writetimeout



Hi everyone,

I have noticed a problem regarding the new writetimeout feature in OpenLDAP 2.4.17. I have a setup with one master and four replicas running on Solaris 10, using the hdb-backend with BDB 4.4 from the opencsw repository. Clients are Sun, Linux and Mac boxes in every possible variation.

I updated my servers last Friday from version 2.4.15 and now some cronjobs accessing the directory were sporadically failing over the weekend.

I monitored the server logs (log level 'stats') while running the involved scripts repeatedly and got something like the following results after fail-runs on all servers:
(IPs and DNs altered)

Jul 20 11:42:43 ldapserver slapd[9053]: [ID 848112 local4.debug] conn=12479 fd=79 ACCEPT from IP=192.168.1.1:50210 (IP=0.0.0.0:389)
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 270379 local4.debug] conn=12479 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 560212 local4.debug] conn=12479 op=0 STARTTLS
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 875301 local4.debug] conn=12479 op=0 RESULT oid= err=0 text=
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 105384 local4.debug] conn=12479 fd=79 TLS established tls_ssf=256 ssf=256
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 215403 local4.debug] conn=12479 op=1 BIND dn="uid=dummyuser,ou=System,dc=example,dc=com" method=128
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 600343 local4.debug] conn=12479 op=1 BIND dn="uid=dummyuser,ou=System,dc=example,dc=com" mech=SIMPLE ssf=0
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 588225 local4.debug] conn=12479 op=1 RESULT tag=97 err=0 text=
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 469902 local4.debug] conn=12479 op=2 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=2 filter="(objectClass=posixAccount)"
Jul 20 11:42:43 ldapserver slapd[9053]: [ID 744844 local4.debug] conn=12479 op=2 SRCH attr=uid userpassword uidnumber gidnumber gecos homedirectory loginshell
Jul 20 11:43:00 ldapserver slapd[9053]: [ID 485650 local4.debug] conn=12479 fd=79 closed (writetimeout)


I don't have the writetimeout keyword configured on any of the boxes and the affected script doesn't do any writes anyway. Also, the problem only seems to arise if the client takes a while to process a search result. The failing scripts are Net::LDAP based Perl scripts running on some old SPARC boxes, so it took them up to half a minute and more to complete.

Setting writetimeout to a high enough value seems solve the problem, but referring to the docs, this shouldn't happen with the keyword unset or
set to 0.

Is this a bug or did I miss something? Did anyone else encounter this so far?


Regards,
Christian Manal