[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Real idletimeout more than configured idletimeout
Le lundi 07 juillet 2008 Ã 02:32 -0700, Howard Chu a Ãcrit :
> Eric DÃchaux wrote:
> > Dear openldap gurus,
> >
> > I am hitting some strange behavior with the idle sessions timeout
> > feature. In my configuration this timeout is set to 60 seconds on 4
> > slaves that are behind a load balancer. This load balancer times-out
> > idle sessions after 90 seconds, which should be fine. Openldap version
> > is the stable one from Debian Etch r3.
>
> I have no idea what Debian or any other distro packages. You should quote
> specific version numbers for all relevant pieces of software.
Sorry about that. Version is 2.3.30.
I also forgot to mention I am running the whole thing inside a VMware
ESX 301 virtual machine. I don't know if this can have impact.
>
> > I however encounter random connection issues that have been traced to
> > the load balancer timeouting and idle session *before* the ldap slave.
>
> > I have straced the slapd process and I found out the applyed idletimeout
> > was way above the configured one, please check the two following strace
> > output :
> >
> >
> > Output 1
>
> > [ some uninteresting ldap stuff ]
> >
> > futex(0x603428, FUTEX_WAKE, 1) = 1
> > read(12, 0x6f30ff, 8) = -1 EAGAIN (Resource temporarily unavailable)
> > futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > write(5, "0", 1) = 1
> > shutdown(12, 2 /* send and receive */) = 0
> > close(12) = 0
> >
> > Here, we can see 5 select system calls for a real idletimeout is 75
> > seconds instead of 60.
>
> This doesn't really surprise me.
>
Me neither.
> > Output 2
>
> > [ some uninteresting ldap stuff ]
> >
> > futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > write(5, "0", 1) = 1
> > shutdown(12, 2 /* send and receive */) = 0
> > close(12) = 0
> >
> > Here we have 6 select system calls for a real idletimeout of 90 seconds
> > which is enough for the session to expire on the load balancer.
>
> This is rather surprising.
>
> > I have checked the source code and the logic that choose either to
> > idletimeout the session or go into a "SLAP_EVENT_WAIT" (select) call is
> > the following :
> >
> > from server/slap/daemon.c
> >
> >
> > now = slap_get_time();
> >
> > if ( ( global_idletimeout> 0 )&&
> > difftime( last_idle_check +
> > global_idletimeout/SLAPD_IDLE_CHECK_LIMIT, now )< 0 )
> > {
> > connections_timeout_idle( now );
> > last_idle_check = now;
> > }
> >
> >
> > As I understand this, no connection should be tested against the
> > idletimeout before any "event wait loop" takes more time than the
> > idletimeout parameter / 4.
>
> Right, on an otherwise idle server, we don't want to wake up too frequently to
> check for idle connections. It's OK to check a little late, but we don't want
> to wake up much too late, which would often occur if the IDLE_CHECK_LIMIT was
> smaller.
>
> > In my case, I need the "event wait loop" to last more than 15 seconds
> > for connections to be checked against aging.
>
> Basically, yes.
>
> > If I am not mistaken, as the difftime call compares seconds, I need the
> > loop to last a least for 16 seconds for the connections_timeout_idle
> > procedure to be called.
>
> > Am I understanding everything the right way ?
>
> Sounds like it.
>
> > If it is the case, shouldn't the difftime call be tested<= 0 to help
> > idle sessions to be cleaned sonner ?
>
> I don't think it makes much difference in the long run. Whenever you choose an
> idletimeout that is not evenly divisible by 4 (IDLE_CHECK_LIMIT) it's going to
> have extra slop anyway. And none of this explains how your 60 second
> idletimeout allowed an idle connection to continue for 90 seconds. Frankly I
> have no idea why that would be.
>
ï
I believe it is possible when the main event loop takes less than 1
second, not counting the select timeout, when an idle check was done on
the previous loop. If this condition happens,
ïdifftime(last_idle_check+global_idletimeout/SLAPD_IDLE_CHECK_LIMIT,
now) will return 0 and no connection aging will be checked.
> In the meantime, on an idle server, I don't see any urgency in closing idle
> connections, because in this case there's no danger of resource starvation. On
> the other hand, for an active server, the event loop is going to be waking up
> more frequently anyway due to real activity, in which case the idle checks
> will happen more frequently. So as the server gets busier, the actual
> idletimeouts will get much closer to the configured value.
>
Got it.
It seems there is no simple workaround on ldap side for my issue.
I will search for other options.
Many thanks for your help.
--
Eric DÃchaux
IngÃnieur KÃbabiste
Sun Microsystems Services France