[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
slapd stops processing thread jobs (ITS#2774)
Full_Name: John Matthews
Version: 2.1.22
OS: Linux 2.4.20
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.12.97.112)
I have been investigating a slapd problem where after a few days it will stop
responding to new client requests. Rather oddly the problem always occurs at
about 1am on one of our LDAP servers. Using a debug level that includes
LDAP_DEBUG_CONNS and with some extra debug output in connection.c (see below) I
have tracked the problem down into the threading code tpool.c.
static int connection_op_activate( Operation *op )
{
...
status = ldap_pvt_thread_pool_submit( &connection_pool,
connection_operation, (void *) op );
Debug( LDAP_DEBUG_CONNS, "connection_activate %d %s %d\n", op->o_conn->c_connid,
connection_state2str(op->o_conn->c_conn_state), status );
static void *
connection_operation( void *ctx, void *arg_v )
{
...
Debug( LDAP_DEBUG_CONNS, "connection(%d) op start %d %d\n", conn->c_connid,
conn->c_n_ops_executing, conn->c_n_ops_completed );
ldap_pvt_thread_mutex_lock( &num_ops_mutex );
static int
connection_input(
Connection *conn
)
{
...
/* Don't process requests when the conn is in the middle of a
* Bind, or if it's closing. Also, don't let any single conn
* use up all the available threads, and don't execute if we're
* currently blocked on output. And don't execute if there are
* already pending ops, let them go first.
*/
if ( conn->c_conn_state == SLAP_C_BINDING
|| conn->c_conn_state == SLAP_C_CLOSING
|| conn->c_n_ops_executing >= connection_pool_max/2
|| conn->c_n_ops_pending
|| conn->c_writewaiter)
{
int max = conn->c_dn.bv_len ? slap_conn_max_pending_auth
: slap_conn_max_pending;
#ifdef NEW_LOGGING
LDAP_LOG( CONNECTION, INFO,
"connection_input: conn %lu deferring operation\n",
conn->c_connid, 0, 0 );
#else
Debug( LDAP_DEBUG_CONNS, "deferring operation %d %d %d\n", conn->c_conn_state
== SLAP_C_BINDING, conn->c_conn_state == SLAP_C_CLOSING, conn->c_writewaiter );
#endif
I get the following debug output when the problem ocurrs:
Oct 17 00:07:47 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:07:47 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:07:47 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:07:47 bizserve slapd[14564]: 14r
Oct 17 00:07:47 bizserve slapd[14564]:
Oct 17 00:07:47 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:07:47 bizserve slapd[14564]: connection_get(14): got connid=1641
Oct 17 00:07:47 bizserve slapd[14564]: connection_read(14): checking for input
on id=1641
Oct 17 00:07:47 bizserve slapd[14564]: connection_activate 1641 0
Oct 17 00:07:47 bizserve slapd[14564]: ber_get_next on fd 14 failed errno=11
(Resource temporarily unavailable)
Oct 17 00:07:47 bizserve slapd[14564]: connection(1641) op start 1 8
Oct 17 00:07:47 bizserve slapd[14564]: connection(1641): do_search
Oct 17 00:07:47 bizserve slapd[14564]: <= bdb_equality_candidates: (objectClass)
index_param failed (18)
Oct 17 00:07:47 bizserve slapd[14564]: <= bdb_equality_candidates: (objectClass)
index_param failed (18)
Oct 17 00:07:47 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:07:47 bizserve slapd[14564]: connection(1641) op finish 0 9
Oct 17 00:09:28 bizserve slapd[14564]: daemon: activity on 2 descriptors
Oct 17 00:09:28 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:09:28 bizserve slapd[14564]: 10r
Oct 17 00:09:28 bizserve slapd[14564]: 14r
Oct 17 00:09:28 bizserve slapd[14564]:
Oct 17 00:09:28 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:09:28 bizserve slapd[14564]: connection_get(10): got connid=1640
Oct 17 00:09:28 bizserve slapd[14564]: connection_read(10): checking for input
on id=1640
Oct 17 00:09:29 bizserve slapd[14564]: ber_get_next on fd 10 failed errno=0
(Success)
Oct 17 00:09:29 bizserve slapd[14564]: connection_read(10): input error=-2
id=1640, closing.
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing: readying conn=1640
sd=10 for close
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing 1640
Oct 17 00:09:29 bizserve slapd[14564]: connection_close: conn=1640 sd=10
Oct 17 00:09:29 bizserve slapd[14564]: daemon: removing 10
Oct 17 00:09:29 bizserve slapd[14564]: connection_destroy -1
Oct 17 00:09:29 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:09:29 bizserve slapd[14564]: connection_get(14): got connid=1641
Oct 17 00:09:29 bizserve slapd[14564]: connection_read(14): checking for input
on id=1641
Oct 17 00:09:29 bizserve slapd[14564]: ber_get_next on fd 14 failed errno=0
(Success)
Oct 17 00:09:29 bizserve slapd[14564]: connection_read(14): input error=-2
id=1641, closing.
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing: readying conn=1641
sd=14 for close
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing 1641
Oct 17 00:09:29 bizserve slapd[14564]: connection_close: conn=1641 sd=14
Oct 17 00:09:29 bizserve slapd[14564]: daemon: removing 14
Oct 17 00:09:29 bizserve slapd[14564]: connection_destroy -1
Oct 17 00:09:29 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:09:29 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:09:29 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:11:08 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:11:08 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:11:08 bizserve slapd[14564]: 15r
Oct 17 00:11:08 bizserve slapd[14564]: daemon: read activity on 15
Oct 17 00:11:08 bizserve slapd[14564]: connection_get(15): got connid=1642
Oct 17 00:11:08 bizserve slapd[14564]: connection_read(15): checking for input
on id=1642
Oct 17 00:11:08 bizserve slapd[14564]: ber_get_next on fd 15 failed errno=0
(Success)
Oct 17 00:11:08 bizserve slapd[14564]: connection_read(15): input error=-2
id=1642, closing.
Oct 17 00:11:08 bizserve slapd[14564]: connection_closing: readying conn=1642
sd=15 for close
Oct 17 00:11:08 bizserve slapd[14564]: connection_closing 1642
Oct 17 00:11:08 bizserve slapd[14564]: connection_close: conn=1642 sd=15
Oct 17 00:11:08 bizserve slapd[14564]: daemon: removing 15
Oct 17 00:11:08 bizserve slapd[14564]: connection_destroy -1
Oct 17 00:11:08 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:11:08 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:11:08 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:25:20 bizserve slapd[14564]: daemon: new connection on 10
Oct 17 00:25:20 bizserve slapd[14564]: connection_init 1643
Oct 17 00:25:20 bizserve slapd[14564]: daemon: added 10r
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:25:20 bizserve slapd[14564]:
Oct 17 00:25:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:25:20 bizserve slapd[14564]: 10r
Oct 17 00:25:20 bizserve slapd[14564]:
Oct 17 00:25:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:25:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:25:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:25:20 bizserve slapd[14564]: connection_activate 1643 B 0
Oct 17 00:25:20 bizserve slapd[14564]: ber_get_next on fd 10 failed errno=11
(Resource temporarily unavailable)
Oct 17 00:25:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:26:20 bizserve slapd[14564]: 10r
Oct 17 00:26:20 bizserve slapd[14564]:
Oct 17 00:26:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:26:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:26:20 bizserve slapd[14564]: deferring operation 1 0 0
Oct 17 00:26:20 bizserve slapd[14564]: Pending 1 1
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:26:20 bizserve slapd[14564]: 10r
Oct 17 00:26:20 bizserve slapd[14564]:
Oct 17 00:26:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:26:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:26:20 bizserve slapd[14564]: deferring operation 1 0 0
Oct 17 00:26:20 bizserve slapd[14564]: Pending 2 1
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:26:20 bizserve slapd[14564]: 10r
Oct 17 00:26:20 bizserve slapd[14564]:
Oct 17 00:26:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:26:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:26:20 bizserve slapd[14564]: ber_get_next on fd 10 failed errno=0
(Success)
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): input error=-2
id=1643, closing.
Oct 17 00:26:20 bizserve slapd[14564]: connection_closing: readying conn=1643
sd=10 for close
Oct 17 00:26:20 bizserve slapd[14564]: connection_closing 1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_close: deferring conn=1643
sd=10
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:27:24 bizserve slapd[14564]: daemon: new connection on 14
Oct 17 00:27:24 bizserve slapd[14564]: connection_init 1644
Oct 17 00:27:24 bizserve slapd[14564]: daemon: added 14r
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:27:24 bizserve slapd[14564]:
Oct 17 00:27:24 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:27:24 bizserve slapd[14564]: 14r
Oct 17 00:27:24 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:27:24 bizserve slapd[14564]: connection_get(14): got connid=1644
Oct 17 00:27:24 bizserve slapd[14564]: connection_read(14): checking for input
on id=1644
Oct 17 00:27:24 bizserve slapd[14564]: connection_activate 1644 B 0
Oct 17 00:27:24 bizserve slapd[14564]: ber_get_next on fd 14 failed errno=11
(Resource temporarily unavailable)
Oct 17 00:27:24 bizserve slapd[14564]: daemon: select: listen=6 active_threads=2
tvp=NULL
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:28:24 bizserve slapd[14564]: 14r
Oct 17 00:28:24 bizserve slapd[14564]:
Oct 17 00:28:24 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:28:24 bizserve slapd[14564]: connection_get(14): got connid=1644
Oct 17 00:28:24 bizserve slapd[14564]: connection_read(14): checking for input
on id=1644
Oct 17 00:28:24 bizserve slapd[14564]: deferring operation 1 0 0
Oct 17 00:28:24 bizserve slapd[14564]: Pending 1 1
Oct 17 00:28:24 bizserve slapd[14564]: daemon: select: listen=6 active_threads=2
tvp=NULL
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:28:24 bizserve slapd[14564]: 14r
**************
Notice how each new operation is deferred in connection_input() because
conn->c_conn_state is "stuck" in a SLAP_C_BINDING state. The number of
"active_threads" in the debug ouput increases for each new request,
connection_op_activate() never gets called again and of course the LDAP clients
never get any further response from the ldap server .. until slapd is
restarted.