[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: slapd crash in back-bdb/ctxcsn.c (ITS#3301)
rhafer@suse.de wrote:
>Full_Name: Ralf Haferkamp
>Version: 2.2.15
>OS: Linux (Kernel 2.6)
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (212.95.102.25)
>
>
>I did run a slightly modified version of "test008-concurrency" on a test server
>with around 10000 entries. The test runs a many add, read and modify (I adapted
>slapd-modrdn to do modifies instead of modrdn) operations in parallel. After a
>short while the server crashed. I was able to produce the following backtrace:
>
>#0 0x080c8a09 in bdb_csn_commit (op=0x44219800, rs=0x44088870, tid=0x4623df90,
>
> ei=0x81842a8, suffix_ei=0x440884d0, ctxcsn_e=0x440884cc,
>ctxcsn_added=0x440884c8,
> locker=2147502168) at ctxcsn.c:62
> bdb = (struct bdb_info *) 0x816aad0
> ctxcsn_ei = (EntryInfo *) 0x0
> ctxcsn_lock = {off = 0, ndx = 938, gen = 135781712, mode = 1075070032}
> max_committed_csn = {bv_len = 135421424, bv_val = 0x4620e4d0 "\017"}
> suffix_lock = {off = 1176550456, ndx = 0, gen = 1141408840, mode =
>135075791}
> rc = -30995
> ret = 10427
> ctxcsn_id = 1176560848
> e = (Entry *) 0x46237f18
> textbuf = "....."
> textlen = 256
> eip = (EntryInfo *) 0x0
>#1 0x080c4d7e in bdb_add (op=0x44219800, rs=0x44088870) at add.c:441
> bdb = (struct bdb_info *) 0x816aad0
> pdn = {bv_len = 11, bv_val = 0x46243feb "o=customers"}
> p = (Entry *) 0x0
> ei = (EntryInfo *) 0x81842a8
> textbuf = "....."
> textlen = 256
> children = (AttributeDescription *) 0x81298b0
> entry = (AttributeDescription *) 0x8129720
> ltid = (DB_TXN *) 0x4623df90
> lt2 = (DB_TXN *) 0x462573a0
> opinfo = {boi_bdb = 0x816a9d0, boi_txn = 0x4623df90, boi_lock = {off =
>16,
> ndx = 1077478705, gen = 1176560872, mode = 1074201072}, boi_err = 0,
> boi_locker = 2147502168, boi_acl_cache = 0}
> subentry = 0
> locker = 2147502168
> lock = {off = 298840, ndx = 386, gen = 3273, mode = DB_LOCK_READ}
> num_retries = 0
> ps_list = (Operation *) 0x10
> rc = 1176502288
> suffix_ei = (EntryInfo *) 0x0
> ctxcsn_e = (Entry *) 0x440884e8
> ctxcsn_added = 0
> postread_ctrl = (LDAPControl **) 0x0
> ctrls = {0x0, 0x4043f4c0, 0x4043f4c0, 0x400, 0x18, 0x4620e4e0}
> num_ctrls = 0
>#2 0x0806aad2 in do_add (op=0x44219800, rs=0x44088870) at add.c:318
> update = 0
> textbuf = "....."
> textlen = 256
> cb = {sc_next = 0x0, sc_response = 0x807106a <slap_replog_cb>,
>sc_cleanup = 0,
> sc_private = 0x0}
> repl_user = 0
> ber = (BerElement *) 0x4623df10
> last = 0x46230d6f ""
> dn = {bv_len = 30, bv_val = 0x46230b32 "cn=James A Jones
>5,o=customers"}
> len = 36
> tag = 4294967295
> e = (Entry *) 0x4620bc38
> modlist = (Modifications *) 0x46265498
> modtail = (Modifications **) 0x4625cbc0
> tmp = {sml_mod = {sm_op = 1141409752, sm_desc = 0x40324eb0, sm_type =
>{bv_len = 15,
> bv_val = 0x46230d4d "telephoneNumber"}, sm_values = 0x46225a90, sm_nvalues
>= 0x0},
> sml_next = 0x0}
> manageDSAit = 0
>#3 0x0806445e in connection_operation (ctx=0x44088900, arg_v=0x44219800)
> at connection.c:1048
> rc = 80
> op = (Operation *) 0x44219800
> rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err = 0,
>sr_matched = 0x0,
> sr_text = 0x0, sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_sasl = {r_sasldata =
>0x0},
> sru_extended = {r_rspoid = 0x0, r_rspdata = 0x0}, sru_search = {r_entry =
>0x0,
> r_attrs = 0x0, r_nentries = 0, r_v2ref = 0x0}}, sr_flags = 0}
> tag = 104
> oldtag = 104
> conn = (Connection *) 0x42d274bc
> memctx = (void *) 0x819c1e0
> memctx_null = (void *) 0x0
> memsiz = 1048576
>#4 0x4003166d in ldap_int_thread_pool_wrapper (xpool=0x812b520) at tpool.c:467
> pool = (struct ldap_int_thread_pool_s *) 0x812b520
> ctx = (ldap_int_thread_ctx_t *) 0x8199c70
> ltc_key = {{ltk_key = 0x80a3de8, ltk_data = 0x819c1e0,
> ltk_free = 0x80a3db8 <sl_mem_destroy>}, {ltk_key = 0x817d210, ltk_data =
>0xe,
> ltk_free = 0x80c786f <bdb_locker_id_free>}, {ltk_key = 0x817d211, ltk_data =
>0x81a1df0,
> ltk_free = 0x80c76db <bdb_txn_free>}, {ltk_key = 0x0, ltk_data = 0x0,
> ltk_free = 0} <repeats 29 times>}
> tid = 1141410736
> i = 391
> keyslot = 391
> hash = 391
>#5 0x403239ed in start_thread () from /lib/tls/libpthread.so.0
>No symbol table info available.
>#6 0x403e59ca in clone () from /lib/tls/libc.so.6
>No symbol table info available.
>
>So it looks the like dn2entry in back-bdb/ctxcsn.c:62 is returning
>DB_LOCK_DEADLOCK (rc = -30995 in the backtrace) and therefore ctxcsn_ei is still
>NULL.
>
>Unfortunately I am not very familar with this code so I don't know how to
>correctly fix it, but returning BDB_CSN_RETRY directly after the dn2entry call
>if rc==DB_LOCK_DEADLOCK seems to fix the problem.
>
>
>
>
>
>
This is now patched in HEAD, please test.
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support