hi, we got a mailingsystem running backed by 4 OpenLDAP 2.1.27 servers hosting all user data. one is running as master, the other three as slaves. every now and then we see strange errors on our MTAs when they query the slaves. most of the queries work as expected, except queries for 2 or 3 entries return LDAP_BUSY. if i manually issue an ldapsearch on these entries, the server sends me the entry's data, but instead of exiting the search with LDAP_SUCCESS, the server sends the exit code 51 (LDAP_BUSY). every client regards this as a failure and doesn't even bother to parse the returned entry. i turned on debugging to see what's going on. here's the shortened output (i wrapped some lines): [...] Nov 3 16:29:10 galen slapd[31869]: bdb_idl_fetch_key: [e2b5f963] Nov 3 16:29:10 galen slapd[31869]: <= bdb_index_read 2 candidates Nov 3 16:29:10 galen slapd[31869]: bdb_search_candidates: id=2 first=76073 last=76075 Nov 3 16:29:10 galen slapd[31869]: entry_decode: "uid=someone,dc=somewhere,dc=at,dc=." Nov 3 16:29:10 galen slapd[31869]: <= entry_decode(uid=someone,dc=somewhere,dc=at,dc=.) Nov 3 16:29:10 galen slapd[31869]: => send_search_entry: dn="uid=someone,dc=somewhere,dc=at,dc=." Nov 3 16:29:10 galen slapd[31869]: <= send_search_entry Nov 3 16:29:10 galen slapd[31869]: ====> bdb_cache_return_entry_r( 76073 ): created (0) Nov 3 16:29:10 galen slapd[31869]: entry_decode: "uid=someone,dc=somwhere,dc=at,dc=." Nov 3 16:29:10 galen slapd[31869]: <= entry_decode(uid=someone,dc=somwhere,dc=at,dc=.) Nov 3 16:29:10 galen slapd[31869]: ====> bdb_cache_add_entry( 76075 ): "uid=someone,dc=somewhere,dc=at,dc=.": already in dn cache Nov 3 16:29:10 galen slapd[31869]: send_ldap_result: conn=0 op=1 p=3 Nov 3 16:29:10 galen slapd[31869]: send_ldap_result: err=51 matched="" text="ldap server busy" apparently bdb_index_read() found 2 candidates matching the query, both having the same dn. i guess slapd returns LDAP_BUSY, because the call to bdb_cache_add_entry(76075) fails. i did a slapcat of the database, examined the ldif file and found the two entries with the same dn. the entries are identical, except that one of them got a more recent modifyTimestamp and an additional attribute (that one is an exact copy of the entry stored in the master's database). so i guess that something went wrong when the master replicated the updates to this entry. any hints what's gone wrong here? is this a know issue in 2.1 and maybe fixed in 2.2? unfortunetly, the system is in productive use so i can't deploy the latest OpenLDAP releases. but i got a copy of a corrupt database here on my laptop. so if you need some more debugging information i can provide it. tia, tom. -- Thomas "Duke" Hager {duke,hager}@sigsegv.at GPG: 1024D/D27F858C http://www.sigsegv.at/gpg/duke.gpg ================================================================= "Never Underestimate the Power of Stupid People in Large Groups."
Attachment:
signature.asc
Description: This is a digitally signed message part