[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Problems with replicas
I have one master server, and two slave servers.
Master config file:
----- s n i p -----
replogfile /var/lib/openldap/replication.log
replica host=_SLAVE1_:389 binddn="_REPLICA-DN_"
bindmethod=simple credentials=_SECRET_
replica host=_SLAVE2_:389 binddn="_REPLICA-DN_"
bindmethod=simple credentials=_SECRET_
Slave config file (EXACTLY the same on both slaves!):
----- s n i p -----
loglevel 256 # Good longterm logging/debugging...
include /etc/openldap/slapd.includes
schemacheck on
pidfile /var/run/slapd.pid
database ldbm
directory "/var/lib/openldap"
cachesize 10000
dbcachesize 50000
updatedn "_REPLICA-DN_"
lastmod on
sizelimit 1500
index uid,mail,mailalternateaddress,mailforwardingaddress eq
suffix "c=SE"
include /etc/openldap/slapd.access
----- s n i p -----
Censoring explanation:
_SLAVE[12]_ is the IP address of the slave servers.
_REPLICA-DN_ is same (exactly! Doublechecked) on both slaves
_SECRET_ is same (exactly! Doublechecked) on both slaves, in cleartext.
Doing a write/modify/delete on the master, propagates the changes to
_SLAVE2_ as it should, but NOT to _SLAVE1_!!!
Somethimes, if I wait long enough (~ 10-15 minutes) it finaly goes
through, but not always...
To try to track this problem down, I shutdown all of the slurpd
processes on the master, added a object and then did a little (well,
not so little :) one-liner to search all three hosts.
----- s n i p -----
_MASTERIP_: _ADDED-OBJECT_
_SLAVE1_:
_SLAVE2_:
----- s n i p -----
Censoring explanation:
_MASTERIP_ is the IP address to the master server
_ADDED-OBJECT_ is the DN of the object to add
Executing
time slurpd -d 255 -o -r /var/lib/openldap/replication.log' 2>&1 | tee /tmp/out
will give this after about 15 seconds (not before):
----- s n i p -----
MASTERIP: _ADDED-OBJECT_
_SLAVE1_:
_SLAVE2_: _ADDED-OBJECT_
----- s n i p -----
After about 2 minutes, I shut the slurpd down, and executed it again,
to make it only propagate the changes to _SLAVE1_...
This is what happens in the file '/tmp/out' while waiting for the
propagation (censored ofcource :):
It seems to hang at the ldap_send_server_request()...
----- s n i p -----
Config: opening config file "/etc/openldap/slapd.conf"
Config: (loglevel 2048)
Config: (include /etc/openldap/slapd.includes)
Config: (include /etc/openldap/slapd.access)
Config: (schemacheck on)
Config: (pidfile /var/run/slapd.pid)
Config: (database ldbm)
Config: (suffix "c=SE")
Config: (directory "/var/lib/openldap")
Config: (cachesize 10000)
Config: (dbcachesize 1000000)
Config: (dbcachenowsync )
Config: (lastmod on)
Config: (sizelimit 1500)
Config: (index uid,mail,mailalternateaddress,mailforwardingaddress eq)
Config: (replogfile /var/lib/openldap/replication.log)
Config: (replica host=_SLAVE1_:389 binddn="cn=admin,ou=Users,o=Air2Net,c=se" bindmethod=simple credentials=_SECRET_)
Config: ** successfully added replica "_SLAVE1_:389"
Config: (replica host=_SLAVE2_:389 binddn="cn=admin,ou=Users,o=Air2Net,c=se" bindmethod=simple credentials=_SECRET_)
Config: ** successfully added replica "_SLAVE2_:389"
Config: ** configuration file successfully read and parsed
Retrieved state information for _SLAVE1_:389 (timestamp 974384051.0)
Retrieved state information for _SLAVE2_:389 (timestamp 974384367.0)
begin replication thread for _SLAVE1_:389
Replica _SLAVE1_:389, skip repl record for _ADDED-OBJECT_ (old)
Open connection to _SLAVE1_:389
ldap_open
begin replication thread for _SLAVE2_:389
Replica _SLAVE2_:389, skip repl record for _ADDED-OBJECT_ (old)
Replica _SLAVE2_:389, skip repl record for _ADDED-OBJECT_ (old)
end replication thread for _SLAVE2_:389
ldap_init
ldap_delayed_open
open_ldap_connection
ldap_connect_to_host: _SLAVE1_:389
sd 6 connected to: _SLAVE1_
ldap_open successful, ld_host is (null)
bind to _SLAVE1_:389 as _REPLICA-DN_ (simple)
ldap_simple_bind_s
ldap_simple_bind
ldap_send_initial_request
ldap_delayed_open
ldap_send_server_request
ldap_result
wait4msg (infinite timeout)
** Connections:
* host: _SLAVE1_ port: 389 (default)
refcnt: 2 status: Connected
last used: Thu Nov 16 15:23:25 2000
** Outstanding Requests:
* msgid 1, origid 1, status InProgress
outstanding referrals 0, parent count 0
** Response Queue:
Empty
do_ldap_select
read1msg
got result msgid 1, original id 1
read1msg: 0 new referrals
request 1 done
res_errno: 0, res_error: <>, res_matched: <>
ldap_free_request (origid 1, msgid 1)
ldap_free_connection
ldap_free_connection: refcnt 1
ldap_result2error
ldap_msgfree
replica _SLAVE1_:389 - add dn "_ADDED-OBJECT_"
ldap_add
ldap_send_initial_request
ldap_delayed_open
ldap_send_server_request
----- s n i p -----
After 13m10.520s, slurpd succeed. This is the 'followup' on the debug
output:
This part takes about 20-30 seconds 'only'...
----- s n i p -----
ldap_result
wait4msg (infinite timeout)
** Connections:
* host: _SLAVE1_ port: 389 (default)
refcnt: 2 status: Connected
last used: Thu Nov 16 15:23:25 2000
** Outstanding Requests:
* msgid 2, origid 2, status InProgress
outstanding referrals 0, parent count 0
** Response Queue:
Empty
do_ldap_select
read1msg
got result msgid 2, original id 2
read1msg: 0 new referrals
request 2 done
res_errno: 0, res_error: <>, res_matched: <>
ldap_free_request (origid 2, msgid 2)
ldap_free_connection
ldap_free_connection: refcnt 1
ldap_result2error
ldap_msgfree
end replication thread for _SLAVE1_:389
slurpd: terminating normally
Processing in one-shot mode:
2 total replication records in file,
2 replication records to process.
real 13m10.520s
user 0m0.540s
sys 0m0.160s
----- s n i p -----
Since both the master, as the two slaves are using the same home-built
Debian package of OpenLDAP 1.2.11 (to use SleepyCAT db instead of the
default) I'm more inclined to belive that this somehow have to do with
the OS (same version of Debian GNU/Linux on all three machines) or the
network (the master and SLAVE1 are on the same localnetwork, while the
SLAVE2 is some distance away).
Anyone have any idea where to look further for the problem I'm having?