[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: multi / standby master: incomplete replication after downtime (?) [SOLVED]
On 18.08.2010 17:16, Rein Tollevik wrote:
On 08/18/2010 04:28 PM, Elmar Marschke wrote:
Here's the logfile of MASTER:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
===_BEGIN_CHANGES_WHILE_BOTH_UP_===
Aug 18 15:30:04 ldapmaster slapd[8017]: slap_queue_csn: queing
0x7f00f317b580 20100818133004.663851Z#000000#000#000000
Your ServerID setting is incorrect, and you are using the default
ServerID=0 on both systems. The ServerID is included in the csn value,
the second to last number (#000# here). Ensure that the ServerID URL is
the exact hostname of the systems it runs on, or that slapd is able to
select the correct sid based on its -h listener argument.
Start slapd with -d config and verify that both logs lines with
differing SID= values.
Rein
Hi all,
thanks to your hints i think i got it working now :)
Several details had to be changed. Perhaps someone has similar problems,
so in the following i give a detailed description what had to be done in
my case. Everything was executed on two freshly out-of-the-box installed
openSuSE 11.3 x86_64 with SuSE-shipped openldap 2.4.21.
First; according the inaccurate time on my testmachines: i deleted the
openSuSE ntp.conf and additionally now i use other ntp servers as
timesync source.
Complete ntp.conf on ldapmaster and ldapslave now is:
-----------------------------------------------------
server ptbtime1.ptb.de prefer
server ptbtime2.ptb.de
tinker panic 0
driftfile /var/lib/ntp/drift/ntp.drift
This results in less different offset values:
---------------------------------------------
ldapmaster:/etc/openldap # ntpq -p; ssh ldapslave ntpq -p
remote refid offset
========================================
*ptbtime1.ptb.de .PTB. -1.218
+ptbtime2.ptb.de .PTB. -1.305
remote refid offset
==========================================
*ptbtime1.ptb.de .PTB. -0.381
+ptbtime2.ptb.de .PTB. 0.203
ldapmaster:/etc/openldap #
Second problem: the missing or wrong serverID in the csn values. To make
clear what obviously helped i will describe how i create(d) my slapd.d/
online configuration. (My original slapd.conf was taken from the
"openldap 2.4" book by Oliver Liebel & John Martin Ungar -- thanks
Oliver :)!
------------------------------------------------------------
ldapmaster:/etc/openldap # slaptest -f /etc/openldap/slapd.conf -F
/etc/openldap/slapd.d/
hdb_db_open: database "dc=local,dc=site":
db_open(/var/lib/ldap//id2entry.bdb) failed: No such file or directory (2).
backend_startup_one (type=hdb, suffix="dc=local,dc=site"): bi_db_open
failed! (2)
slap_startup failed (test would succeed using the -u switch)
------------------------------------------------------------
The following rcldap start every time resulted in:
--------------------------------------------------
ldapmaster:/etc/openldap # rcldap start
Starting ldap-serverstartproc: exit status of parent of
/usr/lib/openldap/slapd: 1
failed
-------------------------------------
and in /var/log/messages was written:
--------------------------------------
Aug 26 15:47:22 ldapmaster slapd[7805]: @(#) $OpenLDAP: slapd 2.4.21
(Jul 5 2010 13:35:22)
$#012#011abuild@build16:/usr/src/packages/BUILD/openldap-2.4.21/servers/slapd
Aug 26 15:47:22 ldapmaster slapd[7805]: olcSyncrepl: value #0:
<olcSyncrepl> invalid URL
Aug 26 15:47:22 ldapmaster slapd[7805]: config error processing
olcDatabase={0}config,cn=config: <olcSyncrepl> invalid URL
Aug 26 15:47:22 ldapmaster slapd[7805]: slapd stopped.
Aug 26 15:47:22 ldapmaster slapd[7805]: connections_destroy: nothing to
destroy.
I greped for "olcSyncrepl" in slapd.d:
---------------------------------------
ldapmaster:/etc/openldap # grep -r olcSyncrepl slapd.d/
slapd.d/cn=config/cn=schema.ldif:olcAttributeTypes: ( OLcfgDbAt:0.11
NAME 'olcSyncrepl' EQUALITY caseIgnoreMatc
slapd.d/cn=config/cn=schema.ldif: olcSizeLimit $ olcSyncUseSubentry $
olcSyncrepl $ olcTimeLimit $ olcUpdateDN
slapd.d/cn=config/olcDatabase={0}config.ldif:olcSyncrepl: rid=003
provider=ldap://ldapmaster.local.site uri="" bindmethod=s
slapd.d/cn=config/olcDatabase={0}config.ldif:olcSyncrepl: rid=004
provider=ldap://ldapslave.local.site uri="" bindmethod=si
slapd.d/cn=config/olcDatabase={1}hdb.ldif:olcSyncrepl: rid=001
provider=ldap://ldapmaster.local.site uri="" bindmethod=s
slapd.d/cn=config/olcDatabase={1}hdb.ldif:olcSyncrepl: rid=002
provider=ldap://ldapslave.local.site uri="" bindmethod=si
ldapmaster:/etc/openldap #
Some internet research told me, that the empty uri="" should be the
problem, and that it would help to remove it. BEFORE i always removed
it; and right; starting of openldap worked then. BUT apparently this
also leads to the missing serverID in my csn values (respectively that
they all had default "000"). NOW i changed those config.ldif and
hdb.ldif files from slapd.d (in which grep found the string
"olcSyncrepl"), to give uri an appropriate value.
Open each file, search for "uri". It's found two times in every file.
Before each occurence there's a variable "provider", which is set to
something. For example in slapd.d/cn=config/olcDatabase={0}config.ldif :
provider=ldap://ldapmaster.local.site uri=""
and
provider=ldap://ldapslave.local.site uri=""
The value of provider ALSO has to be put into uri, that afterwards it
looks like:
provider=ldap://ldapmaster.local.site uri="ldap://ldapmaster.local.site"
and
provider=ldap://ldapslave.local.site uri="ldap://ldapslave.local.site"
I did it on every machine (no file copy from one to another).
After that; "rcldap start" (also) works without problems.
But that still was not enough...
Third thing to do: additionally on each machine make slapd start with
"-h" parameter correctly set; like Rein and Jonathan wrote. According to
the SuSE-way of configuration (like it or not, but in this case i don't
have a choice ;)) this can be done in /etc/sysconfig/openldap:
set OPENLDAP_SLAPD_PARAMS correctly on every machine; e.g. on master:
OPENLDAP_SLAPD_PARAMS="-h ldap://ldapmaster.local.site"
And, by the way, to make sure that ONLY online configuration style
(slapd.d) is used; one can set OPENLDAP_CONFIG_BACKEND from "" to:
OPENLDAP_CONFIG_BACKEND="ldap"
This all together seems to solve the problem. Now LDAP-Objects can be
altered, removed and added on one of both machines while the other one
is down; and all changes are replicated as soon as the "downed" machine
comes back up again. (At least in my tests until now ;))!
Thanks for your time, and best regards..
elmar