[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#4589) syncrepl DELs sometimes don't propagate



richton@nbcs.rutgers.edu wrote:
> Full_Name: Aaron Richton
> Version: 2.3.24
> OS: Solaris 9
> URL: 
> Submission from: (NULL) (128.6.31.135)
>
>
> Something not quite right with session log or nonpresent list or.......
>
> Pretty much the same as the last time I played the syncrepl game, except this
> time instead of entries missing from the slave, there are excessive entries on
> the slave. Here's an example, although it's hardly illuminating:
>
> [...]
> Jun  8 15:18:09 master.rutgers.edu slapd[29239]: [ID 379477 local4.debug]
> conn=361133 op=2 DEL dn="uid=VICTIM,ou=People,dc=rci,dc=rutgers,dc=edu"
> [...]
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 848112 local4.debug]
> conn=361315 fd=100 ACCEPT from IP=192.168.4.67:53492 (IP=0.0.0.0:389)
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 560212 local4.debug]
> conn=361315 op=0 STARTTLS
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 875301 local4.debug]
> conn=361315 op=0 RESULT oid= err=0 text=
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 105384 local4.debug]
> conn=361315 fd=100 TLS established tls_ssf=256 ssf=256
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 215403 local4.debug]
> conn=361315 op=1 BIND dn="cn=readEverything,dc=rci,dc=rutgers,dc=edu"
> method=128
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 600343 local4.debug]
> conn=361315 op=1 BIND dn="cn=readEverything,dc=rci,dc=rutgers,dc=edu"
> mech=SIMPLE ssf=0
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 588225 local4.debug]
> conn=361315 op=1 RESULT tag=97 err=0 text=
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 469902 local4.debug]
> conn=361315 op=2 SRCH base="dc=rci,dc=rutgers,dc=edu" scope=2 deref=0
> filter="(objectClass=*)"
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 744844 local4.debug]
> conn=361315 op=2 SRCH attr=* +
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 545199 local4.debug] Entry
> uid=BYSTANDER,ou=People,dc=rci,dc=rutgers,dc=edu CSN
> 20060608190309Z#000000#00#000000 matches ctx 20060608190309Z#000000#00#000000
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 167594 local4.debug]
> conn=361315 op=2 SEARCH RESULT tag=101 err=0 nentries=0 text=
> Jun  8 15:22:44 slave5.rutgers.edu slapd[12966]: [ID 496601 local4.debug]
> do_syncrep2: LDAP_RES_SEARCH_RESULT
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 218904 local4.debug]
> conn=361315 op=3 UNBIND
> Jun  8 15:22:44 master.rutgers.edu slapd[29239]: [ID 952275 local4.debug]
> conn=361315 fd=100 closed
>
> and at the end of this, slave5 still has uid=VICTIM, which is undesirable.
> slave5 is on "stats sync" debug. I think I should see a del_nonpresent in there,
> no? So I'd assume the master is telling it something wrong. Not sure what else I
> can do. I have:
>
> -d -1 from slave5 (HUGE, running for weeks. I can try to slim it down.)
> "stats sync" syslogs from all slaves
> "stats sync" syslogs from the master
>
> Everything's still running as it was/is, so if you want to dbx some syncprov
> struct or something like that, I can do that. Of course it might not be in scope
> still, but I can always delete more! (Although I'm fairly certain this isn't
> 100% of the time. I'll look harder. Might be like last time, need a certain
> number of ops per sync or similar.)
>
>
>
>
>   
It would probably help to just look at the logs for all of the write 
operations that occurred since the last refresh, in the refresh pass 
that is missing the Delete. If re-issuing the same sequence of 
operations always reproduces the bug, then we'll have something to go on.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/