[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Delta-syncrepl and latency



On 5/9/07, Quanah Gibson-Mount <quanah@zimbra.com> wrote:
[snip]

Well, things seem fine from the accesslog DB side. On my servers @ Stanford, the largest latency I've seen is about 7 seconds. I wonder if something is slow on your network.

--Quanah


I set the loglevel to -1 on the consumer and ran further test:

The provider was idle for about 1 hour 45 min.
At 20:29:50 I ran a batch of 100 sequential writes on the provider.
Nothing happened on the consumer until 20:45:03. Here are the logs
from the consumer:

May  9 20:40:04 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:40:04 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on:
May  9 20:45:03 info-ldap-006 slapd[32288]:  13r
May  9 20:45:03 info-ldap-006 slapd[32288]:
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: read active on 13
May  9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13)
May  9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13): got connid=0
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: =>do_syncrepl rid 006
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: =>do_syncrep2 rid 006
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on:
May  9 20:45:03 info-ldap-006 slapd[32288]: do_syncrep2: rid 006 Can't
contact LDAP server
May  9 20:45:03 info-ldap-006 slapd[32288]:
May  9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13)
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: hangup on 13
May  9 20:45:03 info-ldap-006 slapd[32288]: connection_get(13): got connid=0
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: removing 13
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on:
May  9 20:45:03 info-ldap-006 slapd[32288]:
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: do_syncrepl: rid 006 retrying
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: activity on:
May  9 20:45:03 info-ldap-006 slapd[32288]:
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:03 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrepl rid 006
May  9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrep2 rid 006
May  9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13)
May  9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13): got connid=0
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: added 13r listener=(nil)
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on:
May  9 20:45:08 info-ldap-006 slapd[32288]:
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on 1 descriptor
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: activity on:
May  9 20:45:08 info-ldap-006 slapd[32288]:  13r
May  9 20:45:08 info-ldap-006 slapd[32288]:
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: read active on 13
May  9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13)
May  9 20:45:08 info-ldap-006 slapd[32288]: connection_get(13): got connid=0
May  9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrepl rid 006
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: daemon: epoll: listen=8
active_threads=0 tvp=zero
May  9 20:45:08 info-ldap-006 slapd[32288]: =>do_syncrep2 rid 006
May  9 20:45:08 info-ldap-006 slapd[32288]: >>> dnPrettyNormal:
<uid=chill,ou=People,dc=example,dc=com>

It looks like the consumer didn't detect that its previous connection
to the provider had closed. It seems like the FW between the consumer
and the provider is killing the connection without sending a TCP Reset
to both sides. I definitely need to talk to our network guys about
that.

Now my question is what triggered the 'do_syncrepl ' at 20:45:03? Why
didn't the consumer perform the 'do_syncrepl' earlier and realize that
the connection to the provider was lost?

Thanks,
Sam