[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Antw: Re: Experience with LDAP monitoring (cn=monitor)
>>> Michael Ströder <michael@stroeder.com> schrieb am 10.03.2016 um 13:50 in
Nachricht <56E16D7A.2070707@stroeder.com>:
> Ulrich Windl wrote:
>> Issues with LDAP Monitoring
>
> Aren't you mentioning issues with monitoring in general?
>
>> "Uptime" is in whole seconds only (minor issue). SNMP uptime has a finer
>> resolution (but limited range, unfortunately).
>>
>> Detailed data per peer can only be retrieved through the "Connections",
but
>> that's a moment's view only: So if a client opens a connection, does a few
>> operations, then closes the connection, a polling client of the monitor
will
>> never see those client operations. Also when needing a cumulative count
of
>> operations per peer (or just the number of connections per peer (for a
> rate)), a monitor
>> client will have to accumulate the numbers from all peer connections. If
a
>> connection (with significant operations being done) was closed since the
> last
>> poll, the total number will look negative. So the monitor client will have
> to
>> store accumulated numbers for closed connections per peer also (Keeping
>> numbers for all closed connections seems inefficient).
>>
>> "Current Connections" is returned as monitor _counter_ object
> (monitorCounter),
>> where in fact it's of type "gauge", opposed to "Total Connections" (which
is
> also
>> returned as monitor counter) which is actually a counter. This makes the
> code harder
>> than necessary.
>
> Of course Shannon's sampling theorem also applies to IT monitoring.
Sorry, I'm not impressed: It's easy for the server to count the numbers, and I
just wonder why it woudln't give it out.
>
> And of course if your scripts calculate rates, it has to deal with counter
> reset
> etc. BTDT.
Of course they do. The debug message would read like "UNKNOWN: [3 searches
with 12 entries in 0.066s], 95, [Operations.Bind.i: restart detected (95 -
1228173 == -1228078)], 22, [Operations.Unbind.i: restart detected (95 - 1228173
== -1228078)], 9, [Operations.Search.i: restart detected (95 - 1228173 ==
-1228078)], 92, [Operations.Compare.i: restart detected (95 - 1228173 ==
-1228078)], 0, [Operations.Modify.i: restart detected (95 - 1228173 ==
-1228078)], 0, [Operations.Modrdn.i: restart detected (95 - 1228173 ==
-1228078)], 0, [Operations.Add.i: restart detected (95 - 1228173 == -1228078)],
0, [Operations.Delete.i: restart detected (95 - 1228173 == -1228078)], 0,
[Operations.Abandon.i: restart detected (95 - 1228173 == -1228078)], 0,
[Operations.Extended.i: restart detected (95 - 1228173 == -1228078)], 19" (".i"
is a shortcut for ".initiated")
>
> In general polling based monitoring system like Nagios, check_mk etc. are
> pretty
> poor regarding fine-grained performance monitoring. You will always loose
> information about peak loads happening in those pretty wide time slots of
> 30+ secs.
But still ist exactly infinite times better than nothing ;-)
>
> If you really need it you can send the logged events to the usual ELK stack
> (or
> similar) and analyze whatever you want there [1]. Of course, depending on
> your
> OpenLDAP load, you need big and fast log stores.
>
> [1] https://github.com/coudot/openldap-elk
>
>> What I'm missing are some database (BDB/HDB) runtime statistics.
>
> Forget about BDB/HDB. MDB is the way to go. ;-)
There aren't statistics either, and I should be allowed to have an opinion.
>
> https://www.openldap.org/its/index.cgi?findid=7770
>
> Ciao, Michael.