[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OpenLDAP system architecture?

To: Brad Knowles <b.knowles@its.utexas.edu>
Subject: Re: OpenLDAP system architecture?
From: Howard Chu <hyc@symas.com>
Date: Thu, 24 Jan 2008 15:08:50 -0800
Cc: openldap-software@openldap.org
In-reply-to: <1201198359.29813.31.camel@valen.cc.utexas.edu>
References: <1201198359.29813.31.camel@valen.cc.utexas.edu>
User-agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.9b3pre) Gecko/2008011510 SeaMonkey/2.0a1pre

Brad Knowles wrote:

Folks,

I'm going through the documentation at
<http://www.openldap.org/doc/admin24/>, the OpenLDAP FAQ-o-Matic at
<http://www.openldap.org/faq/data/cache/1.html>, and the archives of the
various Open-LDAP mailing lists, but I have not yet found anything that
discusses how one might want to architect a large-scale OpenLDAP system
with multiple masters, multiple slaves, etc... for best performance and
low latency.

In our case, we have OpenLDAP 2.3.something (a few versions behind the
official latest stable release), and we've recently hit our four
millionth object (at a large University with something like 48,000
students, 2700 faculty, and 19,000 employees), and we're running into
some performance issues that are going to keep us from rolling out some
other large projects, at least until we can get the problems resolved.

In my experience, 4 million objects (at around 3KB per entry) is near the limit of what will fit into 16GB of RAM. Sounds like you need a server with more than 16GB if you want to keep growing and not be waiting on disks.

I do not yet understand a great deal about how our existing OpenLDAP
systems are designed, but I am curious to learn what kinds of
recommendations you folks would have for a large scale system like this.

In the far, dark, distant past, I know that OpenLDAP did not handle
situations well when you had both updates and reads occurring on the
same system, so the recommendation at the time was to make all updates
on the master server, then replicate that out to the slaves where all
the read operations would occur.  You could even go so far as to set up
slaves on pretty much every single major client machine, for maximum
distribution and replication of the data, and maximum scalability of the
overall LDAP system.

The single-master constraints on OpenLDAP were never about performance. Even with OpenLDAP 2.2 the concurrent read/write rates for back-bdb are faster than any other directory server. It's always been about data consistency, and the fact that it's so easy to lose it in a multi-master setup.

I know that modern versions of OpenLDAP are able to handle a mix of both
updates and reads much better, so that the old style architecture is not
so necessary.  But for a large-scale system like we have, would it not
be wise to use the old-style architecture for maximum performance and
scalability?

If you did use a multi-master cluster pair environment that handled all
the updates and all the LDAP queries that were generated, what kind of
performance do you think you should reasonably be able to get with the
latest version of 2.4.whatever on high-end hardware,

You've been brainwashed by all the marketing lies other LDAP vendors tell about multi-master replication. Multi-master has no relation to performance. It's only about fault tolerance and high availability. No matter whether you choose a single-master or a multi-master setup, with the same number of machines, the same number of writes must be propagated to all servers, so the overall performance will be the same.

and what kind of
hardware would you consider to be "high-end" for that environment?

That's a pointless question. The right question is - how fast do you need it to be? What load are you experiencing now, what constitutes a noticeable delay, and how often do you see those?

Is CPU more important, or RAM, or disk space/latency?

If you have enough RAM, disk latency shouldn't be a problem. Disk space is so cheap today that it should never be a problem. CPU, well, that depends on your performance target.

Alternatively, if you went to a three-level master(s)->proxies->slaves
architecture [0], what kind of performance would you expect to be able
to get, and how many machines would you expect that to be able to scale
to?  Are there any other major issues to be concerned about with that
kind of architecture, like latency of updates getting pushed out to the
leaf-node slaves?


Yes.

How about the ultimate maximum distribution scenario, where you put an
LDAP slave on virtually every major LDAP client machine?

Generally I like the idea of having compact/simple slapd configs spread all over. With the old slapd.conf that would have been rather painful to administer though. Also in general, more moving parts means more things that can break.

Any and all advice you can provide would be appreciated, and in
particular I would greatly appreciate it if you can provide any
references to documentation, FAQs, mailing list archives where I can
read more.


--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP     http://www.openldap.org/project/

Follow-Ups:
- Re: OpenLDAP system architecture?
  - From: Brad Knowles <b.knowles@its.utexas.edu>
- Re: OpenLDAP system architecture?
  - From: Brad Knowles <b.knowles@its.utexas.edu>

References:
- OpenLDAP system architecture?
  - From: Brad Knowles <b.knowles@its.utexas.edu>

Prev by Date: Re: OpenLDAP system architecture?
Next by Date: Re: OpenLDAP system architecture?
Index(es):
- Chronological
- Thread