We were simply evaluating our options for openldap redesign (read upgrades). Here's a reply from one of our engineers:
----------
Thank you Howard for your reply. I would like to know how many BDB
files you had to achieve this great performance with 150 million
entries.
My request is to collapse the DIT structure for our customers. I need
a single "ou=people" container and be able to accommodate all customer
entries into this logical context. This container has to be able to
grow. It has to be able to grow to many, many millions. Currently we
are thinking of a number in between 20MM to 50MM. However, because of
scale limitations in the past we had to split out people container
into 14 sub-OU's which are named as cities.
A commercial LDAP vendor is highlighting one of his features that would allow us to have an arbitrary amount of physical files belonging to the same OU. We would be able to reduce the physical file size and retrieve performance gains through shorter searches, smaller indices, and whatever else benefits from small files. It would greatly reduce our administrative cost/burden and reduce some costly moving of people/entries in our environment.
Also a lot of our LDAP-Depending applications could be simplified. There is no need for our business to know where a customer is coming from. We have no value of that information to us, but we are unable to get rid of it.
There's nothing in OpenLDAP requiring you to operate this way.
---------
To expand a little bit, splitting the tree allowed us to keep database
files small and recoverable from another source, and we could split
the database accross different disks for I/O gains. Although more
intelligent indexing could probably help a lot in these respects.
Thanks, _Matt
On 1/4/06, Howard Chu <hyc@symas.com> wrote:
matthew sporleder wrote:
No, there's no such feature. Nor does it sound like it would be useful,I'm trying to figure out if I can abstract a database's logical layout (DIT) from being bound to specific files per 'database' definition, and I'm not seeing any good tips in the berkeley db tuning docs.
For example:
I have ou=region1,dc=example,dc=com and ou=region2,dc=exmaple,dc=com. Right now the only options I see of separating these are to define them in different 'database' sections. I would, however, like to have them both defined in one database, but allow the actual database files (dn2id, etc) to be split in terms of size, or other definables. (usage stats, whatever)
Am I missing something obvious in DB_CONFIG like "max_file_size"?
given what little you've described so far. Even if you allowed a
particular DB file to be split, all of the files would still occupy
space in the single BDB environment cache. In fact, since each DB handle
also consumes cache space, splitting files would consume more resources
than otherwise. Given that we've benchmarked a directory with 150
million entries consuming about a terabyte of disk space, using the
current back-bdb code, getting tens of thousands of operations per
second throughput, I don't see any particular reason to bother with
splitting the files. Perhaps if you explained what real problem you're
trying to solve, it might make a bit more sense.
-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/