[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
back-ndb design notes
- To: OpenLDAP Devel <openldap-devel@openldap.org>
- Subject: back-ndb design notes
- From: Howard Chu <hyc@symas.com>
- Date: Sun, 19 Oct 2008 02:14:41 -0700
- User-agent: Mozilla/5.0 (X11; U; Linux x86_64; rv:1.9.1b1pre) Gecko/20081015 SeaMonkey/2.0a1pre
While looking over the Admin Guide I was re-reading section 1.8 (LDAP vs
RDBMS) and thought it would be worth describing the approach we took and
tradeoffs we made. (Disclaimer: some of these decisions need to be changed.)
(Also for more background: NDB is a cluster-based storage engine for MySQL. It
runs as its own set of processes across the nodes of a cluster; all accesses
to it are through remote procedure calls.)
The basic approach starts with a rough adaptation of back-bdb to NDB. That is,
we use one table to map DNs to numeric entryIDs, and use this numeric ID as
the primary key to access the main entry data. But instead of using a
monolithic id2entry table (and treating all the LDAP data as blobs) we start
from the notion of a table per objectclass.
For objectclasses that inherit from others, only the unique attributes of the
subordinate class reside in that class's table. This is one of the underlying
requirements - an LDAP attribute must only appear in one table. For attributes
that appear in unrelated LDAP objectclasses, you must configure them into
"attribute sets" (which are yet another set of tables) to ensure that their
column definition is used uniformly by all the objectclasses.
Note that this means to retrieve the complete information for a particular
entry, we have to pull from many tables at once, and each requires a separate
NDB operation. The NDB API allows many operations to be batched together and
executed in parallel, so we can do all of these fetches in a single roundtrip.
(Otherwise, performance for this approach would be pretty abominable.) The
Admin Guide notes that this approach can be very poor because accessing many
tables at once requires a lot of disk seeks. This consideration doesn't apply
to NDB because each node of the cluster always caches everything completely in
RAM.
However, we first need to know which tables to fetch from, i.e., we need to
consult a table to get a list of objectclasses, before we can spit out the
batched request to all the relevant tables. To avoid having to visit so many
different tables sequentially, we store the objectclass attribute in the dn2id
table. (We obviously have to visit the dn2id table anyway. Just as with
back-bdb, where accessing any entry by name requires first a dn2id lookup
followed by an id2entry lookup, here we must do the dn2id lookup followed by
the parallel lookups across all of the class/set tables.)
For as much functionality as that covers, things are pretty straightforward.
The dn2id table has some interesting issues as well. I think it's a poor
design, but it was the best we could do under our initial constraints.
Basically it has a 16 column primary key, where each rdn occupies a column.
This approach makes it fairly easy to obtain base, subtree, and children
scoping, but it suffers from the same problems as back-bdb and back-ldbm.
(Bloated, no subtree rename... Plus, it imposes a hard limit on tree depth at
16 levels. It's probably an acceptable limit, but worth noting that none of
our other backends have such a limit.)
The entryID is just an integer column, and the objectclass attribute is a
single column containing all of the classes concatenated as a space-delimited
string. E.g., for an entry with
objectClass: person
objectClass: inetOrgPerson
the column will contain " person inetOrgPerson "
This format allows us to perform equality searches on objectclass using an SQL
LIKE filter. (objectclass=foo) -> objectclass like "% foo %"
(Filters aren't implemented using actual SQL, but the NDBapi filter mechanism
obviously implements SQL filtering functionality.)
As mentioned, the numeric entryID is used as the primary key of the main data
tables. In fact, that was just our initial approach which only supported
single-valued attributes. Now we support multivalued attributes by using a
two-column primary key, <entryID,value#>. So there's a "value" column with
values from 0 to N-1 for an attribute with N values, and each value occupies a
separate row. The NDBapi allows us to do partially-specified primary key
lookups, so we can do a lookup on just the entryID and automatically get all
of the values/rows associated with that entryID in a single operation.
Search indexing is currently a mess, and will need to be rewritten. We arrived
at the current approach based on a couple main constraints:
1) this backend will do no local caching
2) we must always retrieve the desired data in only 2 API roundtrips
#1 was based on the requirement that multiple slapds be able to operate
concurrently on the database, along with multiple mysqlds. We didn't have time
to design a distributed cache synchronization mechanism, therefore, no local
caching.
#2 was based on the simple fact that taking more than 2 roundtrips would
destroy performance.
The current solution uses the regular SQL table indexing features, but the
indices are stored in the dn2id table, not in the main data tables. This means
we can do an index lookup and obtain an entry's DN and objectclasses in a
single roundtrip, then a single roundtrip for the main data, so it meets our
#2 constraint. One problem with this is that a single table can only
accommodate rows up to 8192 bytes wide, and we can easily max that out when
multiple indices are needed. Another problem is that updating indexed
attribute values requires writes to both the main data tables and the dn2id
table. While back-ndb handles this itself, mysql daemons writing to the data
tables won't know about this requirement. (We need to use a trigger/stored
procedure to make sure the SQL side keeps the indices up to date.)
The big problem here is that if you want to be able to match multiple columns
in an indexed filter lookup, they must all belong to a single index. A 3
column index <a,b,c> is not the same as 3 individual indices <a>, <b>, and
<c>... With the multi-column index, you can only match filters that use those
columns in that specific order. E.g., (&(a=foo)(b=bar)(c=baz)) will work but
not (&(c=baz)(b=bar)(a=foo)). You can omit the trailing columns
(&(a=foo)(b=bar)) but you can't omit the leading columns (&(b=foo)(c=bar)).
And, while the SQL indexers support equality and inequality indexing, they
don't support substring indexing at all.
Moving forward, I think the only feasible approach is to stop using
multi-column indices. We'll use individual indices, and have to perform
boolean evaluations within slapd, instead of letting the NDB engine evaluate
them. This is unfortunate because it will use a lot more network bandwidth,
but we don't have much choice. Likewise, we can't do anything about substring
indexing, it's up to MySQL to provide a solution for that.
Other big changes going forward include replacing the current dn2id scheme
with a back-hdb style layout so that we don't waste so much space, and so we
can support subtree rename. We'll also take a stab at local caching of the
dn2id table. (That may be rather tricky; NDB has an event API that slapd can
use to listen for add/delete/modify events, but it's totally asynchronous. If
we can't work out a way to keep the cache transactionally consistent then this
just isn't going to happen.)
Mapping LDAP onto an RDBMS is still a hard problem. back-ndb implements some
pretty good solutions to a lot of the problems, but there's still things that
can be improved. The current backend works for a limited set of LDAP use
cases, and operates well even with concurrent SQL access. With the changes I
described, it will support a broader range of LDAP use cases...
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/