[Date Prev][Date Next] [Chronological] [Thread] [Top]

back-ndb design notes

To: OpenLDAP Devel <openldap-devel@openldap.org>
Subject: back-ndb design notes
From: Howard Chu <hyc@symas.com>
Date: Sun, 19 Oct 2008 02:14:41 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; rv:1.9.1b1pre) Gecko/20081015 SeaMonkey/2.0a1pre

While looking over the Admin Guide I was re-reading section 1.8 (LDAP vs RDBMS) and thought it would be worth describing the approach we took and tradeoffs we made. (Disclaimer: some of these decisions need to be changed.)

(Also for more background: NDB is a cluster-based storage engine for MySQL. It runs as its own set of processes across the nodes of a cluster; all accesses to it are through remote procedure calls.)

The basic approach starts with a rough adaptation of back-bdb to NDB. That is, we use one table to map DNs to numeric entryIDs, and use this numeric ID as the primary key to access the main entry data. But instead of using a monolithic id2entry table (and treating all the LDAP data as blobs) we start from the notion of a table per objectclass.

For objectclasses that inherit from others, only the unique attributes of the subordinate class reside in that class's table. This is one of the underlying requirements - an LDAP attribute must only appear in one table. For attributes that appear in unrelated LDAP objectclasses, you must configure them into "attribute sets" (which are yet another set of tables) to ensure that their column definition is used uniformly by all the objectclasses.

Note that this means to retrieve the complete information for a particular entry, we have to pull from many tables at once, and each requires a separate NDB operation. The NDB API allows many operations to be batched together and executed in parallel, so we can do all of these fetches in a single roundtrip. (Otherwise, performance for this approach would be pretty abominable.) The Admin Guide notes that this approach can be very poor because accessing many tables at once requires a lot of disk seeks. This consideration doesn't apply to NDB because each node of the cluster always caches everything completely in RAM.

However, we first need to know which tables to fetch from, i.e., we need to consult a table to get a list of objectclasses, before we can spit out the batched request to all the relevant tables. To avoid having to visit so many different tables sequentially, we store the objectclass attribute in the dn2id table. (We obviously have to visit the dn2id table anyway. Just as with back-bdb, where accessing any entry by name requires first a dn2id lookup followed by an id2entry lookup, here we must do the dn2id lookup followed by the parallel lookups across all of the class/set tables.)

For as much functionality as that covers, things are pretty straightforward.

The dn2id table has some interesting issues as well. I think it's a poor design, but it was the best we could do under our initial constraints. Basically it has a 16 column primary key, where each rdn occupies a column. This approach makes it fairly easy to obtain base, subtree, and children scoping, but it suffers from the same problems as back-bdb and back-ldbm. (Bloated, no subtree rename... Plus, it imposes a hard limit on tree depth at 16 levels. It's probably an acceptable limit, but worth noting that none of our other backends have such a limit.)

The entryID is just an integer column, and the objectclass attribute is a single column containing all of the classes concatenated as a space-delimited string. E.g., for an entry with objectClass: person objectClass: inetOrgPerson

the column will contain " person inetOrgPerson " This format allows us to perform equality searches on objectclass using an SQL LIKE filter. (objectclass=foo) -> objectclass like "% foo %"

(Filters aren't implemented using actual SQL, but the NDBapi filter mechanism obviously implements SQL filtering functionality.)

As mentioned, the numeric entryID is used as the primary key of the main data tables. In fact, that was just our initial approach which only supported single-valued attributes. Now we support multivalued attributes by using a two-column primary key, <entryID,value#>. So there's a "value" column with values from 0 to N-1 for an attribute with N values, and each value occupies a separate row. The NDBapi allows us to do partially-specified primary key lookups, so we can do a lookup on just the entryID and automatically get all of the values/rows associated with that entryID in a single operation.

Search indexing is currently a mess, and will need to be rewritten. We arrived at the current approach based on a couple main constraints: 1) this backend will do no local caching 2) we must always retrieve the desired data in only 2 API roundtrips

#1 was based on the requirement that multiple slapds be able to operate concurrently on the database, along with multiple mysqlds. We didn't have time to design a distributed cache synchronization mechanism, therefore, no local caching.

#2 was based on the simple fact that taking more than 2 roundtrips would destroy performance.

The current solution uses the regular SQL table indexing features, but the indices are stored in the dn2id table, not in the main data tables. This means we can do an index lookup and obtain an entry's DN and objectclasses in a single roundtrip, then a single roundtrip for the main data, so it meets our #2 constraint. One problem with this is that a single table can only accommodate rows up to 8192 bytes wide, and we can easily max that out when multiple indices are needed. Another problem is that updating indexed attribute values requires writes to both the main data tables and the dn2id table. While back-ndb handles this itself, mysql daemons writing to the data tables won't know about this requirement. (We need to use a trigger/stored procedure to make sure the SQL side keeps the indices up to date.)

The big problem here is that if you want to be able to match multiple columns in an indexed filter lookup, they must all belong to a single index. A 3 column index <a,b,c> is not the same as 3 individual indices <a>, <b>, and <c>... With the multi-column index, you can only match filters that use those columns in that specific order. E.g., (&(a=foo)(b=bar)(c=baz)) will work but not (&(c=baz)(b=bar)(a=foo)). You can omit the trailing columns (&(a=foo)(b=bar)) but you can't omit the leading columns (&(b=foo)(c=bar)).

And, while the SQL indexers support equality and inequality indexing, they don't support substring indexing at all.

Moving forward, I think the only feasible approach is to stop using multi-column indices. We'll use individual indices, and have to perform boolean evaluations within slapd, instead of letting the NDB engine evaluate them. This is unfortunate because it will use a lot more network bandwidth, but we don't have much choice. Likewise, we can't do anything about substring indexing, it's up to MySQL to provide a solution for that.

Other big changes going forward include replacing the current dn2id scheme with a back-hdb style layout so that we don't waste so much space, and so we can support subtree rename. We'll also take a stab at local caching of the dn2id table. (That may be rather tricky; NDB has an event API that slapd can use to listen for add/delete/modify events, but it's totally asynchronous. If we can't work out a way to keep the cache transactionally consistent then this just isn't going to happen.)

Mapping LDAP onto an RDBMS is still a hard problem. back-ndb implements some pretty good solutions to a lot of the problems, but there's still things that can be improved. The current backend works for a limited set of LDAP use cases, and operates well even with concurrent SQL access. With the changes I described, it will support a broader range of LDAP use cases...

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Prev by Date: R: Re: R: Re: commit: ldap/servers/slapd schema_init.c
Next by Date: IDL bit vectors
Index(es):
- Chronological
- Thread