[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: How Indexes work?

To: Bjørn Ruberg <bjorn@ruberg.no>
Subject: Re: How Indexes work?
From: Steeg Carson <steeg.carson@googlemail.com>
Date: Thu, 30 Dec 2010 19:53:52 +0100
Cc: openldap-technical@openldap.org
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=zH5ldfkXGWoLD2L+Nd0B0V4EPCRasG12LduYXCbmZVE=; b=X5+PvPwlD6LVIaJkZPubdOm7fPqHYGZpL95FyQVrKjYejLfog8rb5LroRYU+b41Snq cnSsvZe9IGv16nMPwEIOcMHPPXgRR55hzQSHajfYhKFP3aUyaZOwNXGw808bZ1btQkYx EUcA7gL52UsUQQYV7M/rSiVdBZ55TNSIokIYo=
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=JDQjcU/53rWUGadibgfiC+55eI9eIQPXeOPihGQgzygTgoy7HkNADGra9jENgP0yWu zWXBKevkjVRSNGCfCKMk8mmcuu98lPKPFEKY9s/zbHglN+xppmiWsitllQI3OUijtWUB biYYn6TcPwNAy1tVov92E844ZUrAaUgtX1NwQ=
In-reply-to: <4D0BE2E8.2040002@ruberg.no>
References: <AANLkTikhdKJ5s5JhshOiC8erF3xc58_O+RMKbh=nJsgU@mail.gmail.com> <4D0B4756.4020303@ruberg.no> <AANLkTimvEsUBfqASbPU3_Pz0O7bmQsHGHT5b_og05yKV@mail.gmail.com> <4D0BE2E8.2040002@ruberg.no>

Hello,

I tried a little bit more time to investigate the problem.

First, I installed a 64bit test machine, with 16GByte RAM and 2 CPUs
under VMware ESX with own SAS-Storage (RAID10) for only this Machine.
I configured slapd.conf as following:


#######################################################################
include         /etc/openldap/schema/core.schema
include         /etc/openldap/schema/cosine.schema
include         /etc/openldap/schema/inetorgperson.schema
include         /etc/openldap/schema/rfc2307bis.schema
include         /etc/openldap/schema/own.schema

pidfile         /var/run/slapd/slapd.pid
argsfile        /var/run/slapd/slapd.args

modulepath      /usr/lib/ldap
moduleload      back_hdb

sizelimit               -1
timelimit               300
disallow bind_anon

gentlehup on
tool-threads 2

# hdb database definitions

database        hdb
suffix          "ou=root"
rootdn          "uid=admin,ou=root"
checkpoint  4096 15
# loglevel only for test, not during time measuring
loglevel       33
rootpw   password
directory       /var/lib/ldap_hdb
logfile         /var/log/openldap.log
cachesize 1000000
dncachesize 1000000
idlcachesize 3000000
dbnosync

index objectClass,entryUUID,entryCSN     eq
index subEngine eq
index cn eq,sub
#######################################################################

The backend for the database uses hdb.

In DB_CONF i set 2 GB BDB page cache (set_cachesize 2 0 1)


The entire directory holds 470812 entires.

(=> ldapsearch -x -h localhost -wpassword -D"uid=admin,ou=root"
-b"ou=root" "(objectClass=*)" dn | grep "^dn:" | wc -l)


Task:

I search Objects with a special objectClass (subEngine) only in a
dedicated oontainer (set via Base DN).

The objectClass "subEngine" exists 104384 times in the entire directory:
(=> ldapsearch -x -h localhost -D"uid=admin,ou=root" -b"cn=ou=root>"
"(ObjectClass=subEngine)" dn | grep "^dn:" | wc -l)

But the objectClass "subEngine"  exist only one time in the dedicated Container.

When I do the search:
ldapsearch -x -h localhost -wpassword  -D"uid=admin,ou=root"
-b"cn=ownPath,ou=root" "(ObjectClass=subEngine)"

in the logfile I can see:

=> bdb_equality_candidates (objectClass)
=> key_read
<= bdb_index_read 470601 candidates
<= bdb_equality_candidates: id=-1, first=228, last=470828
<= bdb_filter_candidates: id=-1 first=228 last=470828
<= bdb_list_candidates: id=-1 first=228 last=470828
<= bdb_filter_candidates: id=-1 first=228 last=470828
<= bdb_list_candidates: id=-1 first=40595 last=470828
<= bdb_filter_candidates: id=-1 first=40595 last=470828
bdb_search_candidates: id=-1 first=40595 last=470828

What does does these messages mean?

I can't see, how they are related with the directory.

If I search then in the logfile, I see 430233 messages like:
"hdb_search: <candidate> <message: does not match filter | scope not
okay>"

 So the 430233 comes from 470828-40595=430233.  Why so much searches?

Should the index for the objectClass=subEngine not  hold only 104384 entires?

What are this for values, and how is the search done?

I guess, the first is the index lookup. But the index holds only the
IDs and does nothing know about the DN from the entry.
So in the next step, all from Index ID's will used to query the
id2Entry.bdb and check the DN?

This search takes about 40seconds (with logging turned off!) for the
first time. During this time, I can see a heavy  write (about 25M/s)
load from the slapd (seen with iotop)

After the cache is filled, the lookup takes about 2 seconds (with
logging turned off!)...

The only difference in the logs between the first an the second search
is, that in the log for the first search for each hdb_search a

 entry_decode: ""
 <= entry_decode()

is seen.

But in the log from the second search the hdb_search is done also 430233 times!

Is this correct?


Thanks in advance

Steeg Carson









2010/12/17 Bjørn Ruberg <bjorn@ruberg.no>:
> Steeg Carson:
>>
>> 2010/12/17 Bjørn Ruberg<bjorn@ruberg.no>:
>>>
>>> Steeg Carson:
>>> [...]
>>>
>>>> I have a database, and my search is like shown above. The search takes
>>>> long.
>>>
>>> Did you run slapindex after adding the index? Is the index file owned by
>>> the
>>> proper user account?
>
> You didn't answer the above question...
>
> [...]
>
>>>> But what I see, is that the write IO from LDAP is enormously (seen
>>>> with iotop). During the whole search, the write IO is higher than the
>>>> read IO.
>>>> Why?
>>>
>>> What is slapd's current loglevel?
>>
>> loglevel is 0
>> I know, I should better use 256, but for this reason, I did switch off
>> logging :-( for testing.
>
> I asked because you said there's much *write* activity. If there's no
> logging, something else must be writing and you should find out what it is.
> This is probably the reason why the search is slow.
>
> [...]
>
>> But what about the first part of my question. in this Posting. How
>> will be a ldapsearch processed?
>> Does the slapd, search the whole database despite of indexes?
>
> I'm no authority on this, but generally the main purpose of using indexes is
> -not- having to do a full scan. This will of course require that the index
> has been properly built (se above).
>
> If your original statement is still correct - that is, you've built an "eq"
> index (equality, exact match) and you search for the exact value - the index
> should have made a difference.
>
> However, if you've built an equality index and then search for a substring,
> the index will not speed it up.
>
> As a side note, you should be aware that while most attributes can be
> indexed with "eq", some attributes won't allow substring indexing.
>
> Hope this helps.
>
> --
> Bjørn
>

Follow-Ups:
- Re: How Indexes work?
  - From: Dieter Klünter <dieter@dkluenter.de>

References:
- How Indexes work?
  - From: Steeg Carson <steeg.carson@googlemail.com>
- Re: How Indexes work?
  - From: Bjørn Ruberg <bjorn@ruberg.no>
- Re: How Indexes work?
  - From: Steeg Carson <steeg.carson@googlemail.com>
- Re: How Indexes work?
  - From: Bjørn Ruberg <bjorn@ruberg.no>

Prev by Date: RE: invalid credentials (49) for normal user
Next by Date: Re: How Indexes work?
Index(es):
- Chronological
- Thread