[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7667) performance degradation when using MDB_INTEGERKEY
--f46d043c7d22cec95604e479b9f8
Content-Type: text/plain; charset=UTF-8
Thanks! I was aware of little endian transformation. I did not know that
the change of insertion order affects write performance of the database
that much.
On Wed, Aug 21, 2013 at 12:54 AM, Howard Chu <hyc@symas.com> wrote:
> romange@gmail.com wrote:
>
>> --001a11c1e98008372804e46726c2
>> Content-Type: text/plain; charset=UTF-8
>>
>>
>> Hi, I extracted a small dataset that shows the problem.
>> you can download it from here:
>> https://docs.google.com/file/**d/**0B6o29pwkWoERdnFSaUtMNDljemc/**
>> edit?usp=sharing<https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit?usp=sharing>
>>
>> I modified mdb_copy.c to demonstrate the difference. copy it to source dir
>> from here
>> https://docs.google.com/file/**d/**0B6o29pwkWoERd3VuUm1DN0FpcUU/**
>> edit?usp=sharing<https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit?usp=sharing>
>>
>> build and
>> run "time ./mdb_copy foo foo2"
>> after this change the flag at line 64 and run it again.
>> at my computer the difference is 17s vs 1.7s for 3 million items.
>>
>
> This test doesn't prove the existence of a bug. You're running on a
> Little-Endian machine, therefore data that is in sorted order as a string
> is in hashed order when used as an integer. Your data insert turns into a
> worst-case insert order in this case, causing the worst possible random
> access strides through memory. Assuming the two orders to be equivalent is
> a pretty common mistake for DB programmers. Microsoft has done the same
> thing in ActiveDirectory, I mentioned it here a few years ago
> http://www.openldap.org/lists/**openldap-devel/200711/**msg00002.html<http://www.openldap.org/lists/openldap-devel/200711/msg00002.html>
>
> If you had run this test on a Big-Endian machine, like SPARC, the insert
> order would be identical either way, and INTEGERKEY result would have been
> faster.
>
> Closing this ITS, no bug.
>
>
>>
>>
>> On Tue, Aug 20, 2013 at 9:04 PM, Quanah Gibson-Mount <quanah@zimbra.com
>> >wrote:
>>
>> --On Sunday, August 18, 2013 11:46 AM +0000 romange@gmail.com wrote:
>>>
>>> Full_Name: Roman Gershman
>>>
>>>> Version:
>>>> OS: linux 3.8.0-25-generic
>>>> URL:
>>>> Submission from: (NULL) (212.150.97.210)
>>>>
>>>>
>>> Please provide further information, specifically:
>>>
>>> The size of values
>>> Insert order
>>> Sample code if possible
>>>
>>> Thanks,
>>> Quanah
>>>
>>>
>>> --
>>>
>>> Quanah Gibson-Mount
>>> Lead Engineer
>>> Zimbra, Inc
>>> --------------------
>>> Zimbra :: the leader in open source messaging and collaboration
>>>
>>>
>>
>>
>>
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/**project/<http://www.openldap.org/project/>
>
--
Best regards,
Roman
--f46d043c7d22cec95604e479b9f8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_extra">Thanks! I was aware of little e=
ndian transformation. I did not know that the change of insertion order aff=
ects write performance of the database that much.=C2=A0</div><div class=3D"=
gmail_extra">
<br></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><=
br><div class=3D"gmail_quote">On Wed, Aug 21, <a href=3D"tel:2013" value=3D=
"+9722013" target=3D"_blank">2013</a> at 12:54 AM, Howard Chu <span dir=3D"=
ltr"><<a href=3D"mailto:hyc@symas.com" target=3D"_blank">hyc@symas.com</=
a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange@gmail.com</a> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--001a11c1e98008372804e46726c2<br>
Content-Type: text/plain; charset=3DUTF-8<div><br>
<br>
Hi, I extracted a small dataset that shows the problem.<br>
you can download it from here:<br>
<a href=3D"https://docs.google.com/file/d/0B6o29pwkWoERdnFSaUtMNDljemc/edit=
?usp=3Dsharing" target=3D"_blank">https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERdnFSaUtMNDljemc/<u></u>edit?usp=3Dsharing</a><br>
<br>
I modified mdb_copy.c to demonstrate the difference. copy it to source dir<=
br>
from here<br>
<a href=3D"https://docs.google.com/file/d/0B6o29pwkWoERd3VuUm1DN0FpcUU/edit=
?usp=3Dsharing" target=3D"_blank">https://docs.google.com/file/<u></u>d/<u>=
</u>0B6o29pwkWoERd3VuUm1DN0FpcUU/<u></u>edit?usp=3Dsharing</a><br>
<br>
build and<br>
run "time ./mdb_copy foo foo2"<br>
after this change the flag at line 64 and run it again.<br>
at my computer the difference is 17s vs 1.7s for 3 million items.<br>
</div></blockquote>
<br>
This test doesn't prove the existence of a bug. You're running on a=
Little-Endian machine, therefore data that is in sorted order as a string =
is in hashed order when used as an integer. Your data insert turns into a w=
orst-case insert order in this case, causing the worst possible random acce=
ss strides through memory. Assuming the two orders to be equivalent is a pr=
etty common mistake for DB programmers. Microsoft has done the same thing i=
n ActiveDirectory, I mentioned it here a few years ago <a href=3D"http://ww=
w.openldap.org/lists/openldap-devel/200711/msg00002.html" target=3D"_blank"=
>http://www.openldap.org/lists/<u></u>openldap-devel/200711/<u></u>msg00002=
.html</a><br>
<br>
If you had run this test on a Big-Endian machine, like SPARC, the insert or=
der would be identical either way, and INTEGERKEY result would have been fa=
ster.<br>
<br>
Closing this ITS, no bug.<div><div><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
<br>
<br>
On Tue, Aug 20, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_blank">2=
013</a> at 9:04 PM, Quanah Gibson-Mount <<a href=3D"mailto:quanah@zimbra=
.com" target=3D"_blank">quanah@zimbra.com</a>>wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
--On Sunday, August 18, <a href=3D"tel:2013" value=3D"+9722013" target=3D"_=
blank">2013</a> 11:46 AM +0000 <a href=3D"mailto:romange@gmail.com" target=
=3D"_blank">romange@gmail.com</a> wrote:<br>
<br>
=C2=A0 Full_Name: Roman Gershman<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Version:<br>
OS: linux 3.8.0-25-generic<br>
URL:<br>
Submission from: (NULL) (212.150.97.210)<br>
<br>
</blockquote>
<br>
Please provide further information, specifically:<br>
<br>
The size of values<br>
Insert order<br>
Sample code if possible<br>
<br>
Thanks,<br>
Quanah<br>
<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Lead Engineer<br>
Zimbra, Inc<br>
--------------------<br>
Zimbra :: =C2=A0the leader in open source messaging and collaboration<br>
<br>
</blockquote>
<br>
<br>
<br>
</blockquote>
<br>
<br>
-- <br></div></div><span><font color=3D"#888888">
=C2=A0 -- Howard Chu<br>
=C2=A0 CTO, Symas Corp. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <a href=3D"http:=
//www.symas.com" target=3D"_blank">http://www.symas.com</a><br>
=C2=A0 Director, Highland Sun =C2=A0 =C2=A0 <a href=3D"http://highlandsun.c=
om/hyc/" target=3D"_blank">http://highlandsun.com/hyc/</a><br>
=C2=A0 Chief Architect, OpenLDAP =C2=A0<a href=3D"http://www.openldap.org/p=
roject/" target=3D"_blank">http://www.openldap.org/<u></u>project/</a><br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div>-- <b=
r>Best regards,<br>=C2=A0 =C2=A0=C2=A0 Roman
</div></div>
--f46d043c7d22cec95604e479b9f8--