[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: py-lmdb
Luke Kenneth Casson Leighton wrote:
We fell for the fantasy of parallel writes with BerkeleyDB, but after a
dozen+ years of poking, profiling, and benchmarking, it all becomes clear -
all of that locking overhead+deadlock detection/recovery is just a waste of
resources.
... which is why tdb went to the other extreme, to show it could be done.
But even tdb only allows one write transaction at a time. I looked into
writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I
know pretty well how tdb works...
https://twitter.com/hyc_symas/status/451763166985613312
quote:
"The new code is faster at indexing and searching, but not so much
faster it would blow you away, even using
LMDB. Turns out the slowness of Python looping trumps the speed of a
fast datastore :(. The difference
might be bigger on a big index; I'm going to run experiments on the
Enron dataset and see."
interesting. so why is read up at 5,000,000 per second under python
(in a python loop, obviously) but write isn't? something odd there.
Good question. I'd guess there's some memory allocation overhead involved in
writes. The Whoosh guys have some more perf stats here
https://bitbucket.org/mchaput/whoosh/wiki/Whoosh3
(their test.Tokyo / All Keys result is highly suspect though, the timing is
the same for 100,000 keys as for 1M keys. Probably a bug in their test code.)
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
- Follow-Ups:
- Re: py-lmdb
- From: Volker Lendecke <Volker.Lendecke@SerNet.DE>
- References:
- py-lmdb
- From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
- Re: py-lmdb
- From: Howard Chu <hyc@symas.com>
- Re: py-lmdb
- From: Howard Chu <hyc@symas.com>