Hi Sir/Madam,
Before trying LMDB, I just stack all the features
together into one huge binay file, and use seek function in C++ to
access each feature. Since the feature size is fixed, I can easily
compute the address of each feature in the file.
Then I tried LMDB. The value is the feature as it
is. The key is "1", "2", "3", .... Since 16kB is exactly 4 x
page_size, adding the key and header, each feature will occupy 5 x
page_size, so the db file on disk is about 1.25 times of the
previous binary file, this is already a disadvantage for LMDB, but
I still hope there can be some efficiency trade-off. I use LDMB++
C++ wrapper to access features.
Next, I compared two approach by accessing the
same random 1% features from about 300k features. Before the test,
I use vmtouch to evict both files from memory cache. The result is
surprising. The one use LMDB is 1.5 times slower than the raw
binary file (30s vs 20s).
Is this because the size of feature (exactly 4
pages)? Do I understand the use of LMDB incorrectly?
Thank your for your time!
Best Regards, Tao Chen |