[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Accessing random rows from LMDB
On 09/02/15 20:16, Sravan Kumar Reddy Javaji wrote:
1) Is there anyway that I can find the total number of records in LMDB.
mdb_stat -a <database>.
2) Can I access all the rows from LMDB randomly instead of sequentially.
(...)
No.
I know that it is better to read sequentially from LMDB and then later
randomize the records. But I have around 1 million records in LMDB, I
cant upload entire data to memory at once. I am planning to read data
batch wise into memory and perform some operation on it. So, I am
wondering, is there anyway that I can read the data randomly from LMDB
directly.
Make a random permutation of the integers [1..number of records].
Walk the DB with mdb_cursor_get:MDB_<FIRST/NEXT>, associate each
record with an ID from the permutation. Or something like that.
To avoid massacring your cache, avoid following the data.mv_data
pointer at this stage. (Only relevant when nodes are > 1/2 OS page
so the data items are stored in overflow pages rather than next to
the keys.) Unless you preprocess your entries and write them to
a file at this stage, then just record (file position, size).
Now process your records ordered by ID, that'll be your random walk.
Don't know what "associate a record with an ID" will be for you.
If you have a read-only copy of your database, maybe just build
a 32 Mbyte array of (offset of key, size, offset of data, size)
for each record, save that to a file, and bypass LMDB. Offsets
relative to MDB_envinfo.me_mapaddr. Otherwise, maybe build a
named database with {key = record ID, data = original key}.