Hello,
Working with overflow pages directly via pointers outside write transactions works great and it helps that they do not move "by design" in current versions as discussed in this thread.
I have two related scenarios that will give a substantial performance boost in my case.
The first one is updating a value in-place via pointer from aborted write transaction. If I
1) use MDB_WRITEMAP,
2) from **write** transaction find a record (which is small and not in an overflow page),
3) modify a part of it's value (for duplicates this part is not used in the compare function) directly via the MDB_VAL data pointer (e.g. interlocked_increment or compare_and_swap),
4) and **abort** the transaction,
then readers see the updated value via normal read transactions later. Since I do the direct updates from inside a write transaction all other writers should be locked until I exit the transaction (abort in this case), and no pages should move since the transaction is aborted. Is this correct? Does this work "by design" or "by accident" currently?
The second one is about updating values in-place from read transactions. If I
1) use MDB_WRITEMAP,
2) open a database with MDB_INTEGERKEY (and could use a dedicated environment with a single DB if that changes the answer),
3) add values to the DB *only* using MDB_APPEND | MDB_NOOVERWRITE,
4)
modify a part of a value directly via the MDB_VAL data pointer,
is it possible that the page from the read transaction is replaced with a new one if there is a parallel write transaction?
There is a quote from Howard on GitHub (
https://github.com/lmdbjava/benchmarks/issues/9#issuecomment-354184989): "When we do sequential inserts using MDB_APPEND, there is
no page split at all - we just allocate a new page and fill it, sequentially." Does this mean that if there are no page splits then existing pages do not move as well,
and it is "safe" to use pointers outside of write transactions as is the case with the overflow pages?
In both cases I update values of a struct that indicate e.g. some lifecycle stage of an object the LMDB record refers to and stage transitions are idempotent. If a direct pointer write doesn't make to disk due to system failure subsequent readers (workers)
will see an older stage and repeat stage transition.
Therefore missed direct writes do not break application logic, I only care about physical corruption of the entire DB. If I update the values in-place inside read transactions and the page becomes stale this should not corrupt DB since the old page will go to the free list only after the read transaction is finished, so this "hack" should not break DB. But then missed writes will be a norm and not a special-case on OS failure. But if pages do not move, all these "soft" updates could be done in parallel and be very fast.
Unfortunately I cannot answer this myself while trying to read the mbd.c file. In the second scenario I'm specifically concerned what is happening when DB becomes large and the tree needs rebalancing. At least in this case some pages need to move, but does the rebalancing replace/split existing pages?
Thanks & best regards,
Victor