I wonder if an intermediate variation on this idea might be easier to
implement:
We can describe a subset of MDB_RDONLY transactions as 'long running',
e.g. when copying the database. Normal read-only transactions might then
be called 'ephemeral'. We might compute long-running transactions
heuristically, e.g. ((currentTxnID - rdOnlyTxnId) > 100) and/or we might
allow developers to mark them.
When deciding which transactions to free, we compute three numbers with
a one-pass scan of the reader lock table:
txnO = the oldest active transaction
txnE = the oldest active ephemeral transaction
txnL = the newest active long-running transaction
Normally, LMDB just computes txnO, and conservatively assumes that
readers might be operating anywhere between txnO and the current
transaction. Emmanuel's proposal for precise analysis is somewhat
daunting. But I think we can use this simple three-point analysis to
find some sweet spots between precise and conservative.
We know by definition that txnO <= txnL, and txnO <= txnE.
Under normal conditions, i.e. assuming that long-running transactions
are relatively infrequent, we also know that txnL < txnE will be true
more frequently than not. Conservatively, we must protect the ranges
from [txnO,txnL] and [txnE,currentTxnId]. But in the common case, there
will be a non-empty range of transactions (txnL,txnE) that can be GC'd.
And of course anything below txnO can still be GC'd.
I'm not sure whether or not we can easily identify which pages are used
only within that gap, and may thus be collected. I haven't studied that
aspect of LMDB yet. But supposing that aspect is addressed, this scheme
should greatly mitigate the impact of infrequent long-running read-only
transactions. A lot more space from recent write transactions would be
reusable.