[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: openldap.git branch mdb.master updated. 0ce6bb4be0034120c850917bc4f59b4d4efc1432
openldap-commit2devel@OpenLDAP.org writes:
> commit aff2693fc0721df4ccb6ceb357f80501c413ed38
> Author: Howard Chu <hyc@symas.com>
> Date: Mon Dec 10 12:16:50 2012 -0800
>
> ITS#7455 simplify
>
> Don't try to reclaim overflow pages while operating on
> the freelist (for now). The circular dependencies are much like
> the single-page case, but worse. Maybe look into this in the
> future, but it's not absolutely necessary now.
Suggestions to reduce freelist changes during commit:
Let a freelist entry steal page numbers listed in the next entries.
Then mdb_page_alloc can grab more old pages without deleting/updating
their entries and producing new dirty pages. Next txn does the updates.
Preallocate the final MDB_oldpages with MDB_RESERVE in mdb_txn_commit()
and leave some room to spare. Then use page numbers from it and/or
steal new ones at need.
BTW, could MDB offer an MDB_RESERVE2 which says "give me data->mv_size
bytes plus as much more as will fit without growing the page"?
And MDB_RESERVE2_SHRINK which shrinks the size to the final size.
Stolen pages -- one way would be to search for particular pages to seal,
and list the stolen ones at the end of the freelist entry.
Or: Stealing only from the end of the previous entry/entries should be
simpler, but doesn't let us choose some specific pages to steal in order
to gain a big enough contiguous page range:
typedef struct MDB_freelist_entry { /* freelist entry in the DB */
short mf_len; /* saved length */
short mf_stolen_entries; /* #fully stolen entries */
short mf_nextlen; /* 0 or remaining length of next entry */
MDB_ID mf_pages[]; /* length mf_len. */
} MDB_freelist_entry;
Thus, if the free DB contains
(txnid_t)123 => { .mf_stolen_entries = 1, .mf_nextlen = 7 }
(txnid_t)124 => { ... }
(txnid_t)125 => { .mf_len = 20 }
then mdb should henceforth skip entry#124 and entry#125.mf_pages[7..19].
A simple variant of page ranges, to save space and simplify range handling:
/* Page range: (pagecount << MDB_PGNO_BITS) | (pageno + pagecount) */
typedef pgno_t mdb_pages_t;
Lone pages get pagecount=1. With MDB_PGCOUNT_BITS = (64bit 4 ? 19 : 12)
and page size 4096, that limits MDB to a 128 petabyte DB and 2G entry
size. Or 4G database and 16M entry size on 32-bit machines. (I'd call
limiting the entry size a bonus compared to today's mdb: The current
freelist doesn't exactly handle 2 billion freed pages gracefully.)
--
Hallvard