[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: mdb meta pages
Hallvard Breien Furuseth wrote:
Could writing a word/byte to the current meta page break someting?
While I'm asking, why are metas separate pages, instead of simply
a fixed 256 or so bytes apart to keep them in separate cachelines?
The only reason I can think of is if a write gets garbled, the
other meta page is safe - but mdb assumes correct filesystem
operation anyway.
Because the fundamental unit of storage is a page. Writing to anything smaller
than a page requires the OS to read a full page and then update the portion of
it. Doing so from multiple processes would require file locking to prevent
corruption. Writes to separate pages are guaranteed not to interfere with each
other.
This is for a "syncdelay<count>" feature to replace "dbnosync".
The latter can break DB consistency after a system crash: Without
fdatasync(), the OS can reorder writes, leaving meta pages to
refer to trees with not written or overwritten data pages.
This should not be a new keyword. Just implement the <size> feature of the
checkpoint keyword.
syncdelay<count> will only sync every<count> or maybe<count>/2
commit. It'll need 4 meta pages, of which 2 may refer to unsynced
data pages. mdb_env_sync() may then need to write a "synced" flag
to the current meta page, or do a dummy write transaction which
sets with a "synced data pages" flag in its meta page. The latter
would have to wait out any existing/pending write transactions.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
- References:
- mdb meta pages
- From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>