[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Is putting slapd into read-only mode sufficient for backups?
On Fri, 10 Feb 2012, Buchan Milne wrote:
> On Friday, 10 February 2012 01:48:45 Quanah Gibson-Mount wrote:
...
> > I thought I was very clear on that in my last email. It is not
> > sufficient. You need to stop slapd and run *db_recover*, which is more
> > exhaustive than db_checkpoint, if you want to go the route of backing
> > up the BDB db.
>
> If you checkpoint, and you backup all the database files (including
> transaction log files) in the correct order, you should not need to
> db_recover
If that's all that you require at backup time, then in order to guarantee
correctness *at restore time* you have to perform "catastrophic" recovery
(ala db_recover -c) on the restored database before trying to use it.
That's necessary if a checkpoint occurs between when you start copying .db
files and when you copy the last transaction log file.
The optimized procedure that I worked out with Sleepycat's help (for a
completely different program, but using the "transaction data store") was
this:
** Backing up the database environment is done with the following
** steps:
** 0) all txn log files except the current one are copied to
** the backup
** 1) a checkpoint is taken
** 2) the list of txn log files that are no longer needed for
** recovery or txn_abort is obtained
** 3) the LSN of the most recent checkpoint is noted
** 4) all the database table files, including queue extents,
** are copied to the backup
** 5) all the txn log files that were not copied in step (0)
** are copied to the backup
** 6) if a checkpoint has *not* taken place since step (3),
** then the database is marked as not needing catastrophic
** recovery when restored
** 7) if the list from step (2) is not empty, then those txn
** log files are removed from the active database environment
** and are marked in the backup as unnecessary for normal
** restoration
**
** Note that the ordering of this is almost completely inflexible.
** In particular:
** (0) must preceed (5)
** (1) must preceed (2) and (3)
** (2) and (3) must preceed (4)
** (4) must preceed (5)
** (5) must preceed (6) and (7)
**
** Minimizing the time between (3) and (6) is highly desirable,
** as that minimizes the window in which a checkpoint could
** occur that would result in a backup that would require
** catastrophic recovery when restored. Restoring such a
** backup is *much* slower than restoring one that only requires
** normal recovery. That's why (0) and (7) are pushed forward
** and backward to where they are.
For those trying to script this, you can get the LSN of the most recent
checkpoint with
db_stat -t | awk '$2 ~ /^File\/offset/{print $1; exit}'
Philip Guenther