Although the introduction of physical backups in Percona Backup for MongoDB (PBM) made it possible to significantly cut the restore time for big datasets, it’s hard to beat snapshots in speed and efficiency. That’s why we introduced external backups (aka snapshot-based backup API) in PBM 2.2.0
The idea came from the requests to bring EBS snapshots into PBM. But we decided that, instead of focusing on a particular type, we would give users the ability to leverage their own backup strategies to meet their needs and environment.
How Percona’s snapshot-based backup API works
What PBM does during physical backup can be genuinely split into three phases.
First, it prepares the database and ensures the data files can be safely copied. Then, it copies files to the storage. And lastly, it returns the database to the non-backup state — closes backup cursors, etc. Something similar happens with the restore: prepare the cluster, copy files, and prepare data so the cluster can start in a consistent state. For more details, you can see the Tech Peek section in our blog post on physical backups, but for the purposes of this blog, those details don’t matter.
The new API literally breaks up the backup and restore process into these three stages. Giving the user full control over the data copy. So it can be either EBS or any other snapshot, or ‘cp -Rp’ … or whatever fits your needs.
To start, just run pbm backup -t external. PBM will notify when the data will be ready for copying with the prompt saying from exactly which node on each shard it should be done (backup needs to be done only from one node on each replica set). Then, when the snapshot(s) (data copy) is done, you have to tell PBM to finish the backup with pbm backup-finish
And that’s it. Restore follows the pattern. Start the restore with pbm restore [backup_name] -external, copy data files to every node of the corresponding replica set in the cluster when PBM prepares everything, and finish the restore with pbm restore-finish.
Restore your existing snapshots with PBM
The great thing is that you can restore snapshots taken without PBM. PBM creates backup metadata during an external backup, and if [backup_name] is provided for the restore, it will use it to check backup compatibility with the target cluster in terms of topology and PSMDB version and to define “restore-to-time.” But restore can be run with no [backup_name] perfectly fine. Just the checks will be omitted (that’s on you), and the “restore-to-time” will be picked from the provided data. PBM will look into the actual data during the restore and define the most recent common cluster time across all shards. Just be mindful that there is not much we can check and ensure for non-PBM snapshots. Another thing that PBM might need regarding the backup data is MongoDB storage options. These are preserved in the backup metadata as well, but in case of an existing snapshot, you can pass it via –config flag.
This feature was released as Technical Preview so that we can adjust it in further iterations following your feedback. So, give it a try and either leave your thoughts on the forum or file a Jira ticket.