Creating a Backup Strategy

Stefan van der Berg

When I started working at Telogis (now Verizon Connect), the database backup system was very rudimentary, consisting only of backups of the most important tables, due to the sheer size of each database. I was simply tasked with developing a proper backup strategy that we could rely on to keep our data safe and to give us peace of mind.
What I Achieved:
I updated my Python programming and AWS EC2 skills, and wrote a simple snapshotting script using the boto3 library. I also added new volumes to each database server to store enough transaction logs, so I could restore databases to an exact point in time.
As I started practicing restoring these databases, I got the idea to develop a Python script to do the database restores for me. The script would search for the correct data snapshot and transaction log snapshot based on the restore time and database name provided. This was a very complex task to achieve, and I leveraged many years of Linux experience to be able to mount any filesystem type, and to get the database environment ready before starting the database restore, with support for tablespaces.
Today, that script can automatically create new AWS server instances, and restore the database to any point in time in the past week (depending on snapshot availability), or just bring the database up as quickly as possible, or create a read only replica streaming from the live database. It can restore multiple databases on one server, and allows options to create any AWS instance size and storage type, all while maintaining network isolation and security to protect the production instances.
I then updated my node.js skills, and developed a front-end to manage this process, using the aws-sdk toolkit to achieve this. The application takes care of the complex security requirements by using SSO tokens and the AWS passrole feature to temporarily grant a server the necessary permissions to do the restore, and then to revoke those permissions.
Today, my whole team uses this front-end to restore databases, and we can bring any database up within 5 minutes. This has really enhanced the database team's abilty to rapidly restore accidentally deleted or corrupted data, safely test database upgrades or SQL fixes in Production, without affecting or harming the live databases.
I also developed another Python script that uses functions from the restore script to automatically test database restores daily. I used my skills in Saltstack automation, to ensure we get alerted if any database known by the automation system is not being backed up, or if the restore failed.
Like this project

Posted Oct 11, 2021

Join 50k+ companies and 1M+ independents

Contra Logo

© 2025 Contra.Work Inc