Matt Segal Dev

How to backup and restore a Postgres database

Thu 04 June 2020, by Matthew Segal
Category: DevOps

You've deployed your Django web app to to the internet. Grats! Now you have a fun new problem: your app's database is full of precious "live" data, and if you lose that data, it's gone forever. If your database gets blown away or corrupted, then you will need backups to restore your data. This post will go over how to backup and restore PostgreSQL, which is the database most commonly deployed with Django.

Not everyone needs backups. If your Django app is just a hobby project then losing all your data might not be such a big deal. That said, if your app is a critical part of a business, then losing your app's data could literally mean the end of the business - people losing their jobs and going bankrupt. So, at least some of time, you don't want to lose all your data.

The good news is that backing up and restoring Postgres is pretty easy, you only need two commands: pg_dump and pg_restore. If you're using MySQL instead of Postgres, then you can do something very similar to the instructions in this post using mysqldump.

Taking database backups

I'm going to assume that you've already got a Postgres database running somewhere. You'll need to run the following code from a bash shell on a Linux machine that can access the database. In this example, let's say you're logged into the database server with ssh.

The first thing to do is set some Postgres-specifc environment variables to specify your target database and login credentials. This is mostly for our convenience later on.

# The server Postgres is running on
export PGHOST=localhost
# The port Postgres is listening on
export PGPORT=5432
# The database you want to back up
export PGDATABASE=mydatabase
# The database user you are logging in as
export PGUSER=myusername
# The database user's password
export PGPASSWORD=mypassw0rd

You can test these environment variables by running a psql command to list all the tables in your app's database.

psql -c "\dt"

# Output:
# List of relations
# Schema | Name          | Type  | Owner
#--------+---------------+-------+--------
# public | auth_group    | table | myusername
# public | auth_group... | table | myusername
# public | auth_permi... | table | myusername
# public | django_adm... | table | myusername
# .. etc ..

If psql is missing you can install it on Ubuntu or Debian using apt:

sudo apt install postgresql-client

Now we're ready to create a database dump with pg_dump. It's pretty simple to use because we set up those environment variables earlier. When you run pg_dump, it just spits out a bunch of SQL statements as hundreds, or even thousands of lines of text. You can take a look at the output using head to view the first 10 lines of text:

pg_dump | head

# Output:
# --
# -- PostgreSQL database dump
# --
# -- Dumped from database version 9.5.19
# -- Dumped by pg_dump version 9.5.19
# SET statement_timeout = 0;
# SET lock_timeout = 0;
# SET client_encoding = 'UTF8';

The SQL statements produced by pg_dump are instructions on how to re-create your database. You can turn this output into a backup by writing all this SQL text into a file:

pg_dump > mybackup.sql

That's it! You now have a database backup. You might have noticed that storing all your data as SQL statements is rather inefficient. You can compress this data by using the "custom" dump format:

pg_dump --format=custom > mybackup.pgdump

This "custom" format is ~3x smaller in terms of file size, but it's not as pretty for humans to read because it's now in some funky non-text binary format:

pg_dump --format=custom | head

# Output:
# xtshirt9.5.199.5.19k0ENCODINENCODING

# SET client_encoding = 'UTF8';
# false00
# ... etc ...

Finally, mybackup.pgdump is a crappy file name. It's not clear what is inside the file. Are we going to remember which database this is for? How do we know that this is the freshest copy? Let's add a timestamp plus a descriptive name to help us remember:

# Get Unix epoch timestamp
# Eg. 1591255548
TIME=$(date "+%s")
# Descriptive file name
# Eg. postgres_mydatabase_1591255548.pgdump
BACKUP_FILE="postgres_${PGDATABASE}_${TIME}.pgdump"
pg_dump --format=custom > $BACKUP_FILE

Now you can run these commands every month, week, or day to get a snapshot of your data. If you wanted, you could write this whole thing into a bash script called backup.sh:

#!/bin/bash
# Backs up mydatabase to a file.
export PGHOST=localhost
export PGPORT=5432
export PGDATABASE=mydatabase
export PGUSER=myusername
export PGPASSWORD=mypassw0rd
TIME=$(date "+%s")
BACKUP_FILE="postgres_${PGDATABASE}_${TIME}.pgdump"
echo "Backing up $PGDATABASE to $BACKUP_FILE"
pg_dump --format=custom > $BACKUP_FILE
echo "Backup completed"

You should avoid hardcoding passwords like I just did above, it's better to pass credentials in as a script argument or environment variable. The file /etc/environment is a nice place to store these kinds of credentials on a secure server.

Restoring your database from backups

It's pointless creating backups if you don't know how to use them to restore your data. There are three scenarios that I can think of where you want to run a restore:

I'll go over these scenarios one at a time.

Restoring from scratch

Sometimes you can lose the database server and there is nothing left. Maybe you deleted it by accident, thinking it was a different server. Luckily you have your database backup file, and hopefully some automated configuration management to help you quickly set the server up again.

Once you've got the new server provisioned and PostgreSQL installed, you'll need to recreate the database and the user who owns it:

sudo -u postgres psql <<-EOF
    CREATE USER $PGUSER WITH PASSWORD '$PGPASSWORD';
    CREATE DATABASE $PGDATABASE WITH OWNER $PGUSER;
EOF

Then you can set up the same environment variables that we did earlier (PGHOST, etc.) and then use pg_restore to restore your data. You'll probably see some warning errors, which is normal.

BACKUP_FILE=postgres_mydatabase_1591255548.pgdump
pg_restore --dbname $PGDATABASE $BACKUP_FILE

# Output:
# ... lots of errors ...
# pg_restore: WARNING:  no privileges were granted for "public"
# WARNING: errors ignored on restore: 1

I'm not 100% on what all these errors mean, but I believe they're mostly related to the restore script trying to modify Postgres objects that your user does not have permission to modify. If you're using a standard Django app this shouldn't be an issue. You can check that the restore actually worked by checking your tables with psql:

# Check the tables
psql -c "\dt"

# Output:
# List of relations
# Schema | Name          | Type  | Owner
#--------+---------------+-------+--------
# public | auth_group    | table | myusername
# public | auth_group... | table | myusername
# public | auth_permi... | table | myusername
# public | django_adm... | table | myusername
# .. etc ..

# Check the last migration
psql -c "SELECT * FROM django_migrations ORDER BY id DESC LIMIT 1"

# Output:
#  id |  app   | name      | applied
# ----+--------+-----------+---------------
#  20 | tshirt | 0003_a... | 2019-08-26...

There you go! Your database has been restored. Crisis averted.

Rolling back an existing database

If you want to roll your existing database back to an previous point in time, deleting all new data, then you will need to use the --clean flag, which drops your restored database tables before re-creating them (docs here):

BACKUP_FILE=postgres_mydatabase_1591255548.pgdump
pg_restore --clean --dbname $PGDATABASE $BACKUP_FILE

Restoring a dev environment

It's often beneficial to restore a testing or development database from a known backup. When you do this, you're not so worried about setting up the right user permissions. In this case you want to completely destroy and re-create the database to get a completely fresh start, and you want to use the --no-owner flag to ignore any database-user related stuff in the restore script:

sudo -u postgres psql -c "DROP DATABASE $PGDATABASE"
sudo -u postgres psql -c "CREATE DATABASE $PGDATABASE"
BACKUP_FILE=postgres_mydatabase_1591255548.pgdump
pg_restore --no-owner --dbname $PGDATABASE $BACKUP_FILE

I use this method quite often to pull non-sensitive data down from production environments to try and reproduce bugs that have occured in prod. It's much easier to fix mysterious bugs when you have regular database backups, error reporting and centralized logging.

Next steps

I hope you now have the tools you need to backups and restore your Django app's Postgres database. If you want to read more the Postgres docs have a good section on database backups.

Once you've got your head around database backups, you should automate the process to make it more reliable. I will show you how to do this in this follow-up post.

If you have any feedback or questions email me at [email protected]