Database Backups

It is frequently desirable to back up the structure and contents of databases, and the core utility that handles backing up information from a Postgres database is pg_dump. Remember, if you delete or change things in a database, there is no undo button! And so making regular backups of important databases should be a routine that you adopt. Moreover, backing up a database can also double as an effective way to duplicate the information to another user or database, and can thus often be useful for information transfer. The content below will highlight some parameters and options that pg_dump can use, but you can also check out the official documentation here.

Warning

pg_dump is a command-line or terminal command, and thus your terminal needs to be configured to know where the program resides. It is included as part of the normal PostgreSQL server install, so you already have it residing on your computer. If you have already configured your path earlier this semester so that you can run psql, then you are good to go. If you can not run psql on your system, then neither will you be able to run pg_dump. You’ll have to fix your system PATH as described earlier in the semester for it to recognize the pg_dump command.

Like most command-line applications, pg_dump takes a variety of arguments after the initial command. Some of these are mandatory, but most are optional. In general, the structure looks like:

pg_dump [connection options][backup options] dbname

Connection Options

The connections options set the specific ways in which you connect to your database of choice. There are defaults for all of these, but you may need to be specific. In general, you have all of these options already specified in whatever client you have been using to connect to your database, so you could always check there. Some of the most common to be aware of are:

Name	Flag	Description
Hostname	`-h host`	Provides the hostname or IP address of where the database server resides
Username	`-U username`	Provides the username of who is connecting. Defaults to OS user, but most should probably set it to `postgres`
Port	`-p port`	Provides the system port to connect to. Defaults to 5432, and should only need to be set if you changed the server port

Backup Options

The backup options finetune how you want the backup to proceed. Most of these are optional, but there are some I would suggest setting, and some that may be more appropriate to however you stored your tables.

Name	Flag	Description
Output file	`-f filename`	The filename you want to dump the database contents to. Should end in a `.sql` suffix.
Schema	`-n pattern`	If you need to just dump 1 or more specific schema of tables from an otherwise heavily populated database, this gives you a method to do so. All schema names that match the pattern will be dumped.
Ownership	`-O`	By default, the backup recreates all user permissions. If you want any user (not a superuser) to be able to restore the data, this is a good option to set.
Table	`-t pattern`	If you only want to backup specific table(s) for whatever reason, you can just use this flag. Only table names matching the pattern will be dumped.
Inserts	`--inserts`	This will force the backup to use insert statements instead of `COPY`. This is much slower to restore, and should only be used if you need the backup to be portable to another database variant.

Examples

Important

For all the following examples, suppose that the database mydb is running on the local computer and can be connected to with the postgres user.

Dump the entire mydb database into a file called backup.sql:

pg_dump -h localhost -U postgres -f backup.sql mydb

Dump only the schema named project from within the mydb database into a file called project.sql:

pg_dump -h localhost -U postgres -f project.sql -n project mydb

Dump only the table named best_table from within the mydb database into a file called backup.sql, ignoring user permissions:

pg_dump -h localhost -U postgres -f backup.sql -t best_table -O mydb

Storage

Once you have created your database dump, depending on the amount of data in your database, it might be larger than the 25MB limit that GitHub enforces. Note that because the dump is just a text file, you can often times zip it for significant compression savings. If it is still too large to upload to GitHub even after compressing, I would recommend uploading it to your Google Drive and then sharing it with me in that way. If that is still problematic, reach out to me directly and we’ll find another way for you to get the data to me.