Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database dump format #96

Closed
zerebubuth opened this issue Jul 28, 2016 · 5 comments
Closed

Database dump format #96

zerebubuth opened this issue Jul 28, 2016 · 5 comments

Comments

@zerebubuth
Copy link
Collaborator

Continued from #78:

@zerebubuth:

Do you think it would be worth considering the "directory" dump format? That would allow parallel dumps as an improvement over the custom format, and might allow individual tables to be compressed using a seekable compression for parallel access. Although I'm not sure about that last bit...

@pnorman:

It would improve dump parallelism, but restoring from either custom or directory can do tables in parallel and neither can do a single table in parallel.

If I'm doing a dump I normally do the custom format because its slightly easier to send around, and z 9 if space or bandwidth matters.

If we did move to directory we'd need to make sure not to put too much load on the DB at the start of the backup.

For the one-time backup and restore directory might be better.

Although pg_dump and pg_restore don't support any single-table parallelism, could we reduce dump time (which runs every week) by using directory format?

Planet generation which, as always, is slipping ever later in the week, is also limited by table parallelism, but we could code around that as long as the (compressed) table is seekable. Which might be possible while retaining backwards compatibility with pg_restore by using a "seekable" gzip variant.

It looks like pg_dump / pg_restore use libz directly, so would require some software changes to make them use anything seekable. 😞

But if we thought that might be a good idea, then it could be worth trying to get a patch upstream to handle external compression commands.

@tomhughes
Copy link
Member

As far as I know custom format is far and away the best format for just about everything except if you want to dump tables in parallel, or need to get SQL out for some reason. I never use anything else and would be extremely reluctant to do so here.

Dumping tables in parallel is just going to impact on "real" use of the database to start with.

@zerebubuth
Copy link
Collaborator Author

I thought the "directory" format was the same as the "custom" format, but with one "custom" file per table in a directory?

What are the ways in which "custom" format is better than "directory"? Is it a significantly smaller file than a tar of the directory? Does it support things which "directory" doesn't?

From the pg_dump documentation:

The most flexible output file formats are the "custom" format (-Fc) and the "directory" format(-Fd). They allow for selection and reordering of all archived items, support parallel restoration, and are compressed by default. The "directory" format is the only format that supports parallel dumps.

It would be more disruptive to dump in parallel, but the amount of available paralellism (i.e: large tables) is fairly limited, so the majority of the time would be spent at 2-3x more load than the serial dump. On the other hand, the total quantity dumped is the same, so by increasing the load, it can shorten the total dump time and that's less time it's impacting on "real" use.

Alternatively, once we have karm and karm+1, it may be possible to do a dump from the replica quickly enough to avoid getting killed by WAL overwriting.

@Komzpa
Copy link

Komzpa commented Aug 18, 2016

pg_basebackup can make a dump of running db quickly ensuring database is prepared for backup and doing a copy of database files.
https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html

@pnorman
Copy link
Collaborator

pnorman commented Mar 3, 2019

I was working on some pg_dump work, and all the postgres people I talk to are currently recommending directory format as the default.

I'm using custom for a lot of my stuff because the ease of having a single rather than a tarball or directory is important, but I don't think those reasons apply here.

@Firefishy
Copy link
Member

Since the pg_dump native format backup is a requirement for https://github.com/zerebubuth/planet-dump-ng (generates weekly planet dumps) I think this is decided.

There is an open question on if we should also have pg_basebackups when combined with WAL backups to allow point-in-time recovery. Likely using https://pgbackrest.org/ sending the backups to S3. This likely best handled via an upcoming ticket on pitr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants