Database dump format #96

zerebubuth · 2016-07-28T13:40:59Z

Continued from #78:

Do you think it would be worth considering the "directory" dump format? That would allow parallel dumps as an improvement over the custom format, and might allow individual tables to be compressed using a seekable compression for parallel access. Although I'm not sure about that last bit...

@pnorman:

It would improve dump parallelism, but restoring from either custom or directory can do tables in parallel and neither can do a single table in parallel.

If I'm doing a dump I normally do the custom format because its slightly easier to send around, and z 9 if space or bandwidth matters.

If we did move to directory we'd need to make sure not to put too much load on the DB at the start of the backup.

For the one-time backup and restore directory might be better.

Although pg_dump and pg_restore don't support any single-table parallelism, could we reduce dump time (which runs every week) by using directory format?

Planet generation which, as always, is slipping ever later in the week, is also limited by table parallelism, but we could code around that as long as the (compressed) table is seekable. Which might be possible while retaining backwards compatibility with pg_restore by using a "seekable" gzip variant.

It looks like pg_dump / pg_restore use libz directly, so would require some software changes to make them use anything seekable. 😞

But if we thought that might be a good idea, then it could be worth trying to get a patch upstream to handle external compression commands.

The text was updated successfully, but these errors were encountered:

tomhughes · 2016-07-28T13:45:12Z

As far as I know custom format is far and away the best format for just about everything except if you want to dump tables in parallel, or need to get SQL out for some reason. I never use anything else and would be extremely reluctant to do so here.

Dumping tables in parallel is just going to impact on "real" use of the database to start with.

zerebubuth · 2016-07-28T14:31:37Z

I thought the "directory" format was the same as the "custom" format, but with one "custom" file per table in a directory?

What are the ways in which "custom" format is better than "directory"? Is it a significantly smaller file than a tar of the directory? Does it support things which "directory" doesn't?

From the pg_dump documentation:

The most flexible output file formats are the "custom" format (-Fc) and the "directory" format(-Fd). They allow for selection and reordering of all archived items, support parallel restoration, and are compressed by default. The "directory" format is the only format that supports parallel dumps.

It would be more disruptive to dump in parallel, but the amount of available paralellism (i.e: large tables) is fairly limited, so the majority of the time would be spent at 2-3x more load than the serial dump. On the other hand, the total quantity dumped is the same, so by increasing the load, it can shorten the total dump time and that's less time it's impacting on "real" use.

Alternatively, once we have karm and karm+1, it may be possible to do a dump from the replica quickly enough to avoid getting killed by WAL overwriting.

Komzpa · 2016-08-18T22:08:18Z

pg_basebackup can make a dump of running db quickly ensuring database is prepared for backup and doing a copy of database files.
https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html

pnorman · 2019-03-03T18:13:49Z

I was working on some pg_dump work, and all the postgres people I talk to are currently recommending directory format as the default.

I'm using custom for a lot of my stuff because the ease of having a single rather than a tarball or directory is important, but I don't think those reasons apply here.

Firefishy · 2023-12-06T10:58:25Z

Since the pg_dump native format backup is a requirement for https://github.com/zerebubuth/planet-dump-ng (generates weekly planet dumps) I think this is decided.

There is an open question on if we should also have pg_basebackups when combined with WAL backups to allow point-in-time recovery. Likely using https://pgbackrest.org/ sending the backups to S3. This likely best handled via an upcoming ticket on pitr.

Firefishy closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database dump format #96

Database dump format #96

zerebubuth commented Jul 28, 2016

tomhughes commented Jul 28, 2016

zerebubuth commented Jul 28, 2016

Komzpa commented Aug 18, 2016

pnorman commented Mar 3, 2019

Firefishy commented Dec 6, 2023

Database dump format #96

Database dump format #96

Comments

zerebubuth commented Jul 28, 2016

tomhughes commented Jul 28, 2016

zerebubuth commented Jul 28, 2016

Komzpa commented Aug 18, 2016

pnorman commented Mar 3, 2019

Firefishy commented Dec 6, 2023