-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database dump format #96
Comments
As far as I know custom format is far and away the best format for just about everything except if you want to dump tables in parallel, or need to get SQL out for some reason. I never use anything else and would be extremely reluctant to do so here. Dumping tables in parallel is just going to impact on "real" use of the database to start with. |
I thought the "directory" format was the same as the "custom" format, but with one "custom" file per table in a directory? What are the ways in which "custom" format is better than "directory"? Is it a significantly smaller file than a From the
It would be more disruptive to dump in parallel, but the amount of available paralellism (i.e: large tables) is fairly limited, so the majority of the time would be spent at 2-3x more load than the serial dump. On the other hand, the total quantity dumped is the same, so by increasing the load, it can shorten the total dump time and that's less time it's impacting on "real" use. Alternatively, once we have karm and karm+1, it may be possible to do a dump from the replica quickly enough to avoid getting killed by WAL overwriting. |
pg_basebackup can make a dump of running db quickly ensuring database is prepared for backup and doing a copy of database files. |
I was working on some pg_dump work, and all the postgres people I talk to are currently recommending directory format as the default. I'm using custom for a lot of my stuff because the ease of having a single rather than a tarball or directory is important, but I don't think those reasons apply here. |
Since the There is an open question on if we should also have pg_basebackups when combined with WAL backups to allow point-in-time recovery. Likely using https://pgbackrest.org/ sending the backups to S3. This likely best handled via an upcoming ticket on pitr. |
Continued from #78:
@zerebubuth:
@pnorman:
Although
pg_dump
andpg_restore
don't support any single-table parallelism, could we reduce dump time (which runs every week) by using directory format?Planet generation which, as always, is slipping ever later in the week, is also limited by table parallelism, but we could code around that as long as the (compressed) table is seekable. Which might be possible while retaining backwards compatibility with
pg_restore
by using a "seekable" gzip variant.It looks like
pg_dump
/pg_restore
uselibz
directly, so would require some software changes to make them use anything seekable. 😞But if we thought that might be a good idea, then it could be worth trying to get a patch upstream to handle external compression commands.
The text was updated successfully, but these errors were encountered: