-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for how to do an optional backup? #6
Comments
I think it makes sense to have a backupdir option and maybe an auto recover option, so if the upgrade fails it restores the backup into the original location. |
Interesting idea. Yeah, it probably wouldn't be too hard automatically recover the original data back to the starting location. It'll take some more detailed thinking about when coming to implement it. 😄 |
Hey, is there any chance some backup option as proposed here will be added in the near future? We are considering to use this, but we are a little worried that something will go wrong eventually and we'd like to have some very quick rollback option available just in case 😃 |
@j-hellenberg What's the approach you're currently using for backing up the database? 😄 |
@j-hellenberg Oh, as another relevant data point, how big are your databases on disk generally, and do they generally have enough free space on their disk partitions for 2 full copies of the database? |
@justinclift Currently, we are performing full backups of our Kubernetes Persistent Volumes once per week. Losing up to a week's worth of data is manageable, though, obviously, far from ideal. So far, we got lucky and did not have to use one of these backups. Our database sizes are still pretty manageable, I think the biggest is about 1GB in size on disk. The data volumes are all large enough to support two full database copies. Still, I think it makes sense for Side note: You mentioned that you only wanted to use a directory for backups if it is empty. There are certainly use cases where it makes sense to do it like this, but I just thought of another option, just as an idea: We could allow overwriting an old backup if the current version Y of the database is newer than the database version X of the backup. In this case, we are essentially already one upgrade step ahead (upgrading from Y to Z now), meaning we can be fairly certain by now that the last upgrade must have succeeded. In that case, we can safely remove the old backup of version X. Doing it like that would remove the manual effort of having to manually delete the old backup before each new upgrade. |
@justinclift Just a friendly reminder, is there any update on this? :) |
Not yet. 😬 |
Does recommending a multi-step approach like this make sense? https://github.com/pgautoupgrade/docker-pgautoupgrade/wiki/Automating-The-Upgrade The user can simply add a backup stage before the upgrade as part of the startup order (whether docker compose, Kubernetes, etc). This way they have the flexibility to do a pgdump, directory copy, upload to object storage, copy to bind mount dir, whatever they want (using whatever container they wish too). It's a tradeoff between creating and maintaining backup options as part of this container, or having pgautoupgrade only do one function: upgrade postgres.
It reality it is quite helpful to have at least some basic backup and rollback capability built in. So perhaps a minor suggestion to your original proposal: why not create the backup in the PG_DATA dir to save the additional bind mount? As you say:
|
This seems like a decent idea. We'd want to do a disk space check first too, to ensure there's enough free space to copy the directory. In theory that shouldn't be too hard to check, probably some |
i dont really like the empty folder idea. i would create a folder in that backup dir based on the upgrade step. so something like 14_15 if we upgrade from 14 to 15. Space shouldnt really be a concern in my opnion, you need a really huge db on a very small system ← should be a very rare combo. also checking available space should be very easy to add before doing an backup, just make sure to not do an upgrade if the user wants an backup without enough space. |
Yeah, something along those lines seems like a decent idea. That should also help to not overwrite things if the database automatically upgrades several releases over time too. We could probably make that folder name even more descriptive too. Maybe a timestamp and extra user friendly text?
Nah, I've literally seen this kind of problem happen on production systems before. For example, on a long running database host it's starting to get worryingly close to the upper safety thresholds of disk usage before needing to figure out a path for upgrading the host. Also, things like simple mistakes (running a backup accidentally goes to the wrong file system) can cause the same kind of failure mode too. That being said, we only need to check that stuff when the user has indicated they want a backup to happen. 😄 |
I like the idea of a very descriptive directory 👍 Implementation so far based on discussion:
Let me know if I missed anything or if there are additional ideas 😃 I might get some time to take a stab at this in the coming months, but otherwise anyone feel free to tackle it in the meantime 🙏 |
Yep, that sounds like the right kind of thing. Unexpected edge gotchas may turn up while doing the actual implementation (it happens), but as a reasonable plan that looks fine. With the |
also we should make it atomic, im thinking about a cluster when a node gets offline the cluster starts the container on another node, if the first node comes back online it would stil have the container running until the cluster manges it. so we should make sure that never a second container would write or modify the dir another potential container does things. timestamps could couse an edge case where a container gets constantly restartet because of a failure and would create endless dirs and potentially fill up disk space, so we should have such edge cases in mind too |
Isn't that something it's up to the container management framework to ensure? ie that if our container is mounted and running, it's not going to be messed around with anything / or mess anything else up Doesn't really seem like the job of our container to ensure there aren't other cluster members still running (etc). Unless I'm misunderstanding the example scenario? 😄 |
I think it makes sense to have this as a
Also, with the proposed implementation, I'm still not sure whether it is ideal that successive upgrades will continue to fill up the disk space with multiple parallel copies of the database. For example, when we upgrade from 12 to 16 in steps, we end up with the main data and four full copies on top ( |
It's probably a better idea to do that rather than overwrite a previous backup. If we go with the assumption that If someone using the container sets the option but doesn't ensure they have enough space before hand, that's up to them to resolve. I mean, we should still play nice with things (ie detect and warn/error/etc), but lets not over think it? 😄 |
in theory yes, but its some kind of edge case. and on the other hand my healthcheck implements this already (need to be modified a bit i think for the backup thing). But you are right that the orchestrator should not allow this but i mean i had this case with my swarm cluster. |
sorry, I am looking through tickets and though I could maybe work on this someday. I like the implementation suggested @spwoodcock. The only thing that worries me is successive failures: Likely we exit the container with some non-zero exit code, so any orchestration software will usually try to restart it (if configured). This will quickly fill up the disk if not detected. But again if the whole thing is opt-in, then likely the cluster admin observes the automatic upgrade and would note when multiple containers have failed before. Another possibility would be that |
We could probably set some flag/file/something at the start of upgrades, and clear it when upgrades complete. Then we could check for the presence of the flag (before the code which sets it), which would imply that the previous run crashed before completing. This would let us exit early with some kind of output message for the admins "Previous upgrade seems to have crashed, please have an admin manually review before removing the $FOO file". |
At present, the docker container does an in-place upgrade, with no capability for backing things up (just in case).
We should probably support some kind of useful, optional, backup capability.
So far, my initial thinking is to have some environment variable to define a new mount point (
PGAUTO_BACKUPDIR
or similar), and if that's set to a valid value pointing at an empty directory then we use it.For the backup type, I reckon we should just go with a "full data copy" approach rather than a pgdump. At least for now.
It should (in theory) be reasonably simple. We'd use that "backupdir" as part of the pg_upgrade file moving process, running the
pg_upgrade
without the--link
option so things aren't upgraded in place.That way, the backup gets done, and the new database ends up in the correct directory. The downsides are the (potentially significant) extra space that gets used, and the extra time the whole upgrade process will take.
Optimally, we'd probably recommend people use a 2nd bind mounted directory for that "backupdir", so it's not a backup that disappears when the container stops.
Thoughts? 😄
The text was updated successfully, but these errors were encountered: