Skip to content
This repository has been archived by the owner on Aug 3, 2022. It is now read-only.

Add backup and restore for bootstrapping #75

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

theganyo
Copy link
Member

@theganyo theganyo commented Apr 22, 2017

Submitted for feedback
Untested. Don't merge.

Copy link
Contributor

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good approach -- do a backup on a schedule and POST it to an endpoint, which then would take care of the rest. Other than one thing I saw in the code I have a few ideas.

First, I see that the changeservers do the backup on a schedule. What if the backup attempt fails? Can each one log the error and try again in less time using an exponential backoff? That way we might survive a temporary glitch better.

Same on a boot of a new changeserver -- if it can't get the backup, what does it do?

I also see that the backup server is trying to manage storage by having a bunch of separate files. GCP isn't like a regular filesystem -- you can overwrite the same "file" and the whole thing will either change atomically or fail. So maybe we only just need one GCS "file" that we keep overwriting.
(Also maybe we should checksum the file first, so we can skip the GCS update if nothing changed since the last backup. Otherwise we keep paying money to overwrite the file over and over. I think that GCS has a way to do this easily.)

Finally, I'm not sure I like the idea of generating a "secret" and putting it in a parameter. At the very least we'll need a way to rotate it. I'm not sure what else to do though -- we'll have to think about it. (One thing we can do is use GCP IAM for this.)

I think this is a clever usage of Cloud Functions. Is there some way that we can unit-test that code using regular node?

if err != nil {
return err, false
}
defer req.Body.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "res.body.Close()" here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. :) Thanks.

@theganyo
Copy link
Member Author

Yeah, I intentionally just allowed the startup to fail if was told to restore, but can't. It could to a retry... or some external process could do a retry. As for the backup, yes, as backoff would definitely make sense.

What you see with the backup server isn't it using different files for different versions... it will actually overwrite the same file as you suggest. The ability to have multiple file IDs is to allow more than one cluster of change server (ie. with different tenant information) to backup to the same bucket. But, yes, a checksum could potentially be used to avoid rewriting the file... or perhaps the change servers could realize that they have nothing new to backup and not even try.

Finally, I agree the "secret" thing isn't a fantastic solution. It was just a hack to enable some kind of basic level of authz. I'm not sure of the correct solution here.

@gbrail
Copy link
Contributor

gbrail commented Apr 27, 2017

Still playing with an automated presubmit check. It failed because this branch is based on an older master -- not a real failure yet.

@gbrail
Copy link
Contributor

gbrail commented May 5, 2017

Scott -- I haven't merged this because it says "not ready yet." Do we have a plan to get it ready?

@theganyo
Copy link
Member Author

theganyo commented May 5, 2017

It still needs unit tests at a minimum. Also probably needs a more thoughtful approach to authz. There is no plan to finish this at present.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants