-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different reset modes during import #26
Conversation
f41f27a
to
6b5d600
Compare
…port Currently the only mode is the existing behaviour, however this opens the door to other approaches.
This is unlikely to be common, however it provides an escape hatch for advanced users who want to do their own thing.
This is slightly wassteful, but should mean we cheaply get good coverage over all the interesting model relations we come up with.
Sequences don't nicely fit into one of just schema or data, they're somewhat inherently both. Given that Django's "loaddata" over-writes existing rows in tables, it seems reasonable to do something similar for sequences -- even if that means we actually drop the sequence and fully re-create it.
6b5d600
to
b94d9df
Compare
@lirsacc this may be of interest as I think you've wanted this before too. |
Thanks for this Peter! I've not done a line by line review yet, I need to let this one stew for a few days if that's ok? The problem makes sense, having a solution in this codebase also makes sense, and this looks like a reasonable option. A few questions/alternatives to discuss....
|
Yup, sure.
Good thoughts 👍, thanks for the prompt review :)
👍 will have a think.
I think the main one is that it assumes that the existing database is close enough to what the current codebase defines that it can use the current models to empty the database. There are various ways that might not be true (even just being on a different branch might hit it) which could either fail loudly or not clear things out that the user might expect would be removed. One thing which has just occurred to me and I suspect isn't covered by tests here (and is almost certainly a 🐞) -- the django migrations table isn't being cleared out, so that'll need addressing.
Aside from the limitations of the alternatives, I think there's a couple of reasons to keep the
Note that neither the
Interesting, I hadn't thought about pushing this down to that layer. This might fit alongside the Thinking it through, another source of complexity would be handling the case of a single model with several strategies. For me this feels like it'd be getting too complex and having the reset separate is going to be quite a bit simpler. I guess it depends on how advanced you want this tool to get in terms of what it offers around working with an existing database.
Perhaps. Feels like this might need a bit more thinking -- currently there's no relationship between these modes and the later processing. I'll admit sequences weren't something I'd thought about until actually building this, so they ended up getting a simple tweak rather than being designed for explicitly.
Hadn't thought about making it a setting. While my use-case is one where I mostly want a different mode in different environments, it's not clear to me that that would always be the case nor that that's necessarily a good motivation for a setting. My immediate thought is that this feels like a thing you want to control when you run the command (or write the wrapper script) rather than something that's a property of the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: pushing down into strategies...
Thinking it through, another source of complexity would be handling the case of a single model with several strategies. For me this feels like it'd be getting too complex and having the reset separate is going to be quite a bit simpler.
Very good point, I think this answers this one entirely for me, thanks!
Re: command line flag...
While my use-case is one where I mostly want a different mode in different environments, it's not clear to me that that would always be the case nor that that's necessarily a good motivation for a setting. My immediate thought is that this feels like a thing you want to control when you run the command (or write the wrapper script) rather than something that's a property of the codebase.
Sounds good. Let's start with a command-line flag. I think you're right that this is a property of the particular development environment and the current aim of the developer at a point in time, rather than a configuration around the runtime environment that might be configured with something like Configurations. We can always add a setting later if there's any desire for it.
Thanks for all your time on this and on replying in detail to the comments! Let's go with the approach in this PR. I've done a line by line review now.
Don't forget the potential bug about the handling of the migrations table, if it is necessary.
This happens in self.dump_data_for_import, likely I missed this in 78f06a3.
This introduces a util to simplify this and adds usage to a couple of places which had been missed.
Essentially we get a merging of the two sources of data.
beef078
to
219e810
Compare
Importing, even in the 'none' case, is potentially destructive so we want to give the user a chance to bail.
Also includes docs for the Postgres sequences extra.
@danpalmer thanks for the review. I think I've addressed everything now -- both the TODOs and the review comments, so would be great if you had time for another pass. Hopefully review by commit of the new changes is clear enough, but happy to answer any further questions you have. |
Ah, just for clarity: I still haven't done any manual testing here and probably won't have time this side of the new year. The updates since the first review do include better testing though so I'm a little more confident in how this will behave now. Happy to wait & do some manual testing though if that's desired. |
@danpalmer hope you've had a good break, do you know when you might have a chance to have another look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this @PeterJCLaw. Go ahead and merge when you're ready!
with connection.cursor() as cursor: | ||
table_names = connection.introspection.table_names(cursor) | ||
|
||
models = connection.introspection.installed_models(table_names) | ||
|
||
if MigrationRecorder(connection).has_table(): | ||
models.add(MigrationRecorder.Migration) | ||
|
||
with connection.schema_editor() as editor: | ||
for model in models: | ||
editor.delete_model(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, should this be transactional? That way we'd either drop all the tables or none, which is probably better than maybe dropping some?
A failure mode here is if the user has permission for some of the tables but not all.
Given that we're not dropping the database it might even be nice to have the whole import be transactional, but that likely requires deeper changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this as-is for the moment -- following the pattern of the drop-database mode which can also leave things part complete (even though that's perhaps less likely).
for model in models: | ||
editor.delete_model(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A progress bar here might be nice. And/or some generic feedback that the "reset" portion of the import has completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this as-is. I still think that a progress report here would be good, however making it clear that this is on the reset side feels non-trivial without some more general status reporting during import.
Have now tested this in the use-case I originally had in mind, it seems to work :) |
This introduces support for "reset modes"1 -- different approaches for ensuring a clean database before importing the fresh data. Initial support is for three modes:
drop-database
: the default; drops the database & re-creates it.drop-tables
: drops the tables the Django codebase is aware of, useful if the Django database user doesn't have access to drop the entire database.none
: no attempt to reset the database, useful if the user has already manually configured the database or otherwise wants more control over setup.Using the second of these in a staging-type environment is the main motivation for this change.
This PR also updates the existing
PostgresSequences
extra to cope with a sequence of a given name already being present by overwriting it. This doesn't feel like the best solution (the result if the import somehow lists the same sequence twice may be unexpected), but seemed probably the simplest for now given what devdata is typically used for.Review by commit may be useful, they're in a hopefully sensible order for that.
Testing is achieved by adding reset modes to all existing import tests as well as creating a few more. I've also tested this against the use-case I originally had (importing against an existing database part of a staging environment).
Fixes #23
TODO:
drop-tables
mode -> tested & fixed in 4227139none
when importing over the top of an existing matching schema -> test demonstrating result in 84a2b03Footnotes
This doesn't feel like the best name, however "strategies" is already taken in this project and I didn't want to confuse that. Open to changing this if we can come up with something better! ↩