-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trashless Wagtail #8310
base: main
Are you sure you want to change the base?
Trashless Wagtail #8310
Conversation
a8f3c46
to
b710a17
Compare
e392897
to
14ce4e9
Compare
85b1b44
to
6b6248e
Compare
6b6248e
to
0906030
Compare
e5921f0
to
189314f
Compare
cfgov/archival/wagtail_hooks.py
Outdated
def add_archival_signal_reciever(request, page): | ||
from wagtail.models import Page | ||
|
||
pre_delete.connect(archive_page_data_receiver, sender=Page) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these signal handlers only hooked up and disconnected at the moment the deletions happen, as opposed to on startup (in app ready()
, for example)? Is there a downside to having them always hooked up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern was to control for deletion outside of an action taken in Wagtail. The practicality is, deletion during test runs triggers the signal handlers — which, could be fine, or there are other ways around it.
cfgov/archival/utils.py
Outdated
|
||
# Compare the last migration | ||
last_migration = get_last_migration(app_label) | ||
if last_migration != page_last_migration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential issue with using migrations as a compatibility check is that we may introduce migrations to a different app that nonetheless impact the page model being imported. For example, if we modify v1.CFGOVPage
, that'll generate a migration in the v1
app but not in the ask_cfpb
app, even though the ask_cfpb.AnswerLandingPage
model will be impacted.
Using migrations seems both not conservative enough (because it misses changes in other apps) but also potentially too conservative (a migration might be introduced in another model in the app, or to some non-breaking thing like help text). Is it too risky to remove this check entirely and just attempt the import, and (a) handle errors more nicely (as suggested above) and/or (b) let the user work out if things haven't been restored properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
The problem this is an attempt to solve is that you can write anything to a StreamField as JSON. but that doesn't mean it matches the current schema as Wagtail understands it. So you could import data with fields that no longer exist, and then when dumped into the edit, that data is just gone.
Most of our content is in content
fields, and if a descendent of CFGOVPage
modifies its content
field (I think they all do?), it has to redefine it, and will generate a migration for it. You're correct that if CFGOVPage.sidefoot
changes, trying to import a page of a type that doesn't modify it won't have that change reflected.
I think there's a balance here between being feature-complete and usable enough to test the hypothesis that we'll need to restore pages this way. It's probably worth talking to @csebianlander and @sonnakim to discuss what the editor experience should be (we had talked about it, but I don't see it reflected in the partial GHE search results ATM).
My view was this was a good-enough-to-start-with way to do some validation that a page import would be okay.
We could also track migrations all the way down the inheritance chain, but I think at that point my preference would be to remove the migration check entirely and let users fend for themselves. Because
Is it too risky to remove this check entirely
I think the risk is very low either way. Ultimately this is to enable the hypothesis-testing: do we need the ability to restore deleted content?
d17c997
to
4da7f30
Compare
f1a6b3d
to
aff1a96
Compare
Co-authored-by: Andy Chosak <[email protected]>
aff1a96
to
41e462f
Compare
These code changes propose using Django storage to store and serve the Wagtail deletion archive from PR 8310 [0], instead of using Apache to serve those files. This simplifies the configuration and makes it easier to test locally and easier to port in future to other webserver approaches. With these changes, the deletion archive is now only available behind the Wagtail admin login, at the URL /admin/__deleted/. This URL provides a nicely formatted Wagtail report view where archives can be downloaded (instead of the current PR approach that uses Apache directory serving). With this change, Django storage configuration has been migrated to settings.STORAGES from settings.STATICFILES_STORAGE, which was deprecated in Django 4.2 and removed in Django 5.1. This commit also includes a minor bugfix to the archive import template which doesn't currently properly display the destination page title. [0] #8310
These code changes propose using Django storage to store and serve the Wagtail deletion archive from PR 8310 [0], instead of using Apache to serve those files. This simplifies the configuration and makes it easier to test locally and easier to port in future to other webserver approaches. With these changes, the deletion archive is now only available behind the Wagtail admin login, at the URL /admin/__deleted/. This URL provides a nicely formatted Wagtail report view where archives can be downloaded (instead of the current PR approach that uses Apache directory serving). With this change, Django storage configuration has been migrated to settings.STORAGES from settings.STATICFILES_STORAGE, which was deprecated in Django 4.2 and removed in Django 5.1. Additionally, with this change, cf.gov deployments that have not defined the WAGTAIL_DELETION_ARCHIVE_PATH environment variable still support importing archives; the archive-on-delete and archive download functionality is still disabled in those cases. This commit also includes a minor bugfix to the archive import template which doesn't currently properly display the destination page title. [0] #8310
Use Django storage for Wagtail deletion archive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it! A couple minor cleanup things.
STORAGES["default"] = { | ||
"BACKEND": "storages.backends.s3boto3.S3Boto3Storage", | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
# Deletion archive | ||
# If this is set then when Wagtail pages are deleted a JSON archive file will | ||
# be written to this path containing the deleted page's data. | ||
WAGTAIL_DELETION_ARCHIVE_PATH = os.environ.get("WAGTAIL_DELETION_ARCHIVE_PATH") | ||
if WAGTAIL_DELETION_ARCHIVE_PATH is not None: | ||
WAGTAIL_DELETION_ARCHIVE_FILESYSTEM = f"osfs://{WAGTAIL_DELETION_ARCHIVE_PATH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Deletion archive | |
# If this is set then when Wagtail pages are deleted a JSON archive file will | |
# be written to this path containing the deleted page's data. | |
WAGTAIL_DELETION_ARCHIVE_PATH = os.environ.get("WAGTAIL_DELETION_ARCHIVE_PATH") | |
if WAGTAIL_DELETION_ARCHIVE_PATH is not None: | |
WAGTAIL_DELETION_ARCHIVE_FILESYSTEM = f"osfs://{WAGTAIL_DELETION_ARCHIVE_PATH}" |
I missed deleting this in my PR.
|
||
WAGTAIL_DELETION_ARCHIVE_FILESYSTEM = "mem:///archive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAGTAIL_DELETION_ARCHIVE_FILESYSTEM = "mem:///archive" |
This can also be deleted.
We disabled Wagtail's built-in deletion back in 2016, for good reason at the time. In its place we created a "Trash" site in Wagtail and our approach was to move content to that site rather than delete it.
This PR does four things:
Restores the ability to delete pages in Wagtail, in 6ba1b74.
We had disabled deletion in
before_delete_page
andbefore_bulk_action
Wagtail hooks, and this commit removes those.Prevents deletion of locked pages in Wagtail, in 87f6ec6.
Bizarrely, locked pages are still deletable, an issue that's been open in Wagtail since 2015. This commit adds in
before_delete_page
andbefore_bulk_action
Wagtail hooks that check whether a page is locked before proceeding with the deletion.Exports a JSON dump of page data (limited to the data directly belonging to a page, not related data) to a configured location before deletion and enables importing of that data via the admin, in 6f0fe69.
On the off chance that a page is accidentally deleted, want to be able to recover that page with the ease that restoring the entire database from backup is not.
This commit uses the
before_delete_page
andbefore_bulk_action
Wagtail hooks to connect a Djangopre_delete
signal handler to thePage
model that will export thespecific
page'sserializable_data
. This JSON is intended to be reasonably human-readable so that an editor could reconstruct the page from it if needed.This commit also adds an
Import
button to Wagtail via standard hooks that consists of a view, template, and import function that will create a new page from exported page data with the click of a couple of buttons.There are significant limitations to the data that gets exported and imported: our recently added Wagtail footnotes is an example. Footnotes are stored as references in a separate table to the page. Because they're not stored on the page itself, they will not be exported, but the references on the page will be. This will break on import without editing the JSON data.
The JSON export and import is hypothesized to be useful. If we find we are not using it, and that deletion is happening without the need to restore content with any significant frequency, we should revisit and remove this functionality.
Modifies the
EmailSignUp
snippet's privacy disclosure page field toSET_NULL
when the destination page is deleted, in 189314f. This means if we delete a disclosure page that an email signup is using, it will remove that page from the email signup. An editor will need to set a new one.How to test this PR
There are a few things to test:
Deletion and JSON export
ARCHIVE_FILESYSTEM
in the environment, i.e.export ARCHIVE_FILESYSTEM=/tmp/json_archives
. Run cf.gov.ARCHIVE_FILESYSTEM
and observe that there are JSON files for all pages deleted, in their site slug paths.Import a deleted page
Import
action in the Wagtail page explorer "…" menu to import it.Locked pages
Checklist