-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation Plan: Bulk moderation actions #3719
Conversation
@sarayourfriend I'm very happy to extend the deadline for review given your schedule, please just let me know! 🙂 |
Full-stack documentation: https://docs.openverse.org/_preview/3719 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. New files ➕: |
5 Feb should be fine but I will let you know by Thursday if I am going to be late on this. At worst I would review this on my 5 Feb, which is before your 5 Feb, and you would never suspect that I'd done it last minute 🤡 But I'll let you know if that isn't going to work for whatever reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great, and I anticipate approving without issue, provided the usage of Elasticsearch or Postgres for searching records to bulk action against (and the rationale for the decision) is clarified. My hunch is that Elasticsearch is the safer option, and may even be the necessary option, considering creator
is not indexed in the API Postgres. However, I haven't spent a great deal of time thinking about this, so I am really just asking for clarification on which we will use and why considering the limitations of each. Postgres would be a lot less work overall, I imagine, considering how easily Django admin already interacts with it. But is the performance hit worth that? Probably that balances against how much effort it takes to use Elasticsearch instead of Postgres for that view.
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also approve this IP :)
My main 2 questions are:
- Do we delete the item from the main media table, or only Elasticsearch? (see more details in the inline comment)
- How will we proceed with updating the related IP? Will we open another issue for it?
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
- Capitalize Elasticsearch - Clarify language - Specify that new media filters should query Elasticsearch rather than Postgres
I am going to leave this in the Revision round for a few more days before moving to the final round, to give @sarayourfriend and @obulat another chance to respond to continue the discussion on a few comments, in particular:
In response to @obulat's other question, |
To clarify my answer from the comment thread, my vote is a strong "no", it is not worth making a change for this. My reasons are in the thread. |
Re-requesting reviews as this has entered the Decision Round :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work 👍
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm not an assigned reviewer, I am still sharing some comments that came up during working on #3760. I'll add more comments if something comes up later, but these are the thoughts I have so far.
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Show resolved
Hide resolved
`media_identifier` (previously a one-to-one relationship to the media the | ||
decision is related to), will be updated to `media_identifiers` _plural_: a | ||
many-to-many relationship to potentially multiple media records the decision is | ||
related to. Under the covers, this will be implemented by a join table. As |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using PostgreSQL media_identifiers
can be implemented as a ArrayField
in Django that allows the array to be stored in-table rather than as a separate join table. Also it allows us to have the weak constraints because they will just be strings to the ArrayField
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the join table due to the assumption that bulk reports may affect large numbers of records. The size of the rows in the ModerationDecision
table could get quite large and querying on the field could be very slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that makes sense. I had not thought about the impact this would have on querying. But wouldn't DB-writes be incredibly slow if we have to write individual records for each media item in a bulk-moderation event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so -- we can use RelatedManager.set to bulk set the list of related records, if I'm understanding your question!
@obulat are you able to review the implementation plan soon? It would be great to get this approved soon (with any changes needed, of course, if you have blockers to share) so that we can update the issues from the previous implementation plan with the new ModerationDecision model and other modifications to the existing plan. |
Co-authored-by: sarayourfriend <[email protected]>
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @obulat Excluding weekend1 days, this PR was ready for review 5 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @stacimc, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
I've just realized (prompted by @dhruvkb 's comment above) that I may need to clarify the specifics of the implementation for the many to many relationship. I likely do not have time to do so today so I'm going to draft this briefly. |
@obulat Can you confirm you've seen the request to review this? Will you confirm if you will approve with the change Staci is making or if you'll require other changes? |
I was briefly concerned that there would be an issue with using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is solid and very detailed as it considers all of the edge cases, and also updates the related plan.
Sorry for the delay
...and_safety/content_report_moderation/20240122-implementation_plan_bulk_moderation_actions.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Olga Bulat <[email protected]>
I've created issues under this milestone, and updated the relevant issue from the related IP. |
Fixes
Fixes #1967 by @sarayourfriend
Description
I've chosen @obulat and @sarayourfriend as reviewers. In particular I'd like to call out to @sarayourfriend (as I do in the document itself) that this plan build on and makes some slight change to the work in #3494 that should be carefully reviewed.
This discussion is following the Openverse decision-making process. Information about this process can be found on the Openverse documentation site.
Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.
Current round
This discussion is currently in the Decision round.
The deadline for review of this round is 19 February 2024.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin