Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4826 Replicate RECAP PDF uploads to subdockets #4857

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented Dec 28, 2024

This PR adds support for replicating PDF uploads to sub-dockets, following a similar approach to attachment pages. A step was added in find_subdocket_pdf_rds before process_recap_pdf to look for subdockets where documents should be merged.

The process flow is:

  • The process aborts if the PDF upload belongs to an appellate court, since doppel cases cannot exist in appellate courts.
  • Similar to attachment pages, a RECAPDocument queryset finds unique RDs with the same pacer_doc_id in the same court. This query was moved to a helper method since it's common to both find_subdocket_pdf_rds and find_subdocket_att_page_rds.
  • Additional ProcessingQueue entries are created for each additional RECAPDocument where the PDF needs replication.
  • If the original PQ lacks a pacer_case_id (optional in PDF uploads), one is assigned during the first iteration so the PQ can succeed when processed by process_recap_pdf. Otherwise, the lookup will fail with RECAPDocument.MultipleObjectsReturned.
  • PQ creation is wrapped in transaction.atomic to roll back any objects if errors occur. This change was also applied to find_subdocket_att_page_rds.
  • Removed redundant code block in process_recap_attachment.

When working on this I noticed that within process_recap_pdf there is the fallback query:

rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id)

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER. If they're not, I think it would be safer to change it to:

rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id, court_id=pq.court_id)?

Let me know what do you think.

@albertisfu albertisfu marked this pull request as ready for review December 30, 2024 18:00
@albertisfu albertisfu requested a review from mlissner December 30, 2024 18:00
@albertisfu albertisfu linked an issue Dec 30, 2024 that may be closed by this pull request
@mlissner
Copy link
Member

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER. If they're not, I think it would be safer to change it to:

Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you.

@johnhawkinson
Copy link
Contributor

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER.

They are unique! The first 3 digits identify the court, see the doc1 URLs section of https://github.com/freelawproject/juriscraper/blob/main/juriscraper/pacer/notes.md

Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you.

Why "definitely"?

@mlissner
Copy link
Member

I forgot about that, you're right, John!

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks! Onward to a proper review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Development

Successfully merging this pull request may close these issues.

Replicate RECAP PDF uploads to subdockets
4 participants