4826 Replicate RECAP PDF uploads to subdockets #4857

albertisfu · 2024-12-28T01:38:25Z

This PR adds support for replicating PDF uploads to sub-dockets, following a similar approach to attachment pages. A step was added in find_subdocket_pdf_rds before process_recap_pdf to look for subdockets where documents should be merged.

The process flow is:

The process aborts if the PDF upload belongs to an appellate court, since doppel cases cannot exist in appellate courts.
Similar to attachment pages, a RECAPDocument queryset finds unique RDs with the same pacer_doc_id in the same court. This query was moved to a helper method since it's common to both find_subdocket_pdf_rds and find_subdocket_att_page_rds.
Additional ProcessingQueue entries are created for each additional RECAPDocument where the PDF needs replication.
If the original PQ lacks a pacer_case_id (optional in PDF uploads), one is assigned during the first iteration so the PQ can succeed when processed by process_recap_pdf. Otherwise, the lookup will fail with RECAPDocument.MultipleObjectsReturned.
PQ creation is wrapped in transaction.atomic to roll back any objects if errors occur. This change was also applied to find_subdocket_att_page_rds.
Removed redundant code block in process_recap_attachment.

When working on this I noticed that within process_recap_pdf there is the fallback query:

courtlistener/cl/recap/tasks.py

Line 264 in cb012e8

rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id)

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER. If they're not, I think it would be safer to change it to:

rd = await RECAPDocument.objects.aget(pacer_doc_id=pq.pacer_doc_id, court_id=pq.court_id)?

Let me know what do you think.

Fixes: #4826

mlissner · 2024-12-30T18:21:54Z

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER. If they're not, I think it would be safer to change it to:

Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you.

johnhawkinson · 2024-12-30T18:27:11Z

Is it correct that this query only use pacer_doc_id ? Not sure if pacer_doc_ids are unique across all courts in PACER.

They are unique! The first 3 digits identify the court, see the doc1 URLs section of https://github.com/freelawproject/juriscraper/blob/main/juriscraper/pacer/notes.md

Yeah, that's definitely a bug, and your fix of adding the court to it should help a lot, thank you.

Why "definitely"?

mlissner · 2024-12-30T18:30:44Z

I forgot about that, you're right, John!

mlissner

LGTM thanks! Onward to a proper review.

albertisfu added 2 commits December 27, 2024 19:37

feat(recap): Replicate RECAP PDF uploads to subdockets

703f122

Fixes: #4826

fix(recap): Avoid PDF upload replication in appellate cases

65da6ed

albertisfu marked this pull request as ready for review December 30, 2024 18:00

albertisfu requested a review from mlissner December 30, 2024 18:00

albertisfu linked an issue Dec 30, 2024 that may be closed by this pull request

Replicate RECAP PDF uploads to subdockets #4826

Open

mlissner approved these changes Dec 30, 2024

View reviewed changes

mlissner assigned ERosendo Dec 30, 2024

mlissner requested a review from ERosendo December 30, 2024 18:34

albertisfu mentioned this pull request Dec 30, 2024

Backward replication of RECAP PDF uploads to subdockets #4864

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4826 Replicate RECAP PDF uploads to subdockets #4857

4826 Replicate RECAP PDF uploads to subdockets #4857

albertisfu commented Dec 28, 2024 •

edited

Loading

mlissner commented Dec 30, 2024

johnhawkinson commented Dec 30, 2024

mlissner commented Dec 30, 2024

mlissner left a comment

4826 Replicate RECAP PDF uploads to subdockets #4857

Are you sure you want to change the base?

4826 Replicate RECAP PDF uploads to subdockets #4857

Conversation

albertisfu commented Dec 28, 2024 • edited Loading

mlissner commented Dec 30, 2024

johnhawkinson commented Dec 30, 2024

mlissner commented Dec 30, 2024

mlissner left a comment

Choose a reason for hiding this comment

albertisfu commented Dec 28, 2024 •

edited

Loading