Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(recap.mergers): Update PACER attachment processing #4665

Merged
merged 11 commits into from
Dec 14, 2024
Merged
18 changes: 15 additions & 3 deletions cl/recap/mergers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1658,9 +1658,21 @@ async def merge_attachment_page_data(
.afirst()
)
else:
main_rd = await RECAPDocument.objects.select_related(
"docket_entry", "docket_entry__docket"
).aget(**params)
try:
main_rd = await RECAPDocument.objects.select_related(
"docket_entry", "docket_entry__docket"
).aget(**params)
except RECAPDocument.DoesNotExist as exc:
# In cases where we have "doppelgänger" dockets drop pacer
# case id and check if the docket exists once more.
if params.get("pacer_case_id"):
flooie marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@grossir grossir Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the params dict key you want to look for is actually "docket_entry__docket__pacer_case_id", as in line 1641 in this same function. search_recapdocument does not have a field for pacer_case_id, so this key on the params dict would never be found

                                             Table "public.search_recapdocument"
         Column          |           Type           | Collation | Nullable |                     Default                      
-------------------------+--------------------------+-----------+----------+--------------------------------------------------
 id                      | integer                  |           | not null | nextval('search_recapdocument_id_seq'::regclass)
 date_created            | timestamp with time zone |           | not null | 
 date_modified           | timestamp with time zone |           | not null | 
 date_upload             | timestamp with time zone |           |          | 
 document_type           | integer                  |           | not null | 
 document_number         | character varying(32)    |           | not null | 
 attachment_number       | smallint                 |           |          | 
 pacer_doc_id            | character varying(64)    |           | not null | 
 is_available            | boolean                  |           |          | 
 sha1                    | character varying(40)    |           | not null | 
 filepath_local          | character varying(1000)  |           | not null | 
 filepath_ia             | character varying(1000)  |           | not null | 
 docket_entry_id         | integer                  |           | not null | 
 description             | text                     |           | not null | 
 ocr_status              | smallint                 |           |          | 
 plain_text              | text                     |           | not null | 
 page_count              | integer                  |           |          | 
 is_free_on_pacer        | boolean                  |           |          | 
 ia_upload_failure_count | smallint                 |           |          | 
 file_size               | integer                  |           |          | 
 thumbnail               | character varying(100)   |           |          | 
 thumbnail_status        | smallint                 |           | not null | 
 is_sealed               | boolean                  |           |          | 
 acms_document_guid      | character varying(64)    |           | not null | 

retry_params = params.copy()
retry_params.pop(
"docket_entry__docket__pacer_case_id", None
)
main_rd = await RECAPDocument.objects.select_related(
"docket_entry", "docket_entry__docket"
).aget(**retry_params)
except RECAPDocument.MultipleObjectsReturned as exc:
if pacer_case_id:
duplicate_rd_queryset = RECAPDocument.objects.filter(**params)
Expand Down
Loading