Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParsingException: Got XML when expecting HTML and cannot parse it. #4803

Open
sentry-io bot opened this issue Dec 9, 2024 · 1 comment
Open

ParsingException: Got XML when expecting HTML and cannot parse it. #4803

sentry-io bot opened this issue Dec 9, 2024 · 1 comment
Assignees

Comments

@sentry-io
Copy link

sentry-io bot commented Dec 9, 2024

This is getting triggered by the get_free_docs command. Looks like we're getting some sort of error that's crashing it.

We should fix this because when this happens we don't get free docs for that day.

Filed by @mlissner


Sentry Issue: COURTLISTENER-80M

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
  File "src/lxml/etree.pyx", line 3306, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1990, in lxml.etree._parseMemoryDocument

ParsingException: Got XML when expecting HTML and cannot parse it.
(13 additional frame(s) were not displayed)
...
  File "cl/corpus_importer/management/commands/scrape_pacer_free_opinions.py", line 407, in do_everything
    get_and_save_free_document_reports(courts, date_start, date_end)
  File "cl/corpus_importer/management/commands/scrape_pacer_free_opinions.py", line 235, in get_and_save_free_document_reports
    exc = fetch_doc_report(
  File "cl/corpus_importer/management/commands/scrape_pacer_free_opinions.py", line 121, in fetch_doc_report
    status, rows_to_create = get_and_save_free_document_report(pacer_court_id, start, end, log.pk)  # type: ignore
  File "cl/corpus_importer/tasks.py", line 390, in get_and_save_free_document_report
    raise self.retry(exc=exc, countdown=5)
  File "cl/corpus_importer/tasks.py", line 352, in get_and_save_free_document_report
    report.query(start, end, sort="case_number")
@mlissner
Copy link
Member

mlissner commented Dec 9, 2024

@flooie, to you for prioritization and scheduling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants