IncompleteRead exception crashes the JobManager #601

VictorVerhaert · 2024-08-14T09:23:57Z

While running long jobs using the JobManager, it crashes while trying to download results.

Traceback (most recent call last):
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 748, in _error_catcher
    yield
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 894, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(1293825744 bytes read, 627790347 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/requests/models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 1060, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 977, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 872, in _raw_read
    with self._error_catcher():
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 772, in _error_catcher
    raise ProtocolError(arg, e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(1293825744 bytes read, 627790347 more expected)', IncompleteRead(1293825744 bytes read, 627790347 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/victor.verhaert/LCFM/lcfm-production/notebooks/JM-LCFM.py", line 137, in <module>
    job_manager.run_jobs(
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/extra/job_management.py", line 273, in run_jobs
    self._update_statuses(df)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/extra/job_management.py", line 433, in _update_statuses
    self.on_job_done(the_job, df.loc[i])
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/extra/job_management.py", line 373, in on_job_done
    job.get_results().download_files(target=job_dir)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/rest/job.py", line 502, in download_files
    downloaded = [a.download(target) for a in self.get_assets()]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/rest/job.py", line 502, in <listcomp>
    downloaded = [a.download(target) for a in self.get_assets()]
                  ^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/rest/job.py", line 378, in download
    for block in response.iter_content(chunk_size=chunk_size):
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/requests/models.py", line 822, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(1293825744 bytes read, 627790347 more expected)', IncompleteRead(1293825744 bytes read, 627790347 more expected))

We need to make the job manager more robust to these type of exceptions

The text was updated successfully, but these errors were encountered:

soxofaan · 2024-08-19T09:29:44Z

Do you have an idea if that ChunkedEncodingError is just a temp glitch or can you reproduce that failure each time you try to (manually) download the result assets?

soxofaan · 2024-08-19T09:36:33Z

We need to make the job manager more robust to these type of exceptions

The question is what can be done better purely at the level of python client implementation.

Skipping the failure with a warning is tempting, but that might not be better (as a default behavior) because the end user might easily overlook that and get wrong impression that everything went fine.

An alternative simple improvement that could help here is add an option to not automatically download results of jobs

soxofaan added the collect feedback label Aug 19, 2024

soxofaan added the job manager label Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IncompleteRead exception crashes the JobManager #601

IncompleteRead exception crashes the JobManager #601

VictorVerhaert commented Aug 14, 2024

soxofaan commented Aug 19, 2024

soxofaan commented Aug 19, 2024

IncompleteRead exception crashes the JobManager #601

IncompleteRead exception crashes the JobManager #601

Comments

VictorVerhaert commented Aug 14, 2024

soxofaan commented Aug 19, 2024

soxofaan commented Aug 19, 2024