Skip to content

Commit

Permalink
New option: ignore_incomplete_reads (#787)
Browse files Browse the repository at this point in the history
* Add job option `ignore_incomplete_reads`.

Sometimes web servers return incomplete responses, triggering an
`InvalidChunkLength` exception in urlwatch. Enable this job option to
ignore these errors.

#725
  • Loading branch information
wfrisch authored Feb 15, 2024
1 parent 0bc4abd commit 30653a3
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The format mostly follows [Keep a Changelog](http://keepachangelog.com/en/1.0.0/
### Added

- New `enabled` option for all jobs. Set to false to disable a job without needing to remove it or comment it out (Requested in #625 by snowman, contributed in #785 by jamstah)
- New option `ignore_incomplete_reads` (Requested in #725 by wschoot, contributed in #787 by wfrisch)

### Changed

Expand Down
6 changes: 6 additions & 0 deletions docs/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,12 @@ or ignore all HTTP errors if you like:
url: https://example.com/
ignore_http_error_codes: 4xx, 5xx
You can also ignore incomplete reads:

.. code-block:: yaml
url: "https://example.com/"
ignore_incomplete_reads: true
Overriding the content encoding
-------------------------------
Expand Down
1 change: 1 addition & 0 deletions docs/source/jobs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Job-specific optional keys:
- ``ignore_http_error_codes``: List of HTTP errors to ignore (see :ref:`advanced_topics`)
- ``ignore_timeout_errors``: Do not report errors when the timeout is hit
- ``ignore_too_many_redirects``: Ignore redirect loops (see :ref:`advanced_topics`)
- ``ignore_incomplete_reads``: Ignore incomplete HTTP responses (see :ref:`advanced_topics`)

(Note: ``url`` implies ``kind: url``)

Expand Down
4 changes: 3 additions & 1 deletion lib/urlwatch/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ class UrlJob(Job):
__required__ = ('url',)
__optional__ = ('cookies', 'data', 'method', 'ssl_no_verify', 'ignore_cached', 'http_proxy', 'https_proxy',
'headers', 'ignore_connection_errors', 'ignore_http_error_codes', 'encoding', 'timeout',
'ignore_timeout_errors', 'ignore_too_many_redirects')
'ignore_timeout_errors', 'ignore_too_many_redirects', 'ignore_incomplete_reads')

CHARSET_RE = re.compile('text/(html|plain); charset=([^;]*)')

Expand Down Expand Up @@ -391,6 +391,8 @@ def ignore_error(self, exception):
return True
if isinstance(exception, requests.exceptions.TooManyRedirects) and self.ignore_too_many_redirects:
return True
if isinstance(exception, requests.exceptions.ChunkedEncodingError) and self.ignore_incomplete_reads:
return True
elif isinstance(exception, requests.exceptions.HTTPError):
status_code = exception.response.status_code
ignored_codes = []
Expand Down

0 comments on commit 30653a3

Please sign in to comment.