Skip to content

Releases: mborsetti/webchanges

v3.13

28 Aug 21:56
Compare
Choose a tag to compare

Notice

Support for Python 3.8 will be removed on or about 5 October 2023. A reminder that older Python versions are
supported for 3 years after being obsoleted by a new major release (i.e. about 4 years since their original release).

Added

  • Reports have a new separate configuration option to split reports into one-per-job.

  • url jobs without use_browser have a new retries directive to specify the number of times to retry a
    job that errors before giving up. Using retries: 1 or higher will often solve the ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) error received from a misconfigured server at the first
    connection.

  • remove_duplicates filter has a new adjacent sub-directive to de-duplicate non-adjacent lines or items.

  • css and xpath have a new sort subfilter to sort matched elements lexicographically.

  • Command line arguments:

    • New --footnote to add a custom footnote to reports.
    • New --change-location to keep job history when the url or command changes.
    • --gc-database and --clean-database now have optional argument RETAIN-LIMIT to allow increasing
      the number of retained snapshots from the default of 1.
    • New --detailed-versions to display detailed version and system information, inclusive of the versions of
      dependencies and, in certain Linux distributions (e.g. Debian), of system libraries. It also reports available
      memory and disk space.

Changed

  • command jobs now have improved error reporting which includes the error text from the failed command.
  • --rollback-database now confirms the date (in ISO-8601 format) to roll back the database to and, if
    webchanges is being run in interactive mode, the user will be asked for positive confirmation before proceeding
    with the un-reversible deletion.

Internals

  • Added bandit <https://github.com/PyCQA/bandit>__ testing to improve the security of code.
  • headers are now turned into strings before being passed to Playwright (addresses the error
    playwright._impl._api_types.Error: extraHTTPHeaders[13].value: expected string, got number).
  • Exclude tests from being recognized as package during build (contributed by Max <https://github.com/aragon999>__ in #54 <https://github.com/mborsetti/webchanges/pull/54>__).
  • Refactored and cleaned up some tests.
  • Initial testing with Python 3.12.0-rc1, but a reported bug in typing.TypeVar prevents the pyee dependency
    of playwright from loading, causing a failure. Awaiting for fix in Python 3.12.0-rc2 to retry.

v3.12

19 Nov 00:59
Compare
Choose a tag to compare

Added

  • Support for Python 3.11. Please note that the dependency lxml may fail to install on Windows due to
    this <https://bugs.launchpad.net/lxml/+bug/1977998>__ bug and that therefore for now webchanges can only be
    run in Python 3.10 on Windows.

Removed

  • Support for Python 3.7. As a reminder, older Python versions are supported for 3 years after being obsoleted by a new
    major release; support for Python 3.8 will be removed on or about 5 October 2023.

Fixed

  • Job sorting for reports is now case-insensitive.
  • Documentation on how to anonymously monitor GitHub releases (due to changes in GitHub) (contributed by Luis Aranguren <https://github.com/mercurytoxic>__ upstream <https://github.com/thp/urlwatch/issues/723>__).
  • Handling of method subfilter for filter html2text (reported by kongomondo <https://github.com/kongomondo>__
    upstream <https://github.com/thp/urlwatch/issues/588>__).

v3.11

25 Sep 12:44
Compare
Choose a tag to compare

Notice

Support for Python 3.7 will be removed on or about 22 October 2022 as older Python versions are supported for 3
years after being obsoleted by a new major release.

Added

  • The new no_conditional_request directive for url jobs turns off conditional requests for those extremely rare
    websites that don't handle it (e.g. Google Flights).
  • Selecting the database engine and the maximum number of changed snapshots saved is now set through the configuration
    file, and the command line arguments --database-engine and --max-snapshots are used to override such
    settings. See documentation for more information. Suggested by jprokos <https://github.com/jprokos>__ in #43 <https://github.com/mborsetti/webchanges/issues/43>__.
  • New configuration setting empty-diff within the display configuration for backwards compatibility only:
    use the additions_only job directive instead to achieve the same result. Reported by
    bbeevvoo <https://github.com/bbeevvoo>__ in #47 <https://github.com/mborsetti/webchanges/issues/47>__.
  • Aliased the command line arguments --gc-cache with --gc-database, --clean-cache with --clean-database
    and --rollback-cache with --rollback-database for clarity.
  • The configuration file (e.g. conf.yaml) can now contain keys starting with a _ (underscore) for remarks (they
    are ignored).

Changed

  • Reports are now sorted alphabetically and therefore you can use the name directive to affect the order by which
    your jobs are displayed in reports.
  • Implemented measures for url jobs using browser: true to avoid being detected: webchanges now passes all
    the headless Chrome detection tests here <https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html>.
    Brought to my attention by amammad <https://github.com/amammad>
    in #45 <https://github.com/mborsetti/webchanges/issues/45>__.
  • Running webchanges --test (without specifying a JOB) will now check the hooks file (if any) for syntax errors in
    addition to the config and jobs file. Error reporting has also been improved.
  • No longer showing the the text returned by the server when a 404 - Not Found error HTTP status code is returned by for
    all url jobs (previously only for jobs with use_browser: true).

Fixed

  • Bug in command line arguments --config and --hooks. Contributed by
    Klaus Sperner <https://github.com/klaus-tux>__ in PR #46 <https://github.com/mborsetti/webchanges/pull/46>__.
  • Job directive compared_versions now works as documented and testing has been added to the test suite. Reported by
    jprokos <https://github.com/jprokos>__ in #43 <https://github.com/mborsetti/webchanges/issues/43>__.
  • The output of command line argument --test-diff now takes into consideration compared_versions.
  • Markdown containing code in a link text now converts correctly in HTML reports.

Internals

  • The job kind of shell has been renamed command to better reflect what it does and the way it's described
    in the documentation, but shell is still recognized for backward compatibility.
  • Readthedocs build upgraded to Python 3.10

v3.10.3

11 Jul 22:49
Compare
Choose a tag to compare

Added

  • URL jobs with use_browser: true that receive an error HTTP status code from the server will now include the text
    returned by the website in the error message (e.g. "Rate exceeded.", "upstream request timeout", etc.), except for
    HTTP status code 404 - Not Found.

Changed

  • The command line argument --jobs used to specify a jobs file will now accept a glob pattern <https://en.wikipedia.org/wiki/Glob_(programming)>__, e.g. wildcards, to specify multiple files. If more than one
    file matches the pattern, their contents will be concatenated before a job list is built. Useful e.g. if you have
    multiple jobs files that run on different schedules and you want to clean the snapshot database of URLs/commands no
    longer monitored ("garbage collect") using --gc-cache.
  • The command line argument --list will now list the full path of the jobs file(s).
  • Traceback information for Python Exceptions is suppressed by default. Use the command line argument --verbose
    (or -v) to display it.

Fixed

  • Fixed Unicode strings with encoding declaration are not supported. error in the xpath filter using
    method: xml under certain conditions (MacOS only). Reported by jprokos <https://github.com/jprokos>__ in #42 <https://github.com/mborsetti/webchanges/issues/42>__.

Internals

  • The source distribution is now available on PyPI to support certain packagers like fpm.
  • Improved handling and reporting of Playwrigt browser errors (for URL jobs with use_browser: true).

v3.10.2

09 Jun 05:57
Compare
Choose a tag to compare

⚠ Breaking Changes

  • Due to a fix to the html2text filter (see below), the first time you run this new version you may get a change
    report with deletions and additions of lines that look identical. This will happen one time only
    and will prevent
    future such change reports.

Added

  • You can now run the command line argument --test without specifying a JOB; this will check the config
    (default: config.yaml) and job (default: job.yaml) files for syntax errors.
  • New job directive compared_versions allows change detection to be made against multiple saved snapshots;
    useful for monitoring websites that change between a set of states (e.g. they are running A/B testing).
  • New command line argument --check-new to check if a new version of webchanges is available.
  • Error messages for url jobs failing with HTTP reason codes of 400 and higher now include any text returned by the
    website (e.g. "Rate exceeded.", "upstream request timeout", etc.). Not implemented in jobs with use_browser: true
    due to limitations in Playwright.

Changed

  • On Linux and macOS systems, for security reasons we now check that the hooks file and the directory it is located
    in are owned and writeable by only the user who is running the job (and not by its group or by other
    users), identical to what we do with the jobs file if any job uses the shellpipe filter. An
    explanatory ImportWarning message will be issued if the permissions are not correct and the import of the hooks module
    is skipped.
  • The command line argument -v or --verbose now shows reduced verbosity logging output while -vv (or
    --verbose --verbose) shows full verbosity.

Fixed

  • The html2text filter is no longer retaining any spaces found in the HTML after the end of the text on a line,
    which are not displayed in HTML and therefore a bug in the conversion library used. This was causing a change report
    to be issued whenever the number of such invisible spaces changed.
  • The cookies directive was not adding cookies correctly to the header for jobs with browser: true.
  • The wait_for_timeout job directive was not accepting integers (only floats). Reported by Markus Weimar <https://github.com/Markus00000>__ in #39 <https://github.com/mborsetti/webchanges/issues/39>__.
  • Improved the usefulness of the message of FileNotFoundError exceptions in filters execute and shellpipe
    and in reporter run_command.
  • Fixed an issue in the legacy parser used by the xpath filter which under specific conditions caused more html
    than expected to be returned.
  • Fixed how we determine if a new version has been released (due to an API change by PyPI).
  • When adding custom JobBase classes through the hooks file, their configuration file entries are no longer causing
    warnings to be issued as unrecognized directives.

Internals

  • Changed bootstrapping logic so that when using -vv the logs will include messages relating to the registration of
    the various classes.
  • Improved execution speed of certain informational command line arguments.
  • Updated the vendored version of packaging.version.parse() to 21.3, released on 2021-11-27.
  • Changed the import logic for the packaging.version.parse() function so that if packaging is found to be
    installed, it will be imported from there instead of from the vendored module.
  • urllib3 is now an explicit dependency due to the refactoring of the requests package (we previously used
    requests.packages.urllib3). Has no effect since urllib3 is already being installed as a dependency of
    requests.
  • Added typed.py file to implement PEP 561 <https://peps.python.org/pep-0561/>__.

v3.10.1

03 May 15:12
Compare
Choose a tag to compare

Fixed

  • KeyError: 'indent' error when using beautify filter. Reported by César de Tassis Filho <https://github.com/CTassisF>__ in #37 <https://github.com/mborsetti/webchanges/issues/37>__.

v3.10

03 May 00:13
Compare
Choose a tag to compare

⚠ Breaking changes

Pyppeteer has been replaced with Playwright

This change only affects jobs that use_browser: true (i.e. those running on a browser to run JavaScript). If none
of your jobs have use_browser: true, there's nothing new here (and nothing to do).

Must do

If *any* of your jobs have ``use_browser: true``, you **MUST**:

1) Install the new dependencies:

.. code-block:: bash

   pip install --upgrade webchanges[use_browser]

2) (Optional) ensure you have an up-to-date Google Chrome browser:

.. code-block:: bash

   webchanges --install-chrome

Additionally, if any of your ``use_browser: true`` jobs use the ``wait_for`` directive, it needs to be replaced with
one of:

* ``wait_for_function`` if you were specifying a JavaScript function (see
  `here <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-function>`__ for full function details).
* ``wait_for_selector`` if you were specifying a selector string or xpath string (see `here
  <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-selector>`__ for full function details), or
* ``wait_for_timeout`` if you were specifying a timeout; however, this function should only be used for debugging
  because it "is going to be flaky", so use one of the other two ``wait_for`` if you can.; full details `here
  <https://playwright.dev/python/docs/api/class-frame#frame-wait-for-timeout>`__.

Optionally, the values of ``wait_for_function`` and ``wait_for_selector`` can now be dicts to take full advantage of all
the features offered by those functions in Playwright (see documentation links above).

If you are using the ``wait_for_navigation`` directive, it is now called ``wait_for_url`` and offers both glob pattern
and regex matching; ``wait_for_navigation`` will act as an alias for now but but a deprecation warning will be issued.

If you are using the ``chromium_revision`` or ``_beta_use_playwright`` directives in your configuration file, you
should delete them to prevent future errors (for now only a deprecation warning is issued).

Finally, if you are  using the experimental ``block_elements`` sub-directive, it is not (yet?) implemented in Playwright
and is simply ignored.

Improvements

wait_until has additional functionality, and now takes one of:

  • load (default): Consider operation to be finished when the load event is fired.
  • domcontentloaded: Consider operation to be finished when the DOMContentLoaded event is fired.
  • networkidle (old networkidle0 and networkidle2 map into this): Consider operation to be finished when
    there are no network connections for at least 500 ms.
  • commit (new): Consider operation to be finished when network response is received and the document started
    loading.

New directives

The following directives are new to the Playwright implementation:

* ``referer``: Referer header value (a string). If provided, it will take preference over the referer header value set
  by the ``headers`` sub-directive.
* ``initialization_url``: A url to navigate to before the ``url`` (e.g. a home page where some state gets set).
* ``initialization_js``: Only used in conjunction with ``initialization_url``, a JavaScript to execute after
  loading ``initialization_url`` and before navigating to the ``url`` (e.g. to emulate a log in).  Advanced usage
* ``ignore_default_args`` directive for ``url`` jobs with ``use_browser: true`` (using Chrome) to control how Playwright
  launches Chrome.

In addition, the new ``--no-headless`` command line argument will run the Chrome browser in "headed" mode, i.e.
displaying the website as it loads it, to facilitate with debugging and testing (e.g. ``webchanges --test 1
--no-headless --test-reporter email``).

See more details of the new directives in the updated documentation.


Freeing space by removing Pyppeteer

You can free up disk space if no other packages use Pyppeteer by, in order:

  1. Removing the downloaded Chromium images by deleting the entire directory (and its subdirectories) shown by running:

.. code-block:: bash

python -c "import pathlib; from pyppeteer.chromium_downloader import DOWNLOADS_FOLDER; print(pathlib.Path(DOWNLOADS_FOLDER).parent)"

  1. Uninstalling the Pyppeteer package by running:

.. code-block:: bash

pip uninstall pyppeteer

Rationale

The implementation of ``use_browser: true`` jobs (i.e. those running on a browser to run JavaScript) using Pyppeteer
and the Chromium browser it uses has been very problematic, as the library:

* is in alpha,
* is very slow,
* defaults to years-old obsolete versions of Chromium,
* can be insecure (e.g. found that TLS certificates were disabled for downloading browsers!),
* creates conflicts with imports (e.g. requires obsolete version of websockets),
* is poorly documented,
* is poorly maintained,
* may require OS-specific dependencies that need to be separately installed,
* does not work with Arm-based processors,
* is prone to crashing,
* and outright freezes withe the current version of Python (3.10)!

Pyppeteer's `open issues <https://github.com/pyppeteer/pyppeteer/issues>`__ now exceed 130 and are growing almost daily.

`Playwright <https://playwright.dev/python/>`__ has none of the issues above, the core dev team apparently is the same
who wrote Puppeteer (of which Pyppeteer is a port to Python), and is supported by the deep pockets of Microsoft. The
Python version is officially supported and up-to-date, and (in our configuration) uses the latest stable version of
Google Chrome out of the box without the contortions of manually having to pick and set revisions.

Playwright has been in beta testing within **webchanges** for months and has been performing very well (significantly
more so than Pyppeteer).


Documentation
-------------
* Major updates on anything that has to do with ``use_browser``.
* Fixed two examples of the ``email`` reporter. Reported by `jprokos  <https://github.com/jprokos>`__ in
  `#34 <https://github.com/mborsetti/webchanges/issues/34>`__.


Advanced
--------
* If you subclassed JobBase in your ``hooks.py`` file, and are defining a ``retrieve`` method, please note that the
  number of arguments has been increased to 3 as follows:

.. code-block:: python

   def retrieve(self, job_state: JobState, headless: bool = True) -> Tuple[Union[str, bytes], str]:
        """Runs job to retrieve the data, and returns data and ETag.

        :param job_state: The JobState object, to keep track of the state of the retrieval.
        :param headless: For browser-based jobs, whether headless mode should be used.
        :returns: The data retrieved and the ETag.
        """

v3.9.2

13 Apr 23:54
Compare
Choose a tag to compare

⚠ Last release using Pyppeteer

  • This is the last release using Pyppeteer for jobs with use_browser: true, which will be replaced by Playwright
    in release 9.10, forthcoming hopefully in a few weeks. See above for more information on how to prepare -- and start
    using Playwright now!

Added

  • New ignore_dh_key_too_small directive for URL jobs to overcome the ssl.SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1129) error.
  • New indent sub-directive for the beautify filter (requires BeautifulSoup version 4.11.0 or later).
  • New --dump-history JOB command line argument to print all saved snapshot history for a job.
  • Playwright only: new--no-headless command line argument to help with debugging and testing (e.g. run
    webchanges --test 1 --no-headless). Not available for Pyppeteer.
  • Extracted Discord reporting from webhooks into its own discord reporter to fix it not working and to
    add embedding functionality as well as color (contributed by Michał Ciołek <https://github.com/michalciolek>__
    upstream <https://github.com/thp/urlwatch/issues/683>. Reported by jprokos <https://github.com/jprokos>in#33 #33`__.

Fixed

  • We are no longer rewriting to disk the entire database at every run. Now it's only rewritten if there are changes
    (and minimally) and, obviously, when running with the --gc-cache or --clean-cache command line argument.
    Reported by JsBergbau <https://github.com/JsBergbau>__ upstream <https://github.com/thp/urlwatch/issues/690>__.
    Also updated documentation suggesting to run --clean-cache or --gc-cache periodically.
  • A ValueError is no longer raised if an unknown directive is found in the configuration file, but a Warning is
    issued instead. Reported by c0deing <https://github.com/c0deing>__ in #26 <https://github.com/mborsetti/webchanges/issues/26>__.
  • The kind job directive (used for custom job classes in hooks.py) was undocumented and not fully functioning.
  • For jobs with use_browser: true and a switch directive containing --window-size, turn off Playwright's
    default fixed viewport (of 1280x720) as it overrides --window-size.
  • Email headers ("From:", "To:", etc.) now have title case per RFC 2076. Reported by fdelapena <https://github.com/fdelapena>__ in #29 <https://github.com/mborsetti/webchanges/issues/29>__.

Documentation

  • Added warnings for Windows users to run Python in UTF-8 mode. Reported by Knut Wannheden <https://github.com/knutwannheden>__ in #25 <https://github.com/mborsetti/webchanges/issues/25>__.
  • Added suggestion to run --clean-cache or --gc-cache periodically to compact the database file.
  • Continued improvements.

Internals

  • Updated licensing file to GitHub naming standards <https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository>__
    and updated its contents to more clearly state that this software redistributes source code of release 2.21
    of urlwatch (https://github.com/thp/urlwatch/tree/346b25914b0418342ffe2fb0529bed702fddc01f), retaining its license,
    which is distributed as part of the source code.
  • Pyppeteer has been removed from the test suite.
  • Deprecated webchanges.jobs.ShellError exception in favor of Python's native subprocess.SubprocessError one and
    its subclasses.

v3.9.1

28 Jan 08:42
Compare
Choose a tag to compare

⚠ Breaking changes in the near future (opt-in now):

Pyppetter will be replaced with Playwright (can opt in now!)

The implementation of ``use_browser: true`` jobs (i.e. those running on a browser to run JavaScript) using Pyppeteer
has been very problematic, as the library:

* is in alpha,
* is very slow,
* defaults to years-old obsolete versions of Chromium,
* can be insecure (found that TLS certificates were disabled for downloading browsers!)
* creates conflicts with imports (e.g. requires obsolete version of websockets)
* is poorly documented,
* is poorly maintained,
* and freezes when running it in the current version of Python (3.10)!

Pyppeteer's `open issues <https://github.com/pyppeteer/pyppeteer/issues>`__ now exceed 110.

As a result, I have been investigating a substitute, and found one in `Playwright
<https://playwright.dev/python/>`__. This package has none of the issues above, the core dev team apparently is the same
who wrote Puppetter (of which Pyppeteer is a port to Python), and is supported by the deep pockets of Microsoft. The
Python version is officially supported and up-to-date and we can easily use the latest stable version of Google Chrome
with it without mocking around with setting chromium_revisions.

You can upgrade to Playwright now!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Playwright implementation in this release of **webchanges** is extremely stable, fully tested (even on Python
3.10!), and much faster than Pyppeteer (some of my jobs are running 3x faster!). While it's probably production
quality, for the moment it is being released as an opt-in beta only.

I urge you to switch to Playwright. To do so:

Ensure that you have at least Python 3.8 (not tested in 3.7 due to testing limitations).

Install dependencies::

   pip install --upgrade webchanges[playwright]

Ensure you have an up-to-date Chrome installation::

   webchanges --install-chrome

Edit your configuration file...::

   webchanges --edit-config

...to add ``_beta_use_playwright: true`` (note the leading underline) under the ``browser`` section of ``job_defaults``,
 like this:

.. code-block:: yaml

   job_defaults:
     browser:
         _beta_use_playwright: true

That's it!

All job sub-directives works as they are, with only two minor exceptions:

* ``wait_for`` needs to be replaced with either ``wait_for_selector`` (see more `here
  <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-function>`__) or ``wait_for_function`` (see
  more `here <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-function>`__).
  These can still be strings (in which case they will be either the selector or the expression) but also dicts with
  arguments accepted by those functions (except for timeout, which is set by the ``timeout`` sub-directory).
* The experimental ``block_elements`` sub-directive is not implemented (yet?) and is simply ignored.

The following sub-directives are new:

* ``referer``: Referer header value. If provided it will take preference over the referer header value set by the
  ``headers`` sub-directive.
* ``headless`` (true/false): Launch browser in headless mode (i.e. invisible) (defaults to true). Set it to false to see
  what's going on in the browser for debugging purposes.

Please make sure to open a GitHub `issue <https://github.com/mborsetti/webchanges/issues>`__ if you encounter
anything wrong!

If you decide to stick with Playwright, you can free up disk space (if no other package uses Pyppeteer) by removing
the downloaded Chromium by deleting the *directory* shown by running::

   webchanges --chromium-directory

and uninstalling the Pyppeteer package by running::

   pip uninstall pyppeteer

The Playwright implementation also determines the maximum number of jobs to run in parallel based on the amount of free
memory available, which seems to be the relevant constraint, and this will make **webchanges** faster on machines with
lots of memory and more stable on small ones.

Fixed
-----
* Config file directives checker would incorrect reject reports added through ``hooks.py``. Reported by `Knut Wannheden
  <https://github.com/knutwannheden>`__ at `#24 <https://github.com/mborsetti/webchanges/issues/24>`__.

v3.9

26 Jan 15:27
Compare
Choose a tag to compare

⚠ Breaking changes in the near future (opt-in now):

Pyppetter will be replaced with Playwright (can opt in now!)

The implementation of ``use_browser: true`` jobs (i.e. those running on a browser to run JavaScript) using Pyppeteer
has been very problematic, as the library:

* is in alpha,
* is very slow,
* defaults to years-old obsolete versions of Chromium,
* can be insecure (found that TLS certificates were disabled for downloading browsers!)
* creates conflicts with imports (e.g. requires obsolete version of websockets)
* is poorly documented,
* is poorly maintained,
* and freezes when running it in the current version of Python (3.10)!

Pyppeteer's `open issues <https://github.com/pyppeteer/pyppeteer/issues>`__ now exceed 110.

As a result, I have been investigating a substitute, and found one in `Playwright
<https://playwright.dev/python/>`__. This package has none of the issues above, the core dev team apparently is the same
who wrote Puppetter (of which Pyppeteer is a port to Python), and is supported by the deep pockets of Microsoft. The
Python version is officially supported and up-to-date and we can easily use the latest stable version of Google Chrome
with it without mocking around with setting chromium_revisions.

You can upgrade to Playwright now!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Playwright implementation in this release of **webchanges** is extremely stable, fully tested (even on Python
3.10!), and much faster than Pyppeteer (some of my jobs are running 3x faster!). While it's probably production
quality, for the moment it is being released as an opt-in beta only.

I urge you to switch to Playwright. To do so:

Ensure that you have at least Python 3.8 (not tested in 3.7 due to testing limitations).

Install dependencies::

   pip install --upgrade webchanges[playwright]

Ensure you have an up-to-date Chrome installation::

   webchanges --install-chrome

Edit your configuration file...::

   webchanges --edit-config

...to add ``_beta_use_playwright: true`` (note the leading underline) under the ``browser`` section of ``job_defaults``,
 like this:

.. code-block:: yaml

   job_defaults:
     browser:
         _beta_use_playwright: true

That's it!

All job sub-directives works as they are, with only two minor exceptions:

* ``wait_for`` needs to be replaced with either ``wait_for_selector`` (see more `here
  <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-function>`__) or ``wait_for_function`` (see
  more `here <https://playwright.dev/python/docs/api/class-frame/#frame-wait-for-function>`__).
  These can still be strings (in which case they will be either the selector or the expression) but also dicts with
  arguments accepted by those functions (except for timeout, which is set by the ``timeout`` sub-directory).
* The experimental ``block_elements`` sub-directive is not implemented (yet?) and is simply ignored.

The following sub-directives are new:

* ``referer``: Referer header value. If provided it will take preference over the referer header value set by the
  ``headers`` sub-directive.
* ``headless`` (true/false): Launch browser in headless mode (i.e. invisible) (defaults to true). Set it to false to see
  what's going on in the browser for debugging purposes.

Please make sure to open a GitHub `issue <https://github.com/mborsetti/webchanges/issues>`__ if you encounter
anything wrong!

If you decide to stick with Playwright, you can free up disk space (if no other package uses Pyppeteer) by removing
the downloaded Chromium by deleting the *directory* shown by running::

   webchanges --chromium-directory

and uninstalling the Pyppeteer package by running::

   pip uninstall pyppeteer

The Playwright implementation also determines the maximum number of jobs to run in parallel based on the amount of free
memory available, which seems to be the relevant constraint, and this will make **webchanges** faster on machines with
lots of memory and more stable on small ones.

Changed
-------
* The method ``bs4`` of filter ``html2text`` has a new ``strip`` sub-directive which is passed to BeautifulSoup, and
  its default value has changed to false to conform to BeautifulSoup's default. This gives better output in most
  cases. To restore the previous non-standard behavior, add the ``strip: true`` sub-directive to the ``html2text``
  filter of jobs.
* Pyppeteer (used for URL jobs with ``use_browser: true``) is now crashing during certain tests with Python 3.7.
  There will be no new development to fix this as the use of Pyppeteer will soon be deprecated in favor of Playwright.
  See above to start using Playwright now (highly suggested).

Added
-----
* The method ``bs4`` of filter ``html2text`` now accepts the sub-directives ``separator`` and ``strip``.
* When using the command line argument ``--test-diff``, the output can now be sent to a specific reporter by also
  specifying the ``--test-reporter`` argument. For example, if running on a machine with a web browser, you can see
  the HTML version of the last diff(s) from job 1 with ``webchanges --test-diff 1 --test-reporter browser`` on your
  local browser.
* New filter ``remove-duplicate-lines``. Contributed by `Michael Sverdlin <https://github.com/sveder>`__ upstream `here
  <https://github.com/thp/urlwatch/pull/653>`__ (with modifications).
* New filter ``csv2text``. Contributed by `Michael Sverdlin <https://github.com/sveder>`__ upstream `here
  <https://github.com/thp/urlwatch/pull/658>`__ (with modifications).
* The ``html`` report type has a new job directive ``monospace`` which sets the output to use a monospace font.
  This can be useful e.g. for tabular text extracted by the ``pdf2text`` filter.
* The ``command_run`` report type has a new environment variable ``WEBCHANGES_CHANGED_JOBS_JSON``.
* Opt-in to use Playwright for jobs with ``use_browser: true`` instead of pyppeteer (see above).

Fixed
-----
* During conversion of Markdown to HTML,
  * Code blocks were not rendered without wrapping and in monospace font;
  * Spaces immediately after ````` (code block opening) were being dropped.
* The ``email`` reporter's ``sendmail`` sub-directive was not passing the ``from`` sub-directive (when specified) to
  the ``sendmail`` executable as an ``-f`` command line argument. Contributed by
  `Jonas Witschel <https://github.com/diabonas>`__ upstream `here <https://github.com/thp/urlwatch/pull/671>`__ (with
  modifications).
* HTML characters were not being unescaped when the job name is determined from the <title> tag of the data monitored
  (if present).
* Command line argument ``--test-diff`` was only showing the last diff instead of all saved ones.
* The ``command_run`` report type was not setting variables ``count`` and ``jobs`` (always 0). Contributed by
  `Brian Rak <https://github.com/devicenull>`__ in `#23 <https://github.com/mborsetti/webchanges/issues/23>`__.

Documentation
-------------
* Updated the "recipe" for monitoring Facebook public posts.
* Improved documentation for filter ``pdf2text``.

Internals
---------
* Support for Python 3.10 (except for URL jobs with ``use_browser`` using pyppeteer since it does not yet support it;
  use Playwright instead).
* Improved speed of detection and handling of lines starting with spaces during conversion of Markdown to HTML.
* Logging (``--verbose``) now shows thread IDs to help with debugging.

Known issues
------------
* Pyppeteer (used for URL jobs with ``use_browser: true``) is now crashing during certain tests with Python 3.7.
  There will be no new development to fix this as the use of Pyppeteer will soon be deprecated in favor of Playwright.
  See above to start using Playwright now (highly suggested).