Skip to content

Releases: mborsetti/webchanges

v3.26.0

13 Oct 08:45
Compare
Choose a tag to compare

Added

  • Python 3.13 Support: webchanges now supports Python 3.13, but complete testing is pending due to dependencies
    such has lxml not having yet published installation packages ("wheels") for 3.13.

  • Glob Pattern Support for Hooks Files: The --hooks command-line argument now accepts glob patterns for flexible
    hook file selection.

  • Multiple Hook Specifications: Specify multiple hook files or glob patterns by repeating the --hooks argument.

  • Enhanced Version Information: --detailed-versions now displays the system's default value for
    --max-threads.

  • Optional zstd Compression: URL jobs without browser: true can now utilize zstd compression for
    improved efficiency (requires pip install -U webchanges[zstd]).

  • ai_google Differ Enhancements (BETA):

    • New additions_only Subdirective: When set to true, generates AI-powered summaries of only the added text. This
      is particularly helpful for monitoring pages with regularly added content (e.g., press releases).
    • New unified_diff_new Field: Added to the prompt directive.

Changed

  • Relaxed Security for Job and Hook Files: The ownership requirement for files containing command jobs,
    shellpipe filters, or hook files has been expanded to include root ownership, in addition to the current user.

  • ai_google Differ Refinements (BETA):

    • Renamed Prompt Fields (⚠ BETA breaking change): For clarity, old_data and new_data fields in the
      prompt directive have been renamed to old_text and new_text, respectively.
    • Improved Output Quality: Significantly enhanced output quality by revising the default values for
      system_instructions and prompt.
    • Updated Documentation.

Fixed

  • Markdown Handling: Improved handling of links with empty text in the Markdown to HTML converter.
  • image Differ Formatting: Fixed HTML formatting issues within the image differ.

Removed

  • Python 3.9 Support: Support for Python 3.9 has been dropped. As a reminder, older Python versions are supported for 3
    years after being superseded by a new major release (i.e. approximately 4 years after their initial release).

v3.25.0

15 Aug 08:38
Compare
Choose a tag to compare

Added

  • Multiple job files or glob patterns can now be specified by repeating the --jobs argument.
  • Job list filtering using Python regular expression <https://docs.python.org/3/library/re.html#regular-expression-syntax>. Example: webchanges --list blue lists
    jobs with 'blue' in their name (case-sensitive, so not 'Blue'), while webchanges --list (?i)blue is
    case-insensitive <https://docs.python.org/3/library/re.html#re.I>
    .
  • New URL job directive params for specifying URL parameters (query strings), e.g. as a dictionary.
  • New gotify reporter (upstream contribution: link <https://github.com/thp/urlwatch/pull/823/files>__).
  • Improved messaging at startup when a legacy database that requires conversion is found.

Changed

  • Updated ai_google differ to reflect Gemini 1.5 Pro's 2M token context window.

Fixed

  • Corrected the automated handling in differs and reporters of data with a 'text/markdown' MIME type.
  • Multiple wdiff differ fixes and improvements:
    • Fixed body font issues;
    • Removed spurious ^\n insertions;
    • Corrected range_info lines;
    • Added word break opportunities (<wbr>) in HTML output for better browser handling of long lines.
  • deepdiff differ now breaks a list into its individual elements.
  • Improved URL matching for jobs by normalizing %xx escapes and plus signs (e.g. https://www.example.org/El Niño
    will now match https://www.example.org/El+Ni%C3%B1o and vice versa).
  • Improved the text-to-HTML URL parser to accurately extract URLs with multiple parameters.

Internals

  • Replaced requests.structures.CaseInsensitiveDict with httpx.Headers as the Class holding headers.
  • The Job.headers attribute is now initialized with an empty httpx.Headers object instead of None.

v3.24.1

14 Jun 17:31
Compare
Choose a tag to compare

Added

  • Command line argument --rollback-database now accepts dates in ISO-8601 format in addition to Unix timestamps.
    If the library dateutil (not a dependency of webchanges) is found installed, then it will also accept any
    string recognized by dateutil.parser such as date only, time only, date and time, etc. (Suggested
    by Markus Weimar <https://github.com/Markus00000>__ in issue #78 <https://github.com/mborsetti/webchanges/issues/78>__).
  • ai-google differ (BETA) now supports calls to the Gemini 1.5 Pro with 2M tokens model (early access required).

v3.24.0

06 Jun 03:52
Compare
Choose a tag to compare

Added

  • New wdiff differ to perform word-by-word comparisons. Replaces the dependency on an outside executable and
    allows for much better formatting and integration.
  • New system_instructions directive added to the ai-google differ (BETA).
  • Added to the documentation examples on how to use the re.findall filter to extract only the first or last line
    (suggested by Marcos Alano <https://github.com/malano>__ in issue #81 <https://github.com/mborsetti/webchanges/issues/81>__).

Changed

  • Updated the documentation for the ai-google differ (BETA), mostly to reflect billing changes by Google, which is
    still free for most.

Fixed

  • Fixed a data type check in preventing URL jobs' data (for POSTs etc.) to be a list.

v3.23.0

15 May 00:51
Compare
Choose a tag to compare

Changed

  • The ai-google (BETA) differ now defaults to using the new gemini-1.5-flash model (see documentation here <https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash-expandable>), as it still supports
    1M tokens, "excels at summarization" (per here <https://blog .google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/#gemini-model-updates:~:text=1 .5%20flash%20excels%20at%20summarization%2C>
    ), allows for a higher number of requests per minute (in the
    free version, 15 vs. 2 of gemini-1.5-pro), is faster, and, if you're paying for it, cheaper. To continue to
    use gemini-1.5-pro, which may produce more "complex" results, specify it in the job's differ directive.

Fixed

  • Fixed header of deepdiff and image (BETA) differs to be more consistent with the default unified differ.
  • Fixed the way images are handled in the email reporter so that they now display correctly in clients such as Gmail.

Internals

  • Command line argument --test-differs now processes the new mime_type attribute correctly (mime_type is
    an internal work in progress attribute to facilitate future automation of filtering, diffing, and reporting).

v3.22

25 Apr 05:03
Compare
Choose a tag to compare

⚠ Breaking Changes

  • Developers integrating custom Python code (hooks.py) should refer to the "Internals" section below for important
    changes.

Changed

  • Snapshot database

    • Moved the snapshot database from the "user_cache" directory (typically not backed up) to the "user_data" directory.
      The new paths are (typically):

      • Linux: ~/.local/share/webchanges or $XDG_DATA_HOME/webchanges
      • macOS: ~/Library/Application Support/webchanges
      • Windows: %LOCALAPPDATA%\webchanges\webchanges
    • Renamed the file from cache.db to snapshots.db to more clearly denote its contents.

    • Introduced a new command line option --database to specify the filename for the snapshot database, replacing
      the previous --cache option (which is deprecated but still supported).

    • Many thanks to Markus Weimar <https://github.com/Markus00000>__ for pointing this problem out in issue #75 <https://github.com/mborsetti/webchanges/issues/75>__.

  • Modified the command line argument --test-differ to accept a second parameter, specifying the maximum number of
    diffs to generate.

  • Updated the command line argument --dump-history to display the mime_type attribute when present.

  • Enhanced differs functionality:

    • Standardized headers for deepdiff and imagediff to align more closely with those of unified.

    • Improved the google_ai differ:

      • Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
        when Google API errors occur.
      • Improved the default prompt to Analyze this unified diff and create a summary listing only the changes:\n\n{unified_diff} for improved results.

Fixed

  • Fixed an AttributeError Exception when the fallback HTTP client package requests is not installed, as reported
    by yubiuser <https://github.com/yubiuser>__ in issue #76 <https://github.com/mborsetti/webchanges/issues/76>__.
  • Addressed a ValueError in the --test-differ command, a regression reported by Markus Weimar <https://github.com/Markus00000>__ in issue #79 <https://github.com/mborsetti/webchanges/issues/79>__.
  • To prevent overlooking changes, webchanges now refrains from saving a new snapshot if a differ operation fails
    with an Exception.

Internals

  • New mime_type attribute: we are now capturing and storing the data type (as a MIME type) alongside data in the
    snapshot database to facilitate future automation of filtering, diffing, and reporting. Developers using custom
    Python code will need to update their filter and retrieval methods in classes inheriting from FilterBase and
    JobBase, respectively, to accommodate the mime_type attribute. Detailed updates are available in the hooks documentation <https://webchanges.readthedocs.io/en/stable/hooks.html#:~:text=Changed%20in%20version%203.22>__.
  • Updated terminology: References to cache in object names have been replaced with ssdb (snapshot database).
  • Int

v3.21

16 Apr 04:52
Compare
Choose a tag to compare

Added

  • Job selectable differs: The differ, i.e. the method by which changes are detected and summarized, can now be
    selected job by job. Also gone is the restriction to have only unified diffs, HTML table diff, or calling an outside
    executable, as differs have become modular.

    • Python programmers can write their own custom differs using the hooks.py file.
    • Backward-compatibility is preserved, so your current jobs will continue to work.
  • New differs:

    • difflib to report element-by-element changes in JSON or XML structured data.
    • imagediff (BETA) to report an image showing changes in an image being tracked.
    • ai_google (BETA) to use a Generative AI provide a summary of changes (free API key required). We use
      Google's Gemini Pro 1.5 since it is the first model that can ingest 1M tokens, allowing to analyze changes in
      long documents (up to 350,000 words, or about 700 pages single-spaced) such as terms and conditions, privacy
      policies, etc. where summarization adds the most value and which other models can't handle. The differ can call
      the Gen AI model to summarize a unified diff or to find and summarize the differences itself. Also supported is
      Gemini 1.0, but it can handle a lower number of tokens.

Changed

  • Filter absolute_links now converts URLs of the action, href and src attributes in any HTML tag, as
    well as the data attribute of the <object> tag; it previously converted only the href attribute of
    <a> tags.
  • Updated explanatory text and error messages for increased clarity.
  • You can now select jobs to run by using its url/command instead of its number, e.g. webchanges https://test.com is
    just as valid as webchanges 1.

Deprecated

  • Job directive diff_tool. Replaced with the command differ (see here <https://webchanges.readthedocs.io/en/stable/differs.html#command_diff>__.

Fixed

  • webchanges --errors will no longer check jobs who have disabled: true (thanks to yubiuser <https://github.com/yubiuser>__ for reporting this in issue # 73 <https://github.com/mborsetti/webchanges/issues/73>__).
  • Markdown links with no text were not clickable when converted to HTML; conversion now adds a 'Link without text'
    label.

Internals

  • Improved speed of creating a unified diff for an HTML report.
  • Reduced excessive logging from httpx's sub-modules hpack and httpcore when running with -vv.

v3.20.2

16 Mar 23:05
Compare
Choose a tag to compare

Fixed

  • Parsing the to address for the sendmail email reporter.

v3.20.1

16 Mar 05:59
Compare
Choose a tag to compare

Fixed

  • Regression introduced in supporting sending to multiple "to" addresses.

v3.20

15 Mar 08:30
Compare
Choose a tag to compare

Added

  • re.findall filter to extract, delete or replace non-overlapping text using Python re.findall.

Changed

  • --test-reporter now allows testing of reporters that are not enabled; if a reporter is not enabled, a warning
    will be issued. This simplifies testing.
  • email reporter (both SMTP and sendmail) supports sending to multiple "to" addresses.

Fixed

  • Reports from jobs with monospace: true were not being rendered correctly in Gmail.