Releases: mborsetti/webchanges
v3.26.0
Added
-
Python 3.13 Support: webchanges now supports Python 3.13, but complete testing is pending due to dependencies
such haslxml
not having yet published installation packages ("wheels") for 3.13. -
Glob Pattern Support for Hooks Files: The
--hooks
command-line argument now accepts glob patterns for flexible
hook file selection. -
Multiple Hook Specifications: Specify multiple hook files or glob patterns by repeating the
--hooks
argument. -
Enhanced Version Information:
--detailed-versions
now displays the system's default value for
--max-threads
. -
Optional
zstd
Compression: URL jobs withoutbrowser: true
can now utilizezstd
compression for
improved efficiency (requirespip install -U webchanges[zstd]
). -
ai_google
Differ Enhancements (BETA):- New
additions_only
Subdirective: When set to true, generates AI-powered summaries of only the added text. This
is particularly helpful for monitoring pages with regularly added content (e.g., press releases). - New
unified_diff_new
Field: Added to theprompt
directive.
- New
Changed
-
Relaxed Security for Job and Hook Files: The ownership requirement for files containing
command
jobs,
shellpipe
filters, or hook files has been expanded to include root ownership, in addition to the current user. -
ai_google
Differ Refinements (BETA):- Renamed Prompt Fields (⚠ BETA breaking change): For clarity,
old_data
andnew_data
fields in the
prompt
directive have been renamed toold_text
andnew_text
, respectively. - Improved Output Quality: Significantly enhanced output quality by revising the default values for
system_instructions
andprompt
. - Updated Documentation.
- Renamed Prompt Fields (⚠ BETA breaking change): For clarity,
Fixed
- Markdown Handling: Improved handling of links with empty text in the Markdown to HTML converter.
image
Differ Formatting: Fixed HTML formatting issues within theimage
differ.
Removed
- Python 3.9 Support: Support for Python 3.9 has been dropped. As a reminder, older Python versions are supported for 3
years after being superseded by a new major release (i.e. approximately 4 years after their initial release).
v3.25.0
Added
- Multiple job files or glob patterns can now be specified by repeating the
--jobs
argument. - Job list filtering using
Python regular expression <https://docs.python.org/3/library/re.html#regular-expression-syntax>
. Example:webchanges --list blue
lists
jobs with 'blue' in their name (case-sensitive, so not 'Blue'), whilewebchanges --list (?i)blue
is
case-insensitive <https://docs.python.org/3/library/re.html#re.I>
. - New URL job directive
params
for specifying URL parameters (query strings), e.g. as a dictionary. - New
gotify
reporter (upstream contribution:link <https://github.com/thp/urlwatch/pull/823/files>
__). - Improved messaging at startup when a legacy database that requires conversion is found.
Changed
- Updated
ai_google
differ to reflect Gemini 1.5 Pro's 2M token context window.
Fixed
- Corrected the automated handling in differs and reporters of data with a 'text/markdown' MIME type.
- Multiple
wdiff
differ fixes and improvements:- Fixed body font issues;
- Removed spurious
^\n
insertions; - Corrected
range_info
lines; - Added word break opportunities (
<wbr>
) in HTML output for better browser handling of long lines.
deepdiff
differ now breaks a list into its individual elements.- Improved URL matching for jobs by normalizing %xx escapes and plus signs (e.g.
https://www.example.org/El Niño
will now matchhttps://www.example.org/El+Ni%C3%B1o
and vice versa). - Improved the text-to-HTML URL parser to accurately extract URLs with multiple parameters.
Internals
- Replaced
requests.structures.CaseInsensitiveDict
withhttpx.Headers
as the Class holding headers. - The
Job.headers
attribute is now initialized with an emptyhttpx.Headers
object instead of None.
v3.24.1
Added
- Command line argument
--rollback-database
now accepts dates in ISO-8601 format in addition to Unix timestamps.
If the library dateutil (not a dependency of webchanges) is found installed, then it will also accept any
string recognized bydateutil.parser
such as date only, time only, date and time, etc. (Suggested
byMarkus Weimar <https://github.com/Markus00000>
__ in issue#78 <https://github.com/mborsetti/webchanges/issues/78>
__). ai-google
differ (BETA) now supports calls to the Gemini 1.5 Pro with 2M tokens model (early access required).
v3.24.0
Added
- New
wdiff
differ to perform word-by-word comparisons. Replaces the dependency on an outside executable and
allows for much better formatting and integration. - New
system_instructions
directive added to theai-google
differ (BETA). - Added to the documentation examples on how to use the
re.findall
filter to extract only the first or last line
(suggested byMarcos Alano <https://github.com/malano>
__ in issue#81 <https://github.com/mborsetti/webchanges/issues/81>
__).
Changed
- Updated the documentation for the
ai-google
differ (BETA), mostly to reflect billing changes by Google, which is
still free for most.
Fixed
- Fixed a data type check in preventing
URL
jobs'data
(for POSTs etc.) to be a list.
v3.23.0
Changed
- The
ai-google
(BETA) differ now defaults to using the newgemini-1.5-flash
model (see documentationhere <https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash-expandable>
), as it still supports
1M tokens, "excels at summarization" (perhere <https://blog .google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/#gemini-model-updates:~:text=1 .5%20flash%20excels%20at%20summarization%2C>
), allows for a higher number of requests per minute (in the
free version, 15 vs. 2 ofgemini-1.5-pro
), is faster, and, if you're paying for it, cheaper. To continue to
usegemini-1.5-pro
, which may produce more "complex" results, specify it in the job'sdiffer
directive.
Fixed
- Fixed header of
deepdiff
andimage
(BETA) differs to be more consistent with the defaultunified
differ. - Fixed the way images are handled in the email reporter so that they now display correctly in clients such as Gmail.
Internals
- Command line argument
--test-differs
now processes the newmime_type
attribute correctly (mime_type
is
an internal work in progress attribute to facilitate future automation of filtering, diffing, and reporting).
v3.22
⚠ Breaking Changes
- Developers integrating custom Python code (
hooks.py
) should refer to the "Internals" section below for important
changes.
Changed
-
Snapshot database
-
Moved the snapshot database from the "user_cache" directory (typically not backed up) to the "user_data" directory.
The new paths are (typically):- Linux:
~/.local/share/webchanges
or$XDG_DATA_HOME/webchanges
- macOS:
~/Library/Application Support/webchanges
- Windows:
%LOCALAPPDATA%\webchanges\webchanges
- Linux:
-
Renamed the file from
cache.db
tosnapshots.db
to more clearly denote its contents. -
Introduced a new command line option
--database
to specify the filename for the snapshot database, replacing
the previous--cache
option (which is deprecated but still supported). -
Many thanks to
Markus Weimar <https://github.com/Markus00000>
__ for pointing this problem out in issue#75 <https://github.com/mborsetti/webchanges/issues/75>
__.
-
-
Modified the command line argument
--test-differ
to accept a second parameter, specifying the maximum number of
diffs to generate. -
Updated the command line argument
--dump-history
to display themime_type
attribute when present. -
Enhanced differs functionality:
-
Standardized headers for
deepdiff
andimagediff
to align more closely with those ofunified
. -
Improved the
google_ai
differ:- Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
when Google API errors occur. - Improved the default prompt to
Analyze this unified diff and create a summary listing only the changes:\n\n{unified_diff}
for improved results.
- Enhanced error handling: now, the differ will continue operation and report errors rather than failing outright
-
Fixed
- Fixed an AttributeError Exception when the fallback HTTP client package
requests
is not installed, as reported
byyubiuser <https://github.com/yubiuser>
__ inissue #76 <https://github.com/mborsetti/webchanges/issues/76>
__. - Addressed a ValueError in the
--test-differ
command, a regression reported byMarkus Weimar <https://github.com/Markus00000>
__ inissue #79 <https://github.com/mborsetti/webchanges/issues/79>
__. - To prevent overlooking changes, webchanges now refrains from saving a new snapshot if a differ operation fails
with an Exception.
Internals
- New
mime_type
attribute: we are now capturing and storing the data type (as a MIME type) alongside data in the
snapshot database to facilitate future automation of filtering, diffing, and reporting. Developers using custom
Python code will need to update their filter and retrieval methods in classes inheriting from FilterBase and
JobBase, respectively, to accommodate themime_type
attribute. Detailed updates are available in thehooks documentation <https://webchanges.readthedocs.io/en/stable/hooks.html#:~:text=Changed%20in%20version%203.22>
__. - Updated terminology: References to
cache
in object names have been replaced withssdb
(snapshot database). - Int
v3.21
Added
-
Job selectable differs: The differ, i.e. the method by which changes are detected and summarized, can now be
selected job by job. Also gone is the restriction to have only unified diffs, HTML table diff, or calling an outside
executable, as differs have become modular.- Python programmers can write their own custom differs using the
hooks.py
file. - Backward-compatibility is preserved, so your current jobs will continue to work.
- Python programmers can write their own custom differs using the
-
New differs:
difflib
to report element-by-element changes in JSON or XML structured data.imagediff
(BETA) to report an image showing changes in an image being tracked.ai_google
(BETA) to use a Generative AI provide a summary of changes (free API key required). We use
Google's Gemini Pro 1.5 since it is the first model that can ingest 1M tokens, allowing to analyze changes in
long documents (up to 350,000 words, or about 700 pages single-spaced) such as terms and conditions, privacy
policies, etc. where summarization adds the most value and which other models can't handle. The differ can call
the Gen AI model to summarize a unified diff or to find and summarize the differences itself. Also supported is
Gemini 1.0, but it can handle a lower number of tokens.
Changed
- Filter
absolute_links
now converts URLs of theaction
,href
andsrc
attributes in any HTML tag, as
well as thedata
attribute of the<object>
tag; it previously converted only thehref
attribute of
<a>
tags. - Updated explanatory text and error messages for increased clarity.
- You can now select jobs to run by using its url/command instead of its number, e.g.
webchanges https://test.com
is
just as valid aswebchanges 1
.
Deprecated
- Job directive
diff_tool
. Replaced with thecommand
differ (seehere <https://webchanges.readthedocs.io/en/stable/differs.html#command_diff>
__.
Fixed
webchanges --errors
will no longer check jobs who havedisabled: true
(thanks toyubiuser <https://github.com/yubiuser>
__ for reporting this in issue# 73 <https://github.com/mborsetti/webchanges/issues/73>
__).- Markdown links with no text were not clickable when converted to HTML; conversion now adds a 'Link without text'
label.
Internals
- Improved speed of creating a unified diff for an HTML report.
- Reduced excessive logging from
httpx
's sub-moduleshpack
andhttpcore
when running with-vv
.
v3.20.2
Fixed
- Parsing the
to
address for thesendmail
email
reporter.
v3.20.1
Fixed
- Regression introduced in supporting sending to multiple "to" addresses.
v3.20
Added
re.findall
filter to extract, delete or replace non-overlapping text using Pythonre.findall
.
Changed
--test-reporter
now allows testing of reporters that are not enabled; if a reporter is not enabled, a warning
will be issued. This simplifies testing.email
reporter (both SMTP and sendmail) supports sending to multiple "to" addresses.
Fixed
- Reports from jobs with
monospace: true
were not being rendered correctly in Gmail.