WikiRef is a free/libre and open source cross-platform tool that helps you verify and archive the references on a MediaWiki instance.
WikiRef is a tool that can analyze MediaWiki references, identify errors, and archive webpage references using the WaybackMachine service, as well as locally archiving YouTube references. By helping you improve the quality of your instance's sourcing, this tool notifies you of any errors it finds, such as dead links, duplicated or malformed references, and more. It is designed to identify issues in your content, rather than fixing them. It provides a report of the identified issues, making it a useful reporting tool.
- Analyze references site-wise, for specific pages or categories
- Check if references URLs are valid and accessible, so you don't have to worry about dead links
- Verify the accessibility of YouTube video references, their online status and permissions
- Generate a script to download those videos if needed
- Archive non-video sources using the WayBack Machine service
- Publish to MediaWiki reporting result
Some false positives are possible, for instance a website with a faulty SSL certificate will trigger an error, or some websites like LinkedIn fight against bot accessing their content. A "whitelist" system is implemented to handle those cases.
If you need anything, have any ideas or find any bugs or issues, let me know through the issue tracker, I'm totally open to new suggestions ! Open an issue or contact me by mail.
The core of the tool is analysis mode, which analyze references and links, then produces JSON used as input for the other modes.
- On Windows, wikiref will be replaced by wikiref.exe
Here's the list of options that apply to all modes.
Flag | Value | Description | ||
---|---|---|---|---|
-v | --verbose | ⬤ | Output more information in the console | |
-s | --silent | ⬤ | No console output | |
-b | --color-blind | ⬤ | Disable coloring of the console output - Compatibility for certain terminal; | |
-t | --throttle | Duration in second | Enable throtteling between request. Required for youtube who blacklist ip making too many request too quickly from datacenters (6second seems to work great) | |
-l | ⬤ | Ouptut console to a log file with the date and time as name. | ||
--log | Filename | Same as -l but with a specific filename | ||
-h | ⬤ | Output the console in an HTMl file with rendering close to colored the console output. | ||
--html | Filename | Same as -h but with a specific filename | ||
--subdir | ⬤ | Place output files in dedicated folders (json, html, log) | ||
-4 | -ipv4 | ⬤ | Force ipv4 DNS resolution and queries for compatibility in certain corner case |
This mode analyzes references and checks the validity of used URLs and generate the json used by the other modes.
Here's the list of options for the Analyze mode.
Flag | Required | Description | ||
---|---|---|---|---|
-a | --api | ⬤ | Url of api.php | |
-n | --namespace | ⬤ (mutually exclusive with page and category) |
The name of the namespace to analyze | |
-c | --category | ⬤ (mutually exclusive with page and namespace) |
The name of the category to analyze | |
-p | --page | ⬤ (mutually exclusive with category and namespace) |
The name of the page to analyze | |
-j | ⬤ | Output the analysis to a file with a generated name based on the date | ||
--json | Filename | Same as -h but with a specific filename | ||
--whitelist | Filename of a json file containing domain to whitelist. |
Analyze all references from pages in the category Science on the wiki https://demowiki.com/
wikiref analyze -a https://demowiki.com/w/api.php -c Science
Analyze all references from the page Informatic on the wiki https://demowiki.com/
wikiref analyze -a https://demowiki.com/w/api.php -p Informatic
Analyze all references from pages in the category Science on the wiki https://demowiki.com/; put the output in a file called "output.json" and output nothing in the console
wikiref analyze -a https://demowiki.com/w/api.php -c Science --json output.json -s
A whitelisting system is present to avoid checking domains that prevents tools like this to check their page status or domains you trust.
You can provide a JSON file using the following format:
[ "google.com", "www.linkedin.com/", "www.odysee.com" ]
The URLs starting with these domains will not be checked by the system to avoid false positive in the report.
This mode relies on the output of the analyze mode and uses yt-dlp for downloading, but other tools can be used as well.
This mode relies on a free/libre open-source and cross-platform tool yt-dlp. Check their document for installation depending on your system here.
The filenames are composed of the video name followed by the YouTube VideoId.
Here's the list of options for the Script mode.
Flag | Required | Value | Description | ||
---|---|---|---|---|---|
-i | --input-json | ⬤ | Filename | Input JSON source from analyze to generate the download script file | |
-d | --directory | ⬤ | Path | The root folder where to put videos. They will then be placed in a subfolder based on the page name. | |
--output-script | Filename | Allow to change the default name of the output script | |||
--tool | ⬤ | Filename | Path to yt-dlp | ||
--tool-arguments | Tool arguments | Default arguments are "-S res,ext:mp4:m4a --recode mp4" to download 480p verison of the videos. | |||
-e | --extension | File extension without the "." | Extension of the video file. | ||
--redownload | ⬤ | Download the video, even if it already exist. | |||
--download-playlist | ⬤ | Download videos in playlist URLS | |||
--download-channel | ⬤ | Download videos in channel URLS |
Generate a script called download.sh to download all videos contained in the analyse file into the folder "video" using yt-dlp
wikiref script -i output.json -d ./videos/ --tool /bin/usr/yt-dlp --output-script download.sh
Note: Under windows, wikiref will be replaced by wikiref‧exe
This mode archive URL through WaybackMachine. YouTube links are not archived through WaybackMachine, due to YouTube not allowing it.
Here's the list of options for the Archive mode.
Flag | Required | Value | Description | ||
---|---|---|---|---|---|
-a | --input-json | ⬤ | Filename | Input JSON source from analyze to get URL to archive. | |
-w | --wait | ⬤ | Wait for confirmation of archival from Archive.org. Use with caution, archive.org might usually take a long time to archive pages. Sometimes many minutes. |
wikiref archive -i output.json
Publish the result of an JSON analysis to a MediaWiki page.
Here's the list of options for the Archive mode.
Flag | Required | Value | Description | ||
---|---|---|---|---|---|
-a | --api | ⬤ | Url of api.php | ||
-i | --input | ⬤ | Filename | Input JSON source from analyze to generate report for. | |
-u | --user | ⬤ | Wiki username used to publish the edit | ||
-p | --password | ⬤ | The user password. | ||
--report-page | ⬤ | Name of the target page where to write the report. |
wikiref publish --api https://monwiki/wiki/api.php -i analyse.json -u Manu404 -p "a$sw0rd" --report-page "Report page"
The binary is availabla as a Self-contained, or "portable" application. A single file with no other dependencies than yt-dlp for downloading youtube videos.
Currently, supported systems are:
- Windows 64-bit & 32-bit from 10 and above.
- Linux 64-bit (Most desktop distributions like CentOS, Debian, Fedora, Ubuntu, and derivatives)
- Linux ARM (Linux distributions running on Arm like Raspbian on Raspberry Pi Model 2+)
- Linux ARM64 (Linux distributions running on 64-bit Arm like Ubuntu Server 64-bit on Raspberry Pi Model 3+)
Although the tool may theoretically support many platforms, it has only been thoroughly tested on a select few. As such, we cannot guarantee optimal functionality on all platforms without proper feedback. To err on the side of caution, any untested systems will be categorized as "not yet supported." However, if you have feedback regarding a system that is not currently listed as a supported platform, please don't hesitate to contact us. We welcome your feedback and will work to improve the tool's functionality across as many platforms as possible.
System | Supported |
---|---|
Windows 10 64 bits 22H2 | ⬤ |
Windows 11 64b bits | Presumed |
Ubuntu Server 22.04 | ⬤ |
Ubuntu Server 20.04 | ⬤ |
Ubuntu Server 18.04 | ⬤ |
Ubuntu Server 18.04 | ⬤ |
Ubuntu Desktop 20.04 | ⬤ |
Debian 11.6 | ⬤ |
Debian 10.13 | ⬤ |
Fedora Workstation 37.1 | ⬤ |
Fedora Server 37.1 | ⬤ |
CentOS Server 7 | ⬤ |
CentOS Desktop 7 | ⬤ |
OpenSuse Tumbleweed | ⬤ |
Raspbian (Raspberry pi) | ⬤ |
Linux Alpine Standard 3.17.3 | Not Supported Yet |
RHEL | Not Supported Yet |
MacOS | Not Supported Yet |
CentOS 7 user needs the following environment variable:
export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
- Linux Alpine Standard 3.17.3 : The software works but produce incomplete results, it can't be considered supported knowing there's error in his output json. Looking into it.
- MacOS: Support for MacOS will be made available in the near future, once it has been tested.
- RHEL systems: While CentOS and Fedora are supported systems, the likelihood of RHEL systems being supported is low.
If you need to build the solution for a system or architecture that is not officially supported, please refer to the "Build for not officially supported system" section located at the end of the page for guidance.
Retreive submodules when cloning the repository as WikiRef.Commons is in a separated repo due to being shared with other projects.
Build is done on Windows using gitbash (normally provided with git) or on Linux. You don't need any IDE like Visual Studio or similar. What you need is:
- Git for Windows: https://git-scm.com/download/win
- DotNet 7 SDK: https://dotnet.microsoft.com/en-us/download/dotnet/7.0
- Zip and bzip2 2 need to be added to your gitbash if you're on windows, (it's really easy to install); here's a straight forward tutorial: https://ranxing.wordpress.com/2016/12/13/add-zip-into-git-bash-on-windows/
Under linux:
- DotNet 7 SDK: https://dotnet.microsoft.com/en-us/download/dotnet/7.0
- Zip command
Use recursive git cloning to get the application submodules.
Once the dependencies and submodules are installed and, you're ready to compile by yourself the project.
-
After installing the dependencies and submodules, you will be able to compile the project on your own. The compilation process relies on two scripts:
- The root script,
build.sh
, contains the project variables and calls a generic build script. You can edit this script to modify the parameters passed to the real build script - it's a straightforward process. - The actual compilation script is
multiplateform_build.sh
. It can take the following parameters. If a parameter is not provided, an interactive prompt will ask you to enter the information.
Parameter Description -t --target The target plateteforme (cfr suppported plateform) -p --project Path to the project file -n --name Project name used for the zip file -v --version Version use for the zip file -e --embeded Produce a SelfContained ("portable") file (default false) -a --all Build all plateform available - The root script,
A clean is done before each build.
The build output is placed in ""./output/build/<plateform>""
A zip containing the build output is placed in "./output/zip/<plateform>.zip"
The zip name use the folllowing convention:
<name>_<version>_<plateform>.zip
- Sadly, WSL has compatibility issues with the "dotnet" command, so it can't being used.
You can build for 'unofficially supported system' using the -p parameter of the build script and using for a platform available in the list here
Example, building for macOS 13 Ventura ARM 64 : "./multiplateform_build -p osx.13-arm64"
If you like this tool, let me know, it's always appreciated, you can contact me by mail .
Also, if you have any comments or would like any request, I'm totally open for them, open an issue or send me an email.
I would like to shout-out BadMulch, BienfaitsPourTous and the community around the PolitiWiki project for which this tool was developed, for their support, greetz, ideas, company during the development when I was streaming it on the discord server and overall moral support.