-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruleset testing #49
Comments
Yes, this would be cool. |
This seems like a very cool idea. I played around with your example a bit, and I think we may be able to leverage Github actions to run a shell script whenever a ruleset yaml is uploaded or changed to generate and run a test. From your example, I changed I used the following bash script to generate a Playwright test. ./generate_test.sh -i rulesets/ca/_multi-metroland-media-group.yaml > tests/_multi-metroland-media-group.spec.ts
#!/bin/bash
# Command-line argument parsing
while getopts "i:" opt; do
case $opt in
i)
input_file=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
# Check if the input file is provided
if [ -z "$input_file" ]; then
echo "Usage: $0 -i <input_yaml_file>"
exit 1
fi
# Extract information from the "tests" section
url=$(awk '/- url:/ {sub(/- url: /, ""); sub(/^[[:space:]]*/, ""); print}' "$input_file")
domain=$(echo "$url" | awk -F/ '{print $3}')
test=$(awk '/test:/ {sub(/test: /, ""); sub(/^[[:space:]]*/, ""); print}' "$input_file")
# Generate Playwright test script
echo "import { expect, test } from '@playwright/test';"
echo
echo "test('$domain has paywall by default', async ({ page }) => {"
echo " await page.goto('$url');"
echo " await expect($test).toBeVisible();"
echo "});"
echo
echo "test('$domain + Ladder does not have paywall', async ({ page }) => {"
echo " await page.goto('http://localhost:8080/$url');"
echo " await page.waitForLoadState();"
echo " await expect($test).not.toBeVisible();"
echo "});" In the ruleset yaml, I put a Playwright locator in the test portion: tests:
- url: https://www.wellandtribune.ca/news/niagara-region/niagara-transit-commission-rejects-council-request-to-reduce-its-budget-increase/article_e9fb424c-8df5-58ae-a6c3-3648e2a9df66.html
test: page.getByText("This article is exclusive to subscribers.") At the moment, the bash script is pretty limited to just checking if a specified element is or is not visible. If we continue this way, we may want the script to be a bit more general so it can capture other scenarios. This may require anyone contributing a rule to be a bit more explicit in their ruleset tests section, so rather than contribute a Playwright locator they may need to provide the expectation with both a locator and an assertion: test: expect(page.getByText("This article is exclusive to subscribers.")).toBeVisible() Some additional parsing in the bash script could insert a |
Nice work! I've been thinking about how to generate rules for any site, in an automated fashion. One of the main roadblocks is figuring out whether or not a site is paywalled, and to generate a test for it. I wonder if it's as simple as extracting visible text from a page, and asking an LLM whether or not it is paywall text is sufficient. |
@deoxykev I know LLMs are a blunt force object, but you could even use a screenshot instead of text. The headless browsers support this out of the box usually, and visual inspection often is easier than code logic for an LLM. This could even be integrated in a Docker composition with |
As per everywall/ladder-rules#3, we'll need to implement some robust testing.
The main challenge is to test after client-side JS rendering happens, which will probably mean we'll need a headless browser.
A test could look like this: https://github.com/everywall/ladder/blob/ladder_tests/tests/tests/www-wellandtribune-ca.spec.ts
And the results like this.
Perhaps we'll need some codegen in order to go from ruleset to test?
The text was updated successfully, but these errors were encountered: