-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce false positives (those not caused by WAFs or bot detection) #2068
Conversation
The following targets were fixed: Archive[.]org CGTrader CNET Contently IFTTT Linktree xHamster The following targets were removed: HexRPG (auth wall) ModelHub (defunct) Oracle Communities (auth wall) ModelHub was not added to ./removed_sites.md as the platform itself is shutting down (and will therefore never return to Sherlock). The other removed targets were documented normally. BitcoinForum is currently down and suspected to be defunct. Since this is uncertain, however, a test condition was added to suppress false positives while allowing for normal operation upon the forum's return.
Error codes module expanded to support arrays of error codes rather than only one. Using this new functionality, Slides was set to error codes 404 (as standard) AND 204 (non standard), to accomodate for that website's odd edge case.
040100e
to
b6564a8
Compare
Attempts were met with a Varnish error page presenting 54113 (possibly Fastly related). Change to User Agent necessary to avoid Varnish/Fastly issues. Change to Accept necessary to avoid infinite 302 redirection. Without BOTH of these changes, attempts will fail. Both changes being made also permit the use of status_code rather than message.
A lil bit larger of a pr than first expected........ I think that's all of em, though... Well, I hope that's all of em For general testing purposes, ppfeister:rc/combopoc currently reflects master with this, #2069, #2070, and #2092 all merged. (ppfeister:rc/combopoc itself has a messy merge history though -- don't use that branch for any merging upstream) Cheers. |
I've noticed that many bot-detection pages are able to be avoided by using this UA. Unless there's a reason to stay on the old old old one, we may as well update it and reduce our WAF hits.
Thank you so much @ppfeister for fixing all of this, I really appriciate it! |
The following targets were fixed or partially fixed:
Archive[.]org (eh. sometimes. it has a loading problem that causes other F+s... some were fixed)
Archive of Our Own
CGTrader
CNET
Contently
Eintracht Frankfurt Forum
GeeksForGeeks
Genius[.]com Artists
Genius[.]com Users
Gumroad
HackerNews
HackerRank
IFTTT
Kongregate
Linktree
OpenStreetMap
Pinkbike
Polymart (often hits bot detection, but can be bypassed with some proxies)
Slides
Splits[.]io
Strava
Telegram
xHamster
YandexMusic
eintracht
jeuxvideo
The following targets were removed:
BitcoinForum (likely defunct)
G2G
HexRPG (auth wall)
Metacritic
ModelHub (defunct)
Oracle Communities (auth wall)
Misc changes:
Default User Agent updated
ModelHub was not added to ./removed_sites.[md|json] as the platform itself is confirmed to be shutting down (and will therefore never return to Sherlock). The other removed targets were documented normally.
Fiverr, Euw, NationStates Nations, NationStates Regions, and a couple others remain occasionally problematic as they are behind WAFs and bot detection services. These WAF-induced false positives are resolved with sister PR #2069 and have partial support for a decent proxy being discussed in #2081.
Issues:
Fixes #904
Fixes #1966
Fixes #1999 (with sister PR #2069)
Fixes #2027 (with sister PR #2069)
Fixes #2071
Collateral (trumped or negated):
Closes #1843 // Removes jeuxvideo instead of applying a fix (a fix is being applied here).
Closes #2083 // Removes ModelHub, which is an action taken here. Would be negated by merge.
Closes #2096 // Fixes Archive[.]org, which is an action taken here. Would be negated by merge.