You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked the repository for duplicate issues.
What feature do you want to see added?
Add poisoned pages to the site which disrupt the scraping done by AI companies
Why do you want to have this feature?
AI companies are doing ever increasing scraping, taking user information, content, etc. without the consent of users or site owners, while also eating into our bandwidth. While we are not the biggest community/site on the planet, we do have quite the number of users and we create tons of user generated content. This content is becoming more and more accessible via our website, which could make it a target for this kind of scraping.
By adding fake/poisoned pages to our site we can try to accomplish 2 things:
Track which requests are coming from automated scrapers (if there's requests to pages which are not publically available, we can be reasonably certain they're from a bot). This both helps with more accurate visit tracking but also could allow us to block requests from these sources
Attempt to disrupt this scraping by giving the bot bad data
While this won't topple OpenAI or bring Facebook to its knees, it could allow us to at least mitigate our users' data from being used like this.
In case you weren't aware, Cloudflare has a setting to block AI scrapers. While it doesn't accomplish all of the goals presented here, it's a quick one-click method and might be worth turning on if AI scraping is something you're concerned about.
In case you weren't aware, Cloudflare has a setting to block AI scrapers. While it doesn't accomplish all of the goals presented here, it's a quick one-click method and might be worth turning on if AI scraping is something you're concerned about.
I was not aware of this. I'll look into that. Though as mentioned it doesn't achieve all goals
Checked Existing
What feature do you want to see added?
Add poisoned pages to the site which disrupt the scraping done by AI companies
Why do you want to have this feature?
AI companies are doing ever increasing scraping, taking user information, content, etc. without the consent of users or site owners, while also eating into our bandwidth. While we are not the biggest community/site on the planet, we do have quite the number of users and we create tons of user generated content. This content is becoming more and more accessible via our website, which could make it a target for this kind of scraping.
By adding fake/poisoned pages to our site we can try to accomplish 2 things:
While this won't topple OpenAI or bring Facebook to its knees, it could allow us to at least mitigate our users' data from being used like this.
Any other details to share? (OPTIONAL)
The idea comes from this user on Twitter https://twitter.com/Sync1211/status/1831825065937400253 who says, quote:
If we decided to go through with this idea, this may be a good place to start.
The text was updated successfully, but these errors were encountered: