-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Anti Spam Tools
This document is quickly becoming deprecated, please instead view the DDOS section of our Disaster Recovery & First Responder's Guide
The following was originally published by Giovanni Damiola @gdamdam via http://gio.blog.archive.org/2016/03/10/ol-anti-spam-tools. Gio writes:
I’ve added the common words found in the recent spam to the spam words blacklisted mail.com as almost all of the spam was coming from that domain. This may stop some genuine people from registering and making edits. blocked and reverted edits lot of accounts
Other approaches:
On ol-db1
investigate volume and patterns:
select * from store where key like 'account/%/verify' order by id desc limit 50;
Check nginx access logs for common vectoros on ol-www1
sudo cat /var/log/nginx/access.log | grep "/people"
sudo cat /var/log/nginx/access.log | grep "/account/create"
Sam's magic sauce:
netstat -n | /home/samuel/work/reveal-abuse/mktable
sudo cat /var/log/nginx/access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail -n 10 | /home/samuel/work/reveal-abuse/reveal | /home/samuel/work/reveal-abuse/shownames
First, ssh over to ol-www0
(which is the entry point for all traffic) and determine who the bad actor(s) are. Because we anonymize IPs, you'll first have to populate a map of anonymous IPs to IPs we can actually block:
ssh -A ol-www1
netstat -n | /home/samuel/work/reveal-abuse/mktable # XXX this should probably be added to `olsystem`, see: https://github.com/internetarchive/olsystem/issues/45
Then run:
sudo tail -n 5000 /var/log/nginx/access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail -n 10 | /home/samuel/work/reveal-abuse/reveal | /home/samuel/work/reveal-abuse/shownames
Or...
sudo tail -n 250000 /1/var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 30
At this point, see nginx.conf, you can add the IPs to /olsystem/etc/nginx/deny.conf or add classes of IPs or user-agents to /etc/nginx/sites-available/openlibrary.conf
, e.g.:
if ($http_user_agent ~* (Slurp|Yahoo|libwww-perl|Java)) {
return 403;
}
Or, you can block on a per-IP basis in /opt/openlibrary/olsystem/etc/nginx/deny.conf
.
Getting Started & Contributing
- Setting up your developer environment
- Using
git
in Open Library - Finding good
First Issues
- Code Recipes
- Testing Your Code, Debugging & Performance Profiling
- Loading Production Site Data ↦ Dev Instance
- Submitting good Pull Requests
- Asking Questions on Gitter Chat
- Joining the Community Slack
- Attending Weekly Community Calls @ 9a PT
- Applying to Google Summer of Code & Fellowship Opportunities
Developer Resources
- FAQs: Frequently Asked Questions
- Front-end Guide: JS, CSS, HTML
- Internationalization
- Infogami & Data Model
- Solr Search Engine Manual
- Imports
- BookWorm / Affiliate Server
- Writing Bots
Developer Guides
- Developing the My Books & Reading Log
- Developing the Books page
- Understanding the "Read" Button
- Using cache
- Creating and Logging into New Users
- Feature Flagging
Other Portals
- Design
- Librarianship
- Communications
- Staff (internal)
Legacy
Old Getting Started
Orphaned Editions Planning
Canonical Books Page