From d38d341a8917383119d3a1a4c7402fe05f85ddda Mon Sep 17 00:00:00 2001 From: Jarell <91372088+jarelllama@users.noreply.github.com> Date: Wed, 3 Apr 2024 19:52:08 +0800 Subject: [PATCH] Update update_readme.sh --- functions/update_readme.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions/update_readme.sh b/functions/update_readme.sh index d1302d067..f9a516c0a 100644 --- a/functions/update_readme.sh +++ b/functions/update_readme.sh @@ -120,7 +120,7 @@ The domain retrieval process for all sources can be viewed in the repository's c ## Filtering process - The domains collated from all sources are filtered against a whitelist (scam reporting sites, forums, vetted stores, etc.) - The domains are checked against the [Tranco Top Sites Ranking](https://tranco-list.eu/) for potential false positives which are then vetted manually -- Common subdomains like 'www' are removed to make use of wildcard matching for all other subdomains. See the list of checked subdomains here: [subdomains.txt](https://github.com/jarelllama/Scam-Blocklist/blob/main/config/subdomains.txt) +- Common subdomains like 'www' are removed to make use of wildcard matching for all other subdomains - Redundant entries are removed via wildcard matching. For example, 'sub.spam.com' is a wildcard match of 'spam.com' and is, therefore, redundant and is removed. Many of these wildcard domains also happen to be malicious hosting sites - Only domains are included in the blocklist; IP addresses are manually checked for resolving DNS records and URLs are stripped down to their domains