Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not render contact form if it is requested directly by Spammers #420

Open
adminBTI opened this issue Jun 15, 2021 · 9 comments
Open

Do not render contact form if it is requested directly by Spammers #420

adminBTI opened this issue Jun 15, 2021 · 9 comments
Labels
Needs Research Needs further research and discussion on implementation

Comments

@adminBTI
Copy link

adminBTI commented Jun 15, 2021

Despite the anti-spam honeypot, I keep getting spam emails bothering me to renew my domain name or buy their "contact us" form spam services.

Legitimate users would access my website first and then go to the /contact page to submit the form. How can I display an error page instead of the "contact form" for those spammy clients who are hitting my /contact page directly?

I do not want to insert any filtering rules on the frontend webserver, because referrer can be spoofed and Codered sends csrftoken cookie only for form submission. Some commercial proxy servers even remove referrer header in http request.

Is it possible to use a different honeypot field for each client? Such as, different radio and input fields?

@adminBTI adminBTI added the Type: Enhancement New feature or functionality change label Jun 15, 2021
@adminBTI adminBTI changed the title Do not render /contact if it is requested directly by spammers Do not render contact form if it is requested directly by Spammers Jun 15, 2021
@thenewguy
Copy link
Contributor

How would you detect these spammers? It would not be unexpected for someone to come to your "Contact Us" page directly from Google.

Perhaps you should look at adding a captcha message to your form if you want to make it more difficult for spammers? This one is very effective: https://www.google.com/recaptcha/about/

@adminBTI
Copy link
Author

adminBTI commented Jun 18, 2021

I exclude /contact page from search engine crawling! Also, in my /contact page, I only display the actual form, no other useful content.

Google's products are meant for its own employees, not for others. If you don't agree with me, then you are still young. I don't know about you, but I often cannot solve these Google Captchas! Is a tiny corner of traffic light box, which is not the actual light, still considered a traffic light? Is the box with just the tip of bicycle handle bar not to be counted? I don't know.

A lot of my website visitors are not from California. So, they would not agree to the jurisdiction of California (a requirement for using Google Captcha. Read their T&C and Privacy policies). A lot of websites are infected with dependent links to Google's fonts, scripts, captchas etc..

I use the following in my nginx.conf to block Contact form spam on a DjangoCMS site.. (It will work for Coderedcms once I figure out how to add language middleware):

# 1. Make sure that /en/contact/ is excluded in robots.txt
# 2. If LANGUAGE_COOKIE_NAME is not django_language (default), change accordingly
set $var "$uri$cookie_django_language";
if ($var = "/en/contact/") { return 404; }

@thenewguy
Copy link
Contributor

It sounds like you've got it figured out. Best of luck 👍

@onaralili
Copy link

I exclude /contact page from search engine crawling! Also, in my /contact page, I only display the actual form, no other useful content.

Google's products are meant for its own employees, not for others. If you don't agree with me, then you are still young. I don't know about you, but I often cannot solve these Google Captchas! Is a tiny corner of traffic light box, which is not the actual light, still considered a traffic light? Is the box with just the tip of bicycle handle bar not to be counted? I don't know.

A lot of my website visitors are not from California. So, they would not agree to the jurisdiction of California (a requirement for using Google Captcha. Read their T&C and Privacy policies). A lot of websites are infected with dependent links to Google's fonts, scripts, captchas etc..

I use the following in my nginx.conf to block Contact form spam on a DjangoCMS site.. (It will work for Coderedcms once I figure out how to add language middleware):

# 1. Make sure that /en/contact/ is excluded in robots.txt
# 2. If LANGUAGE_COOKIE_NAME is not django_language (default), change accordingly
set $var "$uri$cookie_django_language";
if ($var = "/en/contact/") { return 404; }

Some bots simply visits a website and starts crawling instead of directly coming from a search engine. Also this won't prevent manual spam. As an alternative approach would be to integrate spam filtering API like OOPSpam which is GDPR compliant.

@murty2
Copy link

murty2 commented Jul 9, 2021

Yes, I know some bots will access my website directly. Almost all such bots are upto no good anyway because those are looking to spam.

OOPSpam is a commercial solution and you may be trying to promote a commercial solution on this open source page.

Some bots simply visits a website and starts crawling instead of directly coming from a search engine. Also this won't prevent manual spam. As an alternative approach would be to integrate spam filtering API like OOPSpam which is GDPR compliant.

@onaralili
Copy link

Yes, I know some bots will access my website directly. Almost all such bots are upto no good anyway because those are looking to spam.

OOPSpam is a commercial solution and you may be trying to promote a commercial solution on this open source page.

Some bots simply visits a website and starts crawling instead of directly coming from a search engine. Also this won't prevent manual spam. As an alternative approach would be to integrate spam filtering API like OOPSpam which is GDPR compliant.

I was replying to @adminBTI comment.

It is true that OOPSpam is commercial and that is how it can offer to be privacy-friendly unlike privacy nightmare reCaptcha. Other anti-spam services like Akismet are commercial and they tend to be commercial to keep operation going.

If privacy non-issue for you and looking for free alternative reCaptcha or simple heuristic spam words check would work.

@vsalvino
Copy link
Contributor

Great discussion; chiming in on the various suggestions in this thread:

I think the only possible "true" solution is to make the forms flexible enough to integrate with a commercial spam checker such as Google reCaptcha or some of the other products mentioned in this thread. We would probably be inclined to support Google out of the box, after entering an API key in the wagtail settings, and provide a hook for others to implement their own.

The honeypot method would remain the default as it is simple and free. I would like to improve it a bit, but without seeing spambot behavior it is difficult to know how they are getting through it. We could potentially implement a rate limiter to prevent a single IP from submitting the form X number of times per minute.

As for the suggestion about blocking direct hits to the URL with no referer, that is something we would never directly support, as it is a very common use case (e.g. sending a link to the form URL in an email). But if it works for your individual site, the nginx or django middleware methods referenced should be sufficient, without requiring any changes to coderedcms.

@murty2
Copy link

murty2 commented Jul 21, 2021

Please consider implementing

  1. Simple captcha https://github.com/mbi/django-simple-captcha
  2. Two honeypot fields that change for each request. For example, one radio and another short text for one request and then two multi-select for another request

I am not sure rate limiting at the application that caches is a good idea. Webservers and firewalls are better for that.

A lot of Ubuntu kids seem to use Fail2ban but I am more inclined to use something like SSHguard to block or rate limit IPs. Even with ipset module, it does take 100MB or so of memory to block or rate limit IPs in firewall, but I can take this as a decent compromise when compared to full fledged WAF

I wrote a firewall level script that blacklists which stopped spam. https://github.com/murty2/blacklist But ideally, I would like to know how a form data can be checked by spamassasin or dspam filter process (similar to how email is checked before delivering)

The honeypot method would remain the default as it is simple and free. I would like to improve it a bit, but without seeing spambot behavior it is difficult to know how they are getting through it. We could potentially implement a rate limiter to prevent a single IP from submitting the form X number of times per minute.

@vsalvino vsalvino added Needs Research Needs further research and discussion on implementation and removed Type: Enhancement New feature or functionality change labels Aug 2, 2021
@vsalvino
Copy link
Contributor

Posting here for future reference: May have found a good open source captcha package we could integrate with: https://django-simple-captcha.readthedocs.io/en/latest/index.html

@pppls pppls mentioned this issue Oct 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Research Needs further research and discussion on implementation
Projects
None yet
Development

No branches or pull requests

5 participants