Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exclusion based on services by IP #112

Open
matomoto opened this issue Mar 9, 2024 · 15 comments
Open

Add exclusion based on services by IP #112

matomoto opened this issue Mar 9, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@matomoto
Copy link

matomoto commented Mar 9, 2024

Because, there are more Bots as in the 5 Cloud IP list, and this are also not detectable via the User Agent navigator.userAgent or the information inside the navigator.userAgentData.

My experiences with this matter is, that most of this kind of bots have a "Direct view" and less seconds "Visit duration".

To exclude this bots, a filter rule with user setted settings is needed like.

Exclude Viewer with (example):
Direct View = true
Visit Duration < 10 seconds

// setting:
$tsp_filter_start_seconds = get_tsp_filter_start_seconds(); // example: 10 seconds
$tsp_filter_direct_view = get_tsp_filter_direct_view(); // example: true

$view_visit_duration = get_view_visit_duration(); // example: 5 seconds

$track_bool = true;

if (($tsp_filter_direct_view === true) && ($view_visit_duration < $tsp_filter_start_seconds)) {
  $track_bool = false;
}

if ($track_bool === true) {
  // track the visit
} else {
  // don't track the visit
  // the example is here, because:
  // direct view: true
  // start seconds: 10
  // visit duration: 5
}

Furthermore, expand this filter:

Exclude Viewer with (example):
Direct View = false / true
Referrer = Google / Bing / Wikipedia
Visit Duration < x seconds

So, if the referrer is "Google" and the Visit Duration is less than x seconds: dont' track.
This prevent tracking of speedy Website-Hopper (Google → Website[1] →back to Google → Website[2] →back to Google → Website[3] →back to Google ... ).

@snake14
Copy link
Contributor

snake14 commented Mar 10, 2024

Hi @matomoto . Thank you for this enhancement suggestion. It sounds like some good ideas of potential spam criteria that could be added. I am marking this to be reviewed and prioritised by our Product team.

@snake14 snake14 added the enhancement New feature or request label Mar 10, 2024
@snake14 snake14 added this to the For Prioritisation milestone Mar 10, 2024
@AltamashShaikh
Copy link
Contributor

@matomoto Do you use DeviceDetector plugin along with TrackingSpamPrevention plugin ?

@matomoto
Copy link
Author

@AltamashShaikh In my Matomo instance is the DevicesDetection (Core) is Active and the Tracking Spam Prevention (TSP) Plugin is not installed. The TSP is until now not in my interest. I have a little bit own code (PHP/JS) active to prevent website hopper and cloud bots and headless browser. I observe the TSP to date a long time. I had missed (to date) a little bit informations about the plugin and functions. The plugin also needs few enhancement.

Of the basis of new informations:
This issue here is not for a prevent the saving of this viewer in the database (because its not possible), but is for prevent to block this viewer by the creating of the reports.

My solution in this issue is for some time a prevent of loading the Matomo Tracking code to x seconds at the entry page only (not for further pages). The (example) code snippet is here published: https://forum.matomo.org/t/erfahrungsbericht-matomo-tracking-rauschen/46151/30
The problem is, that with this methode, the visit time has a lag of the x seconds on the entry pages. The methode is also not optimal, but it has the advantage to prevent the saving of this viewer in the database. Both in one would be perfect, but probably not possible to include it in matomo.

@AltamashShaikh
Copy link
Contributor

@matomoto Di you try using the timer trigger in MatomoTagManager ? That could solve your problem maybe, but I don't see why we should add this in TrackingSpam as it a short lived user and that would be useful for many to determine the bounce rate.

Screenshot from 2024-03-13 05-36-45

@AltamashShaikh
Copy link
Contributor

@matomoto Can we close this issue now ?

@matomoto
Copy link
Author

@AltamashShaikh I'm still thinking about it. I don't use the Tag Manager and I'm not going to use it either. Respectivelly it is not possible to connect the Trigger/Timer rule with "Direct view" and only on the Entry Pages.
https://help.piwik.pro/support/tag-manager/time-on-a-website-trigger/
Matomo core thinking about to decrease the bounce rate, but for long time and no result. The matomo core is not the right place. The Tracking Spam Prevention Plugin is it.

@AltamashShaikh
Copy link
Contributor

@matomoto Can you please explain how could this short visit be spam for everyone ? The reason for short visit could be many things and just counting that as spam doesn't seem right to me.

@matomoto
Copy link
Author

matomoto commented Mar 19, 2024

@AltamashShaikh Yes, it's not obvious. It results from experience. I have been making the effort to check the IPs of these viewers (Direct view, only 1 page, few seconds visit time) for some time now. They are almost always not IPs from providers, but from hosters or similar companies (secure check).

Website hoppers are not included because they have a referrer (i.e. no direct view). Some Matomo users only like real viewers and do not count website hoppers that are only on the website for a few seconds. But that is a different topic. This is about direct view. These are almost always bots.

In my experience, the "Direct view, only 1 page, few seconds visit time" viewers are mostly bots, just like the cloud bots, only from other clouds/servers. They come very regularly.

@AltamashShaikh
Copy link
Contributor

@matomoto Can you share a list of IPs that are of hosters for our reference?

@matomoto
Copy link
Author

matomoto commented Mar 20, 2024

@AltamashShaikh , yes, but it's not filtered to only this bots and the IPs are shortened, and it's not up to date (more up to month). It's my privat collection.

23.95.251.0/24
27.115.0.0/17
34.64.0.0/10
34.133.64.0/20
34.135.0.0/20
34.172.0.0/17
35.192.0.0/11
35.184.192.0/20
35.226.80.0/20
35.238.64.0/20
35.239.128.0/20
37.19.211.0/24
42.224.0.0/12
54.190.0.0/16
54.212.0.0/16
54.218.0.0/17
65.128.0.0/11
69.160.160.0/24
79.125.0.0/18
82.165.0.0/16
83.149.64.0/18
102.165.0.0/18
136.244.80.0/20
146.148.0.0/17
156.146.49.0/24
163.116.136.0/24
174.235.48.0/20
176.9.46.0/23
180.160.0.0/13
204.101.0.0/16
205.169.39.0/24
207.102.0.0/16
209.170.64.0/18
2a03:2880:1000::/36
2a03:2880:2000::/36
2600:3c00::/32
2600:3c01::/32
2600:4040:4000::/36
2604:f440::/48
2606:54c0::/32
23.229.*.*
23.236.*.*
24.235.*.*
38.68.*.*
38.152.*.*
38.154.*.*
38.170.*.*
66.84.*.*
66.146.*.*
68.65.*.*
68.234.*.*
69.58.*.*
74.84.*.*
93.104.*.*
111.7.*.*
123.6.*.*
138.229.*.*
139.180.*.*
141.164.*.*
142.147.*.*
149.20.*.*
148.59.*.*
152.44.*.*
154.13.*.*
156.252.*.*
162.244.*.*
167.160.*.*
168.91.*.*
172.81.*.*
172.96.*.*
172.245.*.*
192.149.*.*
192.171.*.*
192.186.*.*
192.198.*.*
192.210.*.*
198.20.*.*
198.245.*.*
199.34.*.*
199.250.*.*
205.185.*.*
206.198.*.*
207.182.*.*
208.103.*.*
209.251.*.*
211.95.*.*
101.227.*.*
101.67.*.*
107.127.*.*
142.132.*.*
189.217.*.*
209.141.*.*
216.180.*.*
216.213.*.*

@AltamashShaikh
Copy link
Contributor

@matomoto Thank's for sharing the ips it looks like some of the ips belong to service:datacenter. But on checking quickly it doesn't look like we get this info from the GEOIP db, if we were getting this info, we could have added an option to exclude visit from certain services. The delay part cannot be added in this plugin, for that you need to use a custom approach or the TagManager approach as I shared in above comment.

Screenshot from 2024-03-21 07-29-55

@matomoto
Copy link
Author

@AltamashShaikh You forget, that I have always a own solution (with JavaScript), and other user need it. So, include it in a plugin is a better way as own JavaScript code, because not all users can handle with the JavaScript code solution. And in the end, it is only an option that can used by plugin users. More than Please I can't say here.

@matomoto
Copy link
Author

matomoto commented Mar 22, 2024

I have found a free datacenter IP list: https://github.com/growlfm/ipcat
This includes also few of the Cloud IP Lists/Ranges.
Unfortunatelly the (example) DataCamp Limited is not included, but many others.

There are more such lists available, but only for sale.

@AltamashShaikh
Copy link
Contributor

@AltamashShaikh You forget, that I have always a own solution (with JavaScript), and other user need it. So, include it in a plugin is a better way as own JavaScript code, because not all users can handle with the JavaScript code solution. And in the end, it is only an option that can used by plugin users. More than Please I can't say here.

@matomoto We cannot add that delay JS code in this plugin as its not designed that way..I can keep this issue open and change the title to "Add exclusion based on services by I/P" and in-future if the geoip DB starts returning the data, we can implement this feature, for now I would recommend using your JS implementation

@matomoto matomoto changed the title Add exclusion of Direct view and x seconds Add exclusion based on services by IP Dec 21, 2024
@matomoto
Copy link
Author

@AltamashShaikh , i have yet changed the title of this issue. I have a new idea to solve the problem. Think reverse: I have seen and read about the matomo plugin "Provider". That matches what is needed: Track no bots. Track only humans. Bots don't use providers, but services. Humans uses providers, but not services. Thats perfect.

It is a option needed, to track only viewers that come from a provider, respectivelly have via the provider plugin a provider with a real name. Have a viewer no provider, don't track it.

Track only viewer with a provider name.

https://github.com/matomo-org/plugin-Provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants