Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log import5 script results in all client-ip's to be 0.0.0.0 #319

Open
hanscees opened this issue May 24, 2023 · 4 comments
Open

log import5 script results in all client-ip's to be 0.0.0.0 #319

hanscees opened this issue May 24, 2023 · 4 comments

Comments

@hanscees
Copy link

Hi,
I am using the docker version of matomo obviously. I am loading data into it using the log script like so:

for i in `ls`; do
#echo $i
if  [[ $(stat -c "%A" $i) =~ "w" ]]; then
  echo $i
/usr/bin/python3 /var/lib/docker/volumes/matmoto_matomo/_data/misc/log-analytics/import_logs.py --url=http://192.168.0.61:8080 --debug  --login [email protected] --password "secletvelly" --idsite=1 --recorders=4   $i

  echo "just uploaded this file to webstats: "
  echo $i
fi

This all works file and the dashboard shows all kinds of data.
However, no data where visitors come from in the dashboard.

The default docker matomo image as far as I can see does have geoip2 on board. I have even updated it with my maxmind license.
I have done things like:

docker exec -it matmoto-app-1 php ./console usercountry:attribute 2023-05-13,2023-05-14

and 
docker exec -it matmoto-app-1 php ./console core:invalidate-report-data --dates=2023-04-01,2023-05-14 --sites=1

this gives no errors.
But no geoip data in the dashboard.

The system settings say:
`

Geolocation geoip2php (continent_code, continent_name, country_code, country_name, region_code, region_name, city_name, postal_code, lat, long)
`

Any help would be greatly appreciated.

@hanscees
Copy link
Author

hanscees commented May 27, 2023

so after a few hours of digging it turns out the ../import_logs.py script does not understand my logfiles. So all client-ip's are recorded as 0.0.0.0

this page shows how to write your own regexps script:
https://github.com/matomo-org/matomo-log-analytics/#readme

It would be an improvement if --debug would actually show you some client-ip's.

So now I have to figure out how to empty the whole database because all records are wrong.

My loglines are like this:
51.159.154.15 www.bomengids.nl - - [22/May/2023:00:00:18 +0200] "GET /winter/Hollandse_iep__Ulmus_hollandica__Dutch_Elm@1@img_91 98knop_th.jpg HTTP/1.1" 200 9067 "https://www.bomengids.nl/knop.html" "newspaper/0.2.8"

the regexps is
--log-format-regex='((?P<ip>\S+) (?P<host>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "GET (?P<path>.*?) HTTP/\S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*'

I figured this out because the visits log show wrong url's where it thinks client-ip's are part fo my website
16[78.46.70.104/2022/species/Zomereik__Quercus_robur__English_oak__Sommer-Eiche--Stiel-Eiche__Chene_commun--Chene_dAngleterre.html](https://78.46.70.104/2022/species/Zomereik__Quercus_robur__English_oak__Sommer-Eiche--Stiel-Eiche__Chene_commun--Chene_dAngleterre.html)

@hanscees
Copy link
Author

if the import script in debug more could be fed some loglines where it would log what ity " thinks" is the client-ip and host and url, that would make it better understandable.
It might even suggest to read the url documentation page above.

Spent a lot of time debugging a very long confusing script.
A well I did learn python has got named regexps groups, which is very handy.

A script much more readable is here:
https://github.com/gilbN/geoip2influx

cheers

@hanscees
Copy link
Author

and also with the new regexps it does not work. All visits are from 0.0.0.0 according the the visits log.

what a drag.

@hanscees hanscees changed the title cant get geolocation data to work log import5 script results in all client-ip's to be 0.0.0.0 May 28, 2023
@hanscees
Copy link
Author

see this analysis:
matomo-org/matomo-log-analytics#354

using the log import script with the docker containers results in all client-ip's to be 0.0.0.0.
If my analysis is correct this causes the geoip map to be empty of course.

Please help to find the bug, or where I do things wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant