Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only display unique hits in badge #61

Open
christian-draeger opened this issue Apr 12, 2019 · 6 comments
Open

Only display unique hits in badge #61

christian-draeger opened this issue Apr 12, 2019 · 6 comments

Comments

@christian-draeger
Copy link

christian-draeger commented Apr 12, 2019

Hey,
i'm wondering if it's possible to configure the badge to only show unique hits (over a day or specific period of time).
This option would give a more meaningful hint to repo owners that are having the "hits-badge" added to their readme files.

As I saw you already storing the hit data including a hash (consisting of user-agent, IP and timestamp) which in turn means you already know if a hit is unique or not - why not adding this option to the badge? :)

for instance something like http://hits.dwyl.io/skrapeit/skrape.it.svg?uniquePerDay would be handy.

i think what bothers me the most is that every time you adjust your readme (and have the hit-badge included in the readme) IntelliJ's preview mode for md files will render the readme. This causes every keystroke to increment the hit count. Which leads to the issue that the hit-count is extremely falsified and no longer has any significance to the repo owner.
here is a screen record that describes the problem:
hit-count-issue-intellij

@nelsonic
Copy link
Member

nelsonic commented Nov 3, 2021

@SimonLab Do you think you could take a look at this?
It should just be a matter of updating the query to be DISTINCT.
Watch out for this gotcha: https://stackoverflow.com/questions/11250253/postgresql-countdistinct-very-slow
And if we can do it using idiomatic Ecto (without returning all the data to the Phoenix App and wasting bandwidth) it would be even better!

@SimonLab
Copy link
Member

SimonLab commented Nov 3, 2021

@nelsonic Looking at this now
At the moment the count value is returned when a hit is inserted.
So we'll need to update the aggregate function:

hits/lib/hits/hit.ex

Lines 24 to 30 in b60b329

Repo.aggregate(
from(h in __MODULE__,
where: h.repo_id == ^repository_id
),
:count,
:id
)

I'm thinking of adding a filter query parameters to the url (as mention on the first comment): ...svg?filter=day

@nelsonic
Copy link
Member

nelsonic commented Nov 3, 2021

@SimonLab good plan. The only thing to note is at present the Ecto Repo.aggregate query is somehow being ultra-inefficient and thus consuming a ton of bandwidth; see: dwyl/learn-devops#72 (comment)
Basically we might need to write a "Raw SQL" query to avoid sending all the data from PostgreSQL to Phoenix each time ... 💭

@nelsonic
Copy link
Member

nelsonic commented Nov 3, 2021

We really want to streamline the query to do the COUNT DISTINCT on the PostgreSQL server to only return the int value rather than all the data. 🤞

@SimonLab
Copy link
Member

SimonLab commented Nov 3, 2021

I think the aggregate Ecto query is already using the count distinct and only the count is sent back to Phoenix. This is the generated query:

SELECT count(s0."id") FROM (SELECT DISTINCT ON (sh0."useragent_id") sh0."id" AS "id" FROM "hits" AS sh0 WHERE (sh0."repo_id" = $1)) AS s0 [1]

@nelsonic
Copy link
Member

nelsonic commented Nov 3, 2021

@SimonLab cool. thanks for confirming. 👍

SimonLab added a commit that referenced this issue Nov 3, 2021
@SimonLab SimonLab removed their assignment Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants