Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Metrics #290

Open
kamicut opened this issue Apr 12, 2023 · 2 comments
Open

Simple Metrics #290

kamicut opened this issue Apr 12, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@kamicut
Copy link
Member

kamicut commented Apr 12, 2023

We would like to create a simple metrics page:

  • Aggregation of SQL statements
  • Write out to CSV or JSON
  • cronnable / uploadable to s3 keyed by date
  • capture the state on s3 with a "latest file" URL
  • Example metrics: number of users, number of each entity (nodes, ways, relations)
  • Docker file so that script can run as a command within osm-seed
  • Nice to have: connect a jupyter style notebook to the CSV

If we want to graph things "over time", we will need to aggregate the CSV "snapshots" of the SQL across each metric across a time period: we'll need some kind of materialized view into those CSVs so that it's efficient (instead of downloading all the CSVs).

Question / nice to have: how do we backfill the CSV snapshots?

cc @geohacker @batpad

@danrademacher
Copy link

Is this underway, @kamicut ?

@kamicut
Copy link
Member Author

kamicut commented Apr 19, 2023

This work is currently under way @danrademacher. I'd like to share a couple of the simple metrics that I've experimented with by querying the database dump:

Nodes/Ways/Relations over time (cumulative)

image

Nodes over time

image

NB: I've removed the entity counts from the initial import so that the charts show a sense of progression otherwise the charts become unreadable. This is because the node size of the initial import is really big compared to subsequent changes.

Other metrics / next steps

This type of chart can be done similarly for users over time.

Other metrics that I'd like to explore:

  1. Top 10 active users
  2. Changeset comments / word frequency
  3. Most active areas using changeset boundaries (over time)

My immediate next steps are to create a script that will generate the data for these charts that we could potentially schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants