Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStreetMap contributor metrics #13

Open
dhimmel opened this issue Dec 11, 2024 · 2 comments
Open

OpenStreetMap contributor metrics #13

dhimmel opened this issue Dec 11, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@dhimmel
Copy link
Owner

dhimmel commented Dec 11, 2024

It would be great to have some stats like:

  • XX OpenStreetMap accounts have contributed to ski areas, with YY changesets within the last year
  • Top OSM contributor per ski area
  • Distribution of age of last edit for ski area nodes and ways

Best place to start is likely extracting the OpenSkiMap OSM sources information.

@dhimmel dhimmel added the help wanted Extra attention is needed label Dec 23, 2024
@dhimmel
Copy link
Owner Author

dhimmel commented Dec 24, 2024

Here are some example run/ski area sources (code below):

It would be great to get the history of changesets for each way and relation containing:

  • changeset id
  • user
  • date
  • counts of nodes and ways changed (less essential)
Expand for code
import polars as pl
from openskistats.analyze import load_runs_pl, load_ski_areas_pl
import random

run_sources = (
    load_runs_pl()
    .explode("run_sources")
    .select(
        "run_id",
        "run_name",
        "ski_area_ids",
        pl.col("run_sources").alias("run_source"),
    )
    .collect()
)

ski_area_sources = (
    load_ski_areas_pl()
    .explode("ski_area_sources")
    .select(
        "ski_area_id",
        "ski_area_name",
        pl.col("ski_area_sources").alias("ski_area_source"),
    )
    .filter(pl.col("ski_area_source").str.starts_with("https://www.openstreetmap.org"))
)

osm_sources = sorted(set(ski_area_sources["ski_area_source"]) | set(run_sources["run_source"]))

for x in sorted(random.Random(4).sample(osm_sources, 10)):
    print(f"- {x}")

@dhimmel
Copy link
Owner Author

dhimmel commented Dec 26, 2024

Noting that there are potentially two ways to approach this:

  1. via single-way/relation queries to api.openstreetmap.org as proposed in Get changsets for the osm ways and relations #27
  2. By downloading history-latest.osm.pbf, currently 133 GB and using a tool to subset/process this dataset like osmium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant