-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get changsets for the osm ways and relations #27
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,84 @@ | ||||||||
import requests | ||||||||
import xml.etree.ElementTree as ET | ||||||||
import polars as pl | ||||||||
from openskistats.analyze import load_runs_pl, load_ski_areas_pl | ||||||||
|
||||||||
import re | ||||||||
|
||||||||
def parse_osm_url(url: str): | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. return type annotation missing |
||||||||
"""Extract type and ID from an OpenStreetMap URL.""" | ||||||||
match = re.match(r"https://www\.openstreetmap\.org/(way|relation)/(\d+)", url) | ||||||||
if match: | ||||||||
osm_type, osm_id = match.groups() | ||||||||
return osm_type, int(osm_id) | ||||||||
return None, None | ||||||||
|
||||||||
def fetch_changeset_history(osm_type: str, osm_id: int): | ||||||||
"""Fetch the changeset history for a given OSM entity.""" | ||||||||
url = f"https://api.openstreetmap.org/api/0.6/{osm_type}/{osm_id}/history" | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like our usage is borderline according to their policies https://operations.osmfoundation.org/policies/api/. If going this route, we should rate limit requests and set up some method of caching. For a complex decision with many ramifications, like the caching method, consider discussion on multiple options, either here or in the issue prior to implementation. We should identify ourselves as well, similar to openskistats/openskistats/openskimap_utils.py Lines 68 to 70 in 3d0e496
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, will think about this a bit more, and discuss. |
||||||||
response = requests.get(url) | ||||||||
response.raise_for_status() | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm including the response from https://api.openstreetmap.org/api/0.6/relation/6981791/history for reference: Expand for xml response<osm version="0.6" generator="openstreetmap-cgimap 2.0.1 (3288085 spike-08.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
<relation id="6981791" visible="true" version="1" changeset="46092405" timestamp="2017-02-14T23:01:29Z" user="BK_man" uid="242352">
<member type="way" ref="474283680" role=""/>
<member type="way" ref="474283683" role=""/>
<member type="way" ref="474283679" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="2" changeset="46385760" timestamp="2017-02-25T05:49:23Z" user="BK_man" uid="242352">
<member type="way" ref="474283680" role=""/>
<member type="way" ref="474283679" role=""/>
<member type="way" ref="474283683" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="3" changeset="94749008" timestamp="2020-11-25T06:09:29Z" user="David Sanderson" uid="7679993">
<member type="way" ref="474283680" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="4" changeset="94970016" timestamp="2020-11-29T05:36:54Z" user="David Sanderson" uid="7679993">
<member type="way" ref="878819927" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="878819926" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="5" changeset="95638024" timestamp="2020-12-10T19:40:18Z" user="David Sanderson" uid="7679993">
<member type="way" ref="878819926" role=""/>
<member type="way" ref="878819927" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="877632846" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="6" changeset="127640460" timestamp="2022-10-16T22:54:39Z" user="SMS03" uid="15395108">
<member type="way" ref="878819926" role=""/>
<member type="way" ref="1104527478" role=""/>
<member type="way" ref="878819927" role=""/>
<member type="way" ref="1104527481" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="877632846" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
</osm> Let's capture a bit more of this response including |
||||||||
|
||||||||
# Parse the XML response | ||||||||
root = ET.fromstring(response.content) | ||||||||
changesets = [] | ||||||||
for element in root.findall(f"./{osm_type}"): | ||||||||
changesets.append({ | ||||||||
"changeset_id": element.attrib.get("changeset"), | ||||||||
"user": element.attrib.get("user"), | ||||||||
"timestamp": element.attrib.get("timestamp"), | ||||||||
}) | ||||||||
return changesets | ||||||||
|
||||||||
def process_batch(batch: pl.DataFrame) -> pl.DataFrame: | ||||||||
"""Process a batch of OSM URLs to fetch their changeset histories.""" | ||||||||
changeset_records = [] | ||||||||
for url in batch["osm_url"]: | ||||||||
osm_type, osm_id = parse_osm_url(url) | ||||||||
if osm_type and osm_id: | ||||||||
history = fetch_changeset_history(osm_type, osm_id) | ||||||||
for record in history: | ||||||||
record["osm_url"] = url | ||||||||
record["osm_type"] = osm_type | ||||||||
record["osm_id"] = osm_id | ||||||||
changeset_records.append(record) | ||||||||
return pl.DataFrame(changeset_records) | ||||||||
|
||||||||
def get_changesets_for_runs_and_ski_areas() -> pl.DataFrame: | ||||||||
run_sources = ( | ||||||||
load_runs_pl() | ||||||||
.explode("run_sources") | ||||||||
.select( | ||||||||
"run_id", | ||||||||
"run_name", | ||||||||
"ski_area_ids", | ||||||||
pl.col("run_sources").alias("run_source"), | ||||||||
) | ||||||||
.collect() | ||||||||
) | ||||||||
|
||||||||
ski_area_sources = ( | ||||||||
load_ski_areas_pl() | ||||||||
.explode("ski_area_sources") | ||||||||
.select( | ||||||||
"ski_area_id", | ||||||||
"ski_area_name", | ||||||||
pl.col("ski_area_sources").alias("ski_area_source"), | ||||||||
) | ||||||||
.filter(pl.col("ski_area_source").str.starts_with("https://www.openstreetmap.org")) | ||||||||
) | ||||||||
|
||||||||
osm_urls = sorted(set(ski_area_sources["ski_area_source"].to_list()) | set(run_sources["run_source"].to_list())) | ||||||||
|
||||||||
osm_urls_df = pl.DataFrame({"osm_url": osm_urls}) | ||||||||
|
||||||||
# Process the OSM URLs using a map function | ||||||||
changeset_pl_df = osm_urls_df.select( | ||||||||
pl.col("osm_url").map_elements( | ||||||||
lambda url: process_batch(pl.DataFrame({"osm_url": [url]})), | ||||||||
return_dtype=pl.Object | ||||||||
).alias("changeset_data") | ||||||||
).explode("changeset_data") | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is incorrect, could you help me @dhimmel on the polars syntax to parallelize / collect the calls to process_batch? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes, will help. But I think first we should collect all XML responses prior to polars with some sort of persistent caching. We could proceed with a dev sample of 100 or so OSM elements. Once we have a database/file with the XML, we can then figure read all records into polars, but it will make more sense to handle requests outside of the polars dataframe creation I think. |
||||||||
|
||||||||
# Display the resulting DataFrame | ||||||||
return changeset_pl_df | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make sure pre-commit hooks are installed and then run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, yah the CI failures are on pre-commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this file to
openstreetmap.py