-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get changsets for the osm ways and relations #27
base: main
Are you sure you want to change the base?
Conversation
…how changeset_pl_df parallelizes ... still testing this in batch. tested the other methods on, for example: https://www.openstreetmap.org/relation/6981791
Maybe you can run it on a cloud instance or something and put the file up. Will take a while to do all the Api calls. |
|
||
import re | ||
|
||
def parse_osm_url(url: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type annotation missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this file to openstreetmap.py
|
||
def fetch_changeset_history(osm_type: str, osm_id: int): | ||
"""Fetch the changeset history for a given OSM entity.""" | ||
url = f"https://api.openstreetmap.org/api/0.6/{osm_type}/{osm_id}/history" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like our usage is borderline according to their policies https://operations.osmfoundation.org/policies/api/.
If going this route, we should rate limit requests and set up some method of caching. For a complex decision with many ramifications, like the caching method, consider discussion on multiple options, either here or in the issue prior to implementation.
We should identify ourselves as well, similar to
openskistats/openskistats/openskimap_utils.py
Lines 68 to 70 in 3d0e496
headers = { | |
"From": "https://github.com/dhimmel/openskistats", | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will think about this a bit more, and discuss.
"""Fetch the changeset history for a given OSM entity.""" | ||
url = f"https://api.openstreetmap.org/api/0.6/{osm_type}/{osm_id}/history" | ||
response = requests.get(url) | ||
response.raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm including the response from https://api.openstreetmap.org/api/0.6/relation/6981791/history for reference:
Expand for xml response
<osm version="0.6" generator="openstreetmap-cgimap 2.0.1 (3288085 spike-08.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
<relation id="6981791" visible="true" version="1" changeset="46092405" timestamp="2017-02-14T23:01:29Z" user="BK_man" uid="242352">
<member type="way" ref="474283680" role=""/>
<member type="way" ref="474283683" role=""/>
<member type="way" ref="474283679" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="2" changeset="46385760" timestamp="2017-02-25T05:49:23Z" user="BK_man" uid="242352">
<member type="way" ref="474283680" role=""/>
<member type="way" ref="474283679" role=""/>
<member type="way" ref="474283683" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="3" changeset="94749008" timestamp="2020-11-25T06:09:29Z" user="David Sanderson" uid="7679993">
<member type="way" ref="474283680" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="4" changeset="94970016" timestamp="2020-11-29T05:36:54Z" user="David Sanderson" uid="7679993">
<member type="way" ref="878819927" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="878819926" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="5" changeset="95638024" timestamp="2020-12-10T19:40:18Z" user="David Sanderson" uid="7679993">
<member type="way" ref="878819926" role=""/>
<member type="way" ref="878819927" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="877632846" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="6" changeset="127640460" timestamp="2022-10-16T22:54:39Z" user="SMS03" uid="15395108">
<member type="way" ref="878819926" role=""/>
<member type="way" ref="1104527478" role=""/>
<member type="way" ref="878819927" role=""/>
<member type="way" ref="1104527481" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="877632846" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
</osm>
Let's capture a bit more of this response including version
and uid
(user-id)
).explode("changeset_data") | ||
|
||
# Display the resulting DataFrame | ||
return changeset_pl_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure pre-commit hooks are installed and then run pre-commit run --all
. This might fix the failing CI runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yah the CI failures are on pre-commit.
lambda url: process_batch(pl.DataFrame({"osm_url": [url]})), | ||
return_dtype=pl.Object | ||
).alias("changeset_data") | ||
).explode("changeset_data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect, could you help me @dhimmel on the polars syntax to parallelize / collect the calls to process_batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, will help. But I think first we should collect all XML responses prior to polars with some sort of persistent caching. We could proceed with a dev sample of 100 or so OSM elements. Once we have a database/file with the XML, we can then figure read all records into polars, but it will make more sense to handle requests outside of the polars dataframe creation I think.
refs #13
…how changeset_pl_df parallelizes ... still testing this in batch. tested the other methods on, for example: https://www.openstreetmap.org/relation/6981791