Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get changsets for the osm ways and relations #27

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nbarbier-s2
Copy link
Contributor

@nbarbier-s2 nbarbier-s2 commented Dec 24, 2024

refs #13

…how changeset_pl_df parallelizes ... still testing this in batch. tested the other methods on, for example: https://www.openstreetmap.org/relation/6981791

…how changeset_pl_df parallelizes ... still testing this in batch. tested the other methods on, for example: https://www.openstreetmap.org/relation/6981791
@nbarbier-s2
Copy link
Contributor Author

Screenshot 2024-12-24 at 1 16 15 AM

@nbarbier-s2
Copy link
Contributor Author

Maybe you can run it on a cloud instance or something and put the file up. Will take a while to do all the Api calls.


import re

def parse_osm_url(url: str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type annotation missing

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this file to openstreetmap.py


def fetch_changeset_history(osm_type: str, osm_id: int):
"""Fetch the changeset history for a given OSM entity."""
url = f"https://api.openstreetmap.org/api/0.6/{osm_type}/{osm_id}/history"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like our usage is borderline according to their policies https://operations.osmfoundation.org/policies/api/.

If going this route, we should rate limit requests and set up some method of caching. For a complex decision with many ramifications, like the caching method, consider discussion on multiple options, either here or in the issue prior to implementation.

We should identify ourselves as well, similar to

headers = {
"From": "https://github.com/dhimmel/openskistats",
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will think about this a bit more, and discuss.

"""Fetch the changeset history for a given OSM entity."""
url = f"https://api.openstreetmap.org/api/0.6/{osm_type}/{osm_id}/history"
response = requests.get(url)
response.raise_for_status()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm including the response from https://api.openstreetmap.org/api/0.6/relation/6981791/history for reference:

Expand for xml response
<osm version="0.6" generator="openstreetmap-cgimap 2.0.1 (3288085 spike-08.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
<relation id="6981791" visible="true" version="1" changeset="46092405" timestamp="2017-02-14T23:01:29Z" user="BK_man" uid="242352">
<member type="way" ref="474283680" role=""/>
<member type="way" ref="474283683" role=""/>
<member type="way" ref="474283679" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="2" changeset="46385760" timestamp="2017-02-25T05:49:23Z" user="BK_man" uid="242352">
<member type="way" ref="474283680" role=""/>
<member type="way" ref="474283679" role=""/>
<member type="way" ref="474283683" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="3" changeset="94749008" timestamp="2020-11-25T06:09:29Z" user="David Sanderson" uid="7679993">
<member type="way" ref="474283680" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="4" changeset="94970016" timestamp="2020-11-29T05:36:54Z" user="David Sanderson" uid="7679993">
<member type="way" ref="878819927" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="878819926" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="5" changeset="95638024" timestamp="2020-12-10T19:40:18Z" user="David Sanderson" uid="7679993">
<member type="way" ref="878819926" role=""/>
<member type="way" ref="878819927" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="877632846" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
<relation id="6981791" visible="true" version="6" changeset="127640460" timestamp="2022-10-16T22:54:39Z" user="SMS03" uid="15395108">
<member type="way" ref="878819926" role=""/>
<member type="way" ref="1104527478" role=""/>
<member type="way" ref="878819927" role=""/>
<member type="way" ref="1104527481" role=""/>
<member type="way" ref="474283680" role=""/>
<member type="way" ref="877632846" role=""/>
<tag k="name" v="11. Warming Hut"/>
<tag k="piste:difficulty" v="easy"/>
<tag k="piste:type" v="nordic"/>
<tag k="route" v="piste"/>
<tag k="type" v="route"/>
</relation>
</osm>

Let's capture a bit more of this response including version and uid (user-id)

).explode("changeset_data")

# Display the resulting DataFrame
return changeset_pl_df
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure pre-commit hooks are installed and then run pre-commit run --all. This might fix the failing CI runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yah the CI failures are on pre-commit.

lambda url: process_batch(pl.DataFrame({"osm_url": [url]})),
return_dtype=pl.Object
).alias("changeset_data")
).explode("changeset_data")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, could you help me @dhimmel on the polars syntax to parallelize / collect the calls to process_batch?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, will help. But I think first we should collect all XML responses prior to polars with some sort of persistent caching. We could proceed with a dev sample of 100 or so OSM elements. Once we have a database/file with the XML, we can then figure read all records into polars, but it will make more sense to handle requests outside of the polars dataframe creation I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants