-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geocode during ETL #139
Comments
Did some digging, here are the results so far. The SituationThe prototype we created so far uses the Nominatum API, which is ultimately powered by Open Street Maps. This is fine, but they have a usage policy which makes it fairly clear that they don't really want full blown ETL pipelines built against it. That said our data volume is on the lower end of things (e.g. thousands of addresses rather than millions). This means it might be possible for us to use the OSM Nominatum API while staying in the spirit of their terms, but it would would require a bit of engineering (in particular: caching and the ability to handle unexpected application of rate limits for larger batches). The Nominatum API also might dramatically slow the ETL pipeline -- their spec says they want no more than 1 request per second (though later it does say small batch jobs are OK as well). Some Options1: Use the OSM Nominatum APIUsing the existing API, as the prototype does, can be done. There is a bit of overhead associated with being a good FOSS citizen (and of course we risk them shutting down our requests if we don't follow their rules). Specifically, we would want to:
2: Self host the Nominatum APIThis can be done! It is a big undertaking and we won't want to do it, but just in case somehow that changes, the instructions are here. 3: Use a paid APIThere are a few third party APIs some of which use Nominatum under the hood plus their own mix of spices / other FOSS tools. Ideally we could use something that GeoPy supports out of the box, that way it's easy to swap out something truly FOSS (e.g. Nominatum) at any point in time. |
Good news everyone -- It looks like OpenCage is supported by GeoPy. I don't think the service itself is FOSS, though they do publish a lot of their code. Importantly: switching geocode provider should be pretty darn simple thanks to GeoPy. Very importantly, the data is open, which is important too. |
The GeocodeAdder is an InformationAdder which will generate a lat / lng pair for a given address in a proposal. The adder is able to combine several separate columns into one, since our addresses tend to be split into parts. Issue #139
This geocodes the various addresses associated with proposals in the LLIIA2020 competition. Issue #139
The GeocodeAdder is an InformationAdder which will generate a lat / lng pair for a given address in a proposal. The adder is able to combine several separate columns into one, since our addresses tend to be split into parts. Issue #139
This geocodes the various addresses associated with proposals in the LLIIA2020 competition. Issue #139
This geocodes the various addresses associated with proposals in the LLIIA2020 competition. Issue #139
SimpleMaps is a mediawiki plugin which will convert certain wiki tables to Leaflet maps. This adds a new map TOC in the SimpleMap format. Issue #139
Geocoding is an expensive process, and sometimes we want to be able to run local ETL scripts that generate geo data but don't actually run against a geocoder. The `debug` flag enables this functionality. This is intended to only be used for local testing, and the flag may disappear in future. Issue #139
SimpleMaps is a mediawiki plugin which will convert certain wiki tables to Leaflet maps. This adds a new map TOC in the SimpleMap format. Issue #139
Geocoding is an expensive process, and sometimes we want to be able to run local ETL scripts that generate geo data but don't actually run against a geocoder. The `debug` flag enables this functionality. This is intended to only be used for local testing, and the flag may disappear in future. Issue #139
As part of our maps exploration we spent some time geocoding data; there's a desire to actually do that geocoding as part of the ETL pipeline.
This would give us a few things:
As part of this issue we should try to leverage the R&D done in that analysis repository, though it may turn out that the tools used there aren't a perfect fit for our ETL.
The text was updated successfully, but these errors were encountered: