Skip to content

Latest commit

 

History

History
154 lines (102 loc) · 9.79 KB

README.md

File metadata and controls

154 lines (102 loc) · 9.79 KB

Tutorial: First Steps using the OSHDB-API

The OSHDB-API is a Java-API that allows one to run queries on the OSM history data.

The programming interface is based on the MapReduce programming model which divides any analysis into different steps:

  • OSM data is searched spatially, temporally and by its attributes.
    example: return all OSM ways which are tagged as “building” in a specific area in yearly steps between 2012 and 2021
  • Intermediate results are calculated for each OSM entity at each requested timestamp
    example: return the building's footprint area
  • The final result is calculated by combining the intermediate results together using one of the aggregation methods provided by the OSHDB-API.
    example: calculate the sum of the values generated in the previous step.

This first steps tutorial will guide you through each of these steps individually.

1. Download an OSHDB extract

For your first steps, we recommend that you download one of the available OSHDB data extracts from our download server. We will use this data file to run a simple analysis query on it.

2. Add maven dependency

If you already have an existing Java maven project, the OSHDB-API can be added to the project by including it in the dependencies of the project's pom.xml file:

<dependency>
  <groupId>org.heigit.ohsome</groupId>
  <artifactId>oshdb-api</artifactId>
  <version>1.2.3</version>
</dependency>

Note that the OSHDB requires Java 17, so it could sometimes be necessary to specify this additional restriction in the pom.xml as well. Take a look at this example pom.xml file that shows how all these settings should be put together.

If you're starting a new OSHDB project from scratch, it's typically a good idea to create a new maven project using the “create new project” wizard of your IDE of choice. After that you can use the steps described above to add OSHDB as a dependency to the new project.

3. Start writing the OSHDB query

Now, we're ready to go and use the OSHDB to query the OSM history data. If we're starting a new standalone OSHDB query, it makes sense to add a main class and function to the project now. There, we create a new OSHDB database connection:

OSHDBDatabase oshdb = new OSHDBH2("path/to/extract.oshdb");
🛈 Alternatively, we provide simple connection helper.

4. Select OSHDB view

The next step is to decide which kind of analysis we want to perform on the OSM history data. Two different analysis views are provided by the OSHDB:

  • The snapshot view returns the state of the OSM data at specific points in time. This view is for example useful to determine how the amount of OSM data (such as the length of the road network, or the number of buildings) changed over time.
  • The contribution view returns all modifications (e.g. creations, modifications or deletions) to OSM elements within a given time period. This view can be used to get information about the number of OSM contributors editing the OSM data.

We will use the snapshot view in this tutorial, but the contribution view can be used in a very similar manner.

OSMEntitySnapshotView.on(oshdb)

Note that this is not a complete JAVA code line yet. It will be extended in the following steps to include statements that filter, transform and aggregate the OSM history data.

5. Setting spatial and temporal extents

Every query must specify a spatial extent and a timestamp (or time range) for which the OSM data should be analyzed. This can be done by calling the methods areaOfInterest and timestamps on the object we got from the View in the previous step. The spatial extent can either be defined as a rectangular bounding box, or as an arbitrary polygonal JTS geometry object. In our example, we will use a simple bounding box. The timestamp can be defined by providing a date-time string.

    .areaOfInterest(OSHDBBoundingBox.bboxWgs84Coordinates(8.6634, 49.3965, 8.7245, 49.4268))
    .timestamps("2021-01-01")

Note that the OSHDB expects (and returns) coordinates in the cartesian XY order: longitude first, then latitude. So, the parameters for specifying a bounding box in the OSHDB are in the following order: left, bottom, right, top (or: west, south, east, north). Just like the OpenStreetMap data, the OSHDB also works directly with coordinates in WGS84 coordinates (EPSG:4326) of longitude and latitude. To quickly get the coordinates of a bounding box in this format, we recommend the following online tool: https://norbertrenner.de/osm/bbox.html

For now, we only define a single timestamp in our query. The OSHDB also supports querying the OSM history data for multiple timestamps at once. For example, one can analyze the data in yearly steps between 2012 and 2021. At the end of this tutorial we will show what (few) changes are necessary to let our query generate the results for many timestamps at once.

6. Filtering OSM data

In our example, we only want to look at OSM way objects which have the building tag. To filter these objects for our query, we add the following statements to our query:

    .filter("type:way and building=*")

There are a variety of available filter selectors which can be combined into a filter string: each one specifies a property which OSM objects can have. These selectors can be combined into a filter string using boolean operators and parentheses. If multiple filters are set, the result will contain only the OSM objects which match all given filters.

7. Calculating intermediate results

Now we can access the properties of all the OSM way objects that have the building tag. Since we specified the snapshot view, each matching OSM objects will potentially be returned multiple times: once for each timestamp we requested.

In a so called map step (of the MapReduce programming model), where these snapshot objects can be accessed and transformed to any result one desires. For example, we can calculate the area of an OSM object's geometry in the following way.

    .map(snapshot -> Geo.areaOf(snapshot.getGeometry()))

8. Filtering intermediate results

One can optionally also filter the intermediate results generated in the map step. This can be used for many queries, but here, we use this to filter out outliers in the data (i.e. buildings that are larger than a fixed value of 1000 m²).

    .filter(area -> area < 1000.0)

9. Requesting aggregated result

Now we only have to do one final operation, which is letting the OSHDB know how it should aggregate the intermediate results into a final result value. There are several predefined functions available for this, for example sum, average, count, countUniq, …. This is the reduce step of the MapReduce programming model.

     .sum();

Calling this function will instruct the OSHDB to start the processing of the OSM history data, apply all filter, transformation (map) and aggregation (reduce) functions and will return the final result.

10. Summary

To summarize, here is the code of our first query:

OSHDBDatabase oshdb = new OSHDBH2("path/to/extract.oshdb");
Number result = OSMEntitySnapshotView.on(oshdb)
    .areaOfInterest(OSHDBBoundingBox.bboxWgs84Coordinates(8.6634,49.3965,8.7245,49.4268))
    .timestamps("2021-01-01")
    .filter("type:way and building=*")
    .map(snapshot -> Geo.areaOf(snapshot.getGeometry()))
    .filter(area -> area < 1000.0)
    .sum();
System.out.println(result);

This example prints the result (approximately 1360069.5) to the terminal console. Alternatively, it could also be stored in a file, database or be used to generate graphics.

11. Multiple Timestamps

As mentioned earlier in this tutorial, the OSHDB also allows one to generate results for multiple timestamps at once. To achieve this, a few changes are necessary to our query: First, we have to actually tell the OSHDB to query more than one timestamp. One can for example specify a start and end date with regular time interval: .timestamps("2012-01-01", "2021-01-01", Interval.YEARLY). Then, we have to tell the OSHDB that we want our result for each timestamp individually. We do this by calling .aggregateByTimestamp() at some point in our query, before the final reduce operation is executed. This will change the data type of the result of our query from being a single number, to an object that contains an individual number result for each timestamp:

OSHDBDatabase oshdb = new OSHDBH2("path/to/extract.oshdb");
SortedMap<OSHDBTimestamp, Number> result = OSMEntitySnapshotView.on(oshdb)
    .areaOfInterest(OSHDBBoundingBox.bboxWgs84Coordinates(8.6634,49.3965,8.7245,49.4268))
    .timestamps("2012-01-01", "2021-01-01", Interval.YEARLY)
    .filter("type:way and building=*")
    .map(snapshot -> Geo.areaOf(snapshot.getGeometry()))
    .filter(area -> area < 1000.0)
    .aggregateByTimestamp()
    .sum();
System.out.println(result);

The result from this query is visualized in the following graph:

BuildingAreaHeidelberg

12. Next steps

That's it for our first-steps tutorial. Of course there are many more options and features to explore in the OSHDB. For example how the contribution view lets you analyze each modification to the OSM objects individually, more advanced filtering options, or other concepts like the flatMap function, custom aggregateBy and reduce operations, etc.