-
Notifications
You must be signed in to change notification settings - Fork 273
map matching 'quality' test suite #89
Comments
There are many taxi tracks from NYC http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml and there are many on OSM itself: http://wiki.openstreetmap.org/wiki/Planet.gpx or GPSies |
The performance test suite could be also used for this. E.g. counting the errors and differences of the random routes or using different randomness etc |
We should check to which degree the matched route goes through the same sequence of edge/nodes as the expected route. To this end we could compute the Levenshtein distance as a similarity measure between the expected and the actual node/edge sequence. |
Newson & Krumm suggest a different metric in Chapter 5, which is simpler but probably still sufficient:
With this definition it is possible to have an error greater than 1. But I don't think that this is a real problem. Another useful variant would be to use the number of road segments for d_minus, d_plus and d0 instead of their length. |
Agreed that, assuming 'normal' behaviour, it is probably fine. If it's not 'normal' - e.g. non-sequential - then this could be a very poor metric. A Levenshtein approach may handle these cases better. That said, we may need to write something similar to, but not, Levenshtein. For example, transpositions are allowed (with the same cost as adding/removing). This makes sense in e.g. spelling detection, but probably not in routes - edges have the additional constraint of connectivity (which words largely don't - I think). So we'd either need to not consider transitions, or treat them differently. Similar applies for changing edges. So, it's possibly that our Levenshtein could end up only really considering additions/subtractions (in the 'normal' case) which is equivalent to the @karussell's suggestion. As an aside - if we do go with Levenshtein, we should weight be edge length, not just counts. Another thought - what about the area between the curves? We'd probably have time as a dimension (so area includes differences in time), so that even being at the point isn't necessarily good (unless you're also there at the same time). (Actually, the latter is something we'd need to consider in the above approaches too.) In terms of calculating the area, we can probably come up with a suitable proxy similar to numerical integration - e.g. if both route and ground truth had 100 nodes, we'd just sum up the difference (including time) of each node pair. |
As discussed here it'd be good to have a way of testing how 'good' our map matching is by comparing it with known results. (I.e. our map-matching on a GPX track gives a certain route, and how close does that align to the actual known route taken?)
Considerations/comments (to direct a PR):
The text was updated successfully, but these errors were encountered: