Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with misaligned triplines #13

Open
ResidentMario opened this issue Nov 10, 2019 · 2 comments
Open

Deal with misaligned triplines #13

ResidentMario opened this issue Nov 10, 2019 · 2 comments

Comments

@ResidentMario
Copy link
Owner

A highly visible parsing artifact are trips which are for some reason coded as going linearly and then backwards in time:

Screen Shot 2019-11-09 at 10 48 31 PM

@ResidentMario ResidentMario changed the title Deal with misaligned trip Deal with misaligned triplines Nov 10, 2019
@ResidentMario
Copy link
Owner Author

This issue is due to a data error on the provider's part. In certain cases, the first message in the feed is provided with an incomplete stop sequence—specifically, one missing the first stop in the route. This is then corrected in the follow-up feed message. Here is a minimal failure case I found whilst looking into this issue:

>>> bad_msg_pack[0]['trip_update']['trip_update']['stop_time_update'][:5]
[{'stop_id': '702S', 'arrival': 1559384850, 'departure': 1559384850},
 {'stop_id': '705S', 'arrival': 1559384970, 'departure': 1559384970},
 {'stop_id': '706S', 'arrival': 1559385030, 'departure': 1559385030},
 {'stop_id': '707S', 'arrival': 1559385090, 'departure': 1559385090},
 {'stop_id': '708S', 'arrival': 1559385180, 'departure': 1559385180}]

>>> bad_msg_pack[1]['trip_update']['trip_update']['stop_time_update'][:5]
[{'stop_id': '701S', 'arrival': 1559384834, 'departure': 1559384834},
 {'stop_id': '702S', 'arrival': 1559384964, 'departure': 1559384964},
 {'stop_id': '705S', 'arrival': 1559385084, 'departure': 1559385084},
 {'stop_id': '706S', 'arrival': 1559385144, 'departure': 1559385144},
 {'stop_id': '707S', 'arrival': 1559385204, 'departure': 1559385204}]

Inserting new stops after-the-fact in this way breaks the station sequence logic in synthesize_route:

['702S',
 '705S',
 '706S',
 ...,
 '725S',
 '701S',
 '726S']

If this failure was silent, it'd be hard to know what to do about it. Luckily in the case of the MTA 7 feed every message having this problem has another schema violation (lol)—a timestamp on the first vehicle update that is set to zero. By removing messages with this schema violation we can also get rid of this style of data error.

@ResidentMario
Copy link
Owner Author

This turns out to be a very difficult error to recover from. The zero timestamp is extremely common in the dataset for some reason and leads to extremely high fragmentation in the trip-line data. The additional code complexity and time cost in gtfs_tripify required to build an algorithm for heuristically determining that this is happening is not worth the effort. Fix your feed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant