-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial predictions model #119
Conversation
Thanks for looking into this! Is there a plan for how predictions data would be used in the app? Since NextBus only provides an API for predictions per stop, I'm worried that Nextbus would block our IP addresses if we frequently pull predictions for all ~6000 stops. The nice thing about the NextBus vehicleLocations API is we only need to make 1 API request to get the data for all routes.
|
@youngj that's a really good point. @exxonvaldez and I chatted about this a little bit, we were thinking it would be interesting to have this data to visualize the accuracy of predictions and allow us to rate metro lines with this as an additional data point. I think it gives us a unique metric to compare lines with - arrival data gives us an idea of the current movement of trains, the schedule allows us to see how much expected vs actual times differ, but predictions allow us to adjust expectations based on day-to-day delays (e.g. the schedule will be thrown off drastically if a bus breaks down, but predictions will show us how the schedule adjusts in this case). Terence was talking about Mary (?) charts to help visualize this data. @exxonvaldez could you add some of your thoughts as well? |
Also found this endpoint - https://gist.github.com/grantland/7cf4097dd9cdf0dfed14#predictions-for-multi-stops |
From an end user perspective, the question is "how much can I trust the predictions?" From my own experience, I know that predictions for stops near the start of a line are much less accurate, and I'm not sure if they can be trusted at all. Prediction accuracy includes:
Assuming we can scrape the predictions, my main concern is that the predictions are just another view on the location data. That is, they just take the vehicle locations and use the schedule to determine how long it should take between stops, doing some interpolation as needed. They might naively assume vehicles turn around immediately at the ends of lines (when in practice they might go out of service or take a lengthy break), and they might assume vehicles will just appear at the start of lines at the right intervals. If it turns out the predictions are actually interesting in their own right and more than just vehicle locations with schedules applied to them, then it should be possible to generate a bunch of stats as well as Marey chart-like plots of how predictions and vehicles linearly converge on stops over time. The plots would also show when a vehicle deviates from its prediction. |
Thanks for working on this, as this has been a request for a quite a while (see trynmaps/orion#32, @frhino was especially interested in this). The initial idea was to add it in Orion (it's in nodejs, but you're welcome to add a new python script that stores everything in S3 as Orion currently does), which is what's currently fetching all the vehicle locations. However, we'd probably want to deploy it separately given the risk of having the IP blocked (as to not prevent Orion from updating vehicle locations). Also, the predictions have an |
6f4bf4d
to
8794275
Compare
b6ea28e
to
3942d92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems worth investigating the GTFS-realtime feed on https://511.org/developers/list/apis/ mentioned by Terence in Slack. Maybe it would be easier to work with than the Nextbus predictions API in the long run
809d373
to
a203b6a
Compare
Reopening this, at least for testing purposes. |
b75b777
to
cbbde5e
Compare
Messy stuff. Thanks for your patience & determination! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, a few comments related to making the code easier to read and use.
|
||
def gen_prediction(prediction, route_id, stop_id, queried_time): | ||
vehicle = prediction['vehicle'] | ||
minutes = prediction['minutes'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextbus API also returns a seconds
field - may be better to store that instead of minutes since just in case the extra precision is useful later on.
Could also store affectedByLayover and block in case they are useful later on.
|
||
def create_prediction_request_for_route(agency_id: str, route_id: str, stops: list) -> str: | ||
stops = [f"{route_id}|{stop_id}" for stop_id in stops] | ||
stop_str = STOPS_STR + STOPS_STR.join(stops) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps would be easier to understand as:
stop_str = "&".join([f"stops={route_id}|{stop_id}" for stop_id in stops])
and then a={agency_id}&{stop_str}
on the next line
|
||
def parse_prediction_response(queried_time, resp_json: dict) -> list: | ||
predictions_for_route = [] | ||
predictions_by_route = resp_json['predictions'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having variables named predictions_for_route
and predictions_by_route
in the same function could be confusing, probably easiest to remove predictions_by_route by replacing it with resp_json['predictions'] on the next line.
return os.path.join(util.get_data_dir(), f"predictions_{agency}_{datetime_str}.json") | ||
|
||
|
||
def get_predictions_for_route(agency: str, route_id: str) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend moving this function to nextbus.py, to indicate that this function calls Nextbus API. Also this would allow predictions.py not to have any Nextbus specific code, making it easier to reuse it in the future for other transit data providers besides Nextbus.
@@ -0,0 +1,68 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend moving the content of nextbus_predictions.py into nextbus.py. It doesn't seem necessary to have a separate file for this, and using nextbus.py would make it more natural to use, e.g.
import nextbus
predictions = nextbus.get_predictions_for_route('sf-muni','27')
We should merge this soon. Anyone want to address Jesse's last few comments and merge this in? |
Wanted to get a Prediction(s) model down for comments - this pulls data from the nextbus predictions api. Eventually I want it to cache/store data similarly to how
RouteConfig
data is stored. This is a WIP but please leave comments & suggestions.