Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial predictions model #119

Closed
wants to merge 4 commits into from
Closed

Initial predictions model #119

wants to merge 4 commits into from

Conversation

vicky11z
Copy link

Wanted to get a Prediction(s) model down for comments - this pulls data from the nextbus predictions api. Eventually I want it to cache/store data similarly to how RouteConfig data is stored. This is a WIP but please leave comments & suggestions.

@vicky11z vicky11z requested review from youngj and jtanquil May 26, 2019 05:28
@youngj
Copy link
Contributor

youngj commented May 29, 2019

Thanks for looking into this! Is there a plan for how predictions data would be used in the app? Since NextBus only provides an API for predictions per stop, I'm worried that Nextbus would block our IP addresses if we frequently pull predictions for all ~6000 stops. The nice thing about the NextBus vehicleLocations API is we only need to make 1 API request to get the data for all routes.

  • the code in nextbus_predictions.py could be added to nextbus.py rather than adding a new file
  • it's much easier to use Nexbus's undocumented JSON API instead of their documented XML API -- just replace publicXMLFeed with publicJSONFeed

@vicky11z
Copy link
Author

@youngj that's a really good point. @exxonvaldez and I chatted about this a little bit, we were thinking it would be interesting to have this data to visualize the accuracy of predictions and allow us to rate metro lines with this as an additional data point. I think it gives us a unique metric to compare lines with - arrival data gives us an idea of the current movement of trains, the schedule allows us to see how much expected vs actual times differ, but predictions allow us to adjust expectations based on day-to-day delays (e.g. the schedule will be thrown off drastically if a bus breaks down, but predictions will show us how the schedule adjusts in this case). Terence was talking about Mary (?) charts to help visualize this data.

@exxonvaldez could you add some of your thoughts as well?

@vicky11z
Copy link
Author

vicky11z commented May 31, 2019

Also found this endpoint - https://gist.github.com/grantland/7cf4097dd9cdf0dfed14#predictions-for-multi-stops
that might help us minimize some of the calls since we can call all stops & trains in this way instead of each one individually.

@exxonvaldez
Copy link
Contributor

exxonvaldez commented May 31, 2019

From an end user perspective, the question is "how much can I trust the predictions?" From my own experience, I know that predictions for stops near the start of a line are much less accurate, and I'm not sure if they can be trusted at all.

Prediction accuracy includes:

  • How often does the vehicle arrive as predicted or early or late?
  • How often does a predicted vehicle disappear, or a new vehicle suddenly appear?
  • How often do predictions not decrease linearly with time?

Assuming we can scrape the predictions, my main concern is that the predictions are just another view on the location data. That is, they just take the vehicle locations and use the schedule to determine how long it should take between stops, doing some interpolation as needed. They might naively assume vehicles turn around immediately at the ends of lines (when in practice they might go out of service or take a lengthy break), and they might assume vehicles will just appear at the start of lines at the right intervals.

If it turns out the predictions are actually interesting in their own right and more than just vehicle locations with schedules applied to them, then it should be possible to generate a bunch of stats as well as Marey chart-like plots of how predictions and vehicles linearly converge on stops over time. The plots would also show when a vehicle deviates from its prediction.

@EddyIonescu
Copy link
Member

Thanks for working on this, as this has been a request for a quite a while (see trynmaps/orion#32, @frhino was especially interested in this). The initial idea was to add it in Orion (it's in nodejs, but you're welcome to add a new python script that stores everything in S3 as Orion currently does), which is what's currently fetching all the vehicle locations. However, we'd probably want to deploy it separately given the risk of having the IP blocked (as to not prevent Orion from updating vehicle locations).

Also, the predictions have an isAffectedByLayover field (unfortunately there aren't any transit apps that show this, but it'd be really useful), which should address the issue that predictions tend to only really be accurate for vehicles already on their trip (this is mostly due to the operator shortage, where supervisors shift terminal departure times on-the-fly in order to reduce service gaps, as well as operators starting early or late because they feel like it).

@vicky11z vicky11z force-pushed the vz-call-predictions-api branch from 6f4bf4d to 8794275 Compare June 13, 2019 03:25
@vicky11z
Copy link
Author

@youngj or @jtanquil could you take a look at this when you have the time? it's ready for review finally 🎉

@vicky11z vicky11z force-pushed the vz-call-predictions-api branch from b6ea28e to 3942d92 Compare June 13, 2019 04:03
Copy link
Contributor

@youngj youngj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems worth investigating the GTFS-realtime feed on https://511.org/developers/list/apis/ mentioned by Terence in Slack. Maybe it would be easier to work with than the Nextbus predictions API in the long run

models/nextbus_predictions.py Outdated Show resolved Hide resolved
models/nextbus_predictions.py Outdated Show resolved Hide resolved
models/nextbus_predictions.py Outdated Show resolved Hide resolved
models/nextbus_predictions.py Outdated Show resolved Hide resolved
models/nextbus_predictions.py Outdated Show resolved Hide resolved
models/predictions.py Show resolved Hide resolved
models/nextbus_predictions.py Outdated Show resolved Hide resolved
models/nextbus_predictions.py Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
models/nextbus_predictions.py Show resolved Hide resolved
@vicky11z
Copy link
Author

Reopening this, at least for testing purposes.

@vicky11z vicky11z requested a review from youngj July 25, 2019 04:05
@vicky11z vicky11z force-pushed the vz-call-predictions-api branch from b75b777 to cbbde5e Compare July 25, 2019 04:06
@hathix
Copy link
Member

hathix commented Jul 25, 2019

Messy stuff. Thanks for your patience & determination!

Copy link
Contributor

@youngj youngj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, a few comments related to making the code easier to read and use.


def gen_prediction(prediction, route_id, stop_id, queried_time):
vehicle = prediction['vehicle']
minutes = prediction['minutes']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nextbus API also returns a seconds field - may be better to store that instead of minutes since just in case the extra precision is useful later on.

Could also store affectedByLayover and block in case they are useful later on.


def create_prediction_request_for_route(agency_id: str, route_id: str, stops: list) -> str:
stops = [f"{route_id}|{stop_id}" for stop_id in stops]
stop_str = STOPS_STR + STOPS_STR.join(stops)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps would be easier to understand as:

stop_str = "&".join([f"stops={route_id}|{stop_id}" for stop_id in stops])

and then a={agency_id}&{stop_str} on the next line


def parse_prediction_response(queried_time, resp_json: dict) -> list:
predictions_for_route = []
predictions_by_route = resp_json['predictions']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having variables named predictions_for_route and predictions_by_route in the same function could be confusing, probably easiest to remove predictions_by_route by replacing it with resp_json['predictions'] on the next line.

return os.path.join(util.get_data_dir(), f"predictions_{agency}_{datetime_str}.json")


def get_predictions_for_route(agency: str, route_id: str) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend moving this function to nextbus.py, to indicate that this function calls Nextbus API. Also this would allow predictions.py not to have any Nextbus specific code, making it easier to reuse it in the future for other transit data providers besides Nextbus.

@@ -0,0 +1,68 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend moving the content of nextbus_predictions.py into nextbus.py. It doesn't seem necessary to have a separate file for this, and using nextbus.py would make it more natural to use, e.g.

import nextbus
predictions = nextbus.get_predictions_for_route('sf-muni','27')

@hathix hathix added this to the Predictions milestone Aug 8, 2019
@hathix
Copy link
Member

hathix commented Sep 12, 2019

We should merge this soon. Anyone want to address Jesse's last few comments and merge this in?

@hathix hathix requested a review from youngj September 12, 2019 04:25
@akgupta89
Copy link
Member

This is currently abandoned. @youngj or @jtanquil do you want to finish this off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants