Initial predictions model #119

vicky11z · 2019-05-26T05:28:31Z

Wanted to get a Prediction(s) model down for comments - this pulls data from the nextbus predictions api. Eventually I want it to cache/store data similarly to how RouteConfig data is stored. This is a WIP but please leave comments & suggestions.

youngj · 2019-05-29T03:52:10Z

Thanks for looking into this! Is there a plan for how predictions data would be used in the app? Since NextBus only provides an API for predictions per stop, I'm worried that Nextbus would block our IP addresses if we frequently pull predictions for all ~6000 stops. The nice thing about the NextBus vehicleLocations API is we only need to make 1 API request to get the data for all routes.

the code in nextbus_predictions.py could be added to nextbus.py rather than adding a new file
it's much easier to use Nexbus's undocumented JSON API instead of their documented XML API -- just replace publicXMLFeed with publicJSONFeed

vicky11z · 2019-05-31T04:21:08Z

@youngj that's a really good point. @exxonvaldez and I chatted about this a little bit, we were thinking it would be interesting to have this data to visualize the accuracy of predictions and allow us to rate metro lines with this as an additional data point. I think it gives us a unique metric to compare lines with - arrival data gives us an idea of the current movement of trains, the schedule allows us to see how much expected vs actual times differ, but predictions allow us to adjust expectations based on day-to-day delays (e.g. the schedule will be thrown off drastically if a bus breaks down, but predictions will show us how the schedule adjusts in this case). Terence was talking about Mary (?) charts to help visualize this data.

@exxonvaldez could you add some of your thoughts as well?

vicky11z · 2019-05-31T04:21:54Z

Also found this endpoint - https://gist.github.com/grantland/7cf4097dd9cdf0dfed14#predictions-for-multi-stops
that might help us minimize some of the calls since we can call all stops & trains in this way instead of each one individually.

exxonvaldez · 2019-05-31T05:23:50Z

From an end user perspective, the question is "how much can I trust the predictions?" From my own experience, I know that predictions for stops near the start of a line are much less accurate, and I'm not sure if they can be trusted at all.

Prediction accuracy includes:

How often does the vehicle arrive as predicted or early or late?
How often does a predicted vehicle disappear, or a new vehicle suddenly appear?
How often do predictions not decrease linearly with time?

Assuming we can scrape the predictions, my main concern is that the predictions are just another view on the location data. That is, they just take the vehicle locations and use the schedule to determine how long it should take between stops, doing some interpolation as needed. They might naively assume vehicles turn around immediately at the ends of lines (when in practice they might go out of service or take a lengthy break), and they might assume vehicles will just appear at the start of lines at the right intervals.

If it turns out the predictions are actually interesting in their own right and more than just vehicle locations with schedules applied to them, then it should be possible to generate a bunch of stats as well as Marey chart-like plots of how predictions and vehicles linearly converge on stops over time. The plots would also show when a vehicle deviates from its prediction.

EddyIonescu · 2019-06-04T22:06:05Z

Thanks for working on this, as this has been a request for a quite a while (see trynmaps/orion#32, @frhino was especially interested in this). The initial idea was to add it in Orion (it's in nodejs, but you're welcome to add a new python script that stores everything in S3 as Orion currently does), which is what's currently fetching all the vehicle locations. However, we'd probably want to deploy it separately given the risk of having the IP blocked (as to not prevent Orion from updating vehicle locations).

Also, the predictions have an isAffectedByLayover field (unfortunately there aren't any transit apps that show this, but it'd be really useful), which should address the issue that predictions tend to only really be accurate for vehicles already on their trip (this is mostly due to the operator shortage, where supervisors shift terminal departure times on-the-fly in order to reduce service gaps, as well as operators starting early or late because they feel like it).

vicky11z · 2019-06-13T04:00:23Z

@youngj or @jtanquil could you take a look at this when you have the time? it's ready for review finally 🎉

youngj

Seems worth investigating the GTFS-realtime feed on https://511.org/developers/list/apis/ mentioned by Terence in Slack. Maybe it would be easier to work with than the Nextbus predictions API in the long run

models/nextbus_predictions.py

models/predictions.py

models/nextbus_predictions.py

.gitignore

models/nextbus_predictions.py

vicky11z · 2019-07-25T04:05:24Z

Reopening this, at least for testing purposes.

hathix · 2019-07-25T04:13:50Z

Messy stuff. Thanks for your patience & determination!

youngj

Looks pretty good, a few comments related to making the code easier to read and use.

youngj · 2019-07-28T00:36:37Z

models/nextbus_predictions.py

+
+def gen_prediction(prediction, route_id, stop_id, queried_time):
+    vehicle = prediction['vehicle']
+    minutes = prediction['minutes']


Nextbus API also returns a seconds field - may be better to store that instead of minutes since just in case the extra precision is useful later on.

Could also store affectedByLayover and block in case they are useful later on.

youngj · 2019-07-28T00:42:24Z

models/nextbus_predictions.py

+
+def create_prediction_request_for_route(agency_id: str, route_id: str, stops: list) -> str:
+    stops = [f"{route_id}|{stop_id}" for stop_id in stops]
+    stop_str = STOPS_STR + STOPS_STR.join(stops)


Perhaps would be easier to understand as:

stop_str = "&".join([f"stops={route_id}|{stop_id}" for stop_id in stops])

and then a={agency_id}&{stop_str} on the next line

youngj · 2019-07-28T00:46:50Z

models/nextbus_predictions.py

+
+def parse_prediction_response(queried_time, resp_json: dict) -> list:
+    predictions_for_route = []
+    predictions_by_route = resp_json['predictions']


Having variables named predictions_for_route and predictions_by_route in the same function could be confusing, probably easiest to remove predictions_by_route by replacing it with resp_json['predictions'] on the next line.

youngj · 2019-07-28T00:53:27Z

models/predictions.py

+    return os.path.join(util.get_data_dir(), f"predictions_{agency}_{datetime_str}.json")
+
+
+def get_predictions_for_route(agency: str, route_id: str) -> list:


Recommend moving this function to nextbus.py, to indicate that this function calls Nextbus API. Also this would allow predictions.py not to have any Nextbus specific code, making it easier to reuse it in the future for other transit data providers besides Nextbus.

youngj · 2019-07-28T00:56:31Z

models/nextbus_predictions.py

@@ -0,0 +1,68 @@
+import re


I'd recommend moving the content of nextbus_predictions.py into nextbus.py. It doesn't seem necessary to have a separate file for this, and using nextbus.py would make it more natural to use, e.g.

import nextbus predictions = nextbus.get_predictions_for_route('sf-muni','27')

hathix · 2019-09-12T04:25:16Z

We should merge this soon. Anyone want to address Jesse's last few comments and merge this in?

akgupta89 · 2019-10-17T03:39:01Z

This is currently abandoned. @youngj or @jtanquil do you want to finish this off?

vicky11z requested review from youngj and jtanquil May 26, 2019 05:28

vicky11z force-pushed the vz-call-predictions-api branch from 6f4bf4d to 8794275 Compare June 13, 2019 03:25

vicky11z force-pushed the vz-call-predictions-api branch from b6ea28e to 3942d92 Compare June 13, 2019 04:03

youngj reviewed Jun 15, 2019

View reviewed changes

akgupta89 force-pushed the master branch from ac67548 to 2c118a9 Compare June 20, 2019 03:35

exxonvaldez force-pushed the master branch from 98230bd to ac67548 Compare June 21, 2019 04:48

vicky11z force-pushed the vz-call-predictions-api branch from 809d373 to a203b6a Compare July 25, 2019 04:01

vicky11z mentioned this pull request Jul 25, 2019

Use 511 api for predictions #142

Closed

vicky11z requested a review from youngj July 25, 2019 04:05

vicky11z added 4 commits July 24, 2019 21:06

initial predictions model

de16c0b

get predictions for just one route, contains all predictions logic

8c3d12d

predictions for one route at a time

13ef7af

take care of request error

cbbde5e

vicky11z force-pushed the vz-call-predictions-api branch from b75b777 to cbbde5e Compare July 25, 2019 04:06

youngj reviewed Jul 28, 2019

View reviewed changes

hathix added this to the Predictions milestone Aug 8, 2019

hathix requested a review from youngj September 12, 2019 04:25

hathix mentioned this pull request Oct 24, 2019

Pick up abandoned PR for predictions model #359

Open

vicky11z closed this Nov 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial predictions model #119

Initial predictions model #119

vicky11z commented May 26, 2019

youngj commented May 29, 2019

vicky11z commented May 31, 2019

vicky11z commented May 31, 2019 •

edited

Loading

exxonvaldez commented May 31, 2019 •

edited

Loading

EddyIonescu commented Jun 4, 2019

vicky11z commented Jun 13, 2019

youngj left a comment

vicky11z commented Jul 25, 2019

hathix commented Jul 25, 2019

youngj left a comment

youngj Jul 28, 2019

youngj Jul 28, 2019

youngj Jul 28, 2019

youngj Jul 28, 2019

youngj Jul 28, 2019

hathix commented Sep 12, 2019

akgupta89 commented Oct 17, 2019

		return os.path.join(util.get_data_dir(), f"predictions_{agency}_{datetime_str}.json")


		def get_predictions_for_route(agency: str, route_id: str) -> list:

Initial predictions model #119

Initial predictions model #119

Conversation

vicky11z commented May 26, 2019

youngj commented May 29, 2019

vicky11z commented May 31, 2019

vicky11z commented May 31, 2019 • edited Loading

exxonvaldez commented May 31, 2019 • edited Loading

EddyIonescu commented Jun 4, 2019

vicky11z commented Jun 13, 2019

youngj left a comment

Choose a reason for hiding this comment

vicky11z commented Jul 25, 2019

hathix commented Jul 25, 2019

youngj left a comment

Choose a reason for hiding this comment

youngj Jul 28, 2019

Choose a reason for hiding this comment

youngj Jul 28, 2019

Choose a reason for hiding this comment

youngj Jul 28, 2019

Choose a reason for hiding this comment

youngj Jul 28, 2019

Choose a reason for hiding this comment

youngj Jul 28, 2019

Choose a reason for hiding this comment

hathix commented Sep 12, 2019

akgupta89 commented Oct 17, 2019

vicky11z commented May 31, 2019 •

edited

Loading

exxonvaldez commented May 31, 2019 •

edited

Loading