-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline - Predicting Replaced (2nd choice) mode with logistic regression #1087
Comments
FYI, I think that the uprm-civic also has replaced mode ( |
Hi everyone! "public transport routing ... requires timetable data to work properly, and OSM doesn't have that." EDIT: I found that OSM does this
Realtime GTFSSome places provide realtime data such as boston MBTA https://www.mbta.com/developers/gtfs-realtime https://github.com/MobilityData/awesome-transit?tab=readme-ov-file |
@jpfleischer I meant we need routing in the sense of: I went from my house to the drugstore by car. I want to be able to run a query (ideally via API) that will give me the time and cost of the alternatives (e.g. the equivalent of this but with cost included OSM has transit data, and we use transit data from it using overpass for mode detection (look at `emission/net/ext_services ) but it doesn't do routing. OSM-based routing services such as OSRM or GraphHopper typically do not support transit. So we cannot use them to find transit alternatives. There is an open-source routing engine that takes transit into account (Open Trip Planner) We are friends with the OTP folks and have tried using their software before. But for us to use this in a production system, somebody still needs to run the software, load the data, keep it updated, etc. Ideally, there would be an overpass-like system that we could use for routing and that we could pay for if needed. But I am not sure that google maps alternative exists.
transit.land is intended to do that, at least for the US. But somebody needs to load that data |
One final comment on this: wrt the framing of this problem, we have discussed how there are people's preferences (which are related to the person) and the alternatives (which are related to the environment) So the same person may make different choices in a different environment (e.g. @jpfleischer taking transit in Boston but not in FL) even though their internalized preferences have not changed. Just wanted to highlight the flip side of that, which is that different people can have different preferences. While @jpfleischer would not ever take the bus in FL, there are clearly people who do (otherwise, the bus system would have shut down). For the replaced mode project, we want to understand individual or group preferences, specifically as a set of factors that influence their (assumed rational) choices. We can then apply those preferences to a different set of alternatives (new transit line, no e-bike available, parking restrictions...) and get a sense of how they will behave, and by extension, what the impact of the modification to the alternatives is. |
@jpfleischer Here is a PR related to the NTD data processing and integration for energy and emissions, maybe similar methods would allow us to extract transit cost? e-mission-common PR I think the notebooks in |
@Abby-Wheelis |
For frequency - NTD glossary defines "Headway" as "The time interval between vehicles moving in the same direction on a particular route. Can be found in: S-10" - now if I can just figure out where S-10 is... |
S-10 is a form that agencies fill out for reporting to NTD: the 2023 version here includes many of the fields that we saw in the data table with time periods and when they are active (AM peak, Sunday, etc) but I don't see "Headway" in the form or the data table, unfortunately |
I have not been able to find service frequency or headway, but I did find a paper (from 2011) referring to methods for evaluating performance using NTD data System for Transit Performance Analysis Using the National Transit Database, notably:
|
The way to get headwayIt is true that GTFS agencies publish their stop times a lot more frequently than they publish their fares. However, as @Abby-Wheelis has found, there is a documented way to discover headway within a paper, and it will be more straightforward to apply such logic (after verifying its accuracy and reasoning). It would be quite complicated to get the stop times also because there is no NTD ID in the GTFS data, only the stop coordinates, so we would have to add logic to convert coordinates to UACE. We may consider comparing both options if time allows, but for now, just do NTD headway calculation. |
A few new notes from our meeting today:
|
Given that the pseudoformula I found in the paper is fairly complicated I just wanted to think through it to sanity check to make sure we agree with it before trying to implement it:
speed = revenue miles / revenue hours headway = (directional mileage / speed) / num vehicles I'm not sure I've gotten my head wrapped around this formula, if anyone sees it differently please feel free to let me know how we can interpret it |
Abby is right, we have now used the preexisting ntd script and leveraged its logic to add on fares, while fixing a bug to get it to work. We are considering, in regards to coordinate-to-fare return function-
I think half of the preexisting function can be generalized, but for now, I will go with the first option. We also anticipate that, since the fare information is only attached to UACE, that we will calculate a general average fare across the entire UACE, weighted by number of passenger trips, to return fare information for a particular coordinate. |
@Abby-Wheelis @jpfleischer have you started working with the OTP yet? We definitely need travel time as well. I wonder if the OTP API supports any generic queries related to GTFS and headways, similar to https://nycplanning.github.io/td-travelshed/mapbox/public/ |
Short-term goals:
|
We now have a mechanism to launch OpenTripPlanner within a docker container and to build an instance for Denver's RTD. The shortcoming is that it is required to manually specify the gtfs source, but since I know how to pull the GTFS links from Mobility Database according to State, then we can combine the two projects to have an automated GTFS fetcher and an automated transit time calculator (fetched from the OTP API on our local docker instance). JGreenlee/e-mission-common@e012774 Next to do is to make the OTP API logic to get transit times according to coordinates. An issue is that currently, the transit times are not able to be calculated for trips from more than a several months prior. A potential solution is using Mobility Database to pull historical GTFS. What are the key findings?GTFS is bad for fare, great for stop times. |
Mobility Database only has records for RTD Denver dating back to Feb 2024: https://mobilitydatabase.org/feeds/mdb-178 However, I downloaded the gtfs using the Wayback Machine and then successfully calculated trips for 2022. ...however, it is much more straightforward to use the older version of Mobility Database, called OpenMobilityData, |
With ~60 GTFS zip files, OTP takes many hours to fully start up. Possible workarounds are combining all the GTFS files into one using a merge tool such as gtfsmerge, or instead of using 1 gtfs file per month per year, use maybe one every three months. |
The most centralized status of this objective is located at the README.md at my e-mission-common fork: The most crucial aspect is using historical GTFS data. Right now that data lives on AWS servers belonging to OpenMobilityData. The website has a banner on its front page declaring that it is deprecated. I am hoping that this data does not disappear because it is quite crucial. It would be ideal, upon returning to this objective, to save a json of all the agencies, as in:
we scrape because OpenMobilityData no longer gives out API keys. The URL value in the key-value pairing (taken from above) is needed, and the existing logic in https://github.com/jpfleischer/e-mission-common/blob/master/scripts/otp/scrape.ipynb takes care of the rest. We could ideally get more agencies for Colorado but as of now, we only use RTD Denver. |
There are two main components to predicting mode choice with a choice model:
With these factors we can predict that Abby would choose e-bike (-25) and without the e-bike would choose car(-95) but wouldn't choose walk (-130), approximately.
As a baseline, we want to build a logistic regression model, since that is what is most commonly used in research and planning to model mode choice (ie what would the ridership returns on this transit investment be like?).
We have ground truth data about 2nd choice modes, through the replaced mode collected by programs that have a mode of interest, often
e-bike
. This is used to show the impact of the mode of interest, through things like emissions savings/reductions which we map on the public dashboard.To build up the alternatives, we'll need a few different pieces of data, which could be complex to figure out:
@shankari @jpfleischer for visibility
The text was updated successfully, but these errors were encountered: