Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing the latency in Vehicle Position Messages #1157

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

fsalemi
Copy link

@fsalemi fsalemi commented Jun 25, 2024

GTFS-RT Vehicle Position Latency Research Project

The objective of this research project is to examine the current latency of vehicle position data. Latency is defined as the time between a transit vehicle obtaining a GPS reading request and the Cal-ITP data pipeline receiving the response.
As per the California Transit Data Guidelines, the recommended latency is as follows:
“Updates should be published to the Trip Updates and Vehicle Positions feeds at least once every 20 seconds, including updated timestamps and data for each trip and vehicle in service.”
Preliminary analysis of the entire feeds indicates that most fall significantly short of this expectation. This research project aims to quantify the extent of the latency issues, identify any patterns, and evaluate the current industry's capacity to improve latency.

Implementation
Several tables form cal-itp-data-infra has been utilized in preparation for this project. However the main table which majority of the analysis is based on is: fct_vehicle_positions_messages .
Due to the magnitude of this dataset, the Metabase app could not be used with this table, necessitating the execution of complex queries in BigQuery to extract the required data. We are collaborating with the Data Engineering team to develop new models and tables for the data warehouse. This will support vehicle position analysis and any future reporting needs for ongoing monitoring of vehicle position data, eliminating the need for complex queries each time. A ticket has been opened for this issue: see Vehicle Position Data Modelling #3408.
The project was coded primarily in two data languages: R and Python. Most of the plots were created using the R language and the Tidyverse package. However, the lack of RStudio IDE in the CalITP cloud and limitations in R coding within the Jupyter environment required us to install additional packages, which delayed the project's completion. We are currently working on installing RStudio on the CalITP cloud to rectify this issue.
The second part of the project, which involved creating a geographical map of vehicle position latencies, was written in Python. This decision was made due to issues we encountered with installing the Leaflet R package in Jupyter.
The plots from this project was used in preparation of a Power Point Presentation which can be found here.

Next step
Efforts to reduce latency are ongoing. In the second phase of this project, we are reaching out to transit RT vendors for further remediation as this report is being published.

Copy link

nbviewer URLs for impacted notebooks:

Copy link

github-actions bot commented Aug 7, 2024

nbviewer URLs for impacted notebooks:

Copy link

github-actions bot commented Aug 7, 2024

nbviewer URLs for impacted notebooks:

1 similar comment
Copy link

github-actions bot commented Aug 7, 2024

nbviewer URLs for impacted notebooks:

Copy link

github-actions bot commented Aug 7, 2024

nbviewer URLs for impacted notebooks:

Copy link
Member

@evansiroky evansiroky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsalemi feel free to merge this whenever you're ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants