-
Notifications
You must be signed in to change notification settings - Fork 6
Orion Trip Generator
Eddy Ionescu edited this page Jul 12, 2018
·
21 revisions
- takes in JSON S3 files (API output) to generate trips
What is a trip?
- uniqueness is defined by (vehicle_id, route_id, direction_id) staying consistent with every transit state.
- If this tuple is different in the next state, then we end the trip. If we see a tuple that we didn't see last state, we start a new trip.
- the start/end times should be based on the timestamp of the files in the raw data being used (not of when trip is actually being made)
- when a trip ends, we write it to a JSON file in S3 (
agency_id/route_id/direction_id/start_year/start_month/start_day/start_hour/vehicle_id/start_epoch_utc/end_epoch_utc.json
, where epoch is a UTC timestamp in seconds). We can worry about compressing it or loading it into some databases later.
What is a state?
- it's a snapshot of where all Muni vehicles are with a timestamp
Why?
-
our goal is to get the speed & reliability of routes.
-
An easy way of approaching this problem is to look at individual trips and then get data about them along route segments.
-
Storing trip metrics also makes it easier to eventually quickly handle more complex analysis due to aggregated metrics.
-
our goal is to make Muni's GPS data open and accessible to use.
-
making trip data accessible via S3 keeps individual file sizes small and is straightforward & logical for open-data users to retrieve.
What the output will look like:
agency:
startTime:
endTime:
route:
direction:
vid:
states: [{
vtime:
lat:
lon
}]
(json)
How it'll work:
- how it'll persist state:
- each time it goes to the next state, it dumps its state
- Read raw s3 vehicle data from s3 on first startup and create trips (unique tuple in memory)
- Write trips to state file on disk (or s3?) on first startup
- When new raw vehicle file is put in s3, publish sqs message which orion-trip-generator is listening for
- orion-trip-generator consumes sqs message which triggers:
-
- updating trips in memory (states array)
-
- write trips that no longer exist to agency trip s3 bucket and remove them from memory
-
- add new trips to memory.
-
- write current trips state to state file
-
- all of above should be atomic
- if orion-trip-generator restarts, it gets it's state from state file
- if state file doesn't exist it read latest raw file to get state
- If there is a lag between publishing raw files (hours?, days?) this could lead to trips that appears to have taken too long to complete.
- If the trip generator crashes and is restarted after hours or days, the same issue as above could occur (trips that appear to take days to complete)
- Should we ignore writing trips that last longer than a certain period of time?
- Should we still write suspect trips but mark them as possibly invalid?