Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine which trips have associated user input #37

Open
shankari opened this issue Jun 5, 2021 · 0 comments
Open

Determine which trips have associated user input #37

shankari opened this issue Jun 5, 2021 · 0 comments

Comments

@shankari
Copy link
Collaborator

shankari commented Jun 5, 2021

Right now, there are pipeline stages that combine the user input with the cleaned trip objects to create confirmed trips.
A sample confirmed trip object from the master/ceo_ebike_program branch looks like

{'_id': ObjectId('606b8bf0c77a1ff9e630f422'),
 'user_id': UUID('d4376620-fbcd-4aab-95bf-8c2e0ecf9adf'),
 'metadata': {'key': 'analysis/confirmed_trip',
 'platform': 'server',
 'write_ts': 1617660912.6729634,
 'time_zone': 'America/Los_Angeles',
 'write_local_dt': {'year': 2021, 'month': 4, 'day': 5, 'hour': 15, 'minute': 15, 'second': 12, 'weekday': 0, 'timezone': 'America/Los_Angeles'},
 'write_fmt_time': '2021-04-05T15:15:12.672963-07:00'},
 'data': {'source': 'DwellSegmentationTimeFilter',
 'end_ts': 1617659216.0,
 'end_local_dt': {'year': 2021, 'month': 4, 'day': 5, 'hour': 14, 'minute': 46, 'second': 56, 'weekday': 0, 'timezone': 'America/Los_Angeles'},
 'end_fmt_time': '2021-04-05T14:46:56-07:00',
 'end_loc': {'type': 'Point', 'coordinates': [-122.0867274, 37.3911479]},
 'raw_trip': ObjectId('606b8bf0c77a1ff9e630f3e7'),
 'start_ts': 1617658219.0,
 'start_local_dt': {'year': 2021, 'month': 4, 'day': 5, 'hour': 14, 'minute': 30, 'second': 19, 'weekday': 0, 'timezone': 'America/Los_Angeles'},
 'start_fmt_time': '2021-04-05T14:30:19-07:00',
 'start_loc': {'type': 'Point', 'coordinates': [-122.0870928, 37.390054]},
 'duration': 997.0,
 'distance': 2458.7832149780197,
 'start_place': ObjectId('606b8bf0c77a1ff9e630f41a'),
 'end_place': ObjectId('606b8bf0c77a1ff9e630f41b'),
 'cleaned_trip': ObjectId('606b8bf0c77a1ff9e630f3f1'),
 'user_input': {'mode_confirm': 'bike', 'purpose_confirm': 'pick_drop', 'replaced_mode': 'drove_alone'}}}

After #23, which includes setnames(gsub("user_input", "", names(.))) %>%, the mode_confirm, purpose_confirm and replaced_mode entries automagically show up in the trip table.

But now I want to have a column in the participant table with the number of unlabeled trips.
For now, I have implemented (in my fork alone)

.[is.na(mode_confirm), .(unconfirmed = .N), by = user_id]

shankari@735a93b#diff-4e87ef70cdab2756b9d4aa419fd97ffea60f2ec8d9592573ffee2e4ab33dcf53R162-R165

But not everybody is going to use the mode_confirm object, in particular, the travel survey folks will want to potentially have a more complex object.

In python, I do ct_df_confirmed = ct_df[ct_df.user_input != {}], which does not use any field names, but I don't know how to implement a similar check in R.

@asiripanich you asked me to file an issue with the details, and I did 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant