Allow configurable rules to flag entries #50

shankari · 2021-07-14T23:26:26Z

Concrete example:

flag participants who are not contributing data correctly and give hints on why

Example rules:

Apple phone, no app communication, no data upload: Most likely have the “always” permission turned off and have “closed”/force-killed the app
Android 11, recent app communication, but no data upload: Most likely have "always" permission turned off
Both OSes, no recent app communication: token missing or not configured properly?
Samsung phone acting weirdly
- Android < 11, communication, but no data: likely aggressive power savings https://dontkillmyapp.com/

shankari · 2021-07-14T23:28:51Z

I came up with these rules basically after

defining a "valid" user as (has recent trip, has recent communication, has recent upload)
finding invalid users
grouping them by phone make and version
contacting the users to see what the problem was

It seems like some of these parts (e.g. finding invalid users and grouping them) can be done automatically and is related to data analytics?

shankari · 2021-07-14T23:31:38Z

another idea is to define simple rules similar to IFTT and or drools https://www.drools.org/
These allow end users to define rules to automatically tag users or trips based on their needs.
Unfortunately, R doesn't seem to have a rule engine component built in

Python has 2 + "formulas" which allows you to define rules similar to Excel formulas
For now, in R, we would probably want to just write if/then statements, but we should put them into a separate file called rules.R

shankari · 2021-07-14T23:33:35Z

So the automated part can help us come up with the "if" part of the rules, but I can't figure out how to automate the "then" part of the rules. presumably, the "then" part will be some kind of status that will be a column in the participant table, and for an android 11 user without any data upload would say "check always location permission"

shankari · 2021-07-14T23:35:04Z

So I would say: the if statements in R will definitely work and are the easy solution. If you want to experiment with ML and have the time, clustering would be cool too.

shankari · 2021-07-15T00:16:10Z

@allenmichael099 I am writing out a rough outline of a full ML-based system for determining users whose data collection is not working successfully. You do not have to implement all of this. You can pick and choose the parts that are interesting to you; we will have to have a second round of improvements anyway and can roll in any pending tasks then.

Add a "status" or REASON field into the participant dashboard. You may store this in a file or a DB table, or just compute it while loading
On the right hand bottom corner of the dashboard, display the current distribution - "valid" users, and users with each status/reason
Determine the validity and status of each user.
- Validity is hardcoded as in the comment above
- Status/reason can either be known (if it fits the existing model) or UNKNOWN
  - Hardcode the rules as well (easy), or
  - Determine the clusters using ML
    - Extract the features which mark the user as invalid, along with additional features such as phone make and model
    - Learn a supervised/unsupervised "cluster + labels" model from the "invalid" entries that have status
    - Predict the status for other invalid entries, setting to "UNKNOWN" if not possible to predict

Note that the automated approach:

will require an editable table that allows the program manager to specify the status
will require us to save the status where known
- both of these are required to build the data-driven model
will allow us to determine when there is something new that is causing the data collection to break (e.g. iOS 15, etc)

This also suggests a graduated approach to implementing the feature.

implement everything with hardcoded rules
if time and energy permit, replace the hardcoded rules with a data-driven approach.

asiripanich · 2021-07-15T00:52:19Z

@shankari Thanks for planning out this feature so well. :) I love the idea.

asiripanich · 2021-07-24T10:00:11Z

Track the progress of this feature here: https://github.com/asiripanich/emdash.rules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configurable rules to flag entries #50

Allow configurable rules to flag entries #50

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 15, 2021 •

edited

Loading

asiripanich commented Jul 15, 2021

asiripanich commented Jul 24, 2021

Allow configurable rules to flag entries #50

Allow configurable rules to flag entries #50

Comments

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 14, 2021

shankari commented Jul 15, 2021 • edited Loading

asiripanich commented Jul 15, 2021

asiripanich commented Jul 24, 2021

shankari commented Jul 15, 2021 •

edited

Loading