Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configurable rules to flag entries #50

Open
shankari opened this issue Jul 14, 2021 · 7 comments
Open

Allow configurable rules to flag entries #50

shankari opened this issue Jul 14, 2021 · 7 comments

Comments

@shankari
Copy link
Collaborator

Concrete example:

  • flag participants who are not contributing data correctly and give hints on why

Example rules:

  • Apple phone, no app communication, no data upload: Most likely have the “always” permission turned off and have “closed”/force-killed the app
  • Android 11, recent app communication, but no data upload: Most likely have "always" permission turned off
  • Both OSes, no recent app communication: token missing or not configured properly?
  • Samsung phone acting weirdly
@shankari
Copy link
Collaborator Author

I came up with these rules basically after

  • defining a "valid" user as (has recent trip, has recent communication, has recent upload)
  • finding invalid users
  • grouping them by phone make and version
  • contacting the users to see what the problem was

It seems like some of these parts (e.g. finding invalid users and grouping them) can be done automatically and is related to data analytics?

@shankari
Copy link
Collaborator Author

another idea is to define simple rules similar to IFTT and or drools https://www.drools.org/
These allow end users to define rules to automatically tag users or trips based on their needs.
Unfortunately, R doesn't seem to have a rule engine component built in

Python has 2 + "formulas" which allows you to define rules similar to Excel formulas
For now, in R, we would probably want to just write if/then statements, but we should put them into a separate file called rules.R

@shankari
Copy link
Collaborator Author

So the automated part can help us come up with the "if" part of the rules, but I can't figure out how to automate the "then" part of the rules. presumably, the "then" part will be some kind of status that will be a column in the participant table, and for an android 11 user without any data upload would say "check always location permission"

@shankari
Copy link
Collaborator Author

So I would say: the if statements in R will definitely work and are the easy solution. If you want to experiment with ML and have the time, clustering would be cool too.

@shankari
Copy link
Collaborator Author

shankari commented Jul 15, 2021

@allenmichael099 I am writing out a rough outline of a full ML-based system for determining users whose data collection is not working successfully. You do not have to implement all of this. You can pick and choose the parts that are interesting to you; we will have to have a second round of improvements anyway and can roll in any pending tasks then.

  • Add a "status" or REASON field into the participant dashboard. You may store this in a file or a DB table, or just compute it while loading
  • On the right hand bottom corner of the dashboard, display the current distribution - "valid" users, and users with each status/reason
  • Determine the validity and status of each user.
    • Validity is hardcoded as in the comment above
    • Status/reason can either be known (if it fits the existing model) or UNKNOWN
      • Hardcode the rules as well (easy), or
      • Determine the clusters using ML
        • Extract the features which mark the user as invalid, along with additional features such as phone make and model
        • Learn a supervised/unsupervised "cluster + labels" model from the "invalid" entries that have status
        • Predict the status for other invalid entries, setting to "UNKNOWN" if not possible to predict

Note that the automated approach:

  • will require an editable table that allows the program manager to specify the status
  • will require us to save the status where known
    • both of these are required to build the data-driven model
  • will allow us to determine when there is something new that is causing the data collection to break (e.g. iOS 15, etc)

This also suggests a graduated approach to implementing the feature.

  1. implement everything with hardcoded rules
  2. if time and energy permit, replace the hardcoded rules with a data-driven approach.

@asiripanich
Copy link
Owner

@shankari Thanks for planning out this feature so well. :) I love the idea.

@asiripanich
Copy link
Owner

Track the progress of this feature here: https://github.com/asiripanich/emdash.rules

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants