-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Query authors and respondents #18 #30
Conversation
5c83534
to
c4ce0fb
Compare
#### `zulip_report2_thread_lengths.csv` | ||
TODO | ||
|
||
#### `zulip_report3_users.csv` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a "codebook" section to the README.md. I added a codebook for the most recent report, but I need to do so for the rest of the outputs. See: #31
@@ -9,6 +9,8 @@ | |||
3. The Zulip chat we're querying: https://chat.fhir.org/# | |||
4. Category keywords google sheet: | |||
https://docs.google.com/spreadsheets/d/1OB0CEAkOhVTN71uIhzCo_iNaiD1B6qLqL7uwil5O22Q/edit#gid=1136391153 | |||
5. User roles google sheet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohaher Just FYI, Davera and I decided to open up a new tab on the google sheet, where she'll put "user" -> "HL7 organization role" mappings.
@@ -46,11 +48,15 @@ | |||
'zuliprc_path': os.path.join(ENV_DIR, '.zuliprc'), # rc = "runtime config" | |||
'chat_stream_name': 'terminology', | |||
'num_messages_per_query': 1000, | |||
'outpath_report1': os.path.join(PROJECT_DIR, 'zulip_report1_counts.csv'), | |||
'outpath_report2': os.path.join(PROJECT_DIR, 'zulip_report2_thread_lengths.csv'), | |||
'outpath_user_info': os.path.join(PROJECT_DIR, 'zulip_user_info.csv'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we create more and more outputs, I wonder if I should think more about how these outputs are named / organized.
return df_report | ||
|
||
|
||
def create_report_users(df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohaher I just wanted to share with you the new function I made to handle this feature. I feel like it is really repetitive, though; not very proud of it. Due for a refactor at some point.
user_participation_df.to_csv(CONFIG['outpath_raw_results_user_participation'], index=False) | ||
|
||
# TODO: I really don't like how repetitive this is; even worse than previous block | ||
# TODO: Have aggregated to keyword, category, and stream, but not to role agnostic of stream. Would be useful to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other updates I'm thinking of making:
TODO: Have aggregated to keyword, category, and stream, but not to role agnostic of stream. Would be useful to add this, once streams feature is complete.
TODO: aggregate to agnostic of role? for every level? stream, category, keyword? If so, can call 'participant'
- Add: New function implementing basic feature: create_report_users_and_roles() - Add: Documentation for feature to README.md Misc - Add: Codebook section at bottom of README.md documentation. - Add: Comment link to user roles GoogleSheet. - Update: Renamed 'report1' and 'report2' variable and function names to be more descriptive. - Update: Reorganized run() - Update: Fixed an incorrect type. - Update: .gitignore: Added *.pickle
return df_report | ||
|
||
|
||
def create_report_users(df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: | ||
"""Report: Users | ||
# TODO: Bugfix: Major bug; respondent/author counts are not all correct. This is because (i) threads are being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohaher FYI: Major bugfix I need to do:
# TODO: Bugfix: Major bug; respondent/author counts are not all correct. This is because (i) threads are being
counted multiple times when multiple keywords are matched against them, and (ii) we are *only* counting messages
within threads that have keyword matches; not every message in every thread that has a keyword match for any
message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: #32
Updates