Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Query authors and respondents #18 #30

Merged
merged 1 commit into from
Aug 16, 2022
Merged

Feature: Query authors and respondents #18 #30

merged 1 commit into from
Aug 16, 2022

Conversation

joeflack4
Copy link
Member

@joeflack4 joeflack4 commented Aug 15, 2022

Updates

    Feature: Query authors and respondents #18
    - Add: New function implementing basic feature: create_report_users_and_roles()
    - Add: Documentation for feature to README.md

    Misc
    - Add: Codebook section at bottom of README.md documentation.
    - Add: Comment link to user roles GoogleSheet.
    - Update: Renamed 'report1' and 'report2' variable and function names to be more descriptive.
    - Update: Reorganized run()
    - Update: Fixed an incorrect type.
    - Update: .gitignore: Added *.pickle

@joeflack4 joeflack4 self-assigned this Aug 15, 2022
@joeflack4 joeflack4 added the enhancement New feature or request label Aug 15, 2022
@joeflack4 joeflack4 linked an issue Aug 15, 2022 that may be closed by this pull request
3 tasks
#### `zulip_report2_thread_lengths.csv`
TODO

#### `zulip_report3_users.csv`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a "codebook" section to the README.md. I added a codebook for the most recent report, but I need to do so for the rest of the outputs. See: #31

@@ -9,6 +9,8 @@
3. The Zulip chat we're querying: https://chat.fhir.org/#
4. Category keywords google sheet:
https://docs.google.com/spreadsheets/d/1OB0CEAkOhVTN71uIhzCo_iNaiD1B6qLqL7uwil5O22Q/edit#gid=1136391153
5. User roles google sheet:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohaher Just FYI, Davera and I decided to open up a new tab on the google sheet, where she'll put "user" -> "HL7 organization role" mappings.

@@ -46,11 +48,15 @@
'zuliprc_path': os.path.join(ENV_DIR, '.zuliprc'), # rc = "runtime config"
'chat_stream_name': 'terminology',
'num_messages_per_query': 1000,
'outpath_report1': os.path.join(PROJECT_DIR, 'zulip_report1_counts.csv'),
'outpath_report2': os.path.join(PROJECT_DIR, 'zulip_report2_thread_lengths.csv'),
'outpath_user_info': os.path.join(PROJECT_DIR, 'zulip_user_info.csv'),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we create more and more outputs, I wonder if I should think more about how these outputs are named / organized.

return df_report


def create_report_users(df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohaher I just wanted to share with you the new function I made to handle this feature. I feel like it is really repetitive, though; not very proud of it. Due for a refactor at some point.

user_participation_df.to_csv(CONFIG['outpath_raw_results_user_participation'], index=False)

# TODO: I really don't like how repetitive this is; even worse than previous block
# TODO: Have aggregated to keyword, category, and stream, but not to role agnostic of stream. Would be useful to add
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other updates I'm thinking of making:

TODO: Have aggregated to keyword, category, and stream, but not to role agnostic of stream. Would be useful to add this, once streams feature is complete.
TODO: aggregate to agnostic of role? for every level? stream, category, keyword? If so, can call 'participant'

- Add: New function implementing basic feature: create_report_users_and_roles()
- Add: Documentation for feature to README.md

Misc
- Add: Codebook section at bottom of README.md documentation.
- Add: Comment link to user roles GoogleSheet.
- Update: Renamed 'report1' and 'report2' variable and function names to be more descriptive.
- Update: Reorganized run()
- Update: Fixed an incorrect type.
- Update: .gitignore: Added *.pickle
return df_report


def create_report_users(df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
"""Report: Users
# TODO: Bugfix: Major bug; respondent/author counts are not all correct. This is because (i) threads are being
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohaher FYI: Major bugfix I need to do:

    # TODO: Bugfix: Major bug; respondent/author counts are not all correct. This is because (i) threads are being 
       counted multiple times when multiple keywords are matched against them, and (ii) we are *only* counting messages
       within threads that have keyword matches; not every message in every thread that has a keyword match for any 
       message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: #32

@joeflack4 joeflack4 merged commit 2a30506 into main Aug 16, 2022
@joeflack4 joeflack4 deleted the joe branch August 16, 2022 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query users
1 participant