Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: (re)modularize converter (to_reproschema) #75

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

yibeichan
Copy link
Contributor

@yibeichan yibeichan commented Aug 28, 2024

based on discussion with @djarecka on 08/23/2024, we want to improve our (some format)2reproschema converter by remodularizing it. in this current converter:

  1. I made different classes to handle item, activity, and protocol separately
  2. I tried to use as few hard-coded column names as possible in each function. for example, in our previous redcap2reproschema we have SCHEMA_MAP that maps redcap column names to reproschema variables. this time, I reversed the key-value pairs and made a CSV_TO_REPROSCHEMA_MAP where the keys are reproschema variables but values are input csv column names. in this way, we will always use keys in classes and functions but customize the map with different values (input csv column names)
  3. currently we are using a csv file as input for this converter and mostly use it as command line. we should enhance its ability of python module, which should allow users to use dataframe as input and customize dictionaries such as CSV_TO_REPROSCHEMA_MAP, VALUE_TYPE_MAP, INPUT_TYPE_MAP, and ADDITIONAL_NOTES_LIST
  4. I removed csv.DictReader, put the lovely pandas there

I made this converter based on the LORIS format, which a sort of simplified version of the general REDCap version we used to deal with. They are missing some important information (I'll email them soon). but at the same time we can think about how to make the converter more generalized to handle simple and complex cases.

TODOs (popping up when converting the LORIS format):

  • for maxValue and minValue can we use other variables' answer as those values? (this comes from date which should be greater than a certain date but smaller than today)
  • some variables endswith "_en", which indicates English, some endswith "_es", which indicates Spanish. i haven't specified anything for them yet.

@yibeichan yibeichan changed the title (re)modularize converter WIP: (re)modularize converter (to_reproschema) Aug 28, 2024
@yibeichan
Copy link
Contributor Author

@yibeichan will add tests and examples
@djarecka will try to change the old redcap2reproschema use the new class yibei created here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants