Merge branch 'release/1.3.0'

ctsit · Jun 30, 2020 · 3a4a533 · 3a4a533
2 parents b82c410 + 21ab7bf
commit 3a4a533
Show file tree

Hide file tree

Showing 26 changed files with 1,269 additions and 693 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,6 +1,35 @@
 Changelog
 =========
 
+## [1.3.0] - 2020-06-30
+### Summary
+
+This versioin reflects changes to have NACCulator be more compatiable with more centers. We removed some hard coded variables for the 1Florida ADRC. 
+There were changes to how the deprecated Z1 and C1S forms are handled as well as updates to tests for new functionality in the program. 
+
+### Added 
+ * Added Z1 skipping to TFP builder (Samantha Emerson)
+ * Added tests for new functionality on skip logic and CSV formats (Samantha Emerson)
+ * Add run_filters.py to setup.py installation (Samantha Emerson)
+ * Add C1S form skip to uds ivp and fvp builders (Samantha Emerson)
+ * Add Z1 form skipping to uds3 fvp (Samantha Emerson)
+ * Add Z1 form skipping to nacculator uds3 ivp (Samantha Emerson)
+
+ ### Changed
+ * Complete filter adjustments and repair associated unit tests (Samantha Emerson)
+
+ ### Removed
+ * Remove filter that removes all events that are not uds3 initial or followup (Samantha Emerson)
+
+ ### Updated
+ * Update and revise README (Samantha Emerson)
+ * Fix typos in IVP and FVP builder files
+ * Modify form C1S allowable_values for LOGIPREV (Samantha Emerson)
+ * Edit filters to accept any AD center's PTID from their config file (Samantha Emerson)
+ * Update README.md (Taeber Rapczak)
+ * Move Generating Forms to minimize confusion (Taeber Rapczak)
+ * Update generator to handle new CSV DED format (Taeber Rapczak)
+
 ## [1.2.0] - 2020-04-13
 ### Summary
 

diff --git a/README.md b/README.md
@@ -12,16 +12,28 @@ _Note:_ NACCulator _**requires Python 3.**_
 HOW TO Convert from REDCap to NACC
 ----------------------------------
 
-Once the project data is exported from REDCap to the CSV file `data.csv`, run:
+To install NACCulator, run:
 
     $ pip3 install git+https://github.com/ctsit/nacculator.git
+
+Once the project data is exported from REDCap to the CSV file `data.csv`, run:
+
     $ redcap2nacc <data.csv >data.txt
 
 This command will work only in the simplest case; UDS3 IVP data only.
-If there are no errors, then submit the `data.txt` file to NACC.
+Nacculator will automatically skip PTIDs with errors, so the output `data.txt`
+file will be ready to submit to NACC.
+In order to properly filter the data in the csv, nacculator is expecting that
+REDCap visits (denoted by `redcap_event_name`) contain certain keywords:
+    "initial_visit" for initial visit packets
+    "followup_visit" for all followups
+    "milestone" for milestone packets
+    "neuropath" for neuropathology packets
+    "telephone" for telephone followup packets
 
 _Note: output is written to `STDOUT`; errors are written to `STDERR`; input is
-expected to be from `STDIN` unless a file is specified using the `-file` flag._
+expected to be from `STDIN` (the command line) unless a file is specified using
+the `-file` flag._
 
 
 ### Usage
@@ -36,19 +48,18 @@ expected to be from `STDIN` unless a file is specified using the `-file` flag._
 
     optional arguments:
       -h, --help            show this help message and exit
-      -fvp                  Set this flag to process as fvp data
-      -ivp                  Set this flag to process as ivp data
-      -tfp                  Set this flag to process as telephone follow-up data
-      -np                   Set this flag to process as np data
-      -m                    Set this flag to process as m data
+      -fvp                  Set this flag to process as FVP data
+      -ivp                  Set this flag to process as IVP data
+      -tfp                  Set this flag to process as Telephone Followup Packet data
+      -np                   Set this flag to process as Neuropathology data
+      -m                    Set this flag to process as Milestone data
       -csf                  Set this flag to process as NACC BIDSS CSF data
       -f {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}, --filter {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}
                               Set this flag to process the filter
       -lbd                  Set this flag to process as Lewy Body Dementia data
-      -ftld                 Set this flag to process as Frontotemporal Lobar                                     Degeneration data
+      -ftld                 Set this flag to process as Frontotemporal Lobar Degeneration data
       -file FILE            Path of the csv file to be processed.
-      -meta FILTER_META     Input file for the filter metadata (in case -filter is
-                              used)
+      -meta FILTER_META     Input file for the filter metadata (in case -filter is used)
       -ptid PTID            Ptid for which you need the records
       -vnum VNUM            Ptid for which you need the records
       -vtype VTYPE          Ptid for which you need the records
@@ -73,10 +84,12 @@ HOW TO Filter Data Using NACCulator
 -----------------------------------
 
 If your data is not clean enough to be processed by NACCulator, there are some
-built in functions to clean (read transform) the data.
+built in functions to clean (read: transform) the data.
 
 In order to properly use the filters, the first step is to check and validate
-that `nacculator_cfg.ini` has the proper settings for the filter to run.
+that `nacculator_cfg.ini` has the proper settings for the filter to run. In
+order to create this file, find the `nacculator_cfg.ini.example` file and
+remove the `.example` portion, and then fill in your center's information.
 The config file contains sections with in-code filter function name. Each of
 these sections contains elements necessary for the filter to run.
 The filters described below will discuss what is required, if anything.
@@ -89,7 +102,8 @@ the example above shows.
   This filter requires a section in the config called `filter_clean_ptid`. This
   section will contain a single key `filepath` which will point to a csv file
   of ptids to be removed. All the records whose ptid with same packet and visit
-  num found in the passed meta file will be discarded in the output file.
+  num found in the passed meta file will be discarded in the output file. This
+  filter also removes events that lack a visit number in REDCap.
 
   Example meta file:
 
@@ -112,12 +126,12 @@ the example above shows.
   This filter requires a section in the config called `filter_fix_headers` with
   as many keys as needed to replace the necessary columns. See example below.
   This filter fixes the column names of any column found in the filter mapping.
-  This filter does not check for any data. It always replaces the column names
+  This filter does not check for any data. It only replaces the column names
   if found.
 
-  Currently, below replacements are used:
+  For example, the configuration would look like this:
 
-      config:
+      [filter_fix_headers]
       c1s_2a_npsylan: c1s_2_npsycloc
       c1s_2a_npsylanx: c1s_2a_npsylan
       b6s_2a1_npsylanx: c1s_2a1_npsylanx
@@ -132,27 +146,25 @@ the example above shows.
   predefined values. Below are the current defaults :
 
       nogds    -> 0
-      arthupex -> 0
-      arthloex -> 0
-      arthspin -> 0
-      arthunk  -> 0
+      formver  -> 3
 
-  *If field is blank, always it will be updated to default value.*
+  *If field is blank, it will be updated to default value.*
 
 * **updateField**
 
-  This filter is used to update non blank fields. Currently, only `adcid` is
-  updated to 41.
+  This filter is used to update fields that already had a value in the REDCap
+  export. Currently, only `adcid` is updated.
 
 * **removePtid**
 
   **Filter config required**
   This filter requires a section in the config called `filter_remove_ptid` with
   a single key called `ptid_format`. The value for that key is a regex string
   to match ptids that are to be kept.
+  11\d.* keeps all PTIDs that fit the format 11xxxx, such as 110001.
 
-  This filter is used to remove ptids that may have a different set of ids for a
-  different study, or help limit which ids show up in the final result.
+  This filter is used to remove ptids that may have a different set of ids for
+  a different study, or help limit which ids show up in the final result.
 
       config:
       ptid_format: 11\d.*
@@ -165,8 +177,9 @@ the example above shows.
 
 * **getPtid**
 
-    This filter is used to get information about a single PatientID.
-    You need to use the `-ptid` flag to specify the patient ID.
+    This filter is used to get information about a single PatientID and is not
+    present in the config file. You need to use the `-ptid` flag to specify the
+    patient ID.
     You can use the `-vnum` to get the records with particular visit number and
     Patient ID or use `-vtype` to get records with particular visit type and
     Patient ID.
@@ -180,28 +193,26 @@ Example Workflow
 Once you have edited the `nacculator_cfg.ini` file with your API token and
 desired filters, you can get a filtered CSV file of the REDCap data with:
 
-    $ python3 run_filters.py nacculator_cfg.ini
-
-This will create a run folder (`$run_folder`) with the current date that
-contains the csv and each iteration of filter, ending with `final_update.csv`.
+    $ nacculator_filters nacculator_cfg.ini
 
-Next, you will need to split apart the IVP and FVP visits:
-
-    $ bash split_ivp_fvp.sh $run_folder/final_update.csv
+This will create a run folder labeled with the current date 
+(`$run_CURRENT-DATE`) (for example, `run_01-01-2000`) that contains the csv and
+each iteration of filter, ending with `final_update.csv`.
 
 The resulting files will not be in the run folder created by `run_filters.py`.
-They will be in the base directory. You can move them if you would like to, but
-you will need to modify the filepaths in the following commands.
+They will be in the base directory. The filepaths in the following commands are
+modified so that the output is deposited in your `$run_CURRENT-DATE` folder.
 
-Next, you will need to run the actual `redcap2nacc` program to produced the
-fixed width text file for NACC. As you have split the IVP and FVP visits, you
-will run the program twice, using each flag once.
+Next, you will need to run the actual `redcap2nacc` program to produce the
+fixed width text file for NACC. One type of flag can be used at a time, so the
+program must be run twice.
 
-    $ redcap2nacc -ivp <initial_visits.csv >$run_folder/iv_nacc_complete.txt 2>$run_folder/ivp_errors.txt
-    $ redcap2nacc -fvp <followup_visits.csv >$run_folder/fv_nacc_complete.txt 2>$run_folder/fvp_errors.txt
+    $ redcap2nacc -ivp < $run_CURRENT-DATE/final_Update.csv > $run_CURRENT-DATE/iv_nacc_complete.txt 2> $run_CURRENT-DATE/ivp_errors.txt
+    $ redcap2nacc -fvp < $run_CURRENT-DATE/final_Update.csv > $run_CURRENT-DATE/fv_nacc_complete.txt 2> $run_CURRENT-DATE/fvp_errors.txt
 
-This will place the text files in the run folder created earlier, as well as a
-log of the run which will have any errors encountered.
+This will place the text files (`iv_nacc_complete.txt`) in the run folder
+created earlier, as well as a log of the run that contains any found errors
+(`ivp_errors.txt`).
 
 
 Development
@@ -234,36 +245,48 @@ This is not exhaustive, but here is an explanation of some important files.
 
 * `tools/generator.py`:
     generates Python objects based on NACC Data Element Dictionaries in CSV.
+    Used by developers to update the existing forms.py files as necessary.
+
+* `nacculator_cfg.ini`:
+    configuration file for the filters, built from `nacculator_cfg.ini.example`
+    in the root `nacculator/` directory.
 
-* `tools/preprocess/run_filters.py` and `tools/preprocess/run_filters.sh`:
+* `nacc/run_filters.py` and `tools/preprocess/run_filters.sh`:
     pulls data from REDCap based on the settings found in `nacculator_cfg.ini`
     (for .py) and `filters_config.cfg` (for .sh).
 
 
-### Generating New Forms
+### Testing
 
-**Warning: read the warnings in the `./nacc/uds3/ivp/forms.py` first!**
+To run all the tests:
 
-_Note: executing `generator.py` from within tools is an important step as the
-script assumes any corrected DEDs are stored under a folder in the current
-working directory called `corrected`._
+    $ python3 -m unittest
 
-    $ python3 tools/generator.py tools/uds3/ded/csv/ >nacc/uds3/ivp/forms.py
-    $ edit nacc/uds3/ivp/forms.py
 
+To run only the tests in a file:
 
-### Testing
+    $ python3 tests/WHICHEVER_test.py
 
-To run all the tests:
 
-    $ make tests
+### Generating Forms
 
+**Warning: the generator is currently broken due to changes in the CSV format.**
 
-To run only the tests in a file:
+You only need to generate forms when there are new DEDs from NACC. The
+NACCulator install includes the current forms automatically.
 
-    $ python3 tests/WHICHEVER_test.py
+Before running the generator, read the warnings in `./nacc/uds3/ivp/forms.py`
+first.
+
+    $ python3 tools/generator.py tools/uds3/ded/csv/ >nacc/uds3/ivp/forms.py
+    $ edit nacc/uds3/ivp/forms.py
+
+_Note: execute `generator.py` from the same folder as the `corrected`
+folder, which should contain any "corrected" DEDs._
 
 
 ### Resources
 
-* UDS3 FVP forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/
+* UDS3 forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/UDS3csvded.html
+* NACC forms and documentation: https://www.alz.washington.edu/NONMEMBER/NACCFormsAndDoc.html
+* UDS submission site: https://www.alz.washington.edu/MEMBER/sitesub.htm
diff --git a/nacc/csf/forms.py b/nacc/csf/forms.py
@@ -1,5 +1,5 @@
 ###############################################################################
-# Copyright 2015-2019 University of Florida. All rights reserved.
+# Copyright 2015-2020 University of Florida. All rights reserved.
 # This file is part of UF CTS-IT's NACCulator project.
 # Use of this source code is governed by the license found in the LICENSE file.
 ###############################################################################
@@ -32,7 +32,7 @@ def header_fields():
 
 
 class FormEE2(nacc.uds3.FieldBag):
-    """ 
+    """
     Generated from Form eE2: https://www.alz.washington.edu/WEB/csfded.pdf
     """
     def __init__(self):

diff --git a/nacc/redcap2nacc.py b/nacc/redcap2nacc.py
@@ -131,17 +131,17 @@ def check_for_bad_characters(field: Field) -> typing.List:
 
         incompatible = []
         if quote:
-            quote = "'"
-            incompatible.append(quote + " (%s)" % num_quote)
+            quote_char = "'"
+            incompatible.append(quote_char + " (%s)" % num_quote)
         if dquote:
-            dquote = '"'
-            incompatible.append(dquote + " (%s)" % num_dquote)
+            dquote_char = '"'
+            incompatible.append(dquote_char + " (%s)" % num_dquote)
         if amp:
-            amp = '&'
-            incompatible.append(amp + " (%s)" % num_amp)
+            amp_char = '&'
+            incompatible.append(amp_char + " (%s)" % num_amp)
         if percent:
-            percent = '%'
-            incompatible.append(percent + " (%s)" % num_percent)
+            percent_char = '%'
+            incompatible.append(percent_char + " (%s)" % num_percent)
 
     return incompatible
 
@@ -174,13 +174,19 @@ def check_redcap_event(options, record) -> bool:
             return False
     elif options.ivp:
         event_name = 'initial_visit'
-        form_match_z1 = record['ivp_z1_complete']
+        try:
+            form_match_z1 = record['ivp_z1_complete']
+        except KeyError:
+            form_match_z1 = ''
         form_match_z1x = record['ivp_z1x_complete']
         if form_match_z1 in ['0', ''] and form_match_z1x in ['0', '']:
             return False
     elif options.fvp:
         event_name = 'followup_visit'
-        form_match_z1 = record['fvp_z1_complete']
+        try:
+            form_match_z1 = record['fvp_z1_complete']
+        except KeyError:
+            form_match_z1 = ''
         form_match_z1x = record['fvp_z1x_complete']
         if form_match_z1 in ['0', ''] and form_match_z1x in ['0', '']:
             return False
@@ -208,25 +214,25 @@ def check_single_select(packet: uds3_packet.Packet):
     warnings = list()
 
     # D1 4
-    fields = ('AMNDEM', 'PCA', 'PPASYN', 'FTDSYN', 'LBDSYN', 'NAMNDEM')
-    if not exclusive(packet, fields):
+    fields_4 = ('AMNDEM', 'PCA', 'PPASYN', 'FTDSYN', 'LBDSYN', 'NAMNDEM')
+    if not exclusive(packet, fields_4):
         warnings.append('For Form D1, Question 4, there is unexpectedly more '
                         'than one syndrome indicated as "Present".')
 
     # D1 5
-    fields = ('MCIAMEM', 'MCIAPLUS', 'MCINON1', 'MCINON2', 'IMPNOMCI')
-    if not exclusive(packet, fields):
+    fields_5 = ('MCIAMEM', 'MCIAPLUS', 'MCINON1', 'MCINON2', 'IMPNOMCI')
+    if not exclusive(packet, fields_5):
         warnings.append('For Form D1, Question 5, there is unexpectedly more '
                         'than one syndrome indicated as "Present".')
 
     # D1 11-39
-    fields = ('ALZDISIF', 'LBDIF', 'MSAIF', 'PSPIF', 'CORTIF', 'FTLDMOIF',
+    fields_11_39 = ('ALZDISIF', 'LBDIF', 'MSAIF', 'PSPIF', 'CORTIF', 'FTLDMOIF',
               'FTLDNOIF', 'FTLDSUBX', 'CVDIF', 'ESSTREIF', 'DOWNSIF', 'HUNTIF',
               'PRIONIF', 'BRNINJIF', 'HYCEPHIF', 'EPILEPIF', 'NEOPIF', 'HIVIF',
               'OTHCOGIF', 'DEPIF', 'BIPOLDIF', 'SCHIZOIF', 'ANXIETIF',
               'DELIRIF', 'PTSDDXIF', 'OTHPSYIF', 'ALCDEMIF', 'IMPSUBIF',
               'DYSILLIF', 'MEDSIF', 'COGOTHIF', 'COGOTH2F', 'COGOTH3F')
-    if not exclusive(packet, fields):
+    if not exclusive(packet, fields_11_39):
         warnings.append('For Form D1, Questions 11-39, there is unexpectedly '
                         'more than one Primary cause selected.')
 
@@ -269,7 +275,7 @@ def set_to_zero_if_blank(*field_names):
         set_to_zero_if_blank(
             'PSPCBS', 'EYEPSP', 'DYSPSP', 'AXIALPSP', 'GAITPSP', 'APRAXSP',
             'APRAXL', 'APRAXR', 'CORTSENL', 'CORTSENR', 'ATAXL', 'ATAXR',
-            'ALIENLML', 'ALIENLMR', 'DYSTONL', 'DYSTONR')
+            'ALIENLML', 'ALIENLMR', 'DYSTONL', 'DYSTONR', 'MYOCLLT', 'MYOCLRT')
 
     # D1 4.
     if packet['DEMENTED'] == 1: