Skip to content

Commit

Permalink
Merge branch 'release/1.3.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
KevinHanson committed Jun 30, 2020
2 parents b82c410 + 21ab7bf commit 3a4a533
Show file tree
Hide file tree
Showing 26 changed files with 1,269 additions and 693 deletions.
29 changes: 29 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
Changelog
=========

## [1.3.0] - 2020-06-30
### Summary

This versioin reflects changes to have NACCulator be more compatiable with more centers. We removed some hard coded variables for the 1Florida ADRC.
There were changes to how the deprecated Z1 and C1S forms are handled as well as updates to tests for new functionality in the program.

### Added
* Added Z1 skipping to TFP builder (Samantha Emerson)
* Added tests for new functionality on skip logic and CSV formats (Samantha Emerson)
* Add run_filters.py to setup.py installation (Samantha Emerson)
* Add C1S form skip to uds ivp and fvp builders (Samantha Emerson)
* Add Z1 form skipping to uds3 fvp (Samantha Emerson)
* Add Z1 form skipping to nacculator uds3 ivp (Samantha Emerson)

### Changed
* Complete filter adjustments and repair associated unit tests (Samantha Emerson)

### Removed
* Remove filter that removes all events that are not uds3 initial or followup (Samantha Emerson)

### Updated
* Update and revise README (Samantha Emerson)
* Fix typos in IVP and FVP builder files
* Modify form C1S allowable_values for LOGIPREV (Samantha Emerson)
* Edit filters to accept any AD center's PTID from their config file (Samantha Emerson)
* Update README.md (Taeber Rapczak)
* Move Generating Forms to minimize confusion (Taeber Rapczak)
* Update generator to handle new CSV DED format (Taeber Rapczak)

## [1.2.0] - 2020-04-13
### Summary

Expand Down
139 changes: 81 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,28 @@ _Note:_ NACCulator _**requires Python 3.**_
HOW TO Convert from REDCap to NACC
----------------------------------

Once the project data is exported from REDCap to the CSV file `data.csv`, run:
To install NACCulator, run:

$ pip3 install git+https://github.com/ctsit/nacculator.git

Once the project data is exported from REDCap to the CSV file `data.csv`, run:

$ redcap2nacc <data.csv >data.txt

This command will work only in the simplest case; UDS3 IVP data only.
If there are no errors, then submit the `data.txt` file to NACC.
Nacculator will automatically skip PTIDs with errors, so the output `data.txt`
file will be ready to submit to NACC.
In order to properly filter the data in the csv, nacculator is expecting that
REDCap visits (denoted by `redcap_event_name`) contain certain keywords:
"initial_visit" for initial visit packets
"followup_visit" for all followups
"milestone" for milestone packets
"neuropath" for neuropathology packets
"telephone" for telephone followup packets

_Note: output is written to `STDOUT`; errors are written to `STDERR`; input is
expected to be from `STDIN` unless a file is specified using the `-file` flag._
expected to be from `STDIN` (the command line) unless a file is specified using
the `-file` flag._


### Usage
Expand All @@ -36,19 +48,18 @@ expected to be from `STDIN` unless a file is specified using the `-file` flag._

optional arguments:
-h, --help show this help message and exit
-fvp Set this flag to process as fvp data
-ivp Set this flag to process as ivp data
-tfp Set this flag to process as telephone follow-up data
-np Set this flag to process as np data
-m Set this flag to process as m data
-fvp Set this flag to process as FVP data
-ivp Set this flag to process as IVP data
-tfp Set this flag to process as Telephone Followup Packet data
-np Set this flag to process as Neuropathology data
-m Set this flag to process as Milestone data
-csf Set this flag to process as NACC BIDSS CSF data
-f {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}, --filter {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}
Set this flag to process the filter
-lbd Set this flag to process as Lewy Body Dementia data
-ftld Set this flag to process as Frontotemporal Lobar Degeneration data
-ftld Set this flag to process as Frontotemporal Lobar Degeneration data
-file FILE Path of the csv file to be processed.
-meta FILTER_META Input file for the filter metadata (in case -filter is
used)
-meta FILTER_META Input file for the filter metadata (in case -filter is used)
-ptid PTID Ptid for which you need the records
-vnum VNUM Ptid for which you need the records
-vtype VTYPE Ptid for which you need the records
Expand All @@ -73,10 +84,12 @@ HOW TO Filter Data Using NACCulator
-----------------------------------

If your data is not clean enough to be processed by NACCulator, there are some
built in functions to clean (read transform) the data.
built in functions to clean (read: transform) the data.

In order to properly use the filters, the first step is to check and validate
that `nacculator_cfg.ini` has the proper settings for the filter to run.
that `nacculator_cfg.ini` has the proper settings for the filter to run. In
order to create this file, find the `nacculator_cfg.ini.example` file and
remove the `.example` portion, and then fill in your center's information.
The config file contains sections with in-code filter function name. Each of
these sections contains elements necessary for the filter to run.
The filters described below will discuss what is required, if anything.
Expand All @@ -89,7 +102,8 @@ the example above shows.
This filter requires a section in the config called `filter_clean_ptid`. This
section will contain a single key `filepath` which will point to a csv file
of ptids to be removed. All the records whose ptid with same packet and visit
num found in the passed meta file will be discarded in the output file.
num found in the passed meta file will be discarded in the output file. This
filter also removes events that lack a visit number in REDCap.

Example meta file:

Expand All @@ -112,12 +126,12 @@ the example above shows.
This filter requires a section in the config called `filter_fix_headers` with
as many keys as needed to replace the necessary columns. See example below.
This filter fixes the column names of any column found in the filter mapping.
This filter does not check for any data. It always replaces the column names
This filter does not check for any data. It only replaces the column names
if found.

Currently, below replacements are used:
For example, the configuration would look like this:

config:
[filter_fix_headers]
c1s_2a_npsylan: c1s_2_npsycloc
c1s_2a_npsylanx: c1s_2a_npsylan
b6s_2a1_npsylanx: c1s_2a1_npsylanx
Expand All @@ -132,27 +146,25 @@ the example above shows.
predefined values. Below are the current defaults :

nogds -> 0
arthupex -> 0
arthloex -> 0
arthspin -> 0
arthunk -> 0
formver -> 3

*If field is blank, always it will be updated to default value.*
*If field is blank, it will be updated to default value.*

* **updateField**

This filter is used to update non blank fields. Currently, only `adcid` is
updated to 41.
This filter is used to update fields that already had a value in the REDCap
export. Currently, only `adcid` is updated.

* **removePtid**

**Filter config required**
This filter requires a section in the config called `filter_remove_ptid` with
a single key called `ptid_format`. The value for that key is a regex string
to match ptids that are to be kept.
11\d.* keeps all PTIDs that fit the format 11xxxx, such as 110001.

This filter is used to remove ptids that may have a different set of ids for a
different study, or help limit which ids show up in the final result.
This filter is used to remove ptids that may have a different set of ids for
a different study, or help limit which ids show up in the final result.

config:
ptid_format: 11\d.*
Expand All @@ -165,8 +177,9 @@ the example above shows.

* **getPtid**

This filter is used to get information about a single PatientID.
You need to use the `-ptid` flag to specify the patient ID.
This filter is used to get information about a single PatientID and is not
present in the config file. You need to use the `-ptid` flag to specify the
patient ID.
You can use the `-vnum` to get the records with particular visit number and
Patient ID or use `-vtype` to get records with particular visit type and
Patient ID.
Expand All @@ -180,28 +193,26 @@ Example Workflow
Once you have edited the `nacculator_cfg.ini` file with your API token and
desired filters, you can get a filtered CSV file of the REDCap data with:

$ python3 run_filters.py nacculator_cfg.ini

This will create a run folder (`$run_folder`) with the current date that
contains the csv and each iteration of filter, ending with `final_update.csv`.
$ nacculator_filters nacculator_cfg.ini

Next, you will need to split apart the IVP and FVP visits:

$ bash split_ivp_fvp.sh $run_folder/final_update.csv
This will create a run folder labeled with the current date
(`$run_CURRENT-DATE`) (for example, `run_01-01-2000`) that contains the csv and
each iteration of filter, ending with `final_update.csv`.

The resulting files will not be in the run folder created by `run_filters.py`.
They will be in the base directory. You can move them if you would like to, but
you will need to modify the filepaths in the following commands.
They will be in the base directory. The filepaths in the following commands are
modified so that the output is deposited in your `$run_CURRENT-DATE` folder.

Next, you will need to run the actual `redcap2nacc` program to produced the
fixed width text file for NACC. As you have split the IVP and FVP visits, you
will run the program twice, using each flag once.
Next, you will need to run the actual `redcap2nacc` program to produce the
fixed width text file for NACC. One type of flag can be used at a time, so the
program must be run twice.

$ redcap2nacc -ivp <initial_visits.csv >$run_folder/iv_nacc_complete.txt 2>$run_folder/ivp_errors.txt
$ redcap2nacc -fvp <followup_visits.csv >$run_folder/fv_nacc_complete.txt 2>$run_folder/fvp_errors.txt
$ redcap2nacc -ivp < $run_CURRENT-DATE/final_Update.csv > $run_CURRENT-DATE/iv_nacc_complete.txt 2> $run_CURRENT-DATE/ivp_errors.txt
$ redcap2nacc -fvp < $run_CURRENT-DATE/final_Update.csv > $run_CURRENT-DATE/fv_nacc_complete.txt 2> $run_CURRENT-DATE/fvp_errors.txt

This will place the text files in the run folder created earlier, as well as a
log of the run which will have any errors encountered.
This will place the text files (`iv_nacc_complete.txt`) in the run folder
created earlier, as well as a log of the run that contains any found errors
(`ivp_errors.txt`).


Development
Expand Down Expand Up @@ -234,36 +245,48 @@ This is not exhaustive, but here is an explanation of some important files.

* `tools/generator.py`:
generates Python objects based on NACC Data Element Dictionaries in CSV.
Used by developers to update the existing forms.py files as necessary.

* `nacculator_cfg.ini`:
configuration file for the filters, built from `nacculator_cfg.ini.example`
in the root `nacculator/` directory.

* `tools/preprocess/run_filters.py` and `tools/preprocess/run_filters.sh`:
* `nacc/run_filters.py` and `tools/preprocess/run_filters.sh`:
pulls data from REDCap based on the settings found in `nacculator_cfg.ini`
(for .py) and `filters_config.cfg` (for .sh).


### Generating New Forms
### Testing

**Warning: read the warnings in the `./nacc/uds3/ivp/forms.py` first!**
To run all the tests:

_Note: executing `generator.py` from within tools is an important step as the
script assumes any corrected DEDs are stored under a folder in the current
working directory called `corrected`._
$ python3 -m unittest

$ python3 tools/generator.py tools/uds3/ded/csv/ >nacc/uds3/ivp/forms.py
$ edit nacc/uds3/ivp/forms.py

To run only the tests in a file:

### Testing
$ python3 tests/WHICHEVER_test.py

To run all the tests:

$ make tests
### Generating Forms

**Warning: the generator is currently broken due to changes in the CSV format.**

To run only the tests in a file:
You only need to generate forms when there are new DEDs from NACC. The
NACCulator install includes the current forms automatically.

$ python3 tests/WHICHEVER_test.py
Before running the generator, read the warnings in `./nacc/uds3/ivp/forms.py`
first.

$ python3 tools/generator.py tools/uds3/ded/csv/ >nacc/uds3/ivp/forms.py
$ edit nacc/uds3/ivp/forms.py

_Note: execute `generator.py` from the same folder as the `corrected`
folder, which should contain any "corrected" DEDs._


### Resources

* UDS3 FVP forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/
* UDS3 forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/UDS3csvded.html
* NACC forms and documentation: https://www.alz.washington.edu/NONMEMBER/NACCFormsAndDoc.html
* UDS submission site: https://www.alz.washington.edu/MEMBER/sitesub.htm
4 changes: 2 additions & 2 deletions nacc/csf/forms.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
###############################################################################
# Copyright 2015-2019 University of Florida. All rights reserved.
# Copyright 2015-2020 University of Florida. All rights reserved.
# This file is part of UF CTS-IT's NACCulator project.
# Use of this source code is governed by the license found in the LICENSE file.
###############################################################################
Expand Down Expand Up @@ -32,7 +32,7 @@ def header_fields():


class FormEE2(nacc.uds3.FieldBag):
"""
"""
Generated from Form eE2: https://www.alz.washington.edu/WEB/csfded.pdf
"""
def __init__(self):
Expand Down
40 changes: 23 additions & 17 deletions nacc/redcap2nacc.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,17 +131,17 @@ def check_for_bad_characters(field: Field) -> typing.List:

incompatible = []
if quote:
quote = "'"
incompatible.append(quote + " (%s)" % num_quote)
quote_char = "'"
incompatible.append(quote_char + " (%s)" % num_quote)
if dquote:
dquote = '"'
incompatible.append(dquote + " (%s)" % num_dquote)
dquote_char = '"'
incompatible.append(dquote_char + " (%s)" % num_dquote)
if amp:
amp = '&'
incompatible.append(amp + " (%s)" % num_amp)
amp_char = '&'
incompatible.append(amp_char + " (%s)" % num_amp)
if percent:
percent = '%'
incompatible.append(percent + " (%s)" % num_percent)
percent_char = '%'
incompatible.append(percent_char + " (%s)" % num_percent)

return incompatible

Expand Down Expand Up @@ -174,13 +174,19 @@ def check_redcap_event(options, record) -> bool:
return False
elif options.ivp:
event_name = 'initial_visit'
form_match_z1 = record['ivp_z1_complete']
try:
form_match_z1 = record['ivp_z1_complete']
except KeyError:
form_match_z1 = ''
form_match_z1x = record['ivp_z1x_complete']
if form_match_z1 in ['0', ''] and form_match_z1x in ['0', '']:
return False
elif options.fvp:
event_name = 'followup_visit'
form_match_z1 = record['fvp_z1_complete']
try:
form_match_z1 = record['fvp_z1_complete']
except KeyError:
form_match_z1 = ''
form_match_z1x = record['fvp_z1x_complete']
if form_match_z1 in ['0', ''] and form_match_z1x in ['0', '']:
return False
Expand Down Expand Up @@ -208,25 +214,25 @@ def check_single_select(packet: uds3_packet.Packet):
warnings = list()

# D1 4
fields = ('AMNDEM', 'PCA', 'PPASYN', 'FTDSYN', 'LBDSYN', 'NAMNDEM')
if not exclusive(packet, fields):
fields_4 = ('AMNDEM', 'PCA', 'PPASYN', 'FTDSYN', 'LBDSYN', 'NAMNDEM')
if not exclusive(packet, fields_4):
warnings.append('For Form D1, Question 4, there is unexpectedly more '
'than one syndrome indicated as "Present".')

# D1 5
fields = ('MCIAMEM', 'MCIAPLUS', 'MCINON1', 'MCINON2', 'IMPNOMCI')
if not exclusive(packet, fields):
fields_5 = ('MCIAMEM', 'MCIAPLUS', 'MCINON1', 'MCINON2', 'IMPNOMCI')
if not exclusive(packet, fields_5):
warnings.append('For Form D1, Question 5, there is unexpectedly more '
'than one syndrome indicated as "Present".')

# D1 11-39
fields = ('ALZDISIF', 'LBDIF', 'MSAIF', 'PSPIF', 'CORTIF', 'FTLDMOIF',
fields_11_39 = ('ALZDISIF', 'LBDIF', 'MSAIF', 'PSPIF', 'CORTIF', 'FTLDMOIF',
'FTLDNOIF', 'FTLDSUBX', 'CVDIF', 'ESSTREIF', 'DOWNSIF', 'HUNTIF',
'PRIONIF', 'BRNINJIF', 'HYCEPHIF', 'EPILEPIF', 'NEOPIF', 'HIVIF',
'OTHCOGIF', 'DEPIF', 'BIPOLDIF', 'SCHIZOIF', 'ANXIETIF',
'DELIRIF', 'PTSDDXIF', 'OTHPSYIF', 'ALCDEMIF', 'IMPSUBIF',
'DYSILLIF', 'MEDSIF', 'COGOTHIF', 'COGOTH2F', 'COGOTH3F')
if not exclusive(packet, fields):
if not exclusive(packet, fields_11_39):
warnings.append('For Form D1, Questions 11-39, there is unexpectedly '
'more than one Primary cause selected.')

Expand Down Expand Up @@ -269,7 +275,7 @@ def set_to_zero_if_blank(*field_names):
set_to_zero_if_blank(
'PSPCBS', 'EYEPSP', 'DYSPSP', 'AXIALPSP', 'GAITPSP', 'APRAXSP',
'APRAXL', 'APRAXR', 'CORTSENL', 'CORTSENR', 'ATAXL', 'ATAXR',
'ALIENLML', 'ALIENLMR', 'DYSTONL', 'DYSTONR')
'ALIENLML', 'ALIENLMR', 'DYSTONL', 'DYSTONR', 'MYOCLLT', 'MYOCLRT')

# D1 4.
if packet['DEMENTED'] == 1:
Expand Down
Loading

0 comments on commit 3a4a533

Please sign in to comment.