diff --git a/CHANGELOG b/CHANGELOG index c627af0..7447e18 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,6 +1,35 @@ Changelog ========= +## [1.3.0] - 2020-06-30 +### Summary + +This versioin reflects changes to have NACCulator be more compatiable with more centers. We removed some hard coded variables for the 1Florida ADRC. +There were changes to how the deprecated Z1 and C1S forms are handled as well as updates to tests for new functionality in the program. + +### Added + * Added Z1 skipping to TFP builder (Samantha Emerson) + * Added tests for new functionality on skip logic and CSV formats (Samantha Emerson) + * Add run_filters.py to setup.py installation (Samantha Emerson) + * Add C1S form skip to uds ivp and fvp builders (Samantha Emerson) + * Add Z1 form skipping to uds3 fvp (Samantha Emerson) + * Add Z1 form skipping to nacculator uds3 ivp (Samantha Emerson) + + ### Changed + * Complete filter adjustments and repair associated unit tests (Samantha Emerson) + + ### Removed + * Remove filter that removes all events that are not uds3 initial or followup (Samantha Emerson) + + ### Updated + * Update and revise README (Samantha Emerson) + * Fix typos in IVP and FVP builder files + * Modify form C1S allowable_values for LOGIPREV (Samantha Emerson) + * Edit filters to accept any AD center's PTID from their config file (Samantha Emerson) + * Update README.md (Taeber Rapczak) + * Move Generating Forms to minimize confusion (Taeber Rapczak) + * Update generator to handle new CSV DED format (Taeber Rapczak) + ## [1.2.0] - 2020-04-13 ### Summary diff --git a/README.md b/README.md index 6f0fc85..3d18d0d 100644 --- a/README.md +++ b/README.md @@ -12,16 +12,28 @@ _Note:_ NACCulator _**requires Python 3.**_ HOW TO Convert from REDCap to NACC ---------------------------------- -Once the project data is exported from REDCap to the CSV file `data.csv`, run: +To install NACCulator, run: $ pip3 install git+https://github.com/ctsit/nacculator.git + +Once the project data is exported from REDCap to the CSV file `data.csv`, run: + $ redcap2nacc data.txt This command will work only in the simplest case; UDS3 IVP data only. -If there are no errors, then submit the `data.txt` file to NACC. +Nacculator will automatically skip PTIDs with errors, so the output `data.txt` +file will be ready to submit to NACC. +In order to properly filter the data in the csv, nacculator is expecting that +REDCap visits (denoted by `redcap_event_name`) contain certain keywords: + "initial_visit" for initial visit packets + "followup_visit" for all followups + "milestone" for milestone packets + "neuropath" for neuropathology packets + "telephone" for telephone followup packets _Note: output is written to `STDOUT`; errors are written to `STDERR`; input is -expected to be from `STDIN` unless a file is specified using the `-file` flag._ +expected to be from `STDIN` (the command line) unless a file is specified using +the `-file` flag._ ### Usage @@ -36,19 +48,18 @@ expected to be from `STDIN` unless a file is specified using the `-file` flag._ optional arguments: -h, --help show this help message and exit - -fvp Set this flag to process as fvp data - -ivp Set this flag to process as ivp data - -tfp Set this flag to process as telephone follow-up data - -np Set this flag to process as np data - -m Set this flag to process as m data + -fvp Set this flag to process as FVP data + -ivp Set this flag to process as IVP data + -tfp Set this flag to process as Telephone Followup Packet data + -np Set this flag to process as Neuropathology data + -m Set this flag to process as Milestone data -csf Set this flag to process as NACC BIDSS CSF data -f {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}, --filter {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid} Set this flag to process the filter -lbd Set this flag to process as Lewy Body Dementia data - -ftld Set this flag to process as Frontotemporal Lobar Degeneration data + -ftld Set this flag to process as Frontotemporal Lobar Degeneration data -file FILE Path of the csv file to be processed. - -meta FILTER_META Input file for the filter metadata (in case -filter is - used) + -meta FILTER_META Input file for the filter metadata (in case -filter is used) -ptid PTID Ptid for which you need the records -vnum VNUM Ptid for which you need the records -vtype VTYPE Ptid for which you need the records @@ -73,10 +84,12 @@ HOW TO Filter Data Using NACCulator ----------------------------------- If your data is not clean enough to be processed by NACCulator, there are some -built in functions to clean (read transform) the data. +built in functions to clean (read: transform) the data. In order to properly use the filters, the first step is to check and validate -that `nacculator_cfg.ini` has the proper settings for the filter to run. +that `nacculator_cfg.ini` has the proper settings for the filter to run. In +order to create this file, find the `nacculator_cfg.ini.example` file and +remove the `.example` portion, and then fill in your center's information. The config file contains sections with in-code filter function name. Each of these sections contains elements necessary for the filter to run. The filters described below will discuss what is required, if anything. @@ -89,7 +102,8 @@ the example above shows. This filter requires a section in the config called `filter_clean_ptid`. This section will contain a single key `filepath` which will point to a csv file of ptids to be removed. All the records whose ptid with same packet and visit - num found in the passed meta file will be discarded in the output file. + num found in the passed meta file will be discarded in the output file. This + filter also removes events that lack a visit number in REDCap. Example meta file: @@ -112,12 +126,12 @@ the example above shows. This filter requires a section in the config called `filter_fix_headers` with as many keys as needed to replace the necessary columns. See example below. This filter fixes the column names of any column found in the filter mapping. - This filter does not check for any data. It always replaces the column names + This filter does not check for any data. It only replaces the column names if found. - Currently, below replacements are used: + For example, the configuration would look like this: - config: + [filter_fix_headers] c1s_2a_npsylan: c1s_2_npsycloc c1s_2a_npsylanx: c1s_2a_npsylan b6s_2a1_npsylanx: c1s_2a1_npsylanx @@ -132,17 +146,14 @@ the example above shows. predefined values. Below are the current defaults : nogds -> 0 - arthupex -> 0 - arthloex -> 0 - arthspin -> 0 - arthunk -> 0 + formver -> 3 - *If field is blank, always it will be updated to default value.* + *If field is blank, it will be updated to default value.* * **updateField** - This filter is used to update non blank fields. Currently, only `adcid` is - updated to 41. + This filter is used to update fields that already had a value in the REDCap + export. Currently, only `adcid` is updated. * **removePtid** @@ -150,9 +161,10 @@ the example above shows. This filter requires a section in the config called `filter_remove_ptid` with a single key called `ptid_format`. The value for that key is a regex string to match ptids that are to be kept. + 11\d.* keeps all PTIDs that fit the format 11xxxx, such as 110001. - This filter is used to remove ptids that may have a different set of ids for a - different study, or help limit which ids show up in the final result. + This filter is used to remove ptids that may have a different set of ids for + a different study, or help limit which ids show up in the final result. config: ptid_format: 11\d.* @@ -165,8 +177,9 @@ the example above shows. * **getPtid** - This filter is used to get information about a single PatientID. - You need to use the `-ptid` flag to specify the patient ID. + This filter is used to get information about a single PatientID and is not + present in the config file. You need to use the `-ptid` flag to specify the + patient ID. You can use the `-vnum` to get the records with particular visit number and Patient ID or use `-vtype` to get records with particular visit type and Patient ID. @@ -180,28 +193,26 @@ Example Workflow Once you have edited the `nacculator_cfg.ini` file with your API token and desired filters, you can get a filtered CSV file of the REDCap data with: - $ python3 run_filters.py nacculator_cfg.ini - -This will create a run folder (`$run_folder`) with the current date that -contains the csv and each iteration of filter, ending with `final_update.csv`. + $ nacculator_filters nacculator_cfg.ini -Next, you will need to split apart the IVP and FVP visits: - - $ bash split_ivp_fvp.sh $run_folder/final_update.csv +This will create a run folder labeled with the current date +(`$run_CURRENT-DATE`) (for example, `run_01-01-2000`) that contains the csv and +each iteration of filter, ending with `final_update.csv`. The resulting files will not be in the run folder created by `run_filters.py`. -They will be in the base directory. You can move them if you would like to, but -you will need to modify the filepaths in the following commands. +They will be in the base directory. The filepaths in the following commands are +modified so that the output is deposited in your `$run_CURRENT-DATE` folder. -Next, you will need to run the actual `redcap2nacc` program to produced the -fixed width text file for NACC. As you have split the IVP and FVP visits, you -will run the program twice, using each flag once. +Next, you will need to run the actual `redcap2nacc` program to produce the +fixed width text file for NACC. One type of flag can be used at a time, so the +program must be run twice. - $ redcap2nacc -ivp $run_folder/iv_nacc_complete.txt 2>$run_folder/ivp_errors.txt - $ redcap2nacc -fvp $run_folder/fv_nacc_complete.txt 2>$run_folder/fvp_errors.txt + $ redcap2nacc -ivp < $run_CURRENT-DATE/final_Update.csv > $run_CURRENT-DATE/iv_nacc_complete.txt 2> $run_CURRENT-DATE/ivp_errors.txt + $ redcap2nacc -fvp < $run_CURRENT-DATE/final_Update.csv > $run_CURRENT-DATE/fv_nacc_complete.txt 2> $run_CURRENT-DATE/fvp_errors.txt -This will place the text files in the run folder created earlier, as well as a -log of the run which will have any errors encountered. +This will place the text files (`iv_nacc_complete.txt`) in the run folder +created earlier, as well as a log of the run that contains any found errors +(`ivp_errors.txt`). Development @@ -234,36 +245,48 @@ This is not exhaustive, but here is an explanation of some important files. * `tools/generator.py`: generates Python objects based on NACC Data Element Dictionaries in CSV. + Used by developers to update the existing forms.py files as necessary. + +* `nacculator_cfg.ini`: + configuration file for the filters, built from `nacculator_cfg.ini.example` + in the root `nacculator/` directory. -* `tools/preprocess/run_filters.py` and `tools/preprocess/run_filters.sh`: +* `nacc/run_filters.py` and `tools/preprocess/run_filters.sh`: pulls data from REDCap based on the settings found in `nacculator_cfg.ini` (for .py) and `filters_config.cfg` (for .sh). -### Generating New Forms +### Testing -**Warning: read the warnings in the `./nacc/uds3/ivp/forms.py` first!** +To run all the tests: -_Note: executing `generator.py` from within tools is an important step as the -script assumes any corrected DEDs are stored under a folder in the current -working directory called `corrected`._ + $ python3 -m unittest - $ python3 tools/generator.py tools/uds3/ded/csv/ >nacc/uds3/ivp/forms.py - $ edit nacc/uds3/ivp/forms.py +To run only the tests in a file: -### Testing + $ python3 tests/WHICHEVER_test.py -To run all the tests: - $ make tests +### Generating Forms +**Warning: the generator is currently broken due to changes in the CSV format.** -To run only the tests in a file: +You only need to generate forms when there are new DEDs from NACC. The +NACCulator install includes the current forms automatically. - $ python3 tests/WHICHEVER_test.py +Before running the generator, read the warnings in `./nacc/uds3/ivp/forms.py` +first. + + $ python3 tools/generator.py tools/uds3/ded/csv/ >nacc/uds3/ivp/forms.py + $ edit nacc/uds3/ivp/forms.py + +_Note: execute `generator.py` from the same folder as the `corrected` +folder, which should contain any "corrected" DEDs._ ### Resources -* UDS3 FVP forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/ +* UDS3 forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/UDS3csvded.html +* NACC forms and documentation: https://www.alz.washington.edu/NONMEMBER/NACCFormsAndDoc.html +* UDS submission site: https://www.alz.washington.edu/MEMBER/sitesub.htm diff --git a/nacc/csf/forms.py b/nacc/csf/forms.py index 37abcdf..d72419b 100644 --- a/nacc/csf/forms.py +++ b/nacc/csf/forms.py @@ -1,5 +1,5 @@ ############################################################################### -# Copyright 2015-2019 University of Florida. All rights reserved. +# Copyright 2015-2020 University of Florida. All rights reserved. # This file is part of UF CTS-IT's NACCulator project. # Use of this source code is governed by the license found in the LICENSE file. ############################################################################### @@ -32,7 +32,7 @@ def header_fields(): class FormEE2(nacc.uds3.FieldBag): - """ + """ Generated from Form eE2: https://www.alz.washington.edu/WEB/csfded.pdf """ def __init__(self): diff --git a/nacc/redcap2nacc.py b/nacc/redcap2nacc.py index 920b8d6..a65fc9a 100755 --- a/nacc/redcap2nacc.py +++ b/nacc/redcap2nacc.py @@ -131,17 +131,17 @@ def check_for_bad_characters(field: Field) -> typing.List: incompatible = [] if quote: - quote = "'" - incompatible.append(quote + " (%s)" % num_quote) + quote_char = "'" + incompatible.append(quote_char + " (%s)" % num_quote) if dquote: - dquote = '"' - incompatible.append(dquote + " (%s)" % num_dquote) + dquote_char = '"' + incompatible.append(dquote_char + " (%s)" % num_dquote) if amp: - amp = '&' - incompatible.append(amp + " (%s)" % num_amp) + amp_char = '&' + incompatible.append(amp_char + " (%s)" % num_amp) if percent: - percent = '%' - incompatible.append(percent + " (%s)" % num_percent) + percent_char = '%' + incompatible.append(percent_char + " (%s)" % num_percent) return incompatible @@ -174,13 +174,19 @@ def check_redcap_event(options, record) -> bool: return False elif options.ivp: event_name = 'initial_visit' - form_match_z1 = record['ivp_z1_complete'] + try: + form_match_z1 = record['ivp_z1_complete'] + except KeyError: + form_match_z1 = '' form_match_z1x = record['ivp_z1x_complete'] if form_match_z1 in ['0', ''] and form_match_z1x in ['0', '']: return False elif options.fvp: event_name = 'followup_visit' - form_match_z1 = record['fvp_z1_complete'] + try: + form_match_z1 = record['fvp_z1_complete'] + except KeyError: + form_match_z1 = '' form_match_z1x = record['fvp_z1x_complete'] if form_match_z1 in ['0', ''] and form_match_z1x in ['0', '']: return False @@ -208,25 +214,25 @@ def check_single_select(packet: uds3_packet.Packet): warnings = list() # D1 4 - fields = ('AMNDEM', 'PCA', 'PPASYN', 'FTDSYN', 'LBDSYN', 'NAMNDEM') - if not exclusive(packet, fields): + fields_4 = ('AMNDEM', 'PCA', 'PPASYN', 'FTDSYN', 'LBDSYN', 'NAMNDEM') + if not exclusive(packet, fields_4): warnings.append('For Form D1, Question 4, there is unexpectedly more ' 'than one syndrome indicated as "Present".') # D1 5 - fields = ('MCIAMEM', 'MCIAPLUS', 'MCINON1', 'MCINON2', 'IMPNOMCI') - if not exclusive(packet, fields): + fields_5 = ('MCIAMEM', 'MCIAPLUS', 'MCINON1', 'MCINON2', 'IMPNOMCI') + if not exclusive(packet, fields_5): warnings.append('For Form D1, Question 5, there is unexpectedly more ' 'than one syndrome indicated as "Present".') # D1 11-39 - fields = ('ALZDISIF', 'LBDIF', 'MSAIF', 'PSPIF', 'CORTIF', 'FTLDMOIF', + fields_11_39 = ('ALZDISIF', 'LBDIF', 'MSAIF', 'PSPIF', 'CORTIF', 'FTLDMOIF', 'FTLDNOIF', 'FTLDSUBX', 'CVDIF', 'ESSTREIF', 'DOWNSIF', 'HUNTIF', 'PRIONIF', 'BRNINJIF', 'HYCEPHIF', 'EPILEPIF', 'NEOPIF', 'HIVIF', 'OTHCOGIF', 'DEPIF', 'BIPOLDIF', 'SCHIZOIF', 'ANXIETIF', 'DELIRIF', 'PTSDDXIF', 'OTHPSYIF', 'ALCDEMIF', 'IMPSUBIF', 'DYSILLIF', 'MEDSIF', 'COGOTHIF', 'COGOTH2F', 'COGOTH3F') - if not exclusive(packet, fields): + if not exclusive(packet, fields_11_39): warnings.append('For Form D1, Questions 11-39, there is unexpectedly ' 'more than one Primary cause selected.') @@ -269,7 +275,7 @@ def set_to_zero_if_blank(*field_names): set_to_zero_if_blank( 'PSPCBS', 'EYEPSP', 'DYSPSP', 'AXIALPSP', 'GAITPSP', 'APRAXSP', 'APRAXL', 'APRAXR', 'CORTSENL', 'CORTSENR', 'ATAXL', 'ATAXR', - 'ALIENLML', 'ALIENLMR', 'DYSTONL', 'DYSTONR') + 'ALIENLML', 'ALIENLMR', 'DYSTONL', 'DYSTONR', 'MYOCLLT', 'MYOCLRT') # D1 4. if packet['DEMENTED'] == 1: diff --git a/tools/preprocess/run_filters.py b/nacc/run_filters.py similarity index 83% rename from tools/preprocess/run_filters.py rename to nacc/run_filters.py index b6bb429..daa533f 100644 --- a/tools/preprocess/run_filters.py +++ b/nacc/run_filters.py @@ -1,23 +1,23 @@ import os import sys import csv -import json import datetime -import time import nacc import configparser from cappy import API from nacc.uds3.filters import * + # Creating a folder which contains Intermediate files def recent_run_folder(out_dir): - #Check if directory exists. If not, create it. + # Check if directory exists. If not, create it. if not os.path.exists(out_dir): try: os.makedirs(out_dir) except Exception as e: raise e + def get_headers(input_ptr): reader = csv.DictReader(input_ptr) headers = reader.fieldnames @@ -31,68 +31,70 @@ def run_all_filters(folder_name, config): input_path = os.path.join(folder_name, "redcap_input.csv") output_path = os.path.join(folder_name, "clean.csv") print("Processing", file=sys.stderr) - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_clean_ptid(input_ptr, config, output_ptr) print("--------------Replacing drug IDs--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "clean.csv") output_path = os.path.join(folder_name, "drugs.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_replace_drug_id(input_ptr, config, output_ptr) print("--------------Fixing Headers--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "drugs.csv") output_path = os.path.join(folder_name, "clean_headers.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_fix_headers(input_ptr, config, output_ptr) print("--------------Filling in Defaults--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "clean_headers.csv") output_path = os.path.join(folder_name, "default.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_fill_default(input_ptr, config, output_ptr) print("--------------Updating fields--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "default.csv") output_path = os.path.join(folder_name, "update_fields.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_update_field(input_ptr, config, output_ptr) print("--------------Fixing Visit Dates--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "update_fields.csv") output_path = os.path.join(folder_name, "proper_visitdate.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_fix_visitdate(input_ptr, config, output_ptr) + print("--------------Removing Unnecessary Records--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "proper_visitdate.csv") output_path = os.path.join(folder_name, "CleanedPtid_Update.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_remove_ptid(input_ptr, config, output_ptr) print("--------------Removing Records without VisitDate--------------------", file=sys.stderr) input_path = os.path.join(folder_name, "CleanedPtid_Update.csv") output_path = os.path.join(folder_name, "final_Update.csv") - with open (output_path,'w') as output_ptr, open (input_path,'r') as input_ptr: + with open(output_path, 'w') as output_ptr, open(input_path, 'r') as input_ptr: filter_eliminate_empty_date(input_ptr, config, output_ptr) except Exception as e: print("Error in Opening a file") print(e) - return + def read_config(config_path): config = configparser.ConfigParser() config.read(config_path) return config + # Getting Data From RedCap def get_data_from_redcap(folder_name, config): # Enter the path for filters_config try: - token = config.get('cappy','token') - redcap_url= config.get('cappy','redcap_server') + token = config.get('cappy', 'token') + redcap_url = config.get('cappy', 'redcap_server') except Exception as e: print("Please check the config file and validate all the proper fields exist", file=sys.stderr) print(e) @@ -106,7 +108,7 @@ def get_data_from_redcap(folder_name, config): rawdata = str(res.text) myreader = csv.reader(rawdata.splitlines()) try: - with open(os.path.join(folder_name, "redcap_input.csv"),"w") as file: + with open(os.path.join(folder_name, "redcap_input.csv"), "w") as file: writer = csv.writer(file, delimiter=',') for row in myreader: writer.writerow(row) @@ -114,13 +116,13 @@ def get_data_from_redcap(folder_name, config): print("Error in Writing") print(e) - except: + except Exception: print("Error in CSV file") return -if __name__ == '__main__': +def main(): currentdate = datetime.datetime.now().strftime('%m-%d-%Y') folder_name = "run_" + currentdate print("Recent folder " + folder_name, file=sys.stderr) @@ -139,3 +141,7 @@ def get_data_from_redcap(folder_name, config): run_all_filters(folder_name, config_path) exit() + + +if __name__ == '__main__': + main() diff --git a/nacc/uds3/__init__.py b/nacc/uds3/__init__.py index 4bcd253..84afc7a 100644 --- a/nacc/uds3/__init__.py +++ b/nacc/uds3/__init__.py @@ -81,7 +81,8 @@ def value(self): def value(self, val): def out_of_range(v): d = decimal.Decimal(v) - return d < int(self.inclusive_range[0]) or d > int(self.inclusive_range[1]) + return d < int(self.inclusive_range[0]) or \ + d > int(self.inclusive_range[1]) if self.allowable_values: if val is None: diff --git a/nacc/uds3/filters.py b/nacc/uds3/filters.py index d54dff3..6fafc90 100644 --- a/nacc/uds3/filters.py +++ b/nacc/uds3/filters.py @@ -2,16 +2,9 @@ import csv import re -import fileinput import configparser from collections import defaultdict -# This dictionary contains the keys used in the config -fill_default_values = {'nogds': 0, - 'adcid': 41, - 'formver': 3} - -fill_non_blank_values = {'adcid': '41'} def validate(func): @@ -50,10 +43,14 @@ def int_or_string(value, default=-1): @validate def filter_clean_ptid(input_ptr, filter_config, output_ptr): - filepath = filter_config['filepath'] - with open(filepath, 'r') as nacc_packet_file: - output = filter_clean_ptid_do(input_ptr, nacc_packet_file, output_ptr) - return output + if filter_config: + filepath = filter_config['filepath'] + with open(filepath, 'r') as nacc_packet_file: + output = filter_clean_ptid_do(input_ptr, nacc_packet_file, output_ptr) + return output + else: + skip_filter(input_ptr, output_ptr) + return def filter_clean_ptid_do(input_ptr, nacc_packet_file, output_ptr): @@ -61,9 +58,6 @@ def filter_clean_ptid_do(input_ptr, nacc_packet_file, output_ptr): output = csv.DictWriter(output_ptr, None) write_headers(redcap_packet_list, output) - followup_visit = re.compile("followup.*") - initial_visit = re.compile("initial.*") - # TODO: Deal with M Flag in Current_db.csv. completed_subjs = defaultdict(list) @@ -75,22 +69,22 @@ def filter_clean_ptid_do(input_ptr, nacc_packet_file, output_ptr): completed_subjs[nacc_subj_id].append(nacc_visit_num) for redcap_packet in redcap_packet_list: - # if they exist in completed subjs (same id and visit num) + # if they exist in completed subjs (same id and visit num) # then remove them. rc_ptid = redcap_packet['ptid'] - rc_event = redcap_packet['redcap_event_name'] - if not (initial_visit.match(rc_event) or followup_visit.match(rc_event)): - print('Eliminated ptid : ' + rc_ptid + " Event Name : " + redcap_packet['redcap_event_name'] + " NOT INIT OR FOLLOWUP", file=sys.stderr) - continue if redcap_packet['visitnum']: rc_visit_num = int_or_string(redcap_packet['visitnum'], -1) else: - print('Eliminated ptid : ' + rc_ptid + " Event Name : " + redcap_packet['redcap_event_name'] + " MISSING VISIT NUM", file=sys.stderr) + print('Eliminated ptid : ' + rc_ptid + " Event Name : " + + redcap_packet['redcap_event_name'] + " MISSING VISIT NUM", + file=sys.stderr) continue if rc_ptid in completed_subjs: if rc_visit_num in completed_subjs[rc_ptid]: - print('Eliminated ptid : ' + rc_ptid + " Event Name : " + redcap_packet['redcap_event_name'] + " IN CURRENT", file=sys.stderr) + print('Eliminated ptid : ' + rc_ptid + " Event Name : " + + redcap_packet['redcap_event_name'] + " IN CURRENT", + file=sys.stderr) continue output.writerow(redcap_packet) return output @@ -106,6 +100,14 @@ def write_headers(reader, output): @validate def filter_replace_drug_id(input_ptr, filter_meta, output_ptr): + if filter_meta: + filter_replace_drug_id_do(input_ptr, output_ptr) + else: + skip_filter(input_ptr, output_ptr) + return + + +def filter_replace_drug_id_do(input_ptr, output_ptr): reader = csv.DictReader(input_ptr) output = csv.DictWriter(output_ptr, None) write_headers(reader, output) @@ -121,28 +123,37 @@ def filter_replace_drug_id(input_ptr, filter_meta, output_ptr): record[col_name] = 'd' + col_value[1:] count += 1 output.writerow(record) - print('Processed ptid : ' + record['ptid'] + ' Updated ' + str(count) + ' fields.', file=sys.stderr) + print('Processed ptid : ' + record['ptid'] + ' Updated ' + str(count) + + ' fields.', file=sys.stderr) return @validate -def filter_fix_headers(input_file, header_mapping, output_file): - return filter_fix_headers_do(input_file, header_mapping, output_file) - - -def filter_fix_headers_do(input_ptr, header_dictionary, output_ptr): - csv_reader = csv.reader(input_ptr) - csv_writer = csv.writer(output_ptr) - headers = next(csv_reader) - fixed_headers = list(map(lambda header: header_dictionary.get(header,header), headers)) - csv_writer.writerow(fixed_headers) - csv_writer.writerows([row for row in csv_reader]) +def filter_fix_headers(input_file, header_mapping, output_file): + if header_mapping: + return filter_fix_headers_do(input_file, header_mapping, output_file) + else: + skip_filter(input_file, output_file) + return + + +def filter_fix_headers_do(input_ptr, header_dictionary, output_ptr): + csv_reader = csv.reader(input_ptr) + csv_writer = csv.writer(output_ptr) + headers = next(csv_reader) + fixed_headers = list(map(lambda header: header_dictionary.get(header, header), headers)) + csv_writer.writerow(fixed_headers) + csv_writer.writerows([row for row in csv_reader]) return @validate def filter_remove_ptid(input_ptr, filter_config, output_ptr): - return filter_remove_ptid_do(input_ptr, filter_config, output_ptr) + if filter_config: + return filter_remove_ptid_do(input_ptr, filter_config, output_ptr) + else: + skip_filter(input_ptr, output_ptr) + return def filter_remove_ptid_do(input_ptr, filter_diction, output_ptr): @@ -156,9 +167,9 @@ def filter_remove_ptid_do(input_ptr, filter_diction, output_ptr): prog = re.compile(regex_exp) if record['ptid'] in bad_ptids_list: print('Removed ptid : ' + record['ptid'], file=sys.stderr) - elif record['ptid'] in good_ptids_list: + elif record['ptid'] in good_ptids_list: output.writerow(record) - elif prog.match(record['ptid'])!=None: + elif prog.match(record['ptid']) != None: output.writerow(record) else: print('Removed ptid : ' + record['ptid'], file=sys.stderr) @@ -166,6 +177,14 @@ def filter_remove_ptid_do(input_ptr, filter_diction, output_ptr): @validate def filter_eliminate_empty_date(input_ptr, filter_meta, output_ptr): + if filter_meta: + filter_eliminate_empty_date_do(input_ptr, output_ptr) + else: + skip_filter(input_ptr, output_ptr) + return + + +def filter_eliminate_empty_date_do(input_ptr, output_ptr): reader = csv.DictReader(input_ptr) output = csv.DictWriter(output_ptr, None) write_headers(reader, output) @@ -177,10 +196,12 @@ def filter_eliminate_empty_date(input_ptr, filter_meta, output_ptr): def _invalid_date(record): - return (record['visitmo']=='' or record['visitday']=='' or record['visityr']=='') + return (record['visitmo'] == '' or record['visitday'] == '' or + record['visityr'] == '') -def fill_value_of_fields(input_ptr, output_ptr, keysDict, blankCheck=False, defaultCheck=False): +def fill_value_of_fields(input_ptr, output_ptr, keysDict, blankCheck=False, + defaultCheck=False): reader = csv.DictReader(input_ptr) output = csv.DictWriter(output_ptr, None) write_headers(reader, output) @@ -189,21 +210,40 @@ def fill_value_of_fields(input_ptr, output_ptr, keysDict, blankCheck=False, defa for col_name in list(keysDict.keys()): if col_name in list(record.keys()): if blankCheck and (len(record[col_name]) > 0) and (record[col_name] != keysDict[col_name]): - record[col_name] = keysDict[col_name] - count += 1 + record[col_name] = keysDict[col_name] + count += 1 elif defaultCheck and len(record[col_name]) == 0: - record[col_name] = keysDict[col_name] - count += 1 + record[col_name] = keysDict[col_name] + count += 1 + output.writerow(record) + print('Processed ptid : ' + record['ptid'] + ' Updated ' + str(count) + + ' fields.', file=sys.stderr) + return + + +def skip_filter(input_ptr, output_ptr): + reader = csv.DictReader(input_ptr) + output = csv.DictWriter(output_ptr, None) + write_headers(reader, output) + for record in reader: output.writerow(record) - print('Processed ptid : ' + record['ptid'] + ' Updated ' + str(count) + ' fields.', file=sys.stderr) + print('Filter skipped.', file=sys.stderr) return @validate def filter_fix_visitdate(input_ptr, filter_meta, output_ptr): + if filter_meta: + filter_fix_visitdate_do(input_ptr, output_ptr) + else: + skip_filter(input_ptr, output_ptr) + return + + +def filter_fix_visitdate_do(input_ptr, output_ptr): reader = csv.DictReader(input_ptr) output = csv.DictWriter(output_ptr, None) - write_headers(reader,output) + write_headers(reader, output) for record in reader: if record['visitnum']: record['visitnum'] = int_or_string(record['visitnum']) @@ -214,12 +254,40 @@ def filter_fix_visitdate(input_ptr, filter_meta, output_ptr): @validate def filter_fill_default(input_ptr, filter_meta, output_ptr): - fill_value_of_fields(input_ptr, output_ptr, fill_default_values, defaultCheck=True) + if filter_meta: + fill_value_of_fields(input_ptr, output_ptr, fill_default_values(filter_meta), defaultCheck=True) + else: + skip_filter(input_ptr, output_ptr) + return + + +def fill_default_values(config): + # This dictionary contains the keys used in the config + try: + adcid = config['adcid'] + except KeyError: + adcid = '' + fill_default_values = {'nogds': 0, + 'adcid': adcid, + 'formver': 3} + return fill_default_values @validate def filter_update_field(input_ptr, filter_meta, output_ptr): - fill_value_of_fields(input_ptr, output_ptr, fill_non_blank_values, blankCheck=True) + if filter_meta: + fill_value_of_fields(input_ptr, output_ptr, fill_non_blank_values(filter_meta), blankCheck=True) + else: + skip_filter(input_ptr, output_ptr) + + +def fill_non_blank_values(config): + try: + adcid = config['adcid'] + except KeyError: + adcid = '' + fill_non_blank_values = {'adcid': adcid} + return fill_non_blank_values def filter_extract_ptid(input_ptr, Ptid, visit_num, visit_type, output_ptr): @@ -261,10 +329,10 @@ def filter_csv_ptid(Ptid, record): return record -def load_special_case_ptid(case_name,filter_config): +def load_special_case_ptid(case_name, filter_config): try: ptids_string = filter_config[case_name] li = list(ptids_string.split(",")) return li except KeyError: - return [] \ No newline at end of file + return [] diff --git a/nacc/uds3/fvp/builder.py b/nacc/uds3/fvp/builder.py index 8d77240..1530895 100644 --- a/nacc/uds3/fvp/builder.py +++ b/nacc/uds3/fvp/builder.py @@ -4,32 +4,57 @@ # Use of this source code is governed by the license found in the LICENSE file. ############################################################################### +import sys + from nacc.uds3.fvp import forms as fvp_forms from nacc.uds3 import clsform from nacc.uds3 import packet as fvp_packet -def build_uds3_fvp_form(record): + +def build_uds3_fvp_form(record, err=sys.stderr): """ Converts REDCap CSV data into a packet (list of FVP Form objects) """ packet = fvp_packet.Packet() # Set up the forms add_z1_or_z1x(record, packet) add_a1(record, packet) - if record['fu_a2_sub'] == '1' or record['fu_a2sub'] == '1': - add_a2(record, packet) - if record['fu_a3_sub'] == '1' or record['fu_a3sub'] == '1': - add_a3(record, packet) - if record['fu_a4_sub'] == '1' or record['fu_a4sub'] == '1': - add_a4(record, packet) - if record['fu_b1_sub'] == '1' or record['fu_b1sub'] == '1': - add_b1(record, packet) - add_b4(record, packet) - if record['fu_b5_sub'] == '1' or record['fu_b5sub'] == '1': - add_b5(record, packet) - if record['fu_b6_sub'] == '1' or record['fu_b6sub'] == '1': - add_b6(record, packet) - if record['fu_b7_sub'] == '1' or record['fu_b7sub'] == '1': - add_b7(record, packet) + if record['fvp_z1x_complete'] in ['1', '2']: + if record['fu_a2sub'] == '1': + add_a2(record, packet) + if record['fu_a3sub'] == '1': + add_a3(record, packet) + if record['fu_a4sub'] == '1': + add_a4(record, packet) + if record['fu_b1sub'] == '1': + add_b1(record, packet) + add_b4(record, packet) + if record['fu_b5sub'] == '1': + add_b5(record, packet) + if record['fu_b6sub'] == '1': + add_b6(record, packet) + if record['fu_b7sub'] == '1': + add_b7(record, packet) + elif record['fvp_z1_complete'] in ['1', '2']: + if record['fu_a2_sub'] == '1': + add_a2(record, packet) + if record['fu_a3_sub'] == '1': + add_a3(record, packet) + if record['fu_a4_sub'] == '1': + add_a4(record, packet) + if record['fu_b1_sub'] == '1': + add_b1(record, packet) + add_b4(record, packet) + if record['fu_b5_sub'] == '1': + add_b5(record, packet) + if record['fu_b6_sub'] == '1': + add_b6(record, packet) + if record['fu_b7_sub'] == '1': + add_b7(record, packet) + else: + print("ptid " + str(record['ptid']) + + ": No Z1X or Z1 form found.", file=err) + add_b4(record, packet) + add_b8(record, packet) add_b9(record, packet) add_c1s_or_c2(record, packet) @@ -100,35 +125,38 @@ def add_z1_or_z1x(record, packet): setattr(z1x, key, record[value]) z1x_filled_fields += 1 - z1 = fvp_forms.FormZ1() - z1_filled_fields = 0 - z1_field_mapping = { - 'A2SUB': 'fu_a2_sub', - 'A2NOT': 'fu_a2_not', - 'A2COMM': 'fu_a2_comm', - 'A3SUB': 'fu_a3_sub', - 'A3NOT': 'fu_a3_not', - 'A3COMM': 'fu_a3_comm', - 'A4SUB': 'fu_a4_sub', - 'A4NOT': 'fu_a4_not', - 'A4COMM': 'fu_a4_comm', - 'B1SUB': 'fu_b1_sub', - 'B1NOT': 'fu_b1_not', - 'B1COMM': 'fu_b1_comm', - 'B5SUB': 'fu_b5_sub', - 'B5NOT': 'fu_b5_not', - 'B5COMM': 'fu_b5_comm', - 'B6SUB': 'fu_b6_sub', - 'B6NOT': 'fu_b6_not', - 'B6COMM': 'fu_b6_comm', - 'B7SUB': 'fu_b7_sub', - 'B7NOT': 'fu_b7_not', - 'B7COMM': 'fu_b7_comm' - } - for key, value in z1_field_mapping.items(): - if record[value].strip(): - setattr(z1, key, record[value]) - z1_filled_fields += 1 + try: + z1 = fvp_forms.FormZ1() + z1_filled_fields = 0 + z1_field_mapping = { + 'A2SUB': 'fu_a2_sub', + 'A2NOT': 'fu_a2_not', + 'A2COMM': 'fu_a2_comm', + 'A3SUB': 'fu_a3_sub', + 'A3NOT': 'fu_a3_not', + 'A3COMM': 'fu_a3_comm', + 'A4SUB': 'fu_a4_sub', + 'A4NOT': 'fu_a4_not', + 'A4COMM': 'fu_a4_comm', + 'B1SUB': 'fu_b1_sub', + 'B1NOT': 'fu_b1_not', + 'B1COMM': 'fu_b1_comm', + 'B5SUB': 'fu_b5_sub', + 'B5NOT': 'fu_b5_not', + 'B5COMM': 'fu_b5_comm', + 'B6SUB': 'fu_b6_sub', + 'B6NOT': 'fu_b6_not', + 'B6COMM': 'fu_b6_comm', + 'B7SUB': 'fu_b7_sub', + 'B7NOT': 'fu_b7_not', + 'B7COMM': 'fu_b7_comm' + } + for key, value in z1_field_mapping.items(): + if record[value].strip(): + setattr(z1, key, record[value]) + z1_filled_fields += 1 + except KeyError: + z1_filled_fields = 0 # Prefer Z1X to Z1 # If both are blank, use date (Z1X after 2018/04/02) @@ -136,12 +164,10 @@ def add_z1_or_z1x(record, packet): packet.insert(0, z1x) elif z1_filled_fields > 0: packet.insert(0, z1) - elif (int(record['visityr'])>2018) or (int(record['visityr'])==2018 and \ - int(record['visitmo'])>4) or (int(record['visityr'])==2018 and \ - int(record['visitmo'])==4 and int(record['visitday'])>=2): + elif (int(record['visityr']) > 2018) or (int(record['visityr']) == 2018 and + int(record['visitmo']) > 4) or (int(record['visityr']) == 2018 and + int(record['visitmo']) == 4 and int(record['visitday']) >= 2): packet.insert(0, z1x) - else: - packet.insert(0, z1) def add_a1(record, packet): @@ -619,6 +645,8 @@ def add_b8(record, packet): b8.ALIENLMR = record['fu_alienlmr'] b8.DYSTONL = record['fu_dystonl'] b8.DYSTONR = record['fu_dystonr'] + b8.MYOCLLT = record['fu_myocllt'] + b8.MYOCLRT = record['fu_myoclrt'] b8.ALSFIND = record['fu_alsfind'] b8.GAITNPH = record['fu_gaitnph'] b8.OTHNEUR = record['fu_othneur'] @@ -768,61 +796,64 @@ def add_c1s_or_c2(record, packet): setattr(c2, key, record[value]) c2_filled_fields += 1 - c1s = fvp_forms.FormC1S() - c1s_filled_fields = 0 - c1s_field_mapping = { - 'MMSECOMP': 'fu_mmsecomp', - 'MMSEREAS': 'fu_mmsereas', - 'MMSELOC': 'fu_mmseloc', - 'MMSELAN': 'fu_mmselan', - 'MMSELANX': 'fu_mmselanx', - 'MMSEVIS': 'fu_mmsevis', - 'MMSEHEAR': 'fu_mmsehear', - 'MMSEORDA': 'fu_mmseorda', - 'MMSEORLO': 'fu_mmseorlo', - 'PENTAGON': 'fu_pentagon', - 'MMSE': 'fu_mmse', - 'NPSYCLOC': 'fu_npsycloc', - 'NPSYLAN': 'fu_npsylan', - 'NPSYLANX': 'fu_npsylanx', - 'LOGIMO': 'fu_logimo', - 'LOGIDAY': 'fu_logiday', - 'LOGIYR': 'fu_logiyr', - 'LOGIPREV': 'fu_logiprev', - 'LOGIMEM': 'fu_logimem', - 'UDSBENTC': 'fu_udsbentc_c1', - 'DIGIF': 'fu_digif', - 'DIGIFLEN': 'fu_digiflen', - 'DIGIB': 'fu_digib', - 'DIGIBLEN': 'fu_digiblen', - 'ANIMALS': 'fu_animals', - 'VEG': 'fu_veg', - 'TRAILA': 'fu_traila', - 'TRAILARR': 'fu_trailarr', - 'TRAILALI': 'fu_trailali', - 'TRAILB': 'fu_trailb', - 'TRAILBRR': 'fu_trailbrr', - 'TRAILBLI': 'fu_trailbli', - 'MEMUNITS': 'fu_memunits', - 'MEMTIME': 'fu_memtime', - 'UDSBENTD': 'fu_udsbentd_c1', - 'UDSBENRS': 'fu_udsbenrs_c1', - 'BOSTON': 'fu_boston', - 'UDSVERFC': 'fu_udsverfc_c1', - 'UDSVERFN': 'fu_udsverfn_c1', - 'UDSVERNF': 'fu_udsvernf_c1', - 'UDSVERLC': 'fu_udsverlc_c1', - 'UDSVERLR': 'fu_udsverlr_c1', - 'UDSVERLN': 'fu_udsverln_c1', - 'UDSVERTN': 'fu_udsvertn_c1', - 'UDSVERTE': 'fu_udsverte_c1', - 'UDSVERTI': 'fu_udsverti_c1', - 'COGSTAT': 'fu_cogstat' - } - for key, value in c1s_field_mapping.items(): - if record[value].strip(): - setattr(c1s, key, record[value]) - c1s_filled_fields += 1 + try: + c1s = fvp_forms.FormC1S() + c1s_filled_fields = 0 + c1s_field_mapping = { + 'MMSECOMP': 'fu_mmsecomp', + 'MMSEREAS': 'fu_mmsereas', + 'MMSELOC': 'fu_mmseloc', + 'MMSELAN': 'fu_mmselan', + 'MMSELANX': 'fu_mmselanx', + 'MMSEVIS': 'fu_mmsevis', + 'MMSEHEAR': 'fu_mmsehear', + 'MMSEORDA': 'fu_mmseorda', + 'MMSEORLO': 'fu_mmseorlo', + 'PENTAGON': 'fu_pentagon', + 'MMSE': 'fu_mmse', + 'NPSYCLOC': 'fu_npsycloc', + 'NPSYLAN': 'fu_npsylan', + 'NPSYLANX': 'fu_npsylanx', + 'LOGIMO': 'fu_logimo', + 'LOGIDAY': 'fu_logiday', + 'LOGIYR': 'fu_logiyr', + 'LOGIPREV': 'fu_logiprev', + 'LOGIMEM': 'fu_logimem', + 'UDSBENTC': 'fu_udsbentc_c1', + 'DIGIF': 'fu_digif', + 'DIGIFLEN': 'fu_digiflen', + 'DIGIB': 'fu_digib', + 'DIGIBLEN': 'fu_digiblen', + 'ANIMALS': 'fu_animals', + 'VEG': 'fu_veg', + 'TRAILA': 'fu_traila', + 'TRAILARR': 'fu_trailarr', + 'TRAILALI': 'fu_trailali', + 'TRAILB': 'fu_trailb', + 'TRAILBRR': 'fu_trailbrr', + 'TRAILBLI': 'fu_trailbli', + 'MEMUNITS': 'fu_memunits', + 'MEMTIME': 'fu_memtime', + 'UDSBENTD': 'fu_udsbentd_c1', + 'UDSBENRS': 'fu_udsbenrs_c1', + 'BOSTON': 'fu_boston', + 'UDSVERFC': 'fu_udsverfc_c1', + 'UDSVERFN': 'fu_udsverfn_c1', + 'UDSVERNF': 'fu_udsvernf_c1', + 'UDSVERLC': 'fu_udsverlc_c1', + 'UDSVERLR': 'fu_udsverlr_c1', + 'UDSVERLN': 'fu_udsverln_c1', + 'UDSVERTN': 'fu_udsvertn_c1', + 'UDSVERTE': 'fu_udsverte_c1', + 'UDSVERTI': 'fu_udsverti_c1', + 'COGSTAT': 'fu_cogstat' + } + for key, value in c1s_field_mapping.items(): + if record[value].strip(): + setattr(c1s, key, record[value]) + c1s_filled_fields += 1 + except KeyError: + c1s_filled_fields = 0 # Prefer C2 to C1S # If both are blank, use date (C2 after 2017/10/23) @@ -830,12 +861,10 @@ def add_c1s_or_c2(record, packet): packet.insert(0, c2) elif c1s_filled_fields > 0: packet.insert(0, c1s) - elif (int(record['visityr'])>2017) or (int(record['visityr'])==2017 and \ - int(record['visitmo'])>10) or (int(record['visityr'])==2017 and \ - int(record['visitmo'])==10 and int(record['visitday'])>=23): + elif (int(record['visityr']) > 2017) or (int(record['visityr']) == 2017 and + int(record['visitmo']) > 10) or (int(record['visityr']) == 2017 and + int(record['visitmo']) == 10 and int(record['visitday']) >= 23): packet.insert(0, c2) - else: - packet.insert(0, c1s) def add_d1(record, packet): diff --git a/nacc/uds3/fvp/forms.py b/nacc/uds3/fvp/forms.py index 090c236..25d4765 100644 --- a/nacc/uds3/fvp/forms.py +++ b/nacc/uds3/fvp/forms.py @@ -593,7 +593,7 @@ def __init__(self): self.fields['LOGIMO'] = nacc.uds3.Field(name='LOGIMO', typename='Num', position=(196, 197), length=2, inclusive_range=(0, 12), allowable_values=['88'], blanks=[]) self.fields['LOGIDAY'] = nacc.uds3.Field(name='LOGIDAY', typename='Num', position=(199, 200), length=2, inclusive_range=(1, 31), allowable_values=['88'], blanks=[]) self.fields['LOGIYR'] = nacc.uds3.Field(name='LOGIYR', typename='Num', position=(202, 205), length=4, inclusive_range=(2005, CURRENT_YEAR), allowable_values=['8888'], blanks=[]) - self.fields['LOGIPREV'] = nacc.uds3.Field(name='LOGIPREV', typename='Num', position=(207, 208), length=2, inclusive_range=(0, 25), allowable_values=['88'], blanks=[]) + self.fields['LOGIPREV'] = nacc.uds3.Field(name='LOGIPREV', typename='Num', position=(207, 208), length=2, inclusive_range=(0, 25), allowable_values=['88', '99'], blanks=[]) self.fields['LOGIMEM'] = nacc.uds3.Field(name='LOGIMEM', typename='Num', position=(210, 211), length=2, inclusive_range=(0, 25), allowable_values=['96', '95', '98', '97'], blanks=[]) self.fields['UDSBENTC'] = nacc.uds3.Field(name='UDSBENTC', typename='Num', position=(213, 214), length=2, inclusive_range=(0, 17), allowable_values=['96', '95', '98', '97'], blanks=[]) self.fields['DIGIF'] = nacc.uds3.Field(name='DIGIF', typename='Num', position=(216, 217), length=2, inclusive_range=(0, 12), allowable_values=['96', '95', '98', '97'], blanks=[]) diff --git a/nacc/uds3/ivp/builder.py b/nacc/uds3/ivp/builder.py index 8285bbc..06bccb0 100644 --- a/nacc/uds3/ivp/builder.py +++ b/nacc/uds3/ivp/builder.py @@ -4,34 +4,60 @@ # Use of this source code is governed by the license found in the LICENSE file. ############################################################################### +import sys + from nacc.uds3 import clsform from nacc.uds3 import packet as ivp_packet from nacc.uds3.ivp import forms as ivp_forms -def build_uds3_ivp_form(record): +def build_uds3_ivp_form(record, err=sys.stderr): """ Converts REDCap CSV data into a packet (list of IVP Form objects) """ packet = ivp_packet.Packet() # Set up the forms add_z1_or_z1x(record, packet) add_a1(record, packet) - if record['a2_sub'] == '1' or record['a2sub'] == '1': - add_a2(record, packet) - if record['a3_sub'] == '1' or record['a3sub'] == '1': - add_a3(record, packet) - if record['a4_sub'] == '1' or record['a4sub'] == '1': - add_a4(record, packet) - add_a5(record, packet) - if record['b1_sub'] == '1' or record['b1sub'] == '1': - add_b1(record, packet) - add_b4(record, packet) - if record['b5_sub'] == '1' or record['b5sub'] == '1': - add_b5(record, packet) - if record['b6_sub'] == '1' or record['b6sub'] == '1': - add_b6(record, packet) - if record['b7_sub'] == '1' or record['b7sub'] == '1': - add_b7(record, packet) + if record['ivp_z1x_complete'] in ['1', '2']: + if record['a2sub'] == '1': + add_a2(record, packet) + if record['a3sub'] == '1': + add_a3(record, packet) + if record['a4sub'] == '1': + add_a4(record, packet) + add_a5(record, packet) + if record['b1sub'] == '1': + add_b1(record, packet) + add_b4(record, packet) + if record['b5sub'] == '1': + add_b5(record, packet) + if record['b6sub'] == '1': + add_b6(record, packet) + if record['b7sub'] == '1': + add_b7(record, packet) + elif record['ivp_z1_complete'] in ['1', '2']: + if record['a2_sub'] == '1': + add_a2(record, packet) + if record['a3_sub'] == '1': + add_a3(record, packet) + if record['a4_sub'] == '1': + add_a4(record, packet) + add_a5(record, packet) + if record['b1_sub'] == '1': + add_b1(record, packet) + add_b4(record, packet) + if record['b5_sub'] == '1': + add_b5(record, packet) + if record['b6_sub'] == '1': + add_b6(record, packet) + if record['b7_sub'] == '1': + add_b7(record, packet) + else: + print("ptid " + str(record['ptid']) + + ": No Z1X or Z1 form found.", file=err) + add_a5(record, packet) + add_b4(record, packet) + add_b8(record, packet) add_b9(record, packet) add_c1s_or_c2(record, packet) @@ -106,38 +132,40 @@ def add_z1_or_z1x(record, packet): except KeyError: pass - z1 = ivp_forms.FormZ1() - z1_filled_fields = 0 - z1_field_mapping = { - 'A2SUB': 'a2_sub', - 'A2NOT': 'a2_not', - 'A2COMM': 'a2_comm', - 'A3SUB': 'a3_sub', - 'A3NOT': 'a3_not', - 'A3COMM': 'a3_comm', - 'A4SUB': 'a4_sub', - 'A4NOT': 'a4_not', - 'A4COMM': 'a4_comm', - 'B1SUB': 'b1_sub', - 'B1NOT': 'b1_not', - 'B1COMM': 'b1_comm', - 'B5SUB': 'b5_sub', - 'B5NOT': 'b5_not', - 'B5COMM': 'b5_comm', - 'B6SUB': 'b6_sub', - 'B6NOT': 'b6_not', - 'B6COMM': 'b6_comm', - 'B7SUB': 'b7_sub', - 'B7NOT': 'b7_not', - 'B7COMM': 'b7_comm' - } - for key, value in z1_field_mapping.items(): - try: + # Check if Z1 form is present in REDCap project. If it is not present, + # do not map the fields and simply mark z1_filled_fields as 0. + try: + z1 = ivp_forms.FormZ1() + z1_filled_fields = 0 + z1_field_mapping = { + 'A2SUB': 'a2_sub', + 'A2NOT': 'a2_not', + 'A2COMM': 'a2_comm', + 'A3SUB': 'a3_sub', + 'A3NOT': 'a3_not', + 'A3COMM': 'a3_comm', + 'A4SUB': 'a4_sub', + 'A4NOT': 'a4_not', + 'A4COMM': 'a4_comm', + 'B1SUB': 'b1_sub', + 'B1NOT': 'b1_not', + 'B1COMM': 'b1_comm', + 'B5SUB': 'b5_sub', + 'B5NOT': 'b5_not', + 'B5COMM': 'b5_comm', + 'B6SUB': 'b6_sub', + 'B6NOT': 'b6_not', + 'B6COMM': 'b6_comm', + 'B7SUB': 'b7_sub', + 'B7NOT': 'b7_not', + 'B7COMM': 'b7_comm' + } + for key, value in z1_field_mapping.items(): if record[value].strip(): setattr(z1, key, record[value]) z1_filled_fields += 1 - except KeyError: - pass + except KeyError: + z1_filled_fields = 0 # Prefer Z1X to Z1 # If both are blank, use date (Z1X after 2018/04/02) @@ -145,12 +173,10 @@ def add_z1_or_z1x(record, packet): packet.insert(0, z1x) elif z1_filled_fields > 0: packet.insert(0, z1) - elif (int(record['visityr'])>2018) or (int(record['visityr'])==2018 and \ - int(record['visitmo'])>4) or (int(record['visityr'])==2018 and \ - int(record['visitmo'])==4 and int(record['visitday'])>=2): + elif (int(record['visityr']) > 2018) or (int(record['visityr']) == 2018 and + int(record['visitmo']) > 4) or (int(record['visityr']) == 2018 and + int(record['visitmo']) == 4 and int(record['visitday']) >= 2): packet.insert(0, z1x) - else: - packet.insert(0, z1) def add_a1(record, packet): @@ -877,49 +903,52 @@ def add_c1s_or_c2(record, packet): except KeyError: pass - c1s = ivp_forms.FormC1S() - c1s_filled_fields = 0 - c1s_field_mapping = { - 'MMSELOC': 'c1s_1a_mmseloc', - 'MMSELAN': 'c1s_1a1_mmselan', - 'MMSELANX': 'c1s_1a2_mmselanx', - 'MMSEORDA': 'c1s_1b1_mmseorda', - 'MMSEORLO': 'c1s_1b2_mmseorlo', - 'PENTAGON': 'c1s_1c_pentagon', - 'MMSE': 'c1s_1d_mmse', - 'NPSYCLOC': 'c1s_2_npsycloc', - 'NPSYLAN': 'c1s_2a_npsylan', - 'NPSYLANX': 'c1s_2a1_npsylanx', - 'LOGIMO': 'c1s_3amo_logimo', - 'LOGIDAY': 'c1s_3ady_logiday', - 'LOGIYR': 'c1s_3ayr_logiyr', - 'LOGIPREV': 'c1s_3a1_logiprev', - 'LOGIMEM': 'c1s_3b_logimem', - 'DIGIF': 'c1s_4a_digif', - 'DIGIFLEN': 'c1s_4b_digiflen', - 'DIGIB': 'c1s_5a_digib', - 'DIGIBLEN': 'c1s_5b_digiblen', - 'ANIMALS': 'c1s_6a_animals', - 'VEG': 'c1s_6b_veg', - 'TRAILA': 'c1s_7a_traila', - 'TRAILARR': 'c1s_7a1_trailarr', - 'TRAILALI': 'c1s_7a2_trailali', - 'TRAILB': 'c1s_7b_trailb', - 'TRAILBRR': 'c1s_7b1_trailbrr', - 'TRAILBLI': 'c1s_7b2_trailbli', - 'WAIS': 'c1s_8a_wais', - 'MEMUNITS': 'c1s_9a_memunits', - 'MEMTIME': 'c1s_9b_memtime', - 'BOSTON': 'c1s_10a_boston', - 'COGSTAT': 'c1s_11a_cogstat' - } - for key, value in c1s_field_mapping.items(): - try: - if record[value].strip(): - setattr(c1s, key, record[value]) - c1s_filled_fields += 1 - except KeyError: - pass + try: + c1s = ivp_forms.FormC1S() + c1s_filled_fields = 0 + c1s_field_mapping = { + 'MMSELOC': 'c1s_1a_mmseloc', + 'MMSELAN': 'c1s_1a1_mmselan', + 'MMSELANX': 'c1s_1a2_mmselanx', + 'MMSEORDA': 'c1s_1b1_mmseorda', + 'MMSEORLO': 'c1s_1b2_mmseorlo', + 'PENTAGON': 'c1s_1c_pentagon', + 'MMSE': 'c1s_1d_mmse', + 'NPSYCLOC': 'c1s_2_npsycloc', + 'NPSYLAN': 'c1s_2a_npsylan', + 'NPSYLANX': 'c1s_2a1_npsylanx', + 'LOGIMO': 'c1s_3amo_logimo', + 'LOGIDAY': 'c1s_3ady_logiday', + 'LOGIYR': 'c1s_3ayr_logiyr', + 'LOGIPREV': 'c1s_3a1_logiprev', + 'LOGIMEM': 'c1s_3b_logimem', + 'DIGIF': 'c1s_4a_digif', + 'DIGIFLEN': 'c1s_4b_digiflen', + 'DIGIB': 'c1s_5a_digib', + 'DIGIBLEN': 'c1s_5b_digiblen', + 'ANIMALS': 'c1s_6a_animals', + 'VEG': 'c1s_6b_veg', + 'TRAILA': 'c1s_7a_traila', + 'TRAILARR': 'c1s_7a1_trailarr', + 'TRAILALI': 'c1s_7a2_trailali', + 'TRAILB': 'c1s_7b_trailb', + 'TRAILBRR': 'c1s_7b1_trailbrr', + 'TRAILBLI': 'c1s_7b2_trailbli', + 'WAIS': 'c1s_8a_wais', + 'MEMUNITS': 'c1s_9a_memunits', + 'MEMTIME': 'c1s_9b_memtime', + 'BOSTON': 'c1s_10a_boston', + 'COGSTAT': 'c1s_11a_cogstat' + } + for key, value in c1s_field_mapping.items(): + try: + if record[value].strip(): + setattr(c1s, key, record[value]) + c1s_filled_fields += 1 + except KeyError: + pass + except KeyError: + c1s_filled_fields = 0 # Prefer C2 to C1S # If both are blank, use date (C2 after 2017/10/23) @@ -927,12 +956,10 @@ def add_c1s_or_c2(record, packet): packet.insert(0, c2) elif c1s_filled_fields > 0: packet.insert(0, c1s) - elif (int(record['visityr'])>2017) or (int(record['visityr'])==2017 and \ - int(record['visitmo'])>10) or (int(record['visityr'])==2017 and \ - int(record['visitmo'])==10 and int(record['visitday'])>=23): + elif (int(record['visityr']) > 2017) or (int(record['visityr']) == 2017 and + int(record['visitmo']) > 10) or (int(record['visityr']) == 2017 and + int(record['visitmo']) == 10 and int(record['visitday']) >= 23): packet.insert(0, c2) - else: - packet.insert(0, c1s) def add_d1(record, packet): @@ -1108,124 +1135,6 @@ def add_d2(record, packet): d2.OTHCONDX = record['othcondx'] packet.append(d2) - add_z1_or_z1x(record, packet) - update_header(record, packet) - - return packet - - -def add_c1s_or_c2(record, packet): - # Among C1S and C2 forms, one must be filled, one must be empty. After 2017/10/23, must be C2 - if (int(record['visityr'])>2017) or (int(record['visityr'])==2017 and int(record['visitmo'])>10) or \ - (int(record['visityr'])==2017 and int(record['visitmo'])==10 and int(record['visitday'])>=23): - c2 = ivp_forms.FormC2() - c2.MOCACOMP = record['mocacomp'] - c2.MOCAREAS = record['mocareas'] - c2.MOCALOC = record['mocaloc'] - c2.MOCALAN = record['mocalan'] - c2.MOCALANX = record['mocalanx'] - c2.MOCAVIS = record['mocavis'] - c2.MOCAHEAR = record['mocahear'] - c2.MOCATOTS = record['mocatots'] - c2.MOCATRAI = record['mocatrai'] - c2.MOCACUBE = record['mocacube'] - c2.MOCACLOC = record['mocacloc'] - c2.MOCACLON = record['mocaclon'] - c2.MOCACLOH = record['mocacloh'] - c2.MOCANAMI = record['mocanami'] - c2.MOCAREGI = record['mocaregi'] - c2.MOCADIGI = record['mocadigi'] - c2.MOCALETT = record['mocalett'] - c2.MOCASER7 = record['mocaser7'] - c2.MOCAREPE = record['mocarepe'] - c2.MOCAFLUE = record['mocaflue'] - c2.MOCAABST = record['mocaabst'] - c2.MOCARECN = record['mocarecn'] - c2.MOCARECC = record['mocarecc'] - c2.MOCARECR = record['mocarecr'] - c2.MOCAORDT = record['mocaordt'] - c2.MOCAORMO = record['mocaormo'] - c2.MOCAORYR = record['mocaoryr'] - c2.MOCAORDY = record['mocaordy'] - c2.MOCAORPL = record['mocaorpl'] - c2.MOCAORCT = record['mocaorct'] - c2.NPSYCLOC = record['npsycloc_c2'] - c2.NPSYLAN = record['npsylan_c2'] - c2.NPSYLANX = record['npsylanx_c2'] - c2.CRAFTVRS = record['craftvrs'] - c2.CRAFTURS = record['crafturs'] - c2.UDSBENTC = record['udsbentc'] - c2.DIGFORCT = record['digforct'] - c2.DIGFORSL = record['digforsl'] - c2.DIGBACCT = record['digbacct'] - c2.DIGBACLS = record['digbacls'] - c2.ANIMALS = record['animals_c2'] - c2.VEG = record['veg_c2'] - c2.TRAILA = record['traila_c2'] - c2.TRAILARR = record['trailarr_c2'] - c2.TRAILALI = record['trailali_c2'] - c2.TRAILB = record['trailb_c2'] - c2.TRAILBRR = record['trailbrr_c2'] - c2.TRAILBLI = record['trailbli_c2'] - c2.CRAFTDVR = record['craftdvr'] - c2.CRAFTDRE = record['craftdre'] - c2.CRAFTDTI = record['craftdti'] - c2.CRAFTCUE = record['craftcue'] - c2.UDSBENTD = record['udsbentd'] - c2.UDSBENRS = record['udsbenrs'] - c2.MINTTOTS = record['minttots'] - c2.MINTTOTW = record['minttotw'] - c2.MINTSCNG = record['mintscng'] - c2.MINTSCNC = record['mintscnc'] - c2.MINTPCNG = record['mintpcng'] - c2.MINTPCNC = record['mintpcnc'] - c2.UDSVERFC = record['udsverfc'] - c2.UDSVERFN = record['udsverfn'] - c2.UDSVERNF = record['udsvernf'] - c2.UDSVERLC = record['udsverlc'] - c2.UDSVERLR = record['udsverlr'] - c2.UDSVERLN = record['udsverln'] - c2.UDSVERTN = record['udsvertn'] - c2.UDSVERTE = record['udsverte'] - c2.UDSVERTI = record['udsverti'] - c2.COGSTAT = record['cogstat_c2'] - packet.append(c2) - else: - c1s = ivp_forms.FormC1S() - c1s.MMSELOC = record['c1s_1a_mmseloc'] #check for blank - c1s.MMSELAN = record['c1s_1a1_mmselan'] - c1s.MMSELANX = record['c1s_1a2_mmselanx'] - c1s.MMSEORDA = record['c1s_1b1_mmseorda'] - c1s.MMSEORLO = record['c1s_1b2_mmseorlo'] - c1s.PENTAGON = record['c1s_1c_pentagon'] - c1s.MMSE = record['c1s_1d_mmse'] - c1s.NPSYCLOC = record['c1s_2_npsycloc'] - c1s.NPSYLAN = record['c1s_2a_npsylan'] - c1s.NPSYLANX = record['c1s_2a1_npsylanx'] - c1s.LOGIMO = record['c1s_3amo_logimo'] - c1s.LOGIDAY = record['c1s_3ady_logiday'] - c1s.LOGIYR = record['c1s_3ayr_logiyr'] - c1s.LOGIPREV = record['c1s_3a1_logiprev'] - c1s.LOGIMEM = record['c1s_3b_logimem'] - c1s.DIGIF = record['c1s_4a_digif'] - c1s.DIGIFLEN = record['c1s_4b_digiflen'] - c1s.DIGIB = record['c1s_5a_digib'] - c1s.DIGIBLEN = record['c1s_5b_digiblen'] - c1s.ANIMALS = record['c1s_6a_animals'] - c1s.VEG = record['c1s_6b_veg'] - c1s.TRAILA = record['c1s_7a_traila'] - c1s.TRAILARR = record['c1s_7a1_trailarr'] - c1s.TRAILALI = record['c1s_7a2_trailali'] - c1s.TRAILB = record['c1s_7b_trailb'] - c1s.TRAILBRR = record['c1s_7b1_trailbrr'] - c1s.TRAILBLI = record['c1s_7b2_trailbli'] - c1s.WAIS = record['c1s_8a_wais'] - c1s.MEMUNITS = record['c1s_9a_memunits'] - c1s.MEMTIME = record['c1s_9b_memtime'] - c1s.BOSTON = record['c1s_10a_boston'] - c1s.COGSTAT = record['c1s_11a_cogstat'] #check for blank - packet.append(c1s) - def update_header(record, packet): for header in packet: diff --git a/nacc/uds3/ivp/forms.py b/nacc/uds3/ivp/forms.py index a299195..21b3c9c 100644 --- a/nacc/uds3/ivp/forms.py +++ b/nacc/uds3/ivp/forms.py @@ -680,7 +680,7 @@ def __init__(self): self.fields['LOGIMO'] = nacc.uds3.Field(name='LOGIMO', typename='Num', position=(187, 188), length=2, inclusive_range=(1, 12), allowable_values=['88'], blanks=[]) self.fields['LOGIDAY'] = nacc.uds3.Field(name='LOGIDAY', typename='Num', position=(190, 191), length=2, inclusive_range=(1, 31), allowable_values=['88'], blanks=[]) self.fields['LOGIYR'] = nacc.uds3.Field(name='LOGIYR', typename='Num', position=(193, 196), length=4, inclusive_range=(CURRENT_YEAR-1, CURRENT_YEAR), allowable_values=['88', '8888'], blanks=[]) - self.fields['LOGIPREV'] = nacc.uds3.Field(name='LOGIPREV', typename='Num', position=(198, 199), length=2, inclusive_range=(0, 25), allowable_values=['88'], blanks=[]) + self.fields['LOGIPREV'] = nacc.uds3.Field(name='LOGIPREV', typename='Num', position=(198, 199), length=2, inclusive_range=(0, 25), allowable_values=['88', '99'], blanks=[]) self.fields['LOGIMEM'] = nacc.uds3.Field(name='LOGIMEM', typename='Num', position=(201, 202), length=2, inclusive_range=(0, 25), allowable_values=['95', '96', '97', '98'], blanks=[]) self.fields['DIGIF'] = nacc.uds3.Field(name='DIGIF', typename='Num', position=(204, 205), length=2, inclusive_range=(0, 12), allowable_values=['95', '96', '97', '98'], blanks=[]) self.fields['DIGIFLEN'] = nacc.uds3.Field(name='DIGIFLEN', typename='Num', position=(207, 208), length=2, inclusive_range=(0, 8), allowable_values=['95', '96', '97', '98'], blanks=[]) diff --git a/nacc/uds3/m/builder.py b/nacc/uds3/m/builder.py index 812d092..4426e9d 100644 --- a/nacc/uds3/m/builder.py +++ b/nacc/uds3/m/builder.py @@ -1,23 +1,22 @@ ############################################################################### -# Copyright 2015-2016 University of Florida. All rights reserved. +# Copyright 2015-2020 University of Florida. All rights reserved. # This file is part of UF CTS-IT's NACCulator project. # Use of this source code is governed by the license found in the LICENSE file. ############################################################################### -from nacc.uds3 import blanks from nacc.uds3.m import forms as m_form from nacc.uds3 import packet as m_packet -import sys import re + def build_uds3_m_form(record): - + """ Converts REDCap CSV data into a packet (list of M Form objects) """ packet = m_packet.Packet() m = m_form.FormM() - m.CHANGEMO = parse_date(record['m1_1'],'M') - m.CHANGEDY = parse_date(record['m1_1'],'D') - m.CHANGEYR = parse_date(record['m1_1'],'Y') + m.CHANGEMO = parse_date(record['m1_1'], 'M') + m.CHANGEDY = parse_date(record['m1_1'], 'D') + m.CHANGEYR = parse_date(record['m1_1'], 'Y') m.PROTOCOL = record['m1_2a'] m.ACONSENT = record['m1_2a1'] m.RECOGIM = record['m1_2b___1'] @@ -25,77 +24,80 @@ def build_uds3_m_form(record): m.REREFUSE = record['m1_2b___3'] m.RENAVAIL = record['m1_2b___4'] m.RENURSE = record['m1_2b___5'] - m.NURSEMO = parse_date(record['m1_2b1'],'M') - m.NURSEDY = parse_date(record['m1_2b1'],'D') - m.NURSEYR = parse_date(record['m1_2b1'],'Y') + m.NURSEMO = parse_date(record['m1_2b1'], 'M') + m.NURSEDY = parse_date(record['m1_2b1'], 'D') + m.NURSEYR = parse_date(record['m1_2b1'], 'Y') m.REJOIN = record['m1_2b___6'] m.FTLDDISC = record['m1_3'] m.FTLDREAS = record['m1_3a'] - m.FTLDREAx = record['m1_3a1'] # Note : May need to add testing for {',",&,%} to remove + m.FTLDREAx = record['m1_3a1'] m.DECEASED = subject_deceased(record['m1_4']) - m.DISCONT = subject_discont(record['m1_4']) - m.DEATHMO = parse_date(record['m1_5a'],'M') - m.DEATHDY = parse_date(record['m1_5a'],'D') - m.DEATHYR = parse_date(record['m1_5a'],'Y') + m.DISCONT = subject_discont(record['m1_4']) + m.DEATHMO = parse_date(record['m1_5a'], 'M') + m.DEATHDY = parse_date(record['m1_5a'], 'D') + m.DEATHYR = parse_date(record['m1_5a'], 'Y') m.AUTOPSY = record['m1_5b'] - m.DISCMO = parse_date(record['m1_6a'],'M') - m.DISCDAY = parse_date(record['m1_6a'],'D') - m.DISCYR = parse_date(record['m1_6a'],'Y') + m.DISCMO = parse_date(record['m1_6a'], 'M') + m.DISCDAY = parse_date(record['m1_6a'], 'D') + m.DISCYR = parse_date(record['m1_6a'], 'Y') m.DROPREAS = record['m1_6b'] packet.append(m) - update_header(record,packet) + update_header(record, packet) return packet - - #update header function may be wrong + + +# update header function may be wrong def update_header(record, packet): for header in packet: header.PACKET = 'M' - header.FORMID = 'M1' # header.form_name + header.FORMID = 'M1' # header.form_name header.FORMVER = 3 - header.ADCID = 41 #record['ABCID'] + header.ADCID = 41 # record['ABCID'] header.PTID = record['ptid'] - header.VISITMO = parse_date(record['m1_form_date'],'M') - header.VISITDAY = parse_date(record['m1_form_date'],'D') - header.VISITYR = parse_date(record['m1_form_date'],'Y') - header.INITIALS = '' #record['INITIALS'] Note not in RedCap + header.VISITMO = parse_date(record['m1_form_date'], 'M') + header.VISITDAY = parse_date(record['m1_form_date'], 'D') + header.VISITYR = parse_date(record['m1_form_date'], 'Y') + header.INITIALS = '' # record['INITIALS'] Note not in RedCap + - -# parse -def parse_date(date,DMY_choice): +# parse +def parse_date(date, DMY_choice): ymd = re.compile('\d\d\d\d[-\/]\d\d[-\/]\d\d') mdy = re.compile('\d\d[-\/]\d\d[-\/]\d\d\d\d') dub = re.compile('\d\d') - if mdy.match(date) != None: # format is mdy + if mdy.match(date) != None: # format is mdy m = dub.findall(date) - if DMY_choice == "D": + if DMY_choice == "D": return m[1] elif DMY_choice == "M": return m[0] elif DMY_choice == "Y": return m[2] + m[3] - elif ymd.match(date)!= None: #format is ymd + elif ymd.match(date) != None: # format is ymd m = dub.findall(date) if DMY_choice == "D": return m[3] - elif DMY_choice == "M": - return m[2] + elif DMY_choice == "M": + return m[2] elif DMY_choice == "Y": return m[0] + m[1] - elif date =='': + elif date == '': return '' raise ValueError('Inccorect death date format, date must be MM/DD/YYYY') + def subject_deceased(status): """ Splits Deceased from Discont """ - if status == '1': + if status == '1': return 1 else: - return + return + def subject_discont(status): """ Splits Discont from Deceased """ - if status == '2': + if status == '2': return 1 else: - return + return diff --git a/nacc/uds3/m/forms.py b/nacc/uds3/m/forms.py index c411a1a..e96b26d 100644 --- a/nacc/uds3/m/forms.py +++ b/nacc/uds3/m/forms.py @@ -9,17 +9,18 @@ # WARNING: When generating new forms, do not overwrite this section from datetime import date -# WARNING: When generating new forms, use CURRENT_YEAR instead of "CURRENT_YEAR" +# WARNING: When generating new forms, use CURRENT_YEAR instead of "2014" # WARNING: When generating new forms, use CURRENT_YEAR-15 instead of "1999" CURRENT_YEAR = date.today().year ### END non-generated code -def header_fields(): # may not need the headers. + +def header_fields(): # may not need the headers. fields = {} fields['PACKET'] = nacc.uds3.Field(name='PACKET', typename='Char', position=(1, 2), length=2, inclusive_range=None, allowable_values=[], blanks=[]) fields['FORMID'] = nacc.uds3.Field(name='FORMID', typename='Char', position=(4, 6), length=3, inclusive_range=None, allowable_values=[], blanks=[]) - fields['FORMVER'] = nacc.uds3.Field(name='FORMVER', typename='Num', position=(8, 10), length=3, inclusive_range=(1, 3), allowable_values=[], blanks=[]) + fields['FORMVER'] = nacc.uds3.Field(name='FORMVER', typename='Num', position=(8, 10), length=3, inclusive_range=(1, 3), allowable_values=[], blanks=[]) fields['ADCID'] = nacc.uds3.Field(name='ADCID', typename='Num', position=(12, 13), length=2, inclusive_range=(2, 41), allowable_values=[], blanks=[]) fields['PTID'] = nacc.uds3.Field(name='PTID', typename='Char', position=(15, 24), length=10, inclusive_range=None, allowable_values=[], blanks=[]) fields['VISITMO'] = nacc.uds3.Field(name='VISITMO', typename='Num', position=(26, 27), length=2, inclusive_range=(1, 12), allowable_values=[], blanks=[]) @@ -28,33 +29,34 @@ def header_fields(): # may not need the headers. fields['INITIALS'] = nacc.uds3.Field(name='INITIALS', typename='Char', position=(41, 43), length=3, inclusive_range=None, allowable_values=[], blanks=[]) return fields -class FormM(nacc.uds3.FieldBag): + +class FormM(nacc.uds3.FieldBag): def __init__(self): self.fields = header_fields() - self.fields['CHANGEMO'] = nacc.uds3.Field(name='CHANGEMO', typename='Num', position=(45,46), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['CHANGEDY'] = nacc.uds3.Field(name='CHANGEDY', typename='Num', position=(48,49), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['CHANGEYR'] = nacc.uds3.Field(name='CHANGEYR', typename='Num', position=(51,54), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['PROTOCOL'] = nacc.uds3.Field(name='PROTOCOL', typename='Num', position=(56,56), length=1, inclusive_range=(1, 3), allowable_values=[], blanks=['Blank if Question 4a DECEASED = 1 (Yes)','Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['ACONSENT'] = nacc.uds3.Field(name='ACONSENT', typename='Num', position=(58,58), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)','Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)']) - self.fields['RECOGIM'] = nacc.uds3.Field(name='RECOGIM', typename='Num', position=(60,60), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['REPHYILL'] = nacc.uds3.Field(name='REPHYILL', typename='Num', position=(62,62), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['REREFUSE'] = nacc.uds3.Field(name='REREFUSE', typename='Num', position=(64,64), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['RENAVAIL'] = nacc.uds3.Field(name='RENAVAIL', typename='Num', position=(66,66), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['RENURSE'] = nacc.uds3.Field(name='RENURSE', typename='Num', position=(68,68), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['NURSEMO'] = nacc.uds3.Field(name='NURSEMO', typename='Num', position=(70,71), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 2b5 RENURSE ne 1 (Yes)', 'Blank if Question 4a DECEASED = 1 (Yes)','Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['NURSEDY'] = nacc.uds3.Field(name='NURSEDY', typename='Num', position=(73,74), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 2b5 RENURSE ne 1 (Yes)', 'Blank if Question 4a DECEASED = 1 (Yes)','Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['NURSEYR'] = nacc.uds3.Field(name='NURSEYR', typename='Num', position=(76,79), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 2b5 RENURSE ne 1 (Yes)', 'Blank if Question 4a DECEASED = 1 (Yes)','Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['REJOIN'] = nacc.uds3.Field(name='REJOIN', typename='Num', position=(81,81), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['FTLDDISC'] = nacc.uds3.Field(name='FTLDDISC', typename='Num', position=(83,83), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['FTLDREAS'] = nacc.uds3.Field(name='FTLDREAS', typename='Num', position=(85,85), length=1, inclusive_range=(1, 4), allowable_values=[], blanks=['Blank if Question 3 FTLDDISC ne 1 (Discontine FTLD)','Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['FTLDREAX'] = nacc.uds3.Field(name='FTLDREAX', typename='Char', position=(87,146), length=60, inclusive_range=None, allowable_values=[], blanks=['Blank if Question 3a FTLDREAS ne 4 (Other)','Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['DECEASED'] = nacc.uds3.Field(name='DECEASED', typename='Num', position=(148,148), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)'])#,'If Question 4a = 1 (Yes) then skip to Question 5a1' - self.fields['DISCONT'] = nacc.uds3.Field(name='DISCONT', typename='Num', position=(150,150), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)'])#,'If Question 4b = 1 (Yes) then skip to Question 6a1' - self.fields['DEATHMO'] = nacc.uds3.Field(name='DEATHMO', typename='Num', position=(152,153), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['DEATHDY'] = nacc.uds3.Field(name='DEATHDY', typename='Num', position=(155,156), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['DEATHYR'] = nacc.uds3.Field(name='DEATHYR', typename='Num', position=(158,161), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['AUTOPSY'] = nacc.uds3.Field(name='AUTOPSY', typename='Num', position=(163,163), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) - self.fields['DISCMO'] = nacc.uds3.Field(name='DISCMO', typename='Num', position=(165,166), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) - self.fields['DISCDAY'] = nacc.uds3.Field(name='DISCDAY', typename='Num', position=(168,169), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) - self.fields['DISCYR'] = nacc.uds3.Field(name='DISCYR', typename='Num', position=(171,174), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) - self.fields['DROPREAS'] = nacc.uds3.Field(name='DROPREAS', typename='Num', position=(176,176), length=1, inclusive_range=(1, 2), allowable_values=['1', '2'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) + self.fields['CHANGEMO'] = nacc.uds3.Field(name='CHANGEMO', typename='Num', position=(45, 46), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['CHANGEDY'] = nacc.uds3.Field(name='CHANGEDY', typename='Num', position=(48, 49), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['CHANGEYR'] = nacc.uds3.Field(name='CHANGEYR', typename='Num', position=(51, 54), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['PROTOCOL'] = nacc.uds3.Field(name='PROTOCOL', typename='Num', position=(56, 56), length=1, inclusive_range=(1, 3), allowable_values=[], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['ACONSENT'] = nacc.uds3.Field(name='ACONSENT', typename='Num', position=(58, 58), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)']) + self.fields['RECOGIM'] = nacc.uds3.Field(name='RECOGIM', typename='Num', position=(60, 60), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['REPHYILL'] = nacc.uds3.Field(name='REPHYILL', typename='Num', position=(62, 62), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['REREFUSE'] = nacc.uds3.Field(name='REREFUSE', typename='Num', position=(64, 64), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['RENAVAIL'] = nacc.uds3.Field(name='RENAVAIL', typename='Num', position=(66, 66), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['RENURSE'] = nacc.uds3.Field(name='RENURSE', typename='Num', position=(68, 68), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['NURSEMO'] = nacc.uds3.Field(name='NURSEMO', typename='Num', position=(70, 71), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 2b5 RENURSE ne 1 (Yes)', 'Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['NURSEDY'] = nacc.uds3.Field(name='NURSEDY', typename='Num', position=(73, 74), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 2b5 RENURSE ne 1 (Yes)', 'Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['NURSEYR'] = nacc.uds3.Field(name='NURSEYR', typename='Num', position=(76, 79), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 2b5 RENURSE ne 1 (Yes)', 'Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['REJOIN'] = nacc.uds3.Field(name='REJOIN', typename='Num', position=(81, 81), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['FTLDDISC'] = nacc.uds3.Field(name='FTLDDISC', typename='Num', position=(83, 83), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['FTLDREAS'] = nacc.uds3.Field(name='FTLDREAS', typename='Num', position=(85, 85), length=1, inclusive_range=(1, 4), allowable_values=[], blanks=['Blank if Question 3 FTLDDISC ne 1 (Discontine FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['FTLDREAX'] = nacc.uds3.Field(name='FTLDREAX', typename='Char', position=(87, 146), length=60, inclusive_range=None, allowable_values=[], blanks=['Blank if Question 3a FTLDREAS ne 4 (Other)', 'Blank if Question 4a DECEASED = 1 (Yes)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['DECEASED'] = nacc.uds3.Field(name='DECEASED', typename='Num', position=(148, 148), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) # ,'If Question 4a = 1 (Yes) then skip to Question 5a1' + self.fields['DISCONT'] = nacc.uds3.Field(name='DISCONT', typename='Num', position=(150, 150), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) # ,'If Question 4b = 1 (Yes) then skip to Question 6a1' + self.fields['DEATHMO'] = nacc.uds3.Field(name='DEATHMO', typename='Num', position=(152, 153), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['DEATHDY'] = nacc.uds3.Field(name='DEATHDY', typename='Num', position=(155, 156), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['DEATHYR'] = nacc.uds3.Field(name='DEATHYR', typename='Num', position=(158, 161), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['AUTOPSY'] = nacc.uds3.Field(name='AUTOPSY', typename='Num', position=(163, 163), length=1, inclusive_range=(0, 1), allowable_values=['1', '0'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4b DISCONT = 1 (Yes)']) + self.fields['DISCMO'] = nacc.uds3.Field(name='DISCMO', typename='Num', position=(165, 166), length=2, inclusive_range=(1, 12), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) + self.fields['DISCDAY'] = nacc.uds3.Field(name='DISCDAY', typename='Num', position=(168, 169), length=2, inclusive_range=(1, 31), allowable_values=['99'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) + self.fields['DISCYR'] = nacc.uds3.Field(name='DISCYR', typename='Num', position=(171, 174), length=4, inclusive_range=(2015, CURRENT_YEAR), allowable_values=[], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) + self.fields['DROPREAS'] = nacc.uds3.Field(name='DROPREAS', typename='Num', position=(176, 176), length=1, inclusive_range=(1, 2), allowable_values=['1', '2'], blanks=['Blank if Question 2a PROTOCOL = 1 (Follow-up by telephone)', 'Blank if Question 2a PROTOCOL = 2 (Minimal contact)', 'Blank if Question 2a PROTOCOL = 3 (Annual in-person follow-up)', 'Blank if Question 3 FTLDDISC = 1 (Discontinue FTLD)', 'Blank if Question 4a DECEASED = 1 (Yes)']) diff --git a/nacc/uds3/tfp/builder.py b/nacc/uds3/tfp/builder.py index de5c827..c6c560f 100644 --- a/nacc/uds3/tfp/builder.py +++ b/nacc/uds3/tfp/builder.py @@ -675,26 +675,29 @@ def add_z1_or_z1x(record, packet): setattr(z1x, key, record[value]) z1x_filled_fields += 1 - z1 = tfp_forms.FormZ1() - z1_filled_fields = 0 - z1_field_mapping = { - 'A3SUB': 'tele_a3_sub', - 'A3NOT': 'tele_a3_not', - 'A3COMM': 'tele_a3_comm', - 'A4SUB': 'tele_a4_sub', - 'A4NOT': 'tele_a4_not', - 'A4COMM': 'tele_a4_comm', - 'B5SUB': 'tele_b5_sub', - 'B5NOT': 'tele_b5_not', - 'B5COMM': 'tele_b5_comm', - 'B7SUB': 'tele_b7_sub', - 'B7NOT': 'tele_b7_not', - 'B7COMM': 'tele_b7_comm' - } - for key, value in z1_field_mapping.items(): - if record[value].strip(): - setattr(z1, key, record[value]) - z1_filled_fields += 1 + try: + z1 = tfp_forms.FormZ1() + z1_filled_fields = 0 + z1_field_mapping = { + 'A3SUB': 'tele_a3_sub', + 'A3NOT': 'tele_a3_not', + 'A3COMM': 'tele_a3_comm', + 'A4SUB': 'tele_a4_sub', + 'A4NOT': 'tele_a4_not', + 'A4COMM': 'tele_a4_comm', + 'B5SUB': 'tele_b5_sub', + 'B5NOT': 'tele_b5_not', + 'B5COMM': 'tele_b5_comm', + 'B7SUB': 'tele_b7_sub', + 'B7NOT': 'tele_b7_not', + 'B7COMM': 'tele_b7_comm' + } + for key, value in z1_field_mapping.items(): + if record[value].strip(): + setattr(z1, key, record[value]) + z1_filled_fields += 1 + except KeyError: + z1_filled_fields = 0 # Prefer Z1X to Z1 # If both are blank, use date (Z1X after 2018/04/02) @@ -706,8 +709,6 @@ def add_z1_or_z1x(record, packet): and int(record['visitmo']) > 4) or (int(record['visityr']) == 2018 and int(record['visitmo']) == 4 and int(record['visitday']) >= 2): packet.insert(0, z1x) - else: - packet.insert(0, z1) def update_header(record, packet): diff --git a/nacc/uds3/tfp/forms.py b/nacc/uds3/tfp/forms.py index 8ca95b5..0eaf686 100644 --- a/nacc/uds3/tfp/forms.py +++ b/nacc/uds3/tfp/forms.py @@ -9,7 +9,7 @@ # WARNING: When generating new forms, do not overwrite this section from datetime import date -# WARNING: When generating new forms, use CURRENT_YEAR instead of "CURRENT_YEAR" +# WARNING: When generating new forms, use CURRENT_YEAR instead of "2014" # WARNING: When generating new forms, use CURRENT_YEAR-15 instead of "1999" CURRENT_YEAR = date.today().year diff --git a/nacculator_cfg.ini.example b/nacculator_cfg.ini.example index 5614bf5..97ff0dc 100644 --- a/nacculator_cfg.ini.example +++ b/nacculator_cfg.ini.example @@ -4,23 +4,49 @@ token: Your REDCAP Token redcap_server: Your Redcap Server -#[filters] - Each section is named after the corresponding function name -#in filters.py +# [filters] - Each section is named after the corresponding function name +# in filters.py +# Comment out filters that you do not intend to use with # in front of +# the filter name and each line within that section. + [filter_clean_ptid] +# Filters out subjects with PTIDs and visitnums that are already cleared to +# NACC's "Current Database" filepath: path/to/current-db-subjects.csv -[filter_fix_headers] +[filter_replace_drug_id] +# Automatically adds the "d" prefix for each drug ID in Form A4. +present: yes + +# [filter_fix_headers] +# Corrects REDCap-exported headers that do not match NACC's DEDs. # Write in format: # old_header: corrected_header -c1s_2a_npsylan: c1s_2_npsycloc -c1s_2a_npsylanx: c1s_2a_npsylan -b6s_2a1_npsylanx: c1s_2a1_npsylanx -fu_otherneur: fu_othneur -fu_otherneurx: fu_othneurx -fu_strokedec: fu_strokdec + +[filter_fill_default] +# Fills fields that are typically left blank in REDCap: +# adcid, nogds (automatically enters 0 if the field is blank), and formver = 3) +adcid: Your ADC ID + +[filter_update_field] +# Replaces the ADCID if it was previously filled in the REDCap export. +adcid: Your ADC ID + +[filter_fix_visitdate] +# Ensures that REDCap's 'visitnum' field is always an integer. +present: yes [filter_remove_ptid] +# Removes PTIDs that are not in NACC's "Current Database" +# but still should be skipped, such as test PTIDs. +# ptid_format specifies which ptids should be *kept*. ptid_format: 11\d.* -#enter ptid in form of ptid,ptid,ptid,ext... (no spaces) +# bad_ptid notes the exceptions to ptid_format that should still be removed. +# enter ptid in form of ptid,ptid,ptid,ext... (no spaces) bad_ptid: -good_ptid: +# good_ptid notes the exceptions that don't fit ptid_format but should be kept. +good_ptid: + +[filter_eliminate_empty_date] +# Removes PTIDs that are missing information in their visit date. +present: yes diff --git a/setup.py b/setup.py index 4655dc7..f70bf10 100644 --- a/setup.py +++ b/setup.py @@ -6,7 +6,7 @@ from setuptools import setup, find_packages -VERSION="1.2.0" +VERSION = "1.2.0" setup( name="nacculator", @@ -21,12 +21,13 @@ keywords=["REDCap", "NACC", "UDS", "Clinical data"], download_url="https://github.com/ctsit/nacculator/releases/tag/" + VERSION, - package_dir = {'nacc': 'nacc'}, - packages = find_packages(), + package_dir={'nacc': 'nacc'}, + packages=find_packages(), entry_points={ "console_scripts": [ - "redcap2nacc = nacc.redcap2nacc:main" + "redcap2nacc = nacc.redcap2nacc:main", + "nacculator_filters = nacc.run_filters:main" ] }, diff --git a/tests/generator_test.py b/tests/generator_test.py new file mode 100644 index 0000000..e8c2dcf --- /dev/null +++ b/tests/generator_test.py @@ -0,0 +1,30 @@ +import csv +import io +import unittest + +from tools import generator + + +class TestGenerator(unittest.TestCase): + + def test_new_format(self): + a1_ivp_sample = """ +"DORDER","ITEM","VAR","PACKET","FLDLENG","COLUMN1","COLUMN2","DTYPE","RANGE","VALUES","VAL1D","VAL2D","VAL3D","VAL4D","VAL5D","VAL6D","VAL7D","VAL8D","VAL9D","VAL10D","MISSVALS","NEWQUEST","BLANKS" +"1","1","REASON","I",1,45,45,1,"1||4","1||2||4||9","To participate in a research study","To have a clinical evaluation","Both (to participate in a research study and to have a clinical evaluation)","Unknown",,,,,,,"9","Primary reason for coming to ADC:", + """.strip() + + expected = """ +fields["REASON"] = nacc.uds3.Field(name="REASON", typename="Num", position=(45, 45), length=1, inclusive_range=(1, 4), allowable_values=['1', '2', '4', '9'], blanks=[]) + """.strip() + + reader = io.StringIO(a1_ivp_sample) + reader = csv.DictReader(reader) + forms = generator.generate_form("", reader) + fields = generator.fields_to_strings(forms.fields, "") + actual = next(fields) + + self.assertEqual(expected, actual) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_c1s_c2.py b/tests/test_c1s_c2.py index b7032d3..d3c4b1c 100644 --- a/tests/test_c1s_c2.py +++ b/tests/test_c1s_c2.py @@ -4,10 +4,13 @@ from nacc.uds3.ivp import builder as ivp_builder from nacc.uds3.fvp import builder as fvp_builder + class TestC1SC2(unittest.TestCase): def test_c1s_added_to_ivp_when_filled(self): - """ If header is from before October 23, 2017, the C1S form should be added """ + """ + If header is from before October 23, 2017, the C1S form should be added + """ record = make_blank_ivp() record['visityr'] = '2016' record['c1s_1a_mmseloc'] = '1' @@ -18,7 +21,9 @@ def test_c1s_added_to_ivp_when_filled(self): self.assertEqual(ipacket['MMSELOC'], '1') def test_c2_added_to_ivp_when_filled(self): - """ If header is from after October 23, 2017, the C2 form should be added """ + """ + If header is from after October 23, 2017, the C2 form should be added + """ record = make_blank_ivp() record['visityr'] = '2018' record['mocacomp'] = '1' @@ -29,7 +34,9 @@ def test_c2_added_to_ivp_when_filled(self): self.assertEqual(ipacket['MOCACOMP'], '1') def test_c1s_added_to_fvp_when_filled(self): - """ If header is from before October 23, 2017, the C1S form should be added """ + """ + If header is from before October 23, 2017, the C1S form should be added + """ record = make_blank_fvp() record['visityr'] = '2016' record['fu_mmsecomp'] = '1' @@ -40,7 +47,9 @@ def test_c1s_added_to_fvp_when_filled(self): self.assertEqual(fpacket['MMSECOMP'], '1') def test_c2_added_to_fvp_when_filled(self): - """ If header is from after October 23, 2017, the C2 form should be added """ + """ + If header is from after October 23, 2017, the C2 form should be added + """ record = make_blank_fvp() record['visityr'] = '2018' record['fu_mocacomp'] = '1' @@ -50,6 +59,7 @@ def test_c2_added_to_fvp_when_filled(self): fvp_builder.add_c1s_or_c2(record, fpacket) self.assertEqual(fpacket['MOCACOMP'], '1') + def make_blank_ivp(): return { 'visitmo': '', @@ -88,6 +98,7 @@ def make_blank_ivp(): 'c1s_9b_memtime': '', 'c1s_10a_boston': '', 'c1s_11a_cogstat': '', + 'ivp_c1s_complete': '2', # C2 'mocacomp': '', 'mocareas': '', @@ -216,6 +227,7 @@ def make_blank_fvp(): 'fu_udsverte_c1': '', 'fu_udsverti_c1': '', 'fu_cogstat': '', + 'fvp_c1s_complete': '2', # C2 'fu_mocacomp': '', 'fu_mocareas': '', @@ -290,5 +302,6 @@ def make_blank_fvp(): } + if __name__ == "__main__": unittest.main() diff --git a/tests/test_cls.py b/tests/test_cls.py index 164527a..452b214 100644 --- a/tests/test_cls.py +++ b/tests/test_cls.py @@ -52,9 +52,9 @@ def test_cls_added_when_filled(self): self.assertEqual(len(fpacket), 1, "Expected packet to have CLS") def test_partial_cls_has_warning(self): - """Partially completed CLS should create a warning.""" - record = make_filled_record() - record['eng_preferred_language'] = ' ' # Make form partially complete. + """Partially completed CLS should create a warning.""" + record = make_filled_record() + record['eng_preferred_language'] = ' ' # Make form partially complete. ipacket = packet.Packet() itrap = StringIO() @@ -68,11 +68,14 @@ def test_partial_cls_has_warning(self): assert ftrap.getvalue() == "[WARNING] CLS form is incomplete for PTID: unknown\n" ftrap.close() - def test_cls_proficiency_not_100_has_warning(self): - """If language proficiency percentages do not add to 100, create a warning.""" - record = make_filled_record() - record['eng_percentage_english'] = '20' - record['eng_percentage_spanish'] = '91' + def test_cls_proficiency_not_100_has_warning(self): + """ + If language proficiency percentages do not add to 100, + create a warning. + """ + record = make_filled_record() + record['eng_percentage_english'] = '20' + record['eng_percentage_spanish'] = '91' ipacket = packet.Packet() itrap = StringIO() @@ -82,7 +85,7 @@ def test_cls_proficiency_not_100_has_warning(self): fpacket = packet.Packet() ftrap = StringIO() - clsform.add_cls(record, ipacket, ivp_forms, ftrap) + clsform.add_cls(record, fpacket, fvp_forms, ftrap) assert ftrap.getvalue() == "[WARNING] language proficiency percentages do not equal 100 for PTID : unknown\n" ftrap.close() @@ -102,7 +105,7 @@ def test_check_cls_date(self): clsform.add_cls(record, fpacket, fvp_forms) def test_cls_form_marked_complete(self): - """If the completed CLS form is not marked complete, raise.""" + """ If the completed CLS form is not marked complete, raise. """ record = make_filled_record() record['form_cls_linguistic_history_of_subject_complete'] = '0 or 1' @@ -114,6 +117,7 @@ def test_cls_form_marked_complete(self): with self.assertRaises(Exception): clsform.add_cls(record, fpacket, fvp_forms) + def make_blank_record(): return { 'eng_preferred_language': '', diff --git a/tests/test_filter.py b/tests/test_filters.py similarity index 95% rename from tests/test_filter.py rename to tests/test_filters.py index 295d97b..02f9c84 100644 --- a/tests/test_filter.py +++ b/tests/test_filters.py @@ -63,7 +63,7 @@ def test_filter_eliminate_empty_date(self): actual = [] with io.StringIO(redcap_data) as data, \ io.StringIO("") as results: - filters.filter_eliminate_empty_date(data, '', results) + filters.filter_eliminate_empty_date_do(data, results) results.seek(0) reader = csv.DictReader(results) @@ -120,7 +120,7 @@ def test_filter_fix_vistdate(self): actual = [] with io.StringIO(redcap_data) as data, \ io.StringIO("") as results: - filters.filter_fix_visitdate(data, '', results) + filters.filter_fix_visitdate_do(data, results) results.seek(0) reader = csv.DictReader(results) @@ -327,10 +327,10 @@ def test_filter_fix_headers(self): '''.strip() fix_header_dict = { - 'ptid' : 'PTID', - 'visitmo' : 'VisitMo', - 'adcid' : 'ADCid', - 'initials' : 'Initials' + 'ptid': 'PTID', + 'visitmo': 'VisitMo', + 'adcid': 'ADCid', + 'initials': 'Initials' } actual = [] @@ -345,12 +345,15 @@ def test_filter_fix_headers(self): results.seek(0) reader = csv.reader(results) actual = next(reader) - expected = ['PTID','redcap_event_name','formver','ADCid','VisitMo','visitday','visityr','visitnum','Initials','header_complete'] + expected = ['PTID', 'redcap_event_name', 'formver', 'ADCid', 'VisitMo', + 'visitday', 'visityr', 'visitnum', 'Initials', + 'header_complete'] self.assertListEqual(actual, expected) def test_filter_replace_drug_id(self): ''' - `test_filter_replace_drug_id` should replace drug id in the record, and print the processed ptid and number of updated fields. + `test_filter_replace_drug_id` should replace drug id in the record, + and print the processed ptid and number of updated fields. ''' redcap_data = ''' @@ -366,7 +369,7 @@ def test_filter_replace_drug_id(self): with io.StringIO(redcap_data) as data, \ io.StringIO("") as results: - filters.filter_replace_drug_id(data,'', results) + filters.filter_replace_drug_id_do(data, results) # Reset the file position indicator so DictReader reads from the # beginning of the results "file". @@ -380,3 +383,7 @@ def test_filter_replace_drug_id(self): self.assertListEqual(filter_out_1, expected_1) expected_2 = ['d11111', 'd22222', 'd22222', ''] self.assertListEqual(filter_out_2, expected_2) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_invalid_character_checker.py b/tests/test_invalid_character_checker.py index 53aba78..b95bfa1 100644 --- a/tests/test_invalid_character_checker.py +++ b/tests/test_invalid_character_checker.py @@ -7,24 +7,28 @@ class TestInvalidCharacters(unittest.TestCase): def test_find_any_characters(self): - field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', value='agf&dfg') + field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', + value='agf&dfg') found = check_for_bad_characters(field) self.assertTrue(found) def test_many_characters(self): - field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', value='ag%fd"fg') + field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', + value='ag%fd"fg') found_many = check_for_bad_characters(field) many = len(found_many) self.assertEqual(many, 2) def test_closed_double_quotes(self): - field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', value='ag\"fd\"fg') + field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', + value='ag\"fd\"fg') found_two_doublequotes = check_for_bad_characters(field) dups = found_two_doublequotes[0] self.assertEqual(dups, '" (2)') def test_closed_single_quotes(self): - field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', value='ag\'fdf\'g') + field = Field('FOTHMUSX', 'Char', 0, 1, allowable_values='', + value='ag\'fdf\'g') found_two_quotes = check_for_bad_characters(field) dups1 = found_two_quotes[0] self.assertEqual(dups1, '\' (2)') diff --git a/tests/test_m1.py b/tests/test_m1.py index f34dd2d..99c2df3 100644 --- a/tests/test_m1.py +++ b/tests/test_m1.py @@ -11,7 +11,8 @@ class TestM1(unittest.TestCase): def test_m1_death_date_accept(self): """ death date format interpreter accept correct dates """ date = ['12/12/2012', '12-12-2012', '2012/12/12', '2012-12-12'] - date_parsed = ['12', '12', '2012', '12', '12', '2012', '12', '12', '2012', '12', '12', '2012'] + date_parsed = ['12', '12', '2012', '12', '12', '2012', '12', '12', + '2012', '12', '12', '2012'] record = make_blank_m() out = [] for x in date: @@ -19,13 +20,15 @@ def test_m1_death_date_accept(self): record['DEATHMO'] = m_builder.parse_date(x, 'M') record['DEATHDY'] = m_builder.parse_date(x, 'D') record['DEATHYR'] = m_builder.parse_date(x, 'Y') - out += [str(record['DEATHMO']), str(record['DEATHDY']), str(record['DEATHYR'])] + out += [str(record['DEATHMO']), str(record['DEATHDY']), + str(record['DEATHYR'])] self.assertEqual(date_parsed, out) def test_m1_death_date_reject(self): """ death date format interpreter rejects wrong dates """ date = ['12/12/2012', '12-1212', '12/2012/12', '2012-12-12'] - date_parsed = ['12', '12', '2012', '12', '12', '2012', '12', '12', '2012', '12', '12', '2012'] + date_parsed = ['12', '12', '2012', '12', '12', '2012', '12', '12', + '2012', '12', '12', '2012'] record = make_blank_m() out = [] with self.assertRaises(ValueError): @@ -34,7 +37,8 @@ def test_m1_death_date_reject(self): record['DEATHMO'] = m_builder.parse_date(x, 'M') record['DEATHDY'] = m_builder.parse_date(x, 'D') record['DEATHYR'] = m_builder.parse_date(x, 'Y') - out += [str(record['DEATHMO']), str(record['DEATHDY']), str(record['DEATHYR'])] + out += [str(record['DEATHMO']), str(record['DEATHDY']), + str(record['DEATHYR'])] self.assertNotEqual(date_parsed, out) @unittest.skip("'0' is outside of the inclusive_range for 'FTLDREAS', 'FTLDREAX' should be left blank if FTLDREAS is filled regardless of 'DECEASED' or 'DISCONT' status") @@ -140,40 +144,41 @@ def make_blank_m(): 'DROPREAS': '', } - def make_filled_m(): - # default dead - return { - 'visitmo': '01', - 'visitday': '01', - 'visityr': '2000', - 'CHANGEMO': '02', - 'CHANGEDY': '03', - 'CHANGEYR': '2000', - 'PROTOCOL': '', - 'ACONSENT': '', - 'RECOGIM': '', - 'REPHYILL': '', - 'REREFUSE': '', - 'RENAVAIL': '', - 'RENURSE': '', - 'NURSEMO': '', - 'NURSEDY': '', - 'NURSEYR': '', - 'REJOIN': '', - 'FTLDDISC': '', - 'FTLDREAS': '', - 'FTLDREAx': '', - 'DECEASED': '1', - 'DISCONT': '', - 'DEATHMO': '01', - 'DEATHDY': '01', - 'DEATHYR': '2000', - 'AUTOPSY': '1', - 'DISCMO': '', - 'DISCDAY': '', - 'DISCYR': '', - 'DROPREAS': '', - } + +def make_filled_m(): + # default dead + return { + 'visitmo': '01', + 'visitday': '01', + 'visityr': '2000', + 'CHANGEMO': '02', + 'CHANGEDY': '03', + 'CHANGEYR': '2000', + 'PROTOCOL': '', + 'ACONSENT': '', + 'RECOGIM': '', + 'REPHYILL': '', + 'REREFUSE': '', + 'RENAVAIL': '', + 'RENURSE': '', + 'NURSEMO': '', + 'NURSEDY': '', + 'NURSEYR': '', + 'REJOIN': '', + 'FTLDDISC': '', + 'FTLDREAS': '', + 'FTLDREAx': '', + 'DECEASED': '1', + 'DISCONT': '', + 'DEATHMO': '01', + 'DEATHDY': '01', + 'DEATHYR': '2000', + 'AUTOPSY': '1', + 'DISCMO': '', + 'DISCDAY': '', + 'DISCYR': '', + 'DROPREAS': '', + } if __name__ == "__main__": diff --git a/tests/test_skip_z1.py b/tests/test_skip_z1.py new file mode 100644 index 0000000..3f14e84 --- /dev/null +++ b/tests/test_skip_z1.py @@ -0,0 +1,326 @@ +import unittest + +from nacc.uds3 import packet +from nacc.uds3.ivp import builder as ivp_builder +from nacc.uds3.fvp import builder as fvp_builder + + +class TestFormSkip(unittest.TestCase): + + def test_z1_added_to_ivp_when_present(self): + """ If the Z1 form is present, make sure it is added """ + record = make_blank_ivp() + record['a2_sub'] = '1' + + ipacket = packet.Packet() + ivp_builder.add_z1_or_z1x(record, ipacket) + self.assertEqual(ipacket['A2SUB'], '1') + + def test_z1_skipped_from_ivp_when_absent(self): + """ If the Z1 form is absent from the csv, it should simply + be skipped without throwing an error """ + record = make_blank_ivp_Z1X() + record['langa1'] = '1' + + ipacket = packet.Packet() + ivp_builder.add_z1_or_z1x(record, ipacket) + self.assertEqual(ipacket['LANGA1'], '1') + + def test_z1_added_to_fvp_when_filled(self): + """ If the Z1 form is present, make sure it is added """ + record = make_blank_fvp() + record['fu_a2_sub'] = '1' + + fpacket = packet.Packet() + fvp_builder.add_z1_or_z1x(record, fpacket) + self.assertEqual(fpacket['A2SUB'], '1') + + def test_z1x_added_to_fvp_when_filled(self): + """ If the Z1 form is absent from the csv, it should simply + be skipped without throwing an error """ + record = make_blank_fvp_Z1X() + record['fu_langa1'] = '1' + + fpacket = packet.Packet() + fvp_builder.add_z1_or_z1x(record, fpacket) + self.assertEqual(fpacket['LANGA1'], '1') + + +def make_blank_ivp(): + return { + 'visitmo': '1', + 'visitday': '1', + 'visityr': '2017', + # Z1 + 'a2_sub': '', + 'a2_not': '', + 'a2_comm': '', + 'a3_sub': '', + 'a3_not': '', + 'a3_comm': '', + 'a4_sub': '', + 'a4_not': '', + 'a4_comm': '', + 'b1_sub': '', + 'b1_not': '', + 'b1_comm': '', + 'b5_sub': '', + 'b5_not': '', + 'b5_comm': '', + 'b6_sub': '', + 'b6_not': '', + 'b6_comm': '', + 'b7_sub': '', + 'b7_not': '', + 'b7_comm': '', + 'ivp_z1_complete': '2', + # Z1X + 'langa1': '', + 'langa2': '', + 'a2sub': '', + 'a2not': '', + 'langa3': '', + 'a3sub': '', + 'a3not': '', + 'langa4': '', + 'a4sub': '', + 'a4not': '', + 'langa5': '', + 'langb1': '', + 'b1sub': '', + 'b1not': '', + 'langb4': '', + 'langb5': '', + 'b5sub': '', + 'b5not': '', + 'langb6': '', + 'b6sub': '', + 'b6not': '', + 'langb7': '', + 'b7sub': '', + 'b7not': '', + 'langb8': '', + 'langb9': '', + 'langc2': '', + 'langd1': '', + 'langd2': '', + 'langa3a': '', + 'ftda3afs': '', + 'ftda3afr': '', + 'langb3f': '', + 'langb9f': '', + 'langc1f': '', + 'langc2f': '', + 'langc3f': '', + 'langc4f': '', + 'ftdc4fs': '', + 'ftdc4fr': '', + 'ftdc5fs': '', + 'ftdc5fr': '', + 'ftdc6fs': '', + 'ftdc6fr': '', + 'lange2f': '', + 'lange3f': '', + 'langcls': '', + 'clssub': '', + 'ivp_z1x_complete': '2' + } + + +def make_blank_ivp_Z1X(): + return { + 'visitmo': '1', + 'visitday': '1', + 'visityr': '2019', + # Z1X + 'langa1': '', + 'langa2': '', + 'a2sub': '', + 'a2not': '', + 'langa3': '', + 'a3sub': '', + 'a3not': '', + 'langa4': '', + 'a4sub': '', + 'a4not': '', + 'langa5': '', + 'langb1': '', + 'b1sub': '', + 'b1not': '', + 'langb4': '', + 'langb5': '', + 'b5sub': '', + 'b5not': '', + 'langb6': '', + 'b6sub': '', + 'b6not': '', + 'langb7': '', + 'b7sub': '', + 'b7not': '', + 'langb8': '', + 'langb9': '', + 'langc2': '', + 'langd1': '', + 'langd2': '', + 'langa3a': '', + 'ftda3afs': '', + 'ftda3afr': '', + 'langb3f': '', + 'langb9f': '', + 'langc1f': '', + 'langc2f': '', + 'langc3f': '', + 'langc4f': '', + 'ftdc4fs': '', + 'ftdc4fr': '', + 'ftdc5fs': '', + 'ftdc5fr': '', + 'ftdc6fs': '', + 'ftdc6fr': '', + 'lange2f': '', + 'lange3f': '', + 'langcls': '', + 'clssub': '', + 'ivp_z1x_complete': '2' + } + + +def make_blank_fvp(): + return { + 'visitmo': '1', + 'visitday': '1', + 'visityr': '2017', + # Z1 + 'fu_a2_sub': '', + 'fu_a2_not': '', + 'fu_a2_comm': '', + 'fu_a3_sub': '', + 'fu_a3_not': '', + 'fu_a3_comm': '', + 'fu_a4_sub': '', + 'fu_a4_not': '', + 'fu_a4_comm': '', + 'fu_b1_sub': '', + 'fu_b1_not': '', + 'fu_b1_comm': '', + 'fu_b5_sub': '', + 'fu_b5_not': '', + 'fu_b5_comm': '', + 'fu_b6_sub': '', + 'fu_b6_not': '', + 'fu_b6_comm': '', + 'fu_b7_sub': '', + 'fu_b7_not': '', + 'fu_b7_comm': '', + 'fvp_z1_complete': '2', + # Z1X + 'fu_langa1': '', + 'fu_langa2': '', + 'fu_a2sub': '', + 'fu_a2not': '', + 'fu_langa3': '', + 'fu_a3sub': '', + 'fu_a3not': '', + 'fu_langa4': '', + 'fu_a4sub': '', + 'fu_a4not': '', + 'fu_langb1': '', + 'fu_b1sub': '', + 'fu_b1not': '', + 'fu_langb4': '', + 'fu_langb5': '', + 'fu_b5sub': '', + 'fu_b5not': '', + 'fu_langb6': '', + 'fu_b6sub': '', + 'fu_b6not': '', + 'fu_langb7': '', + 'fu_b7sub': '', + 'fu_b7not': '', + 'fu_langb8': '', + 'fu_langb9': '', + 'fu_langc2': '', + 'fu_langd1': '', + 'fu_langd2': '', + 'fu_langa3a': '', + 'fu_ftda3afs': '', + 'fu_ftda3afr': '', + 'fu_langb3f': '', + 'fu_langb9f': '', + 'fu_langc1f': '', + 'fu_langc2f': '', + 'fu_langc3f': '', + 'fu_langc4f': '', + 'fu_ftdc4fs': '', + 'fu_ftdc4fr': '', + 'fu_ftdc5fs': '', + 'fu_ftdc5fr': '', + 'fu_ftdc6fs': '', + 'fu_ftdc6fr': '', + 'fu_lange2f': '', + 'fu_lange3f': '', + 'fu_langcls': '', + 'fu_clssub': '', + 'fvp_z1x_complete': '2' + } + + +def make_blank_fvp_Z1X(): + return { + 'visitmo': '1', + 'visitday': '1', + 'visityr': '2019', + # Z1X + 'fu_langa1': '', + 'fu_langa2': '', + 'fu_a2sub': '', + 'fu_a2not': '', + 'fu_langa3': '', + 'fu_a3sub': '', + 'fu_a3not': '', + 'fu_langa4': '', + 'fu_a4sub': '', + 'fu_a4not': '', + 'fu_langb1': '', + 'fu_b1sub': '', + 'fu_b1not': '', + 'fu_langb4': '', + 'fu_langb5': '', + 'fu_b5sub': '', + 'fu_b5not': '', + 'fu_langb6': '', + 'fu_b6sub': '', + 'fu_b6not': '', + 'fu_langb7': '', + 'fu_b7sub': '', + 'fu_b7not': '', + 'fu_langb8': '', + 'fu_langb9': '', + 'fu_langc2': '', + 'fu_langd1': '', + 'fu_langd2': '', + 'fu_langa3a': '', + 'fu_ftda3afs': '', + 'fu_ftda3afr': '', + 'fu_langb3f': '', + 'fu_langb9f': '', + 'fu_langc1f': '', + 'fu_langc2f': '', + 'fu_langc3f': '', + 'fu_langc4f': '', + 'fu_ftdc4fs': '', + 'fu_ftdc4fr': '', + 'fu_ftdc5fs': '', + 'fu_ftdc5fr': '', + 'fu_ftdc6fs': '', + 'fu_ftdc6fr': '', + 'fu_lange2f': '', + 'fu_lange3f': '', + 'fu_langcls': '', + 'fu_clssub': '', + 'fvp_z1x_complete': '2' + } + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_z1_z1x.py b/tests/test_z1_z1x.py index 46b1de2..e822bb5 100644 --- a/tests/test_z1_z1x.py +++ b/tests/test_z1_z1x.py @@ -4,10 +4,13 @@ from nacc.uds3.ivp import builder as ivp_builder from nacc.uds3.fvp import builder as fvp_builder + class TestC1SC2(unittest.TestCase): def test_z1_added_to_ivp_when_filled(self): - """ If header is from before April 2, 2018, the Z1 form should be added """ + """ + If header is from before April 2, 2018, the Z1 form should be added + """ record = make_blank_ivp() record['visityr'] = '2016' record['a2_sub'] = '1' @@ -18,7 +21,9 @@ def test_z1_added_to_ivp_when_filled(self): self.assertEqual(ipacket['A2SUB'], '1') def test_z1x_added_to_ivp_when_filled(self): - """ If header is from after April 2, 2018, the Z1X form should be added """ + """ + If header is from after April 2, 2018, the Z1X form should be added + """ record = make_blank_ivp() record['visityr'] = '2019' record['langa1'] = '1' @@ -29,7 +34,9 @@ def test_z1x_added_to_ivp_when_filled(self): self.assertEqual(ipacket['LANGA1'], '1') def test_z1_added_to_fvp_when_filled(self): - """ If header is from before April 2, 2018, the Z1 form should be added """ + """ + If header is from before April 2, 2018, the Z1 form should be added + """ record = make_blank_fvp() record['visityr'] = '2016' record['fu_a2_sub'] = '1' @@ -40,7 +47,9 @@ def test_z1_added_to_fvp_when_filled(self): self.assertEqual(fpacket['A2SUB'], '1') def test_z1x_added_to_fvp_when_filled(self): - """ If header is from after April 2, 2018, the Z1X form should be added """ + """ + If header is from after April 2, 2018, the Z1X form should be added + """ record = make_blank_fvp() record['visityr'] = '2019' record['fu_langa1'] = '1' @@ -50,6 +59,7 @@ def test_z1x_added_to_fvp_when_filled(self): fvp_builder.add_z1_or_z1x(record, fpacket) self.assertEqual(fpacket['LANGA1'], '1') + def make_blank_ivp(): return { 'visitmo': '', @@ -77,6 +87,7 @@ def make_blank_ivp(): 'b7_sub': '', 'b7_not': '', 'b7_comm': '', + 'ivp_z1_complete': '2', # Z1X 'langa1': '', 'langa2': '', @@ -125,7 +136,8 @@ def make_blank_ivp(): 'lange2f': '', 'lange3f': '', 'langcls': '', - 'clssub': '' + 'clssub': '', + 'ivp_z1x_complete': '2' } @@ -156,6 +168,7 @@ def make_blank_fvp(): 'fu_b7_sub': '', 'fu_b7_not': '', 'fu_b7_comm': '', + 'fvp_z1_complete': '2', # Z1X 'fu_langa1': '', 'fu_langa2': '', @@ -203,8 +216,10 @@ def make_blank_fvp(): 'fu_lange2f': '', 'fu_lange3f': '', 'fu_langcls': '', - 'fu_clssub': '' + 'fu_clssub': '', + 'fvp_z1x_complete': '2' } + if __name__ == "__main__": unittest.main() diff --git a/tools/generator.py b/tools/generator.py index 714903b..3a0ea3c 100644 --- a/tools/generator.py +++ b/tools/generator.py @@ -4,7 +4,9 @@ # Use of this source code is governed by the license found in the LICENSE file. ############################################################################### -"""Copyright 2015-2019 University of Florida +"""Generates Python code that represent NACC Forms + +Copyright 2015-2019 University of Florida Usage: python3 tools/generator.py -h|--help python3 tools/generator.py @@ -20,9 +22,8 @@ corrections Path to the directory containing manually corrected DED CSVs If unspecified, checking for corrected CSVs is skipped. -Note: the CSV versions of the DEDs are found on the NACC website with the form. -For example, UDS3 FVP Form A1 is at: - https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/uds3dedA1FVP.csv +Note: the CSV versions of the DEDs are found on the NACC website with the form: + https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/UDS3csvded.html """ import csv @@ -60,16 +61,20 @@ def method(self): def fields_to_strings(fields, this="self.") -> typing.Iterable[str]: """ Returns fields as a Python variable declaration """ for field in fields: + inclusive_range = field.inclusive_range + if inclusive_range: + inclusive_range = f"({inclusive_range[0]}, {inclusive_range[1]})" + code = ( '{qualifier}fields["{field.name}"] = nacc.uds3.Field(' 'name="{field.name}", ' 'typename="{field.type}", ' 'position={field.position}, ' 'length={field.length}, ' - 'inclusive_range={field.inclusive_range}, ' + 'inclusive_range={inclusive_range}, ' 'allowable_values={field.allowable_codes}, ' 'blanks={field.blanks})' - ).format(qualifier=this, field=field) + ).format(qualifier=this, field=field, inclusive_range=inclusive_range) yield code @@ -83,45 +88,17 @@ def __init__(self): """.strip() -def generate(ded: str, encoding: str = "utf-8"): +def generate(ded: str, encoding: str = "utf-8") -> DynamicObject: """ Generates Python code representing each NACC Form as a class """ try: with open(ded, encoding=encoding) as stream: reader = csv.DictReader(stream) - form = DynamicObject() - form.fields = [] - - for record in reader: - form.packet = record["Packet"] - form.id = record["Form ID"] - - field = DynamicObject() - field.name = MethodField(record["Data Element"]) - field.order = record["Data Order"] - field.type = record["Data Type"] - field.length = record["Data Length"] - field.position = \ - (int(record["Column 1"]), int(record["Column 2"])) - if record["RANGE1"] not in ("", "."): - field.inclusive_range = \ - (int(record["RANGE1"]), int(record["RANGE2"])) - else: - field.inclusive_range = None - - field.allowable_codes = [] - for key, code in record.items(): - if not code or code == ".": - continue - if not re.match(r"^VAL\d\d?$", str(key)): - continue - field.allowable_codes.append(code) - - form.fields.append(field) - field.blanks = [record[f] for f in reader.fieldnames - if "BLANKS" in f and record[f]] - - form.fields.sort(key=lambda f: f.order) + match = re.match(r"ded[IFT](\w\w\w?).csv", os.path.basename(ded)) + if not match: + raise Exception("Cannot determine Form from filename: " + ded) + form_id = match[1] + form = generate_form(form_id, reader) except UnicodeDecodeError as err: if encoding != "windows-1252": @@ -131,6 +108,92 @@ def generate(ded: str, encoding: str = "utf-8"): return form +def generate_form(form_id: str, reader: csv.DictReader) -> DynamicObject: + form = DynamicObject() + form.fields = [] + + for record in reader: + form.packet = record["PACKET"] + form.id = form_id + + field = DynamicObject() + field.name = MethodField(record["VAR"]) + field.order = record["DORDER"] + + # TODO: Ask NACC about DTYPE. PDFs say "Num" or "Char" + field.type = "Num" + if record["DTYPE"] == "3": + field.type = "Char" + field.length = record["FLDLENG"] + field.position = \ + (int(record["COLUMN1"]), int(record["COLUMN2"])) + + field.inclusive_range = None + if record["RANGE"]: + (start, end) = record["RANGE"].split("||") + start = start.replace("current year", "CURRENT_YEAR") + start = start.replace(" minus ", " - ") + end = end.replace("current year", "CURRENT_YEAR") + end = end.replace(" minus ", " - ") + + field.inclusive_range = (start, end) + + field.allowable_codes = [] + for code in record["VALUES"].split("||"): + field.allowable_codes.append(code) + + # TODO: handle acceptable MISSING values + + form.fields.append(field) + field.blanks = [record[f] for f in reader.fieldnames + if "BLANKS" in f and record[f]] + + form.fields.sort(key=lambda f: f.order) + return form + + +def generate_header(path: str) -> DynamicObject: + with open(path) as stream: + reader = csv.DictReader(stream) + form = DynamicObject() + form.fields = [] + + for record in reader: + form.packet = record["Packet"] + form.id = record["Form ID"] + + field = DynamicObject() + field.name = MethodField(record["Data Element"]) + field.order = record["Data Order"] + field.type = record["Data Type"] + field.length = record["Data Length"] + field.position = \ + (int(record["Column 1"]), int(record["Column 2"])) + if record["RANGE1"] not in ("", "."): + (start, end) = (record["RANGE1"], record["RANGE2"]) + if end == "2014": + end = "CURRENT_YEAR" + field.inclusive_range = (start, end) + else: + field.inclusive_range = None + + field.allowable_codes = [] + for key, code in record.items(): + if not code or code == ".": + continue + if not re.match(r"^VAL\d\d?$", str(key)): + continue + field.allowable_codes.append(code) + + form.fields.append(field) + field.blanks = [record[f] for f in reader.fieldnames + if "BLANKS" in f and record[f]] + + form.fields.sort(key=lambda f: f.order) + + return form + + def indent(text, times=1, tab=" "): """ Returns text with times-tabs inserted at the beginning of each line """ if not text: @@ -167,17 +230,23 @@ def main(): # Search deds_path for CSV files, excluding the ded_header. deds = [filename for filename in os.listdir(deds_path) if filename.endswith(".csv") and filename != ded_header] + deds = sorted(deds) # Generate the Python module starting with the preamble, then the common # header fields, and finally the classes which represents the Forms. print("""# Generated using the NACCulator form generator tool. +from datetime import date + import nacc.uds3 +CURRENT_YEAR = date.today().year + + def header_fields(): fields = {}""") - header = generate(os.path.join(deds_path, ded_header)) + header = generate_header(os.path.join(deds_path, ded_header)) fields = sort_by_starting_position(header.fields) fields = fields_to_strings(fields, this="") for field in fields: