- This is an optional model development project on a real dataset related to predicting the different progressive levels of Alzheimer's disease (AD). The students are expected to use tensorflow library for modeling process and will be asked to submit predicted labels for a test dataset by which their score will be evaulated objectively.
- This project is included in the UpSchool - Google Developers Machine Learning - Deep Learning Program.
- In this project, you are supposed to provide a data science model to determine the level of Alzheimer disease. The levels are the ordinal categories from lower to higher respectively: 0, 0.25, 0.50, 1.0, 2.0, 3.0 (that are the progressive levels of Alzheimer's disease)
- You are expected to use the following features:
['EDUC','NACCMOCA','MARISTAT','NACCFAM','NACCGDS','NACCNE4S','NACCAPOE', 'INDEPEND','RESIDENC','ANYMEDS','NACCAMD','DEL','HALL','DEPD','ANX','APA','DISN', 'IRR','MOT','AGIT','ELAT','NITE','APP','DROPACT','NACCAGEB','SEX']
Index | Variable Name | Section | Variable type | Data type | Short Descriptor | Data Source | Allowable codes | Missing Codes | Description / derivation |
---|---|---|---|---|---|---|---|---|---|
1 | SEX | A1 - Subject Demographics | Original UDS question | Numeric cross-sectional | Subject's sex | rdd | 1 = Male 2 = Female |
||
2 | EDUC | A1 - Subject Demographics | Original UDS question | Numeric cross-sectional | Years of education | rdd | 0 - 36 99 = Unknown |
In general, 12 = high school or GRE, 16 = bachelor's degree, 18 = master's degree, 20 = doctorate. Note that although this variable is not collected at follow-up visits, the value from the initial visit will be shown at all follow-up visits. |
|
3 | MARISTAT | A1 - Subject Demographics | Original UDS question | Numeric longitudinal | Marital Status | rdd | 1 = Married 2 = Widowed 3 = Divorced 4 = Separated 5 = Never married (for marriage was annulled) 6 = Living as married/domestic partner 8 = Other or unknown |
Note that in v1– 2 there was an option for “other” status. These have been recoded to maristat = 9. | |
4 | INDEPEND | A1 - Subject Demographics | Original UDS question | Numeric longitudinal | Level of independence | rdd | 1 = Able to live independently 2 = Requires some assistance with complex activities 3 = Requires some assistance with basic activities 4 = Completely dependent 9 = Unknown |
||
5 | RESIDENC | A1 - Subject Demographics | Original UDS question | Numeric longitudinal | Type of residence | rdd | 1 = Single- or multi-family private residence (apartment, condo, house) 2 = Retirement community or independent group living 3 = Assisted living, adult family home, or boarding home 4 = Skilled nursing facility, nursing home, hospital, or hospice 9 = Other or unknown |
Note that in v1– 2 there was an option for “other” type of residence. These have been recoded to residenc = 9. | |
6 | NACCAGEB | A1 - Subject Demographics | NACC derived variable | Numeric cross-sectional | Subject's age at initial visit | rdd | 18 - 120 | Birth month and year are required elements in the UDS; however, birth day is not collected. To calculate naccageb, birth day is set to 1 for all subjects. Baseline age is then computed as initial visit date minus birth date. Note that although this variable is listed for all visits, it does not change across visits; it is cross-sectional. | |
7 | NACCFAM | A3 - Subject Family History | NACC derived variable | Numeric cross-sectional | Indicator of first-degree family member with cognitive impairment | rdd | 0 = No report of a first-degree family member with cognitive impairment 1 = Report of at least one first-degree family member with cognitive impairment 9 = Unknown -4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
UDS Form A3 version 1 – 2, submitted at all available visits: Subjects reporting at least one parent, sibling, or child with dementia at any visit will have naccfam = 1. Subjects who report no first-degree family members with dementia at all visits where Form A3 is submitted will have naccfam = 0. UDS Form A3 version 3.0 or subsequent versions, submitted at all available visits: If at least one parent, sibling, or child is reported to have both a primary neurological problem/psychiatric condition of cognitive impairment/behavior change (coded as 1) and one of the primary diagnosis codes listed below at any visit, then naccfam = 1. Subjects who report all first-degree family members as having a family history absent of cognitive impairment/psychiatric condition (primary neurological problem/psychiatric condition coded as 2–8) or a primary neurological problem/psychiatric condition is reported (coded as 1), but a code other than those listed below is reported, will have naccfam = 0. For subjects with Form A3 data from multiple form versions, all available data will be included in the calculation of naccfam. For example, if a family history of cognitive impairment is indicated on Form A3 using v3.0 but not on a previous version using v1–2, the subject will still have naccfam = 1. Those with a submitted Form A3 (any version) who are missing data on all first-degree family members are coded as Unknown (naccfam = 9). If some first-degree family members are coded as No and some are coded as Unknown, then they are all coded as Unknown (naccfam = 9). In general, a known history of cognitive impairment reported at any visit supersedes all visits with missing codes. Likewise, an indication of cognitive impairment at any visit supersedes all other visits where a history of cognitive impairment is indicated as not present. In all other conditions where reporting varies, data from the most recent visit are used to calculate naccfam. If Form A3 was never submitted for any version of the UDS, naccfam will take a value of -4. Note that although this variable is listed for all visits, it does not change across visits; it is cross-sectional. |
|
8 | ANYMEDS | A4 - Subject Medications | Original UDS question | Numeric longitudinal | Subject taking any medications | rdd | 0 = No 1 = Yes -4 = Did not complete medications form |
If the medications form was not completed, then anymeds = - 4. | |
9 | NACCAMD | A4 - Subject Medications | NACC derived variable | Numeric longitudinal | Total number of medications reported at each visit | rdd | 0 - 40 -4 = Did not complete medications form |
This variable provides the total number of medications reported at a visit including all prescription and over the counter medications reported on UDS Form A4 at a single visit. If the medications form was not completed, then naccamd = -4. | |
10 | CDRGLOB | B4 CDR® Plus NACC FTLD | Original UDS question | Numeric longitudinal | Global CDR® | rdd | 0.0 = No impairment 0.5 = Questionable impairment 1.0 = Mild impairment 2.0 = Moderate impairment 3.0 = Severe impairment |
||
11 | DEL | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Delusions in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (del=9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
12 | HALL | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Hallucinations in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (hall = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
13 | AGIT | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Agitation or aggression in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (agit = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
14 | DEPD | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Depression or dysphoria in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (depd = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
15 | ANX | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Anxiety in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (anx = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
16 | ELAT | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Elation or euphoria in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (elat = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
17 | APA | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Apathy or indifference in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (apa = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
18 | DISN | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Disinhibition in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (disn = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
19 | IRR | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Irritability or lability in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (irr = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
20 | MOT | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Motor disturbance in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (mot = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
21 | NITE | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Nighttime behaviors in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (nite = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
22 | APP | B5 Neuropsychiatric Inventory Questionnaire (NPI-Q) | Original UDS question | Numeric longitudinal | Appetite and eating problems in the last month | rdd | 0 = No 1 = Yes 9 = Unkown - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
An option of Unknown (app = 9) was added to UDS v3.0 and subsequent versions. Also note that the wording in v3.0 and subsequent versions changed to be consistent with the way the NPI-Q was originally intended to be completed; the wording changes are not expected to affect the essential meaning of the question. | |
23 | NACCGDS | B6 Geriatric Depression Scale (GDS) | NACC derived variable | Numeric longitudinal | Total GDS Score | rdd | 0 - 15 88 = Could not be calculated - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
In earlier versions of the UDS, Centers were not given instructions on how to calculate the total GDS score if three or fewer GDS items were missing. NACC has created a new derived variable for Total GDS score so that subjects who were given the GDS in the earlier versions of UDS v1 will have a total GDS score if they skipped three or fewer items on the questionnaire. If the subject was missing more than three of the 15 items on the GDS for any UDS version, naccgds = 88. The UDS Coding Guidebook for Form B6 provides the algorithm for calculating the GDS score when three or fewer items are missing. | |
24 | DROPACT | B6 Geriatric Depression Scale (GDS) | Original UDS question | Numeric longitudinal | Have you dropped many of your activities and interests? | rdd | 0 = No 1 = Yes 9 = Did not answer - 4 = Not available: UDS form submitted did not collect data in this way, or a skip pattern precludes response to this question |
Note that an option of 9 = Did not answer was added to UDS v3.0 and subsequent versions. | |
25 | NACCAPOE | NACC derived variable | Numeric cross-sectional | APOE genotype | rdd-genetic | 1 = e3,e3 2 = e3,e4 3 = e3,e2 4 = e4,e4 5 = e4,e2 6 = e2,e2 9 = Missing/ unknown/ not assessed |
APOE genotype is run independently by the ADC and reported to NACC on the NACC Neuropathology Form. APOE genotype is also reported by ADGC and NCRAD. In the rare case that the ADC-reported genotype and the genotype reported by ADGC are not the same, the genotype is set to 9 = Missing for that subject. | ||
26 | NACCNE4S | NACC derived variable | Numeric cross-sectional | Number of APOE e4 alleles | rdd-genetic | 0 = No e4 allele 1 = 1 copy of e4 allele 2 = 2 copies of e4 allele 9 = Missing/ unknown/ not assessed |
APOE genotype is run independently by the ADC and reported to NACC on the NACC Neuropathology Form. APOE genotype is also reported by ADGC and NCRAD. In the rare case that the ADC-reported genotype and the genotype reported by ADGC are not the same, the genotype is set to 9 = Missing for that subject. |
- The shape of the dataset is (9180, 38)
- There are 9180 observations and 38 variables.
- There is no missing values in the dataset.
- In 38 variables, 32 of them contain categorical data, 6 of them numerical data and 32 nominal data.
- Categorical column names: ['NACCFAM', 'NACCNE4S', 'ANYMEDS', 'DEL', 'HALL', 'DEPD', 'ANX', 'APA', 'DISN', 'IRR', 'MOT', 'AGIT', 'ELAT', 'NITE', 'APP', 'DROPACT', 'SEX', 'MARISTAT_1', 'MARISTAT_2', 'MARISTAT_3', 'MARISTAT_4', 'MARISTAT_5', 'MARISTAT_6', 'INDEPEND_1', 'INDEPEND_2', 'INDEPEND_3', 'INDEPEND_4', 'RESIDENC_1', 'RESIDENC_2', 'RESIDENC_3', 'RESIDENC_4', 'CDRGLOB']
- Numerical column names: ['EDUC', 'NACCMOCA', 'NACCGDS', 'NACCAPOE', 'NACCAMD', 'NACCAGEB']
- Nominal column names: ['NACCFAM', 'NACCNE4S', 'ANYMEDS', 'DEL', 'HALL', 'DEPD', 'ANX', 'APA', 'DISN', 'IRR', 'MOT', 'AGIT', 'ELAT', 'NITE', 'APP', 'DROPACT', 'SEX', 'MARISTAT_1', 'MARISTAT_2', 'MARISTAT_3', 'MARISTAT_4', 'MARISTAT_5', 'MARISTAT_6', 'INDEPEND_1', 'INDEPEND_2', 'INDEPEND_3', 'INDEPEND_4', 'RESIDENC_1', 'RESIDENC_2', 'RESIDENC_3', 'RESIDENC_4', 'CDRGLOB']
- No data dropping process was performed.
- Quantile values were determined as 0.25 and 0.75, and the values above these values were perceived as outlier and the upper and lower values were equalized to Threshold values.
- There was no missing data.
- In both male and female patients, it was observed that anxiety, depression, irritability and apathy values affect moderate impairment.
- Chi-Square test was performed for nominal variables.At the end of this, the P-Value value of more than 0.5 ['naccfam', 'maristat_4', 'maristat_6'] was decided not to use the model.
- ANOVA test was performed for numerical variables. At the end of this, it was observed that the P-Value value was not larger than 0.5.
- Label Encoding was performed. But it was found that there was no column that should be made Label Encoding.
- One-Hot Encoding was performed. At the end of this, It was observed that this process should be done in two features (['NACCNE4S', 'NACCAPOE']).
- The data imbalance in the train datas was removed with Smote OversamPling before the model was performed.
- Two different stages were established in models.
- Baseline Model
- Estimator / Classifier Selection (Hyperparameter Tuning)