-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Targets spec #76
base: targets
Are you sure you want to change the base?
Targets spec #76
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
--- | ||
title: "Targets_implementation" | ||
author: "Rostyslav Vyuha" | ||
date: "April 14, 2021" | ||
output: html_document | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
# Main Export Functions | ||
|
||
### Verify Targets | ||
|
||
The purpose of this function would be to verify the correct order names and arguments in the passed targets. | ||
This is done by comparing it against the read in modules | ||
|
||
The function would contain 2 arguments **targets_source** and **modules_path**. | ||
The **targets_source** would either be the tar_target list or in most cases the path to the _targets.R | ||
The **modules_path** would be the file path to the modules_map.csv | ||
|
||
This function would not make any changes to the _targets.R file or the list itself | ||
and would simply output warnings and a boolean representing the validity of the targets with the module. | ||
|
||
#### List of warnings | ||
- error when a module step is missing | ||
- error when module steps are out of order | ||
- error when module step contains wrong arguments | ||
|
||
#### Example function usage | ||
|
||
```{r, echo=FALSE} | ||
verify_targets(targets_source = "/assets/specs/targets/depression_targets.R", modules_path = "/assets/specs/targets/modules_map.csv") | ||
``` | ||
This returns TRUE if depression_targets.R contains everything inside modules_map and in correct order with correct arguments | ||
|
||
### Run bllflow Targets | ||
|
||
This function would be responsible for running the targets with arguments filled in by the bllflow object. | ||
|
||
Excluding the arguments mentioned above this function would contain 2 arguments **targets_source** and **bllflow_object**. | ||
The **targets_source** would be identical to the one to verify targets. | ||
The **bllflow_object** would be the bllflow object created upon config initialization with mandatory checks for modules.csv and variables.csv as well as a present working_data. | ||
|
||
The function would first run verify targets to confirm correct order and presence of steps. Then it would modify the tar_targets arguments to reflect their true value rather then the shorthand (roles). | ||
Once the tar_targets were modified accordingly the _targets.R file is written and tar_make() is executed, letting targets handle the returns and the pipeline | ||
|
||
#### Example function usage | ||
|
||
```{r, echo=FALSE} | ||
run_bllflow_targets(targets_source = "/assets/specs/targets/depression_targets.R", bllflow_object = hui_object) | ||
``` | ||
|
||
This would create a _targets.R in base package directory using the targets found at targets_source. | ||
It would essentially be a copy and paste except for the Special Arguments which would be populated using the bllflow_object | ||
|
||
### Create _targets tepmlate | ||
|
||
This function would be responsible for creating the basic bllflow_targets.R file which would only be populated by the steps in modules.csv | ||
|
||
The function would once again contain only 2 arguments **target_path** and **modules_path** | ||
|
||
The function would utilize the shorthand(roles) notation when writing the functions for ease of use for the analyst | ||
|
||
#### Example function usage | ||
|
||
```{r, echo=FALSE} | ||
create_targets_tepmlate(target_path = "/assets/specs/targets/depression_targets.R", modules_path = "/assets/specs/targets/modules_map.csv") | ||
``` | ||
|
||
This would create a barebones depression_targets.R with only things found in the passed modules | ||
|
||
### Create _targets list | ||
|
||
This functions would be responsible for creating a list containing tar_target objects. | ||
|
||
The function would accept 1 mandatory arguments **modules_path** and one optional argument **target_path**. | ||
If a **target_path** is supplied the existing tar_targets list is read in and appended and verified before being returned, if no **target_path** is supplied a barebone template type list is created from the modules_map | ||
|
||
#### Example function usage | ||
|
||
```{r, echo=FALSE} | ||
create_targets_list(modules_path = "/assets/specs/targets/modules_map.csv") | ||
``` | ||
|
||
This would create a barebones tar_targets list with only things found in the passed modules | ||
|
||
# Contents of modules.csv | ||
|
||
### Step_id | ||
|
||
The step_id column must contain a unique identifier for the step being performed. | ||
|
||
### Step_function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets make it so that if they do not provide a value for this column, we will just the value in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would we handle multiple identical functions in the same module? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need step function to be the actual function name as thats what would be fed to targets. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Since
The function would have the same name as the value of the |
||
|
||
The step_function column contains the name of the function being performed in this step. This must match a function name present in the environment during execution. | ||
|
||
### Step_argument_name | ||
|
||
The step_argument_name as the name implies contains the name of a single argument | ||
|
||
### step_argument_value | ||
|
||
The step_argument_value contains the value for a single argument that matches the name in step_argument_name | ||
|
||
#### Special Arguments | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One addition would be a vector of values. For example an argument value for a list of survey cycles allowed, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would limit where a module can be used, I believe a column like this is best used in modules_map There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I understand. Can you give an example? |
||
|
||
*role* This would search for variables matching the role in variables.csv and be replaced with vecor of var names during run time. | ||
*data* This would pass the object attached to the bllflow object inside the data list ie: bllflow$data[[<whats inside data>]], alternatively it can be a reference to data generated by a previous step. | ||
*formula* This would create a left side = right side formula ie: formula[role["outcome"], role["predictor"], sep = "+"] would result in "outcome1 + outcome2 + outcome3 ~ predictor1 + predictor2 + predictor3" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would include terms that are interations and those that are not using this notation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean interactions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because in case of interactions a formula is not needed and you can pass variables as just a vector There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give an example? Not sure I understand. |
||
|
||
### Step_description | ||
|
||
This column contains the step description this is used to populate comments in template creation function. It should contain a helpful description of what this step is responsible for. | ||
|
||
### Step_order | ||
|
||
This column contains the order in which steps should be executed | ||
|
||
# Contents of modules_map | ||
|
||
### Module_Name | ||
|
||
The name of the module being included | ||
|
||
### Module_Path | ||
|
||
The relative path to the module being loaded | ||
|
||
### Module_description | ||
|
||
The module being ran | ||
|
||
### Module_order | ||
|
||
The order in which modules are ran | ||
|
||
# Example function usage | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section should be combined with the first section. Each example usage should be moved to the section for that function. |
||
|
||
## Verify_targets | ||
|
||
|
||
## run_bllflow_targets | ||
|
||
|
||
|
||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
step_ID,step_function,step_arguments,step_description,step_order | ||
create_depression_score_imputation_dataset,create_depression_score_imputation_dataset_function,"(data = data[""study_dataset""], variables = role[""create_depression_score_imputation_dataset""], survey_cycle_variable = role[""survey_cycle""], survey_cycle_lower_limit = 2003, survey_cycle_upper_limit = 2014)",Create the dataset with which we will impute the depression score variable. Only include the survey cycles from 2003 to 2014 since mood disorder is one of the strongest predictors of depression score and it was only available during these cycles in the PUMF,1 | ||
impute_depression_score,impute_depression_score_function,"(data = data[""create_depression_score_imputation_dataset""], outcome = role[""impute_depression_score_outcome""], predictors = role[""impute_depression_score_predictors""], num_multiple_imputations = 5, method = polr)",Imputes the depression score variables using the MICE method. Use a polytomous logistic regression method since there are multiple categories in the depression score variable.,2 | ||
merge_depression_score_imputed_dataset,merge_depression_score_imputed_dataset_function,"(depression_score_imputed_data = data[""impute_depression_score""], study_dataset = data[""study_dataset""], merge_by = role[""id""])",Merge the depression score imputed dataset back into the original study dataset using the id column.,3 | ||
create_mood_disorder_imputation_dataset,create_mood_disorder_imputation_dataset_function,"(data = data[""study_dataset""], variables = role[""create_mood_disorder_imputation_dataset""], survey_cycle_variable = role[""survey_cycle""], survey_cycle_lower_limit = 2001, survey_cycle_upper_limit = 2014)","Create the dataset with which we will impute the mood disorder variable. Include all the cycles we have, which is everything from 2001 to 2014.",4 | ||
impute_mood_disorder,impute_mood_disorder_function,"(data = data[""create_mood_disorder_imputation_dataset""], outcome = role[""impute_mood_disorder_outcome""], predictors = role[""impute_mood_disorder_predictors""], num_multiple_imputations = 5, method = logreg)","Impute the mood disorder variable using MICE method with 5 iterations. Use the logsitc regression model since mood disorder has only 2 categories, Yes and No.",5 | ||
merge_imputed_mood_disorder_data,merge_imputed_mood_disorder_data_function,"(mood_disorder_imputed_data = data[""impute_mood_disorder""], study_dataset = data[""study_dataset""], merge_by = role[""id""]",Merge the mood disorder imputed dataset back into the original study dataset using the id column.,6 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
library(targets) | ||
library(huiport) # The package containing functions found in depression_imputation_module | ||
Hui_impute <- create_targets_tepmlate() | ||
|
||
list( | ||
# Create the dataset with which we will impute the depression score variable. Only include the survey cycles from 2003 to 2014 since mood disorder is one of the strongest predictors of depression score and it was only available during these cycles in the PUMF | ||
tar_target( | ||
create_depression_score_imputation_dataset, | ||
create_depression_score_imputation_dataset_function( | ||
data = data["study_dataset"], | ||
variables = role["create_depression_score_imputation_dataset"], | ||
survey_cycle_variable = role["survey_cycle"], | ||
survey_cycle_lower_limit = 2003, | ||
survey_cycle_upper_limit = 2014 | ||
), | ||
# Imputes the depression score variables using the MICE method. Use a polytomous logistic regression method since there are multiple categories in the depression score variable. | ||
tar_target( | ||
impute_depression_score, | ||
impute_depression_score_function( | ||
data = create_depression_score_imputation_dataset, | ||
outcome = role["impute_depression_score_outcome"], | ||
predictors = role["impute_depression_score_predictors"], | ||
num_multiple_imputations = 5, | ||
method = "polr" | ||
) | ||
), | ||
# Merge the depression score imputed dataset back into the original study dataset using the id column. | ||
tar_target( | ||
merge_depression_score_imputed_dataset, | ||
merge_depression_score_imputed_dataset_function( | ||
depression_score_imputed_data = data["impute_depression_score"], | ||
study_dataset = data["study_dataset"], | ||
merge_by = role["id"] | ||
) | ||
), | ||
# Create the dataset with which we will impute the mood disorder variable. Include all the cycles we have, which is everything from 2001 to 2014. | ||
tar_target( | ||
create_mood_disorder_imputation_dataset, | ||
create_mood_disorder_imputation_dataset_function( | ||
data = data["study_dataset"], | ||
variables = role["create_mood_disorder_imputation_dataset"], | ||
survey_cycle_variable = role["survey_cycle"], | ||
survey_cycle_lower_limit = 2001, | ||
survey_cycle_upper_limit = 2014 | ||
) | ||
), | ||
# Impute the mood disorder variable using MICE method with 5 iterations. Use the logsitc regression model since mood disorder has only 2 categories, Yes and No. | ||
tar_target( | ||
impute_mood_disorder, | ||
impute_mood_disorder_function( | ||
data = data["create_mood_disorder_imputation_dataset"], | ||
outcome = role["impute_mood_disorder_outcome"], | ||
predictors = role["impute_mood_disorder_predictors"], | ||
num_multiple_imputations = 5, | ||
method = "logreg" | ||
) | ||
), | ||
# Merge the mood disorder imputed dataset back into the original study dataset using the id column. | ||
tar_target( | ||
merge_imputed_mood_disorder_data, | ||
merge_imputed_mood_disorder_data_function( | ||
mood_disorder_imputed_data = data["impute_mood_disorder"], | ||
study_dataset = data["study_dataset"], | ||
merge_by = role["id"] | ||
) | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
module_name,module_path,module_description,module_order | ||
depression_imputation,./depression_imputation_module.csv,This module is responsible for imputing the depression score and mood disorder variables within the CCHS-PUMF from cycles 2001 to 2014.,1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
context("Modules Test") | ||
library(targets) | ||
source("verification_expected_input.R") | ||
|
||
test_that("Module verification returns TRUE when it matches modules.csv",{ | ||
expect_true(verify_targets(targets_source = input_one, modules_path = "./modules_map.csv")) | ||
}) | ||
|
||
test_that("Module verification returns appropriate error when a module step is missing",{ | ||
expect_error((verify_targets(targets_source = input_two, modules_path = "./modules_map.csv"), "Missing step merge_imputed_mood_disorder_data") | ||
}) | ||
|
||
test_that("Module verification returns appropriate error when module steps are out of order",{ | ||
expect_error((verify_targets(targets_source = input_two, modules_path = "./modules_map.csv"), "Wrong order of steps") | ||
}) | ||
|
||
test_that("Module verification returns appropriate error when module step contains wrong arguments",{ | ||
expect_error((verify_targets(targets_source = input_two, modules_path = "./modules_map.csv"), "create_depression_score_imputation_dataset contains invalid step arguments") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you the document the list of warnings that this function can output