Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Targets factories to ease {maldipickr} workflow #46

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
86a4a16
draft first {target} factories for {maldipickr}
cpauvert May 3, 2024
55575ee
add {targets} and {tarchetypes} as optional dependencies
cpauvert May 3, 2024
2d14543
ignore the guidelines for contributing during pkg build
cpauvert May 3, 2024
5a97e9f
forgotten line removal after compilation
cpauvert May 3, 2024
b5692bd
add inflated changes for import target factory
cpauvert May 3, 2024
cb62ce9
fix spelling issues
cpauvert May 3, 2024
4fe03d8
fix having targets and tarchetypes as optional dependencies in Suggests
cpauvert May 3, 2024
28a0489
draft gather_spectra_stats() function
cpauvert May 3, 2024
e5dc66f
add test for structure and fix no visible binding issue
cpauvert May 3, 2024
0017bd1
inflate the now tested gather_spectra_stats function
cpauvert May 3, 2024
131caf2
fix typo in target factory example
cpauvert May 3, 2024
c91f260
Merge branch 'fix-45-spectra-stats' into targets-factories
cpauvert May 3, 2024
1257a2a
add section for {targets} functions
cpauvert May 3, 2024
7f2d55f
add TODOs and better explanations of target objects
cpauvert May 3, 2024
12e5220
add explicit section title for factory and link to docs
cpauvert May 6, 2024
a830afd
precise TODO for inherited options
cpauvert May 6, 2024
f94d897
describe the targets objects obtained with the factory
cpauvert May 6, 2024
7f43051
Merge branch 'main' into targets-factories
cpauvert May 6, 2024
bfb1623
make the call to factory cleaner using symbol to prefix targets names
cpauvert May 6, 2024
35aecfe
add {coop} to Suggests packages
cpauvert May 8, 2024
23a821f
initiate the target factory for delineation
cpauvert May 8, 2024
7e4584c
extend the factory up to the picking step
cpauvert May 9, 2024
60bd383
fix typo
cpauvert May 9, 2024
166ee94
fix issue with unrecognized processed list
cpauvert May 9, 2024
dbfbf3f
convert the factory output from vectors to list
cpauvert May 13, 2024
8afef9e
draft test for targets factory
cpauvert May 13, 2024
875e2a2
add test for import spectra target factory
cpauvert May 13, 2024
c4c2c1d
precise the cosine function and link correctly to the man page
cpauvert May 13, 2024
1c5ea4e
draft and explore objects within picking factory
cpauvert May 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@
^CRAN-SUBMISSION$
^revdep$
^CODE_OF_CONDUCT\.md$
^CONTRIBUTING\.md$
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,12 @@ Imports:
tools,
utils
Suggests:
coop,
knitr,
rmarkdown,
spelling,
tarchetypes (>= 0.9.0),
targets (>= 1.7.0),
testthat
VignetteBuilder:
knitr
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export("%>%")
export(check_spectra)
export(delineate_with_identification)
export(delineate_with_similarity)
export(gather_spectra_stats)
export(get_spectra_names)
export(import_biotyper_spectra)
export(import_spede_clusters)
Expand All @@ -15,6 +16,8 @@ export(read_biotyper_report)
export(read_many_biotyper_reports)
export(remove_spectra)
export(set_reference_spectra)
export(tar_import_and_process_spectra)
export(tar_pick_with_similarity)
importFrom(MALDIquant,binPeaks)
importFrom(MALDIquant,calibrateIntensity)
importFrom(MALDIquant,createMassSpectrum)
Expand Down
54 changes: 54 additions & 0 deletions R/gather_spectra_stats.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# WARNING - Generated by {fusen} from dev/flat_utils.Rmd: do not edit by hand

#' Aggregate spectra quality-check statistics
#'
#'
#' @param check_vectors A list of logical vectors from [check_spectra]
#'
#' @return A tibble of one row with the following 5 columns of integers:
#' * `n_spectra`: total number of raw spectra.
#' * `n_valid_spectra`: total number of spectra passing all quality checks
#' * `is_empty`, `is_outlier_length` and `is_not_regular`: total of spectra flagged with these irregularities.
#'
#' @seealso [check_spectra]
#' @export
#' @examples
#' # Get an example directory of six Bruker MALDI Biotyper spectra
#' directory_biotyper_spectra <- system.file(
#' "toy-species-spectra",
#' package = "maldipickr"
#' )
#' # Import the six spectra
#' spectra_list <- import_biotyper_spectra(directory_biotyper_spectra)
#' # Display the list of checks, with FALSE where no anomaly is detected
#' checks <- check_spectra(spectra_list)
#' # Aggregate the statistics of quality-checked spectra
#' gather_spectra_stats(checks)
gather_spectra_stats <- function(check_vectors) {
if (typeof(check_vectors) != "list" ||
is.null(names(check_vectors))) {
stop(
"check_vectors is not a named list. See maldipickr::check_spectra() help page for a correct format."
)
}
equal_length <- unique(lengths(check_vectors))
if (length(equal_length) != 1 ||
any(names(check_vectors) != c("is_empty", "is_outlier_length", "is_not_regular"))
) {
stop(
"Unexpected format for checks_vectors. Are you sure this is the output of maldipickr::check_spectra()?"
)
}

# check_vectors from maldipickr::check_spectra
# src: https://stackoverflow.com/a/51140480/21085566
aggregated_checks <- Reduce(`|`, check_vectors)
check_stats <- vapply(check_vectors, sum, FUN.VALUE = integer(1)) %>%
tibble::as_tibble_row()
tibble::tibble(
"n_spectra" = length(aggregated_checks),
"n_valid_spectra" = .data$n_spectra - sum(aggregated_checks)
) %>%
dplyr::bind_cols(check_stats) %>%
return()
}
124 changes: 124 additions & 0 deletions R/tar_import_and_process_spectra.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# WARNING - Generated by {fusen} from dev/maldipickr-workflow-with-targets.Rmd: do not edit by hand

#' Import and process checked spectra using targets
#'
#'
#' Given a vector of paths to MALDI Biotyper directories containing `acqus` and
#' `acqu`, this target factory facilitates the steps from raw spectra to
#' quality-checked processed spectra. See [targets::tar_target] for more
#' information about what are target objects.
#'
#' @param name A symbol indicating the prefix of all targets created by the factory.
#' For instance, calling `tar_import_and_process_spectra(anaerobe, ...)` will create the target `anaerobe_spectra_raw` among others (see the Value section).
#' @param raw_spectra_directories A vector of paths to directories containing MALDI Biotyper spectra files. This is similar to the `biotyper_directory` parameter from [import_biotyper_spectra], but as a character vector.
#' @inheritParams check_spectra
#' @inheritParams targets::tar_target_raw # TODO: add others parameters to inherit, maybe with ... to targets::tar_target()?
#'
#' # TODO: restrain format to only rds or qs but for qs warn with rlang::check_installed
#'
#' @return A list of target objects whose names use the `name` argument as a prefix:
#' `*_plates_files` (e.g., `anaerobe_plates_files`) and `*_plates` (e.g., `anaerobe_plates`): are unnamed and named lists of input paths provided by `raw_spectra_directories`, respectively, as produced by [tarchetypes::tar_files_input].
#' `*_spectra_raw` (e.g., `anaerobe_spectra_raw`): is a list-of-list of imported spectra objects produced by [import_biotyper_spectra].
#' `*_checks` (e.g., `anaerobe_checks`): is a list-of-list of logical vectors produced by [check_spectra].
#' `*_valid_spectra` (e.g., `anaerobe_valid_spectra`): is a list-of-list of subset of quality-checked spectra produced by [remove_spectra].
#' `*_spectra_stats` (e.g., `anaerobe_spectra_stats`): is a tibble of statistics from the quality-check produced by [gather_spectra_stats] with a row for each input paths from `_plates_files`.
#' `*_processed` (e.g., `anaerobe_processed`): is a list-of-list of processed spectra and associated peaks produced by [process_spectra].
#'
#' @note Once the workflow is checked (with [targets::tar_manifest] or [targets::tar_visnetwork]) and run (with [targets::tar_make]), all the target objects returned can be accessed using [targets::tar_read]] (e.g., `targets::tar_read(anaerobe_spectra_stats)`).
#'
#' @export
#' @examples
#' if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") {
#' targets::tar_dir({ # tar_dir() runs code from a temporary directory.
#' targets::tar_script({
#' library(maldipickr)
#' list(
#' tar_import_and_process_spectra(
#' name = "anaerobe",
#' raw_spectra_directories = system.file(
#' "toy-species-spectra",
#' package = "maldipickr"),
#' tolerance = 1
#' )
#' )},ask = FALSE)
#' targets::tar_make()
#' })
#' }
tar_import_and_process_spectra <- function(
name,
raw_spectra_directories,
tolerance,
format = targets::tar_option_get("format")) {
rlang::check_installed(c("targets", "tarchetypes"),
reason = "to facilitate {maldipickr} workflow development"
)
name <- targets::tar_deparse_language(substitute(name))
targets::tar_assert_path(raw_spectra_directories)
targets::tar_assert_dbl(tolerance)

name_plates <- paste0(name, "_plates")
name_spectra_raw <- paste0(name, "_spectra_raw")
name_checks <- paste0(name, "_checks")
name_spectra_stats <- paste0(name, "_spectra_stats")
name_valid_spectra <- paste0(name, "_valid_spectra")
name_processed <- paste0(name, "_processed")

sym_plates <- as.symbol(name_plates)
sym_spectra_raw <- as.symbol(name_spectra_raw)
sym_checks <- as.symbol(name_checks)
sym_spectra_stats <- as.symbol(name_spectra_stats)
sym_valid_spectra <- as.symbol(name_valid_spectra)

list(
tarchetypes::tar_files_input_raw(name_plates,
raw_spectra_directories,
format = "file"
),
targets::tar_target_raw(name_spectra_raw,
command = substitute(suppressWarnings(import_biotyper_spectra(sym_plates)),
env = list(sym_plates = sym_plates)
),
pattern = substitute(map(sym_plates),
env = list(sym_plates = sym_plates)
),
format = format
),
targets::tar_target_raw(name_checks,
command = substitute(check_spectra(sym_spectra_raw, tolerance),
env = list(tolerance = tolerance, sym_spectra_raw = sym_spectra_raw)
),
pattern = substitute(map(sym_spectra_raw), env = list(sym_spectra_raw = sym_spectra_raw)),
format = format
),
targets::tar_target_raw(name_spectra_stats,
command = substitute(
gather_spectra_stats(sym_checks) %>%
dplyr::mutate(maldi_plate = sym_plates),
env = list(sym_checks = sym_checks, sym_plates = sym_plates)
),
pattern = substitute(map(sym_checks, sym_plates),
env = list(sym_checks = sym_checks, sym_plates = sym_plates)
),
iteration = "vector", format = format
),
# Filter-out non empty spectra and unusual spectra
targets::tar_target_raw(name_valid_spectra,
command = substitute(remove_spectra(sym_spectra_raw, sym_checks),
env = list(sym_spectra_raw = sym_spectra_raw, sym_checks = sym_checks)
),
pattern = substitute(map(sym_spectra_raw, sym_checks),
env = list(sym_spectra_raw = sym_spectra_raw, sym_checks = sym_checks)
),
format = format
),
targets::tar_target_raw(name_processed,
command = substitute(process_spectra(sym_valid_spectra),
env = list(sym_valid_spectra = sym_valid_spectra)
),
pattern = substitute(map(sym_valid_spectra),
env = list(sym_valid_spectra = sym_valid_spectra)
),
format = format, iteration = "list"
)
)
}
127 changes: 127 additions & 0 deletions R/tar_pick_with_similarity.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# WARNING - Generated by {fusen} from dev/maldipickr-workflow-with-targets.Rmd: do not edit by hand

#' Delineate clusters of spectra to be picked using targets
#'
#' Given upstream targets of processed spectra (from [tar_import_and_process_spectra])
#' this target factory facilitates the steps from quality-checked
#' processed spectra to clusters of spectra. See [targets::tar_target] for more
#' information about what are target objects.
#'
#' @param name A symbol indicating the prefix of all targets created by the factory.
#' For instance, calling `tar_pick_with_similarity(anaerobe, ...)` will create
#' the target `anaerobe_sim_interpolated` among others (see the Value section).
#' @param targets_spectra A list of targets produced by [tar_import_and_process_spectra] that should contains one or more targets named `*_processed`.
#' @param threshold A numeric value indicating the minimal cosine similarity between two spectra.
#' @inheritParams pick_spectra
#'
#' @return A list of target objects whose names use the `name` argument as a prefix:
#' * `*_fm_interpolated` (e.g., `anaerobe_fm_interpolated`): a matrix produced by [merge_processed_spectra].
#' * `*_sim_interpolated` (e.g., `anaerobe_sim_interpolated`): a symmetric cosine similarity matrix produced by the transposed version of [coop::cosine].
#' * `*_df_interpolated` (e.g., `anaerobe_df_interpolated`): a tibble with the membership (i.e., which cluster label) each spectra belongs to produced by [delineate_with_similarity].
#' * `*_processed_metadata` (e.g., `anaerobe_processed_metadata`): a tibble of aggregated technical metadata for each spectra.
#' * `*_clusters` (e.g., `anaerobe_clusters`): a tibble indicating with the previous metadata and which spectra was chosen as reference produced by [set_reference_spectra].
#' * `*_picked` (e.g., `anaerobe_picked`): a tibble containing all the previous metadata but more importantly which spectra should be picked produced by [pick_spectra].
#'
#' @export
#' @examples
#'\dontrun{tar_pick_with_similarity()}
tar_pick_with_similarity <- function(
name,
targets_spectra,
threshold,
metadata_df = NULL, criteria_column = NULL,
hard_mask_column = NULL, soft_mask_column = NULL,
is_descending_order = TRUE,
is_sorted = FALSE) {
rlang::check_installed(c("targets", "tarchetypes", "coop"),
reason = "to facilitate {maldipickr} workflow development"
)
name <- targets::tar_deparse_language(substitute(name))
targets::tar_assert_dbl(threshold)
targets::tar_assert_list(targets_spectra)
targets_spectra <- unlist(list(targets_spectra), recursive = TRUE)


# It was tricky to apply the symbol transformation to a list whilst constructing
# a list structure that could be used correctly by merge_processed_spectra()
# Thankfully, @wlandau suggested an awesome solution to a similar problem
# https://github.com/ropensci/targets/discussions/461#discussioncomment-709984
#
# which will create list(fast_processed, slow_processed) from
# targets_spectra = c(fast_target_factory, slow_target_factory)
name_processed <- tarchetypes::tar_select_names(targets_spectra, targets::ends_with("_processed"))
processed_expr <- as.call(c(as.symbol("c"), lapply(name_processed, as.symbol)))


name_fm <- paste0(name, "_fm_interpolated")
name_sim <- paste0(name, "_sim_interpolated")
name_df <- paste0(name, "_df_interpolated")
name_processed_metadata <- paste0(name, "_processed_metadata")
name_clusters <- paste0(name, "_clusters")
name_picked <- paste0(name, "_picked")


sym_fm <- as.symbol(name_fm)
sym_sim <- as.symbol(name_sim)
sym_df <- as.symbol(name_df)
sym_processed_metadata <- as.symbol(name_processed_metadata)
sym_clusters <- as.symbol(name_clusters)

list(
targets::tar_target_raw(
name = name_fm,
command = substitute(merge_processed_spectra(processed_spectra),
env = list(processed_spectra = processed_expr)
)
),
targets::tar_target_raw(
name = name_sim,
command = substitute(coop::tcosine(fm_interpolated),
env = list(fm_interpolated = sym_fm)
)
),
targets::tar_target_raw(
name = name_df,
command = substitute(delineate_with_similarity(
sim_matrix = sim_interpolated,
threshold = threshold,
method = "complete"),
env = list(sim_interpolated = sym_sim, threshold = threshold)
)
),
targets::tar_target_raw(
name = name_processed_metadata,
command = substitute(
dplyr::bind_rows(
lapply(processed_spectra, `[[`, "metadata")),
env = list(processed_spectra = processed_expr)
),
iteration = "list"
),
targets::tar_target_raw(
name = name_clusters,
command = substitute(
set_reference_spectra(df_interpolated, processed_metadata),
env = list(df_interpolated = sym_df, processed_metadata = sym_processed_metadata)
)
),
targets::tar_target_raw(
name = name_picked,
command = substitute(
pick_spectra(cluster_df = df_interpolated,
metadata_df = metadata_df, criteria_column = criteria_column,
hard_mask_column = hard_mask_column,
soft_mask_column = soft_mask_column,
is_descending_order = is_descending_order,
is_sorted = is_sorted),
env = list(df_interpolated = sym_clusters,
metadata_df = metadata_df, criteria_column = criteria_column,
hard_mask_column = hard_mask_column,
soft_mask_column = soft_mask_column,
is_descending_order = is_descending_order,
is_sorted = is_sorted
)
)
)
)
}
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ Please note that the [`{maldipickr}`](https://github.com/ClavelLab/maldipickr) p

## Credits

### Acknowledgements
### Acknowledgments

This R package is developed for spectra data generated by the Bruker MALDI Biotyper device. The [`{maldipickr}`](https://github.com/ClavelLab/maldipickr) package is built from a suite of Rmarkdown files using the [`{fusen}`](https://thinkr-open.github.io/fusen/) package by Rochette S (2023). It relies on:

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ By contributing to this project, you agree to abide by its terms.

## Credits

### Acknowledgements
### Acknowledgments

This R package is developed for spectra data generated by the Bruker
MALDI Biotyper device. The
Expand Down
8 changes: 7 additions & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,20 @@ reference:
- starts_with("delineate_")

- title: "Cherry-pick"
desc: "Function to pinpoint and label specific spectra within clusters"
desc: "Functions to pinpoint and label specific spectra within clusters"
contents:
- pick_spectra
- set_reference_spectra

- title: "Workflow"
desc: "Functions (i.e., targets factories) to facilitate {targets} workflow development for {maldipickr}"
contents:
- starts_with("tar_")

- title: "Miscellaneous"
contents:
- is_well_on_edge
- gather_spectra_stats
- get_spectra_names

news:
Expand Down
Loading