Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create BunisHSPCData #32

Merged
merged 8 commits into from
Apr 29, 2021
Merged

Create BunisHSPCData #32

merged 8 commits into from
Apr 29, 2021

Conversation

dtm2451
Copy link
Contributor

@dtm2451 dtm2451 commented Feb 21, 2021

Addresses #28.
Still need to fill in the citation & add to longtests/.

@@ -6,6 +6,7 @@ Reference,Taxonomy,Part,Number,Call
@baron2016singlecell,10090,pancreas,1886,BaronPancreasData('mouse')
@bhaduri2020cell,9606,cortical organoids,242349,BhaduriOrganoidData()
@buettner2015computational,10090,embryonic stem cells,288,BuettnerESCData()
@bunis2021haematopoietic,9606,haematopoietic stem and progenitor,???,BunisHSPCData
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the manifest's Number come from? Is it a cell number or taxonomy item? And if cell number, the unfiltered number?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just give the number of cells that you get from running BunisHSPCData() with the default arguments. Note that the last thing is the actual command (it'll get wrapped in backticks when it gets printed in the vignette).

@dtm2451 dtm2451 changed the title Create BunisHSPCData, add to manifest & authors@R Create BunisHSPCData Feb 21, 2021
Copy link
Owner

@LTLA LTLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some stub tests in longtests/ to make sure that things run with a variety of options; see some of the examples in there for more details.

#'
#' @details
#' Column metadata is recreated from GEO using the author-supplied TSV of per-cell annotations, or retrieved from a processed version of the data shared by authors via figshare.
#' This contains information such as the tissue & sample of origin, age group, likely cell type, and Developmental Stage Scoring. Cevelopmental Stage
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cevelopmental Stage?

#' This contains information such as the tissue & sample of origin, age group, likely cell type, and Developmental Stage Scoring. Cevelopmental Stage
#'
#' If \code{filtered=TRUE}, only the cells used by the authors in their final analysis are returned.
#' Otherwise, an additional \code{filtered} field will be present in the \code{\link{colData}}, indicating whether the cell was retained by the authors.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retained field?

#' Otherwise, an additional \code{filtered} field will be present in the \code{\link{colData}}, indicating whether the cell was retained by the authors.
#'
#' All data are downloaded from ExperimentHub and cached for local re-use.
#' Specific resources can be retrieved by searching for \code{scRNAseq/bacher-tcell}.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bunis-hspc

#' @author Daniel Bunis
#'
#' @references
#' Bunis et al. 2021
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See format of other references here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldda noted that I hadn't finalized the docs text yet, so I was gonna get to this later + likely wouldda caught the others above myself! But noted and I'll fix all of these.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought you had cottoned on to the fact that the best way to get shit past me is to distract me with lots of little things that need fixing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL no, but now that you mention it, I did notice you missed a filtered/retained mistake in the actual code.

@@ -1,7 +1,7 @@
write.csv(file="../../extdata/2.6.0/metadata-bunis-hspc.csv",
data.frame(
Title = sprintf("Bunis human HSPC %s", c("counts", "colData", "rowData")),
Description = sprintf("%s for the Bunis human haematopoietic stem-progenitor single-cell RNA-seq dataset",
Description = sprintf("%s for the Bunis human haematopoietic stem and progenitor single-cell RNA-seq dataset",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: changing this will need a re-propagation to ExperimentHub, as the manifests there do not update automatically. Compile it to recreate the CSVs and then we'll notify the EHub maintainers that this is altered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that important really. I'll just revert this.

@dtm2451
Copy link
Contributor Author

dtm2451 commented Feb 24, 2021

=/. Now that I've tried actually testing, I seem to get an error with the colData add because the hub coldata is not full-size...

I can fix it by changing line 18 of create_sce.R from

args$colData <- hub[hub$rdatapath==file.path(host, sprintf("coldata%s.rds", suffix))][[1]]

to

args$colData <- hub[hub$rdatapath==file.path(host, sprintf("coldata%s.rds", suffix))][[1]][colnames(all.assays[[1]]),, drop = FALSE]

but I can't imagine that's what you would actually want to happen.

@LTLA
Copy link
Owner

LTLA commented Feb 24, 2021

Probably set has.coldata=FALSE in the .create_sce() call and then add it outside, once we've decided whether or not we want just the filtered cells or everything. Note that there are three filtered modes:

  • Filtered, the cells you used. This is what you get with filtered=TRUE.
  • Unfiltered cells. This is what you get with... I dunno, filtered="cells", perhaps?
  • Unfiltered barcodes (i.e., the full matrix). This is what you get with filtered=FALSE.

A little different from the other cases, and you'll have to use isTRUE and isFALSE to do the checks.

@dtm2451
Copy link
Contributor Author

dtm2451 commented Apr 28, 2021

ready to go?

@LTLA
Copy link
Owner

LTLA commented Apr 29, 2021

Looks good to me.

@LTLA LTLA merged commit 75862a1 into LTLA:master Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants