Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How sequencing depth impacts the number of DARs? #1789

Open
sshanyiiii opened this issue Sep 27, 2024 · 0 comments
Open

How sequencing depth impacts the number of DARs? #1789

sshanyiiii opened this issue Sep 27, 2024 · 0 comments
Labels
documentation Documentation help

Comments

@sshanyiiii
Copy link

Hi,

I am working with Signac to find DARs for each cell type under two experimental conditions. I have 6 samples(2 experimental conditions and 3 time points) constructed by a 10X multiome GEX+ATAC library. I merged 6 samples and defined 13 cell types.

peaks <- CallPeaks(Seurat_obj, group.by = "Celltype")
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_mm10, invert = TRUE)
macs2_counts <- FeatureMatrix(
  fragments = Fragments(Seurat_obj),
  features = peaks,
  cells = colnames(Seurat_obj))
Seurat_obj[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts,
  fragments = Fragments(Seurat_obj),
  annotation = annotation)

The sequencing depths of the 6 samples differ greatly.

seq_depth <- tapply(Seurat_obj$nCount_peaks, Seurat_obj$orig.ident, mean)
barplot(seq_depth, main="Average ATAC Reads per Cell by Sample", las = 2)

图片2

When I used FindMarkers to find DARs between two conditions at 3-time points, I noticed that the number of DARs is highly related to sequencing depth. Also, the TSS enrichment score of samples is consistent with sequencing depth. The same trend holds for DARs between two conditions across cell types.
图片1
图片3

As you mentioned the differential accessibility test uses the TF-IDF values which incorporate a per-cell depth normalization step #373 and I use 'nCount_peaks' as a covariate with FindMarkers function. However the number of DARs still highly related to sequencing depth, so I want to ask why this happens and what can I do with sequencing depth?

DefaultAssay(Seurat_obj) <- "peaks"
Seurat_obj <- RunTFIDF(Seurat_obj, assay = "peaks")      # TF-IDF normalization
Idents(Seurat_obj) <- Seurat_obj$orig.ident
time <- c("125", "145", "185")
NT_pst_time_DiffPeak <- list()
WT_pst_time_DiffPeak <- list()

for(i in 1:length(time)){
  ident.1 <- paste0("NT", time[i])
  ident.2 <- paste0("WT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  NT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}
for(i in 1:length(time)){
  ident.1 <- paste0("WT", time[i])
  ident.2 <- paste0("NT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  WT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}
@sshanyiiii sshanyiiii added the documentation Documentation help label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation help
Projects
None yet
Development

No branches or pull requests

1 participant