How sequencing depth impacts the number of DARs? #1789

sshanyiiii · 2024-09-27T04:00:49Z

Hi,

I am working with Signac to find DARs for each cell type under two experimental conditions. I have 6 samples(2 experimental conditions and 3 time points) constructed by a 10X multiome GEX+ATAC library. I merged 6 samples and defined 13 cell types.

peaks <- CallPeaks(Seurat_obj, group.by = "Celltype")
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_mm10, invert = TRUE)
macs2_counts <- FeatureMatrix(
  fragments = Fragments(Seurat_obj),
  features = peaks,
  cells = colnames(Seurat_obj))
Seurat_obj[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts,
  fragments = Fragments(Seurat_obj),
  annotation = annotation)

The sequencing depths of the 6 samples differ greatly.

seq_depth <- tapply(Seurat_obj$nCount_peaks, Seurat_obj$orig.ident, mean)
barplot(seq_depth, main="Average ATAC Reads per Cell by Sample", las = 2)

When I used FindMarkers to find DARs between two conditions at 3-time points, I noticed that the number of DARs is highly related to sequencing depth. Also, the TSS enrichment score of samples is consistent with sequencing depth. The same trend holds for DARs between two conditions across cell types.

As you mentioned the differential accessibility test uses the TF-IDF values which incorporate a per-cell depth normalization step #373 and I use 'nCount_peaks' as a covariate with FindMarkers function. However the number of DARs still highly related to sequencing depth, so I want to ask why this happens and what can I do with sequencing depth?

DefaultAssay(Seurat_obj) <- "peaks"
Seurat_obj <- RunTFIDF(Seurat_obj, assay = "peaks")      # TF-IDF normalization
Idents(Seurat_obj) <- Seurat_obj$orig.ident
time <- c("125", "145", "185")
NT_pst_time_DiffPeak <- list()
WT_pst_time_DiffPeak <- list()

for(i in 1:length(time)){
  ident.1 <- paste0("NT", time[i])
  ident.2 <- paste0("WT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  NT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}
for(i in 1:length(time)){
  ident.1 <- paste0("WT", time[i])
  ident.2 <- paste0("NT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  WT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}

The text was updated successfully, but these errors were encountered:

sshanyiiii added the documentation Documentation help label Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How sequencing depth impacts the number of DARs? #1789

How sequencing depth impacts the number of DARs? #1789

sshanyiiii commented Sep 27, 2024

How sequencing depth impacts the number of DARs? #1789

How sequencing depth impacts the number of DARs? #1789

Comments

sshanyiiii commented Sep 27, 2024