Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runGenePeakcorr - Requesting ability to specify updated genomes (such as mm39) #9

Open
iilyashov opened this issue Oct 31, 2022 · 4 comments

Comments

@iilyashov
Copy link

iilyashov commented Oct 31, 2022

Hello,

Really fantastic package and pipeline! I was wondering if it may be possible to run the function runGenePeakcorr using the updated mouse genome mm29? If this must be created manually by users, how would I go about generating the proper RefSeq TSS gene annotations, etc. like you have done previously for mm10, hg19, and hg38?

Any help is greatly appreciated, thank you!

@vkartha
Copy link

vkartha commented Nov 6, 2022

Hi there! Thanks for your interest in using this package. For the built-in ones, we obtained it from NCBI (built from a GTF). I think what might be better than us building this for each desired genome, is adapting it such that users can input their own annotations. I'm working on this, it's mosty a matter of formatting the GRanges object to fit our current references. The other thing is you need to make sure you also have access to the corresponding BS.genome object (for the internal functions to obtain the GC content for peak ranges), so there are two things that need changing / matching here, reference wise. Will circle back if I can get this specific reference sorted soon

@vkartha
Copy link

vkartha commented Jan 1, 2023

Hi there! Sending some example code to get the gene annotation TSS object. I will leave this issue open until I formally change the code to allow users to pass their own gene TSS annotation GRanges object and matching BSgenome object (if you could test this that would be great.) One issue with this is there is always the chance someone may accidentally use annotations and genome references that don't match, (e.g. mm39 genes with mm10 genome reference or vice versa) which is why I made a few commonly used built-in options.

That being said, it will be impractical for me to update the package to have every single use-case, so let's start with allowing custom annotations.

See below how you can first download the corresponding gtf (Gene annotation) and then derive TSS coords as a GRanges object that will then be fed into FigR's functions:

#For mm39

wget https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/genes/mm39.ncbiRefSeq.gtf.gz .

In R

library(rtracklayer)

gtfRanges <- import("./mm39.ncbiRefSeq.gtf.gz")
gtfRanges

# Subset to transcripts
gtfRanges <- gtfRanges[gtfRanges$type %in% "transcript"]

# Get TSS (strand-aware) for each transcript 
TSS <- ifelse(strand(gtfRanges ) == "+", start(gtfRanges ), end(gtfRanges ))

#  Construct new GRanges  with just TSS
# Convert to 1-based start coordinate system
TSS <- GRanges(seqnames(gtfRanges), IRanges(TSS+1, width=1), strand(gtfRanges ))
mcols(TSS) <- mcols(gtfRanges )


mcols(TSS) <- mcols(TSS)[,c("transcript_id","gene_name")]

# Keep valid chromosomes
validChr <- paste0("chr", c(1:22, "X", "Y"))
TSS <- TSS[seqnames(TSS) %in% validChr]
TSS <- sort(TSS)

# Only keep first transcript (to keep genes unique) 
# This generally works for our purposes since we usually take a large window around the TSS when performing gene-wise correlations

TSS <- TSS[!duplicated(TSS$gene_name)]

# Your TSS object to save and use downstream
TSS

You would then separately download https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Mmusculus.UCSC.mm39.html to go with this, and pass that as the genome param would be the idea

@LalicJ
Copy link

LalicJ commented Feb 21, 2023

Hi, sorry to bother you. I wonder when I have constructed the TSS object myself and downloaded the reference genome, how should I modify it in the runGenePeakcorr function?
Any help is greatly appreciated, thank you!

@Dalhte
Copy link

Dalhte commented Jun 26, 2023

Hi, sorry to bother you. I wonder when I have constructed the TSS object myself and downloaded the reference genome, how should I modify it in the runGenePeakcorr function? Any help is greatly appreciated, thank you!

Hello
I was worndering if you find what to do in the end ? I would like to try with the rat genome but I don't now how to modifiy the runGenePeakcorr function neither.

Best

David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants