Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create consensus bed of mirdeep2 miRNAs #176

Open
JoseEspinosa opened this issue Aug 29, 2022 · 1 comment
Open

Create consensus bed of mirdeep2 miRNAs #176

JoseEspinosa opened this issue Aug 29, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@JoseEspinosa
Copy link
Member

Description of feature

As discussed in slack with @sdjebali and @lpantano it will be nice to add this feature to the pipeline. I copy the discussion here:
@sdjebali:

I want to produce a single bed file of mirnas found by mirdeep2 in all my samples. I will not try to merge the intervals across samples as we would loose the precise locations of the particular subelements of a mirna, but I will concatenate the bed files obtained in each sample, after filtering on the <>. The only problem I see is that the scores obtained in individual samples will not be comparable with each other, right ? should I instead put the <> as a score in the final bed file ? as it would be more comparable between samples ?

@lpantano:

an option is to overlap the bedtools, and only keep the mirnas that overlap 95% on bases in a given number of samples. then you can use the counts and that should be comparable. but there is always a limitation when doing this after the analysis.
if you overlap allowing high % of nucleotides being the same among samples, then you can take one sample as reference and use those fasta of the precursor and mature to run the pipeline again using those files, and you will get a better quantification for novel mirnas

@JoseEspinosa JoseEspinosa added the enhancement New feature or request label Aug 29, 2022
@sdjebali
Copy link
Contributor

sdjebali commented Aug 31, 2022

At the end I decided to do the following:

  1. filter the csv files output by mirdeep2 so that I only retain detected mirnas with a probability of being a true positive above 75%
  2. make a bed file with the filtered mirnas found in all samples (simple concatenation) but remembering
    a. whether the mirna is novel or known and if it is know its closest known mirna from other species
    b. its probability of being a true postive

On my data the code would be the following

sp=sus_scrofa
resdir=nf-core.smrnaseq.1.1.0.Sscrofa11.1.102.21-06-28
dir=/work/project/fragencode/workspace/geneswitch/results/srnaseq/$sp/$resdir/mirdeep2/mirdeep
pgm=/work/project/fragencode/tools/multi/Scripts/filter.and.format.mirdeep.file.awk
ls $dir/*bed | while read f
do
    awk -v sp=$sp -v f=$f -f $pgm ${f%.bed}.csv 
done | sort -k1,1 -k2,2n -k3,3n > $dir/allsamples.75pcentTP.bed

And $pgm is attached here (the bed file from mirdeep2 is only used to find the sample name here, there could be some more straightforward code)

Let me know if you need help to understand this ?

nschcolnicov pushed a commit that referenced this issue Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants