Create consensus bed of mirdeep2 miRNAs #176

JoseEspinosa · 2022-08-29T14:41:13Z

Description of feature

As discussed in slack with @sdjebali and @lpantano it will be nice to add this feature to the pipeline. I copy the discussion here:
@sdjebali:

I want to produce a single bed file of mirnas found by mirdeep2 in all my samples. I will not try to merge the intervals across samples as we would loose the precise locations of the particular subelements of a mirna, but I will concatenate the bed files obtained in each sample, after filtering on the <>. The only problem I see is that the scores obtained in individual samples will not be comparable with each other, right ? should I instead put the <> as a score in the final bed file ? as it would be more comparable between samples ?

@lpantano:

an option is to overlap the bedtools, and only keep the mirnas that overlap 95% on bases in a given number of samples. then you can use the counts and that should be comparable. but there is always a limitation when doing this after the analysis.
if you overlap allowing high % of nucleotides being the same among samples, then you can take one sample as reference and use those fasta of the precursor and mature to run the pipeline again using those files, and you will get a better quantification for novel mirnas

sdjebali · 2022-08-31T15:00:56Z

At the end I decided to do the following:

filter the csv files output by mirdeep2 so that I only retain detected mirnas with a probability of being a true positive above 75%
make a bed file with the filtered mirnas found in all samples (simple concatenation) but remembering
a. whether the mirna is novel or known and if it is know its closest known mirna from other species
b. its probability of being a true postive

On my data the code would be the following

sp=sus_scrofa
resdir=nf-core.smrnaseq.1.1.0.Sscrofa11.1.102.21-06-28
dir=/work/project/fragencode/workspace/geneswitch/results/srnaseq/$sp/$resdir/mirdeep2/mirdeep
pgm=/work/project/fragencode/tools/multi/Scripts/filter.and.format.mirdeep.file.awk
ls $dir/*bed | while read f
do
    awk -v sp=$sp -v f=$f -f $pgm ${f%.bed}.csv 
done | sort -k1,1 -k2,2n -k3,3n > $dir/allsamples.75pcentTP.bed

And $pgm is attached here (the bed file from mirdeep2 is only used to find the sample name here, there could be some more straightforward code)

Let me know if you need help to understand this ?

Pre-release v1.4.1

JoseEspinosa added the enhancement New feature or request label Aug 29, 2022

nschcolnicov pushed a commit that referenced this issue Oct 10, 2024

Merge pull request #176 from Aratz/dev

278c959

Pre-release v1.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create consensus bed of mirdeep2 miRNAs #176

Create consensus bed of mirdeep2 miRNAs #176

JoseEspinosa commented Aug 29, 2022

sdjebali commented Aug 31, 2022 •

edited

Loading

Create consensus bed of mirdeep2 miRNAs #176

Create consensus bed of mirdeep2 miRNAs #176

Comments

JoseEspinosa commented Aug 29, 2022

Description of feature

sdjebali commented Aug 31, 2022 • edited Loading

sdjebali commented Aug 31, 2022 •

edited

Loading