You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in slack with @sdjebali and @lpantano it will be nice to add this feature to the pipeline. I copy the discussion here: @sdjebali:
I want to produce a single bed file of mirnas found by mirdeep2 in all my samples. I will not try to merge the intervals across samples as we would loose the precise locations of the particular subelements of a mirna, but I will concatenate the bed files obtained in each sample, after filtering on the <>. The only problem I see is that the scores obtained in individual samples will not be comparable with each other, right ? should I instead put the <> as a score in the final bed file ? as it would be more comparable between samples ?
an option is to overlap the bedtools, and only keep the mirnas that overlap 95% on bases in a given number of samples. then you can use the counts and that should be comparable. but there is always a limitation when doing this after the analysis.
if you overlap allowing high % of nucleotides being the same among samples, then you can take one sample as reference and use those fasta of the precursor and mature to run the pipeline again using those files, and you will get a better quantification for novel mirnas
The text was updated successfully, but these errors were encountered:
filter the csv files output by mirdeep2 so that I only retain detected mirnas with a probability of being a true positive above 75%
make a bed file with the filtered mirnas found in all samples (simple concatenation) but remembering
a. whether the mirna is novel or known and if it is know its closest known mirna from other species
b. its probability of being a true postive
On my data the code would be the following
sp=sus_scrofa
resdir=nf-core.smrnaseq.1.1.0.Sscrofa11.1.102.21-06-28
dir=/work/project/fragencode/workspace/geneswitch/results/srnaseq/$sp/$resdir/mirdeep2/mirdeep
pgm=/work/project/fragencode/tools/multi/Scripts/filter.and.format.mirdeep.file.awk
ls $dir/*bed | while read f
do
awk -v sp=$sp -v f=$f -f $pgm ${f%.bed}.csv
done | sort -k1,1 -k2,2n -k3,3n > $dir/allsamples.75pcentTP.bed
And $pgm is attached here (the bed file from mirdeep2 is only used to find the sample name here, there could be some more straightforward code)
Description of feature
As discussed in slack with @sdjebali and @lpantano it will be nice to add this feature to the pipeline. I copy the discussion here:
@sdjebali:
@lpantano:
The text was updated successfully, but these errors were encountered: