consensus file has header of reference and not name bam/sample file #38

MarinaSci · 2023-01-31T11:09:33Z

Hello - great development and thank you, very useful!
Not sure how timely my comment can be and how active this section is...
However, I will try!
One thought I had is, when you have multiple bam files (=multiple samples) you want to extract the same consensus reference from (for subsequent phylogenetic analysis etc), then it would be best if the ococo output file had the sample or bam name on the first line after '>', as opposed to the fasta reference it came from.
I hope that makes sense... Would that be a quick fix you think?

Thank you!!
#featurerequest

karel-brinda · 2023-02-09T00:01:55Z

Hi Marina,

thanks for your comment and the suggestion. To propose a specific solution, I need to double-check whether I understand everything correctly.

Are you proposing that eg in the case you had a ref file with sequences chr1 and chr2 and a BAM file from a sample called smp, you would like to rename the seqs from chr1 to smp.1 and chr2 smp.2, in order to simplify the subsequent analysis?

MarinaSci · 2023-02-12T12:51:48Z

Dear Karel, Thank you very much for getting back to me so swiftly and for taking on my recommendation.. Probably to rename the seqs from chr1 to smp.1 and chr2 smp.1. I work with environmental/faecal samples and can have multiple infections present in a sample. In my references I have multiple genomes (nuclear or mitogenomes); let's say multiple chrs. So for a given sample that has more than 1 parasites present, it would be fantastic to get chr1 to smp.1 and chr2 smp.1. Does it make sense? Again, very grateful for even considering such a tool! Best regards, Marina

…

On Thu, 9 Feb 2023 at 00:02, Karel Břinda ***@***.***> wrote: Hi Marina, thanks for your comment and the suggestion. To propose a specific solution, I need to double-check whether I understand everything correctly. Are you proposing that eg in the case you had a ref file with sequences chr1 and chr2 and a BAM file from a sample called smp, you would like to rename the seqs from chr1 to smp.1 and chr2 smp.2, in order to simplify the subsequent analysis? — Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUSLHWCGF4I5NXHPNDJHXT3WWQXX3ANCNFSM6AAAAAAUMJMAOI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Best wishes, Marina *Marina Papaiakovou, PhD candidate* Harding Distinguished Postgraduate Scholar Department of Veterinary Medicine University of Cambridge, Cambridge, UK *(she/her)* *--* **People have different working patterns; please don’t feel obliged to act on this email outside of your own normal working hours* *

karel-brinda · 2023-02-15T00:45:07Z

In this case, the most straightforward solution would be to post-process the outputs from Ococo.

Unfortunately, it seems that the -F parameter is unable to redirect the FASTA output to the standard output (stdout) (I have no idea why I didn't implement this – I probably focused mainly on the VCF output).

So the way to go is:

First storing the FASTA onto disk, eg ./ococo -i test.bam -f test.fa -x ococo64 -F output.fa
Converting the FASTA to a modified version with new seq names, eg seqtk seq output.fa | perl -pe 's/>chr/>smp./g' or seqtk seq output.fa | perl -pe 's/>/>smp1./g' (depends on how exactly you want to name the sequences)

MarinaSci · 2023-02-17T08:17:03Z

Thank you for the guidance, Karel!! Very helpful. Best wishes, Marina

…

On Wed, 15 Feb 2023 at 00:45, Karel Břinda ***@***.***> wrote: In this case, the most straightforward solution would be to post-process the outputs from Ococo. Unfortunately, it seems that the -F parameter is unable to redirect the FASTA to the standard output (I have no idea why I didn't implement this – I probably focused mainly on the VCF output). So the way to go is: 1. First storing the FASTA onto disk, eg ./ococo -i test.bam -f test.fa -x ococo64 -F output.fa 2. Converting the fasta, eg seqtk seq output.fa | perl -pe 's/>chr/>smp./g' — Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUSLHWGMUSYYTBUFXU3O3HDWXQRJ5ANCNFSM6AAAAAAUMJMAOI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Best wishes, Marina *Marina Papaiakovou, PhD candidate* Harding Distinguished Postgraduate Scholar Department of Veterinary Medicine University of Cambridge, Cambridge, UK *(she/her)* *--* **People have different working patterns; please don’t feel obliged to act on this email outside of your own normal working hours* *

karel-brinda · 2023-02-17T16:34:22Z

You are welcome!

I'll close this ticket for now as this won't be implemented as a separate feature.

I've also made a ticket for future about the possible redirection of consensus to stdout #39.

karel-brinda added the wontfix label Feb 17, 2023

karel-brinda closed this as completed Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consensus file has header of reference and not name bam/sample file #38

consensus file has header of reference and not name bam/sample file #38

MarinaSci commented Jan 31, 2023

karel-brinda commented Feb 9, 2023

MarinaSci commented Feb 12, 2023 via email

karel-brinda commented Feb 15, 2023 •

edited

Loading

MarinaSci commented Feb 17, 2023 via email

karel-brinda commented Feb 17, 2023

consensus file has header of reference and not name bam/sample file #38

consensus file has header of reference and not name bam/sample file #38

Comments

MarinaSci commented Jan 31, 2023

karel-brinda commented Feb 9, 2023

MarinaSci commented Feb 12, 2023 via email

karel-brinda commented Feb 15, 2023 • edited Loading

MarinaSci commented Feb 17, 2023 via email

karel-brinda commented Feb 17, 2023

karel-brinda commented Feb 15, 2023 •

edited

Loading