Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus file has header of reference and not name bam/sample file #38

Closed
MarinaSci opened this issue Jan 31, 2023 · 5 comments
Closed
Labels

Comments

@MarinaSci
Copy link

Hello - great development and thank you, very useful!
Not sure how timely my comment can be and how active this section is...
However, I will try!
One thought I had is, when you have multiple bam files (=multiple samples) you want to extract the same consensus reference from (for subsequent phylogenetic analysis etc), then it would be best if the ococo output file had the sample or bam name on the first line after '>', as opposed to the fasta reference it came from.
I hope that makes sense... Would that be a quick fix you think?

Thank you!!
#featurerequest

@karel-brinda
Copy link
Owner

Hi Marina,

thanks for your comment and the suggestion. To propose a specific solution, I need to double-check whether I understand everything correctly.

Are you proposing that eg in the case you had a ref file with sequences chr1 and chr2 and a BAM file from a sample called smp, you would like to rename the seqs from chr1 to smp.1 and chr2 smp.2, in order to simplify the subsequent analysis?

@MarinaSci
Copy link
Author

MarinaSci commented Feb 12, 2023 via email

@karel-brinda
Copy link
Owner

karel-brinda commented Feb 15, 2023

In this case, the most straightforward solution would be to post-process the outputs from Ococo.

Unfortunately, it seems that the -F parameter is unable to redirect the FASTA output to the standard output (stdout) (I have no idea why I didn't implement this – I probably focused mainly on the VCF output).

So the way to go is:

  1. First storing the FASTA onto disk, eg ./ococo -i test.bam -f test.fa -x ococo64 -F output.fa
  2. Converting the FASTA to a modified version with new seq names, eg seqtk seq output.fa | perl -pe 's/>chr/>smp./g' or seqtk seq output.fa | perl -pe 's/>/>smp1./g' (depends on how exactly you want to name the sequences)

@MarinaSci
Copy link
Author

MarinaSci commented Feb 17, 2023 via email

@karel-brinda
Copy link
Owner

You are welcome!

I'll close this ticket for now as this won't be implemented as a separate feature.

I've also made a ticket for future about the possible redirection of consensus to stdout #39.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants