Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option relabel_self to use sequence itself as a label in FASTA/FASTQ #384

Closed
torognes opened this issue Sep 2, 2019 · 5 comments
Closed
Assignees

Comments

@torognes
Copy link
Owner

torognes commented Sep 2, 2019

A new vsearch option to use the sequence itself as the label in FASTA/FASTQ files was requested for the vsearch plugin for qiime2. The option could be called relabel_self.

See qiime2/q2-vsearch#55 for details.

@torognes torognes self-assigned this Sep 2, 2019
@torognes
Copy link
Owner Author

torognes commented Sep 3, 2019

I have some questions about some details.

  1. Should we use --relabel_self as the option name or is --relabel_seq perhaps better?
  2. Should we use the sequence exactly is it is or should it be normalized in some way, e.g upper-cased?
  3. Should ambiguous nucleotide symbols (UIPAC) be allowed?
  4. Should U be converted to T (as we do before computing MD5 or SHA1)?

@colinbrislawn, what is your opinion?

@colinbrislawn
Copy link
Contributor

Great questions!

what is your opinion?

I think the primary use of this command is to make vsearch directly compatible with other programs, like dada2 and deblur. Let's see how the dada2 dev answered these questions! @benjjneb

@torognes
Copy link
Owner Author

torognes commented Sep 11, 2019

Added in commit bdb164a.

I have used relabel_self as the option name. The sequence is used "as is" for the label, without any conversion. The option is available in all commands that write FASTA or FASTQ files.

@benjjneb
Copy link

Just caught this thread.

We typically don't write out fastq/a files with the sequences itself as the ID line, but it can be a useful format when importing to some other software that like to use the ID line as the sequence identifier. So, I'd probably just do the easiest thing and use the sequence "as is", as @torognes already did.

@torognes
Copy link
Owner Author

Added in version 2.14.0 just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants