You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.
I know QIIME1 support ends soon, but I wanted to record this information somewhere in case people still using it run in to this problem. This also seems like a reasonably serious unexpected behavior because it can result in serious downstream errors.
Using the split_sequence_file_on_sample_ids.py script, if you supply an input fasta file but set the option --file_type fastq, the script will write out per sample fastq files using alternating sequences in the fasta file as quality scores.
Notice that input sequence 2 has become the qual score for input sequence 1.
This is made worse by the fact that the uppercase letters {ACTG} are all valid quality scores in phred 33, so rather than getting an error with a downstream step, you will just have silently halved the number of sequences and put in totally misleading quality scores.
The text was updated successfully, but these errors were encountered:
Thanks for reporting. Just out of curiosity, are you sure the problem is QIIME1 and not another library, like skbio? I'm a bit concern that the bug still exists somewhere else ...
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I know QIIME1 support ends soon, but I wanted to record this information somewhere in case people still using it run in to this problem. This also seems like a reasonably serious unexpected behavior because it can result in serious downstream errors.
Using the
split_sequence_file_on_sample_ids.py
script, if you supply an inputfasta
file but set the option--file_type fastq
, the script will write out per samplefastq
files using alternating sequences in thefasta
file as quality scores.For example, if your
input.fna
file wasand you ran
split_sequence_file_on_sample_ids.py -i input.fna --file_type 'fastq' -o out_test
you'd get
out_test/test_sample_0.fastq
looking likeNotice that input sequence 2 has become the qual score for input sequence 1.
This is made worse by the fact that the uppercase letters {ACTG} are all valid quality scores in
phred 33
, so rather than getting an error with a downstream step, you will just have silently halved the number of sequences and put in totally misleading quality scores.The text was updated successfully, but these errors were encountered: