You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe I'm missing something, but extract_kraken_reads.py seems to be returning a different number of reads for each pair in a fastq, which is creating issues for downstream processing (e.g. taxon-specific assembly). Is there an option to ensure both reads in a pair are extracted when one has a hit (or the reverse)?
Many thanks,
Mat
The text was updated successfully, but these errors were encountered:
Are you saying that the script is finding one read but not the other? What sequencing platform did you use? I've had to modify the script a couple times to account for paired reads formatting.
Are you running the script on both fastq files at the same time?
When I check the read counts: fastaq count_sequences 47759_1#1_1.fastq.gz
3636 fastaq count_sequences 47759_1#1_2.fastq.gz
3658
Since the read counts are discrepant between the fastq files, my original assumption was that the script is finding one but not the other, but I haven't confirmed that. In fact, looking at the first 3 reads in each file (head -12 and tail -12) seems to find the same read. I'll have to think a bit more about how to identify the discrepant reads.
This is data sequenced on NovaSeq 6000. Read headers look like this:
Hi Jen,
Maybe I'm missing something, but
extract_kraken_reads.py
seems to be returning a different number of reads for each pair in a fastq, which is creating issues for downstream processing (e.g. taxon-specific assembly). Is there an option to ensure both reads in a pair are extracted when one has a hit (or the reverse)?Many thanks,
Mat
The text was updated successfully, but these errors were encountered: