extract_kraken_reads.py returns different read counts for paired fastqs #85

matbeale · 2023-09-26T15:49:44Z

Hi Jen,

Maybe I'm missing something, but extract_kraken_reads.py seems to be returning a different number of reads for each pair in a fastq, which is creating issues for downstream processing (e.g. taxon-specific assembly). Is there an option to ensure both reads in a pair are extracted when one has a hit (or the reverse)?

Many thanks,
Mat

The text was updated successfully, but these errors were encountered:

jenniferlu717 · 2023-09-27T14:24:16Z

Are you saying that the script is finding one read but not the other? What sequencing platform did you use? I've had to modify the script a couple times to account for paired reads formatting.

Are you running the script on both fastq files at the same time?

matbeale · 2023-09-27T15:26:35Z

Yes, I'm running the script on both fastqs at the same time - this is my run command:

extract_kraken_reads.py --include-children --fastq-output -k 47759_1#1\_clean_kraken.out -r 47759_1#1\_clean_kraken.report -s 47759_1#1\_clean_1.fastq.gz -s2 47759_1#1\_clean_2.fastq.gz -o 47759_1#1\_1.fastq -o2 47759_1#1\_2.fastq -t 3370480

When I check the read counts:
fastaq count_sequences 47759_1#1_1.fastq.gz
3636
fastaq count_sequences 47759_1#1_2.fastq.gz
3658

Since the read counts are discrepant between the fastq files, my original assumption was that the script is finding one but not the other, but I haven't confirmed that. In fact, looking at the first 3 reads in each file (head -12 and tail -12) seems to find the same read. I'll have to think a bit more about how to identify the discrepant reads.

This is data sequenced on NovaSeq 6000. Read headers look like this:

@A00537:889:HHJ7KDRX3:1:2278:25699:35916/2
TTCTAAAACATGAACAAAGGAAGAAGGAAATAAGTAAATGCAAAGCTATGCCAACCTTTCTAAAACATGCTTGTGCAAAAGGGCGTGTATGTTATGTATGTAAAAACATTTTGCTACTTAGCAATACCATTTGGCAGTACATATATTTGGC
+
FFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract_kraken_reads.py returns different read counts for paired fastqs #85

extract_kraken_reads.py returns different read counts for paired fastqs #85

matbeale commented Sep 26, 2023

jenniferlu717 commented Sep 27, 2023

matbeale commented Sep 27, 2023 •

edited

Loading

extract_kraken_reads.py returns different read counts for paired fastqs #85

extract_kraken_reads.py returns different read counts for paired fastqs #85

Comments

matbeale commented Sep 26, 2023

jenniferlu717 commented Sep 27, 2023

matbeale commented Sep 27, 2023 • edited Loading

matbeale commented Sep 27, 2023 •

edited

Loading