number of joined reads drastic decrease #19

brittbio · 2014-08-13T21:59:59Z

Hello,

Thank you so much for writing this program! :) I was wondering if you could help me with a problem I have come across.

I am trying to merge MiSeq data that are 149 bp Forward and reverse reads (no adaptors). I am using SeqPrep via the QIIME command join_paired_ends.py (http://qiime.org/scripts/join_paired_ends.html). I have successfully merged millions of reads with your command, thank you! And the number of reads merged is similar to that of other programs (fastq-join). HOWEVER, when I tried an approach in which I quality filtered the reads first very few reads were joined.

I filtered the reads with the fasts-toolkit (at least 75% of read had to have a minimum quality score of 25). After filtering I still had over 13million reads. Other merging programs (fastq-join) still merged a significant number of reads with these now quality filtered fastq files, however seqprep never merged more than 1052 reads.

Do you have any idea why this may be? Please let me know if you need more information to address this question.

Thank you!

Brittany

jstjohn · 2014-08-13T22:09:02Z

Could be an issue with supporting different read lengths? I have not tested that extensively. Does your pre-merging script do any trimming, or just complete discarding?

Also have you checked that the read identifiers are still matched up after your quality filtering? Is it possible for example that reads were not thrown out in pairs?

Thanks,
John

On Aug 13, 2014, at 2:59 PM, brittbio [email protected] wrote:

Hello,

Thank you so much for writing this program! :) I was wondering if you could help me with a problem I have come across.

I am trying to merge MiSeq data that are 149 bp Forward and reverse reads (no adaptors). I am using SeqPrep via the QIIME command join_paired_ends.py (http://qiime.org/scripts/join_paired_ends.html). I have successfully merged millions of reads with your command, thank you! And the number of reads merged is similar to that of other programs (fastq-join). HOWEVER, when I tried an approach in which I quality filtered the reads first very few reads were joined.

I filtered the reads with the fasts-toolkit (at least 75% of read had to have a minimum quality score of 25). After filtering I still had over 13million reads. Other merging programs (fastq-join) still merged a significant number of reads with these now quality filtered fastq files, however seqprep never merged more than 1052 reads.

Do you have any idea why this may be? Please let me know if you need more information to address this question.

Thank you!

Brittany

—
Reply to this email directly or view it on GitHub.

brittbio · 2014-08-13T23:40:26Z

Hi John,

Thanks for the fast reply! The reads are all 151bp long (sorry there was an error in the note above when I said 149bp). The quality filtering completely discards reads, it does not trim them. So the reads are all still 149bp long.

I don't believe the identifiers all match up, each forward and reverse read are filtered independently. So there might be some reads removed in the forward but still be present in the reverse. Does seqprep look for the matching identifiers?

That being said, over 70% of the reads are still there post-quality filtering in both the forward and reverse so even if a few of the pairs are gone I would still expect more than 1052 reads to merge since pre-quality filtering I had over 9 million reads merged.

Thank you so much for your help and the script! :-)

Cheers,

Brittany

jstjohn · 2014-08-14T00:14:43Z

You definitely want to preserve matching. If SeqPrep doesn’t just error out when it finds mismatching reads, it may just attempt to keep matching them pairwise until it gets to the end of one of the files.

On Aug 13, 2014, at 4:40 PM, brittbio [email protected] wrote:

Hi John,

Thanks for the fast reply! The reads are all 151bp long (sorry there was an error in the note above when I said 149bp). The quality filtering completely discards reads, it does not trim them. So the reads are all still 149bp long.

I don't believe the identifiers all match up, each forward and reverse read are filtered independently. So there might be some reads removed in the forward but still be present in the reverse. Does seqprep look for the matching identifiers?

That being said, over 70% of the reads are still there post-quality filtering in both the forward and reverse so even if a few of the pairs are gone I would still expect more than 1052 reads to merge since pre-quality filtering I had over 9 million reads merged.

Thank you so much for your help and the script! :-)

Cheers,

Brittany

—
Reply to this email directly or view it on GitHub.

brittbio · 2014-08-14T15:26:23Z

I see, so SeqPrep uses the read location to merge them? OR it just matches reads systematically in the order they appear in the files?

Thanks!

Britt

jstjohn · 2014-08-14T15:56:13Z

Just systematically in the order they appear in the file.
On Aug 14, 2014, at 8:26 AM, brittbio [email protected] wrote:

I see, so SeqPrep uses the read location to merge them? OR it just matches reads systematically in the order they appear in the files?

Thanks!

Britt

—
Reply to this email directly or view it on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of joined reads drastic decrease #19

number of joined reads drastic decrease #19

brittbio commented Aug 13, 2014

jstjohn commented Aug 13, 2014

brittbio commented Aug 13, 2014

jstjohn commented Aug 14, 2014

brittbio commented Aug 14, 2014

jstjohn commented Aug 14, 2014

number of joined reads drastic decrease #19

number of joined reads drastic decrease #19

Comments

brittbio commented Aug 13, 2014

jstjohn commented Aug 13, 2014

brittbio commented Aug 13, 2014

jstjohn commented Aug 14, 2014

brittbio commented Aug 14, 2014

jstjohn commented Aug 14, 2014