Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number of joined reads drastic decrease #19

Open
brittbio opened this issue Aug 13, 2014 · 5 comments
Open

number of joined reads drastic decrease #19

brittbio opened this issue Aug 13, 2014 · 5 comments

Comments

@brittbio
Copy link

Hello,

Thank you so much for writing this program! :) I was wondering if you could help me with a problem I have come across.

I am trying to merge MiSeq data that are 149 bp Forward and reverse reads (no adaptors). I am using SeqPrep via the QIIME command join_paired_ends.py (http://qiime.org/scripts/join_paired_ends.html). I have successfully merged millions of reads with your command, thank you! And the number of reads merged is similar to that of other programs (fastq-join). HOWEVER, when I tried an approach in which I quality filtered the reads first very few reads were joined.

I filtered the reads with the fasts-toolkit (at least 75% of read had to have a minimum quality score of 25). After filtering I still had over 13million reads. Other merging programs (fastq-join) still merged a significant number of reads with these now quality filtered fastq files, however seqprep never merged more than 1052 reads.

Do you have any idea why this may be? Please let me know if you need more information to address this question.

Thank you!

Brittany

@jstjohn
Copy link
Owner

jstjohn commented Aug 13, 2014

Could be an issue with supporting different read lengths? I have not tested that extensively. Does your pre-merging script do any trimming, or just complete discarding?

Also have you checked that the read identifiers are still matched up after your quality filtering? Is it possible for example that reads were not thrown out in pairs?

Thanks,
John

On Aug 13, 2014, at 2:59 PM, brittbio [email protected] wrote:

Hello,

Thank you so much for writing this program! :) I was wondering if you could help me with a problem I have come across.

I am trying to merge MiSeq data that are 149 bp Forward and reverse reads (no adaptors). I am using SeqPrep via the QIIME command join_paired_ends.py (http://qiime.org/scripts/join_paired_ends.html). I have successfully merged millions of reads with your command, thank you! And the number of reads merged is similar to that of other programs (fastq-join). HOWEVER, when I tried an approach in which I quality filtered the reads first very few reads were joined.

I filtered the reads with the fasts-toolkit (at least 75% of read had to have a minimum quality score of 25). After filtering I still had over 13million reads. Other merging programs (fastq-join) still merged a significant number of reads with these now quality filtered fastq files, however seqprep never merged more than 1052 reads.

Do you have any idea why this may be? Please let me know if you need more information to address this question.

Thank you!

Brittany


Reply to this email directly or view it on GitHub.

@brittbio
Copy link
Author

Hi John,

Thanks for the fast reply! The reads are all 151bp long (sorry there was an error in the note above when I said 149bp). The quality filtering completely discards reads, it does not trim them. So the reads are all still 149bp long.

I don't believe the identifiers all match up, each forward and reverse read are filtered independently. So there might be some reads removed in the forward but still be present in the reverse. Does seqprep look for the matching identifiers?

That being said, over 70% of the reads are still there post-quality filtering in both the forward and reverse so even if a few of the pairs are gone I would still expect more than 1052 reads to merge since pre-quality filtering I had over 9 million reads merged.

Thank you so much for your help and the script! :-)

Cheers,

Brittany

@jstjohn
Copy link
Owner

jstjohn commented Aug 14, 2014

You definitely want to preserve matching. If SeqPrep doesn’t just error out when it finds mismatching reads, it may just attempt to keep matching them pairwise until it gets to the end of one of the files.

On Aug 13, 2014, at 4:40 PM, brittbio [email protected] wrote:

Hi John,

Thanks for the fast reply! The reads are all 151bp long (sorry there was an error in the note above when I said 149bp). The quality filtering completely discards reads, it does not trim them. So the reads are all still 149bp long.

I don't believe the identifiers all match up, each forward and reverse read are filtered independently. So there might be some reads removed in the forward but still be present in the reverse. Does seqprep look for the matching identifiers?

That being said, over 70% of the reads are still there post-quality filtering in both the forward and reverse so even if a few of the pairs are gone I would still expect more than 1052 reads to merge since pre-quality filtering I had over 9 million reads merged.

Thank you so much for your help and the script! :-)

Cheers,

Brittany


Reply to this email directly or view it on GitHub.

@brittbio
Copy link
Author

I see, so SeqPrep uses the read location to merge them? OR it just matches reads systematically in the order they appear in the files?

Thanks!

Britt

@jstjohn
Copy link
Owner

jstjohn commented Aug 14, 2014

Just systematically in the order they appear in the file.
On Aug 14, 2014, at 8:26 AM, brittbio [email protected] wrote:

I see, so SeqPrep uses the read location to merge them? OR it just matches reads systematically in the order they appear in the files?

Thanks!

Britt


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants