extend from 3'end #538

avilella · 2024-11-01T11:10:18Z

Can medaka generate a consensus that extends from the soft-clipped 3'end of ONT reads mapped to a reference?

E.g. for B-cell repertoire or T-cell repertoire transcript sequencing with ONT, one can map the reads onto the V-gene sequence, which will look as shown below:

V-Gene ====================================================
read1  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxccccccccccccccffffffffffffff
read2  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxcccccccccccccccffffffffffffff
read3  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxixxxxxxxxxxxxxxxxxxxcccccccccccccffffffffffffff
read4  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxccccccccccccccffffffffffffff
...

The x part of the ONT reads map to the V-Gene, there may be mismatches due to hypermutation, which should be dealt the same way as SNVs in genomic variant calling. The c part is the CDR3 region which is unique to each cell, and doesn't have a reference. The f part is the FWR4, which continues past the CDR3 region, and doesn't align to the V-gene. There could be i insertions and - deletions, which when they are in the V-gene mapping region, are always sequencing errors, as there is no indels in the V-gene part.

Given a .bam file of reads mapping to their corresponding V-gene reference, how do I run medaka to obtain the consensus sequence that includes the CDR3 and FWR4 parts that don't map the V-gene reference?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

ftostevin-ont · 2024-11-05T10:02:33Z

Any bases that are soft-clipped in the read-to-reference bam file will be ignored when generating features that are used for consensus inference. There is not a straightforward way to remove this restriction. To extend the consensus into the FWR4 region, you would need to extend the reference sequence to include the CDR3/FWR4 regions.

Alternatively, you could try using medaka smolecule, which first generates a POA of the reads and then performs a consensus of alignments to the POA consensus sequence. This should span the full length of the reads, though the accuracy will be limited by how well the variable CDR3 region can be aligned in the POA.

avilella · 2024-11-05T10:14:02Z

Thanks, I'll try medaka smolecule. Is there a way in which I can spike "fake reads" into it so that the indels in the V-gene portion disappear but the SNVs remain there? Maybe do it as a 2-step process: medaka smolecule with spiked-in fake V-gene reads, then take the newly created reference with the CDR3/FWR4, and use it to re-align the reads against it?

…

On Tue, Nov 5, 2024 at 10:02 AM ftostevin-ont ***@***.***> wrote: Any bases that are soft-clipped in the read-to-reference bam file will be ignored when generating features that are used for consensus inference. There is not a straightforward way to remove this restriction. To extend the consensus into the FWR4 region, you would need to extend the reference sequence to include the CDR3/FWR4 regions. Alternatively, you could try using medaka smolecule, which first generates a POA of the reads and then performs a consensus of alignments to the POA consensus sequence. This should span the full length of the reads, though the accuracy will be limited by how well the variable CDR3 region can be aligned in the POA. — Reply to this email directly, view it on GitHub <#538 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGSN72ZARU3A5UISPVF73Z7CJU7AVCNFSM6AAAAABRACRI4OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWG42DCNBYGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ftostevin-ont · 2024-11-08T14:16:59Z

This may work but it seems simpler just to use the real reads. Any sequencing errors should be removed by the POA and medaka consensus steps while genuine variants would be retained.

avilella added the enhancement label Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend from 3'end #538

extend from 3'end #538

avilella commented Nov 1, 2024

ftostevin-ont commented Nov 5, 2024

avilella commented Nov 5, 2024 via email

ftostevin-ont commented Nov 8, 2024

extend from 3'end #538

extend from 3'end #538

Comments

avilella commented Nov 1, 2024

ftostevin-ont commented Nov 5, 2024

avilella commented Nov 5, 2024 via email

ftostevin-ont commented Nov 8, 2024