Read sequences don't have overlapping with gene, but eachoverlap() reported there're. #75

Xiao-Zhong · 2024-08-05T06:37:37Z

As shown, the gene "ENSMUSG00000080478.1" wasn't mapped by any read, though there're reads mapping to a nearby multi-exon gene "ENSMUSG00000041560.12". In fact, the first gene is located in one intron of the second.

Please help address the issue. Thanks!

kescobo · 2024-08-05T16:41:05Z

Copying from Slack

jbieler
Today at 3:16 AM
Hum, I think the issue is that those reads do overlap the gene (they start before and end after) but do not cover it because of the gaps in the alignement. You need to compute some kind of coverage using the reads alignement.

jbieler
Today at 3:34 AM
Something like this should do it :
aln = XAM.BAM.alignment(read)

# check if the read match any position in the target interval
for position in ...
    BioAlignments.ref2seq(aln, position)[2] == BioAlignments.OP_MATCH #it covers
end
(edited)

Xiao Zhong
Today at 4:35 AM
Agree. eachoverlap() might only compare the positions of read and feature ends. It should be ok for most DNA read alignments, but will have the issue above caused by gaps like split reads from RNA sequencing.
I got alignment ranges with BAM.position(record) and BAM.rightposition(record), will compare them with features (genes, exons, introns, etc). I'm new with XAM.jl. Thank you!

jbieler
Today at 4:58 AM
Look into ref2seq & seq2ref too, they allow you to check if a reference position is covered by a read, or where a position within the read is mapping to in the reference. (edited)

Xiao-Zhong · 2024-08-06T14:57:52Z

Hi Kevin,

Unfortunately, I haven't solved the issue: how to check overlapping between a feature and the sequences aligned (or check if a feature is located within gaps if there're).

Why I'm interested in this, if a feature isn't really aligned by reads, these reads are not from the feature.

I may have to parse CIGAR string to get the coordinates of alignment blocks. Dose XAM.jl (or other package) have a function for this? Thanks!

BYW, I've asked the question on Slack. I commented here for others who will probably encounter the issue as well.

kescobo · 2024-08-07T14:00:35Z

We have some CIGAR functionality in BioAlignments.jl and some in XAM.jl. @jakobnissen recently started https://github.com/BioJulia/CIGARStrings.jl, but I think that's just a prototype

jonathanBieler · 2024-08-07T16:08:15Z

BAM.alignment and Alignment's methods should give you all you need, it's already parsing the cigar internally, as you can see in the list of anchors :

aln = BAM.alignment(r)
BioAlignments.Alignment:
  aligned range:
    seq: 0-141
    ref: 198265943-198266049
  CIGAR string: 77M1I2M1I3M2D22M35S

julia> aln.anchors
9-element Vector{BioAlignments.AlignmentAnchor}:
 AlignmentAnchor(0, 198265943, 0, '0')
 AlignmentAnchor(77, 198266020, 77, 'M')
 AlignmentAnchor(78, 198266020, 78, 'I')
 AlignmentAnchor(80, 198266022, 80, 'M')
 AlignmentAnchor(81, 198266022, 81, 'I')
 AlignmentAnchor(84, 198266025, 84, 'M')
 AlignmentAnchor(84, 198266027, 86, 'D')
 AlignmentAnchor(106, 198266049, 108, 'M')
 AlignmentAnchor(141, 198266049, 143, 'S')

Xiao-Zhong · 2024-08-11T15:14:13Z

OK. Thanks! The Alignment Operation below is what I wanted.

BioAlignments.OP_SKIP — Constant

'N': (typically long) deletion from the reference, e.g. due to RNA splicing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read sequences don't have overlapping with gene, but eachoverlap() reported there're. #75

Read sequences don't have overlapping with gene, but eachoverlap() reported there're. #75

Xiao-Zhong commented Aug 5, 2024 •

edited

Loading

kescobo commented Aug 5, 2024 •

edited

Loading

Xiao-Zhong commented Aug 6, 2024

kescobo commented Aug 7, 2024

jonathanBieler commented Aug 7, 2024

Xiao-Zhong commented Aug 11, 2024

Read sequences don't have overlapping with gene, but eachoverlap() reported there're. #75

Read sequences don't have overlapping with gene, but eachoverlap() reported there're. #75

Comments

Xiao-Zhong commented Aug 5, 2024 • edited Loading

kescobo commented Aug 5, 2024 • edited Loading

Xiao-Zhong commented Aug 6, 2024

kescobo commented Aug 7, 2024

jonathanBieler commented Aug 7, 2024

Xiao-Zhong commented Aug 11, 2024

Xiao-Zhong commented Aug 5, 2024 •

edited

Loading

kescobo commented Aug 5, 2024 •

edited

Loading