Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read sequences don't have overlapping with gene, but eachoverlap() reported there're. #75

Open
Xiao-Zhong opened this issue Aug 5, 2024 · 5 comments

Comments

@Xiao-Zhong
Copy link

Xiao-Zhong commented Aug 5, 2024

thumbnail_image

As shown, the gene "ENSMUSG00000080478.1" wasn't mapped by any read, though there're reads mapping to a nearby multi-exon gene "ENSMUSG00000041560.12". In fact, the first gene is located in one intron of the second.

Please help address the issue. Thanks!

@kescobo
Copy link
Member

kescobo commented Aug 5, 2024

Copying from Slack

jbieler
Today at 3:16 AM
Hum, I think the issue is that those reads do overlap the gene (they start before and end after) but do not cover it because of the gaps in the alignement. You need to compute some kind of coverage using the reads alignement.

jbieler
Today at 3:34 AM
Something like this should do it :

aln = XAM.BAM.alignment(read)

# check if the read match any position in the target interval
for position in ...
    BioAlignments.ref2seq(aln, position)[2] == BioAlignments.OP_MATCH #it covers
end

(edited)

Xiao Zhong
Today at 4:35 AM
Agree. eachoverlap() might only compare the positions of read and feature ends. It should be ok for most DNA read alignments, but will have the issue above caused by gaps like split reads from RNA sequencing.
I got alignment ranges with BAM.position(record) and BAM.rightposition(record), will compare them with features (genes, exons, introns, etc). I'm new with XAM.jl. Thank you!

jbieler
Today at 4:58 AM
Look into ref2seq & seq2ref too, they allow you to check if a reference position is covered by a read, or where a position within the read is mapping to in the reference. (edited)

@Xiao-Zhong
Copy link
Author

Hi Kevin,

Unfortunately, I haven't solved the issue: how to check overlapping between a feature and the sequences aligned (or check if a feature is located within gaps if there're).

Why I'm interested in this, if a feature isn't really aligned by reads, these reads are not from the feature.

I may have to parse CIGAR string to get the coordinates of alignment blocks. Dose XAM.jl (or other package) have a function for this? Thanks!

BYW, I've asked the question on Slack. I commented here for others who will probably encounter the issue as well.

@kescobo
Copy link
Member

kescobo commented Aug 7, 2024

We have some CIGAR functionality in BioAlignments.jl and some in XAM.jl. @jakobnissen recently started https://github.com/BioJulia/CIGARStrings.jl, but I think that's just a prototype

@jonathanBieler
Copy link
Contributor

BAM.alignment and Alignment's methods should give you all you need, it's already parsing the cigar internally, as you can see in the list of anchors :

aln = BAM.alignment(r)
BioAlignments.Alignment:
  aligned range:
    seq: 0-141
    ref: 198265943-198266049
  CIGAR string: 77M1I2M1I3M2D22M35S

julia> aln.anchors
9-element Vector{BioAlignments.AlignmentAnchor}:
 AlignmentAnchor(0, 198265943, 0, '0')
 AlignmentAnchor(77, 198266020, 77, 'M')
 AlignmentAnchor(78, 198266020, 78, 'I')
 AlignmentAnchor(80, 198266022, 80, 'M')
 AlignmentAnchor(81, 198266022, 81, 'I')
 AlignmentAnchor(84, 198266025, 84, 'M')
 AlignmentAnchor(84, 198266027, 86, 'D')
 AlignmentAnchor(106, 198266049, 108, 'M')
 AlignmentAnchor(141, 198266049, 143, 'S')

@Xiao-Zhong
Copy link
Author

OK. Thanks! The Alignment Operation below is what I wanted.

BioAlignments.OP_SKIP — Constant

'N': (typically long) deletion from the reference, e.g. due to RNA splicing

Screen Shot 2024-08-11 at 11 13 37 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants