Do you have the plan to add the pileup functions? #51

panxiaoguang · 2022-09-26T05:31:12Z

Sometimes,We want to iter bam according to the position not record. So do you have the plan to add the function.

CiaranOMara · 2022-10-09T02:39:38Z

The coverage method in GenomicFeatures.jl will give you a pileup.

There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname method instead of the inconsistent seqname method.

panxiaoguang · 2022-10-09T03:15:51Z

The coverage method in GenomicFeatures.jl will give you a pileup.

There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname method instead of the inconsistent seqname method.

However, the coverage method can only get the base numbers in some collections of regions. We use the pileup method to get, for example,how many bases are A, G , C, and T ? How many mismatches or indels ? from the first position to the last position .Yes,we need an easy API like BAM Record.

CiaranOMara · 2022-10-09T03:58:36Z

Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?

panxiaoguang · 2022-10-09T04:47:45Z

Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?

A good example is python class pipeup, PileupColumn and PileupRead(https://pysam.readthedocs.io/en/latest/api.html#pysam.PileupColumn), But they are using cython, I'm not familiar with that.

MillironX · 2022-10-10T13:04:55Z

I agree this would be extremely useful. I don't think we would need to support the pileup format, per se, but provide a pileup iterator. Basically, a pileup iterator on a SAM.Reader or BAM.Reader should return a new pileup object, something like

@kwdef struct Pileup
    chr
    pos
    ref = 'N'
    count = 0 
    reads::Vector = []
    mapquals::Vector = []
    basequals::Vector = []
end #struct

So iterating through a sorted BAM should return a set like

[
    Pileup("chr1", 1),
    Pileup("chr1", 2),
    Pileup("chr1", 3),
    Pileup("chr2", 1),
    Pileup("chr2", 2),
]

CiaranOMara · 2022-10-11T06:46:13Z

I think part of the ideal solution is to further develop GenomicFeatures.jl/tree/pu/v3.0.0's GenomicIntervalCollection and its overlap iterators. Current development of the overlap iterators work with mixed subtypes of AbstractGenomicInterval, but this restriction could be removed to allow GenomicFeatures's iterators and collections to work directly with XAM records.

jonathanBieler · 2022-10-13T09:57:58Z

I think it would be good to have something like this but it would be even better if it could be made generic enough to accodate things like grouping reads pairs into fragments, e.g if you have to overlapping reads :

R1 : ATTGCGC-----
R2 : ---GTGCATTC

A read-wise pileup at position 5 would return [C = 1 , T = 1] while a fragment-wise at position 4 would return [G = 1] while at position 5 [N=1] or [C = 1 if quality of R1 > quality of R2].

Same things for groups of PCR duplicates, possibly using UMIs.

For this I think you need 1) a (possibly) user-defined grouping criteria ("take R1 and R2 together"), and 2) a (possibly) user-defined reducing/consensus method to compute the pileup number for the grouped reads. Then the interface would like something like this :

 Pileup(FragmentWise(), ReduceWithQuality(minimum_phred=5), "chr1", 1)

That said it might be difficult to make it generic and efficient.

apraga · 2023-04-28T12:07:36Z

That would be a great addition. Here's how I count reference and alternate allele it at the moment (there may be better ways, i'm a beginner in Julia):

function getBaseRecord(record, pos)
    i = pos - BAM.position(record)
    BAM.sequence(record)[i+1]
end

function getBase(reader, chrom, pos)
    [getBaseRecord(r, pos) for r in eachoverlap(reader, chrom,pos:pos)]
end

function countSNV(ref, alt)
    ref = length(filter(x -> x == ref, bp))
    alt = length(filter(x -> x == alt, bp))
    total = length(bp)
    (ref, alt, total)
end

Usage:

reader = open(BAM.Reader, bam, index=bai)
bp = getBase(reader,chrom, pos)
close(reader)

(ref, alt, total) = countSNV(convert(DNA, "A"), convert(DNA, "G"))
println("ref=$ref, alt=$alt over $total")

CiaranOMara added the enhancement New feature or request label Oct 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you have the plan to add the pileup functions? #51

Do you have the plan to add the pileup functions? #51

panxiaoguang commented Sep 26, 2022

CiaranOMara commented Oct 9, 2022

panxiaoguang commented Oct 9, 2022

CiaranOMara commented Oct 9, 2022

panxiaoguang commented Oct 9, 2022

MillironX commented Oct 10, 2022

CiaranOMara commented Oct 11, 2022

jonathanBieler commented Oct 13, 2022 •

edited by kescobo

Loading

apraga commented Apr 28, 2023 •

edited

Loading

Do you have the plan to add the pileup functions? #51

Do you have the plan to add the pileup functions? #51

Comments

panxiaoguang commented Sep 26, 2022

CiaranOMara commented Oct 9, 2022

panxiaoguang commented Oct 9, 2022

CiaranOMara commented Oct 9, 2022

panxiaoguang commented Oct 9, 2022

MillironX commented Oct 10, 2022

CiaranOMara commented Oct 11, 2022

jonathanBieler commented Oct 13, 2022 • edited by kescobo Loading

apraga commented Apr 28, 2023 • edited Loading

jonathanBieler commented Oct 13, 2022 •

edited by kescobo

Loading

apraga commented Apr 28, 2023 •

edited

Loading