Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document performance considerations? #125

Open
alexlenail opened this issue May 3, 2022 · 4 comments
Open

Document performance considerations? #125

alexlenail opened this issue May 3, 2022 · 4 comments

Comments

@alexlenail
Copy link

I'd like to use pyBigWig to collect values at many intervals from many bigwigs, and I'd love to know what's performant.

  1. is there overhead to opening a bigwig with pyBigWig? i.e. what's the runtime difference between:
with pyBigWig.open(bigwig_file) as bw:
    for chrom, start, stop in intervals:
        bw.values(chrom, start, stop)

and

for chrom, start, stop in intervals:
    with pyBigWig.open(bigwig_file) as bw:
        bw.values(chrom, start, stop)
  1. If the former is optimal, is there any advantage to the intervals being sorted?

  2. Do you know relative performance of pyBigWig entries() queries of bigBed files versus tabix queries of gzipped bed files?

@gokceneraslan
Copy link
Contributor

gokceneraslan commented May 7, 2022

I think a vectorized version of bw.values would be much better e.g.

bw.values(np.array([chrom]*3), np.array([79250, 86700, 87277]), np.array([80250, 87700, 88277]), numpy=True)

which returns a list of numpy arrays, without iterating over the intervals in a loop. But I guess this is not implemented yet.

@alexlenail
Copy link
Author

@dpryan79 what is the fastest way to get arrays of values from a bigwig file for each of many genomic intervals (i.e. entries in a bed file)?

@BradBalderson
Copy link

For others, I found a better solution for the above-described task was to use the bigWigAverageOverBed tool from UCSC.

@BradBalderson
Copy link

BradBalderson commented Apr 4, 2024

$ ./bigWigAverageOverBed

bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases
Options:
   -stats=stats.ra - Output a collection of overall statistics to stat.ra file
   -bedOut=out.bed - Make output bed that is echo of input bed but with mean column appended
   -sampleAroundCenter=N - Take sample at region N bases wide centered around bed item, rather
                     than the usual sample in the bed item.
   -minMax - include two additional columns containing the min and max observed in the area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants