Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with visualisation of a large set of reads for one position #444

Open
cwuensch opened this issue Nov 7, 2016 · 4 comments
Open

Comments

@cwuensch
Copy link

cwuensch commented Nov 7, 2016

We are using pileup.js for targeted sequencing with high coverage rates (> 1000 reads per position).
When the alignment track is to be displayed at a position with many reads, the browser gets stuck, and a lot of errors ("NS error") appear in the browser console.
The problem increases especially when zooming out (more positions to visualize).

Maybe a solution would be to specify a maximum number of reads to be shown per position?

Here one example of the full error stack:

17:59:49.740 NS_ERROR_FAILURE:
s()pileup.min.js:14
[219]</G</<.value()pileup.min.js:14
[219]</G</<.value()pileup.min.js:14
bind_applyFunctionN()self-hosted
bound ()self-hosted
[53]</<.notifyAll()pileup.min.js:5
[121]</h.close()pileup.min.js:8
[146]</i.closeAll()pileup.min.js:9
[146]</i.perform()pileup.min.js:9
[146]</i.perform()pileup.min.js:9
[129]</<.perform()pileup.min.js:9
[129]</R()pileup.min.js:9
bound ()self-hosted
[146]</i.closeAll()pileup.min.js:9
[146]</i.perform()pileup.min.js:9
[96]</p.batchedUpdates()pileup.min.js:7
u()pileup.min.js:9
r()pileup.min.js:9
[128]</l<.enqueueSetState()pileup.min.js:9
[78]</r.prototype.setState()pileup.min.js:6
[187]</g</<.value/<()pileup.min.js:11
r()pileup.min.js:5
[47]</</</v.prototype.then/</<()pileup.min.js:5
v/r.promiseDispatch()pileup.min.js:5
t/</<()pileup.min.js:5
n()pileup.min.js:5
t()pileup.min.js:5
1pileup.min.js:14
@armish
Copy link
Member

armish commented Nov 8, 2016

Thanks for reporting this issue, @cwuensch! Sounds like the browser is having problems keeping all those read information in the memory and eventually start having connectivity issues (NS error) due to the high volume of network traffic required to fetch all those read data.

One of the optimizations we incorporated into pileup.js was to minimize the things we draw on the on and off the screen, where we don't try to render any reads that are beyond the boundaries of the screen. In your case, I think the problem is just because of the number of reads falling within a particular genomic range and issues with their representation in the memory. IGV, for example, works around this problem by requiring the user to be at a particular zoom level before loading/showing any read data. We currently don't have that limitation with pileup.js as this can programmatically be accomplished by the developer if/when needed.

As for your suggestion to be able to limit the number of reads loaded into pileup.js: that might become tricky as to be able to fetch informative reads, we still need to load all of them into the memory and then sort their prioritization before showing them. If we do this at the level of read fetching, then we risk missing reads that support a variant, which is not that ideal for the user investigating the read data. Having said that, if you have any suggestions towards how to accomplish this or if you can describe your use case in details, maybe we can come up with a solution that might help with your case.

Otherwise, I strongly suggest programmatically limiting the size of the genomic region to be shown so that the expected number of reads falling into a particular region always stays at a reasonable level. Another option might be to explore @selkovjr's solution (#403) and filtering reads on the server side instead of the client.

@selkovjr
Copy link

selkovjr commented Nov 9, 2016

I still haven't found found any problems with my read delivery solution
(samtools + CGI); what I did have to fix in my branch was the maximum
canvas size. People working with Illumina data (as I am now) are not likely
to see the problem (although I am sure they will soon). When I was with Ion
Torrent, we had extremely high coverage in some panels (up to 100,000) and
of course the whole thing broke, requiring downsampling. So even though the
number of reads sent over the network is limited to those overlapping the
viewable region, there is no limit on the pileup size (and I do not know of
a good way of limiting that besides downsampling or filtering by BAM tags /
read family). A tall pileup (about 2000x or more) simply does not render in
the upstream version of the widget because browsers silently fail to render
a canvas taller than 32767 pixels. Microsoft's browser is far more limited
than that. In my branch, I simply have a hard limit on canvas size and let
the superfluous reads vanish (nobody can see all of them at once anyway).

On 8 November 2016 at 15:36, B. Arman Aksoy [email protected]
wrote:

Thanks for reporting this issue, @cwuensch https://github.com/cwuensch!
Sounds like the browser is having problems keeping all those read
information in the memory and eventually start having connectivity issues
(NS error) due to the high volume of network traffic required to fetch all
those read data.

One of the optimizations we incorporated into pileup.js was to minimize
the things we draw on the on and off the screen, where we don't try to
render any reads that are beyond the boundaries of the screen. In your
case, I think the problem is just because of the number of reads falling
within a particular genomic range and issues with their representation in
the memory. IGV, for example, works around this problem by requiring the
user to be at a particular zoom level before loading/showing any read data.
We currently don't have that limitation with pileup.js as this can
programmatically be accomplished by the developer if/when needed.

As for your suggestion to be able to limit the number of reads loaded into
pileup.js: that might become tricky as to be able to fetch informative
reads, we still need to load all of them into the memory and then sort
their prioritization before showing them. If we do this at the level of
read fetching, then we risk missing reads that support a variant, which is
not that ideal for the user investigating the read data. Having said that,
if you have any suggestions towards how to accomplish this or if you can
describe your use case in details, maybe we can come up with a solution
that might help with your case.

Otherwise, I strongly suggest programmatically limiting the size of the
genomic region to be shown so that the expected number of reads falling
into a particular region always stays at a reasonable level. Another option
might be to explore @selkovjr https://github.com/selkovjr's solution (
#403 #403) and filtering
reads on the server side instead of the client.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#444 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAca6LZwDqsgRnEF76JrP9tSh2j4b4NEks5q8OtfgaJpZM4Krddi
.

@cwuensch
Copy link
Author

cwuensch commented Nov 9, 2016

Dear selkovjr,

could you please give me a brief introduction to where I can find your
special branch and your CGI / samtools solution?
That sounds very interesting to me! Especially because it could also
solve the problem with data security by not having to make the full
bam file publicly available.
Unfortunately I could not find any documentation of this solution so
far. And I am just a "dumb" user, experimenting with integrating a
genome browser into our research database platform.
So, it would be great, if you could give me a brief introduction (or a
link) telling me, what has to be done to integrate your solution into
a web application.

Best regards,
Christian

2016-11-09 7:45 GMT+01:00, Gene Selkov [email protected]:

I still haven't found found any problems with my read delivery solution
(samtools + CGI); what I did have to fix in my branch was the maximum
canvas size. People working with Illumina data (as I am now) are not likely
to see the problem (although I am sure they will soon). When I was with Ion
Torrent, we had extremely high coverage in some panels (up to 100,000) and
of course the whole thing broke, requiring downsampling. So even though the
number of reads sent over the network is limited to those overlapping the
viewable region, there is no limit on the pileup size (and I do not know of
a good way of limiting that besides downsampling or filtering by BAM tags /
read family). A tall pileup (about 2000x or more) simply does not render in
the upstream version of the widget because browsers silently fail to render
a canvas taller than 32767 pixels. Microsoft's browser is far more limited
than that. In my branch, I simply have a hard limit on canvas size and let
the superfluous reads vanish (nobody can see all of them at once anyway).

On 8 November 2016 at 15:36, B. Arman Aksoy [email protected]
wrote:

Thanks for reporting this issue, @cwuensch https://github.com/cwuensch!
Sounds like the browser is having problems keeping all those read
information in the memory and eventually start having connectivity issues
(NS error) due to the high volume of network traffic required to fetch
all
those read data.

One of the optimizations we incorporated into pileup.js was to minimize
the things we draw on the on and off the screen, where we don't try to
render any reads that are beyond the boundaries of the screen. In your
case, I think the problem is just because of the number of reads falling
within a particular genomic range and issues with their representation in
the memory. IGV, for example, works around this problem by requiring the
user to be at a particular zoom level before loading/showing any read
data.
We currently don't have that limitation with pileup.js as this can
programmatically be accomplished by the developer if/when needed.

As for your suggestion to be able to limit the number of reads loaded
into
pileup.js: that might become tricky as to be able to fetch informative
reads, we still need to load all of them into the memory and then sort
their prioritization before showing them. If we do this at the level of
read fetching, then we risk missing reads that support a variant, which
is
not that ideal for the user investigating the read data. Having said
that,
if you have any suggestions towards how to accomplish this or if you can
describe your use case in details, maybe we can come up with a solution
that might help with your case.

Otherwise, I strongly suggest programmatically limiting the size of the
genomic region to be shown so that the expected number of reads falling
into a particular region always stays at a reasonable level. Another
option
might be to explore @selkovjr https://github.com/selkovjr's solution (
#403 #403) and filtering
reads on the server side instead of the client.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#444 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAca6LZwDqsgRnEF76JrP9tSh2j4b4NEks5q8OtfgaJpZM4Krddi
.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#444 (comment)

@selkovjr
Copy link

selkovjr commented Nov 9, 2016

Hi Christian,

I am sorry it is so disorganized. I was supposed to let the crew pull my
edits early on and then open other branches for further changes. But the
initial pull request did not get through and I was busy changing jobs and
moving, adding stuff in the process, and now it has mutated so far that
merging it back in will be a serious project.

Here's the branch: https://github.com/selkovjr/pileup.js

What has changed (from memory):

  • Alignments are delivered via CGI+samtools (I checked that direct BAM
    access remained intact a few times, but not recently)
  • Some fixes around canvas rendering issues
  • Added a marker to highlight a column (like the IGV center line, but
    fixed to a column for the duration of the session
  • Replaced the browser alert in click handler with a split frame showing
    all data about the read. This was done to display Ion Torrent flowgrams;
    for everything else it just shows BAM tags and aligned query sequence with
    reference.
  • Added some code to display target regions and amplicons (bigBed)

Three new directories are now in the source tree.

./frontend -- this has everything (I hope) to open a pileup in its own
window or in an iframe.
./backend -- various data-getters that I symlink into my server docroot
(urls referring to them are hard-coded in frontend code)
./conf -- example uwsgi configuration putting it all together. I ran it
under apache while at Ion, but have not retained the config. It was pretty
obvious, though: just allow CGI and symlinks everywhere.

Frontend configuration is all hard-coded in ./frontend/index.html.mustache
(similar to pileup.js demo)

I never had time to package it properly and I believe tests have been
broken or incomplete for some time. If you have any problems setting it up,
please feel free to ask for help.

Regards,

--Gene

On 9 November 2016 at 09:32, cwuensch [email protected] wrote:

Dear selkovjr,

could you please give me a brief introduction to where I can find your
special branch and your CGI / samtools solution?
That sounds very interesting to me! Especially because it could also
solve the problem with data security by not having to make the full
bam file publicly available.
Unfortunately I could not find any documentation of this solution so
far. And I am just a "dumb" user, experimenting with integrating a
genome browser into our research database platform.
So, it would be great, if you could give me a brief introduction (or a
link) telling me, what has to be done to integrate your solution into
a web application.

Best regards,
Christian

2016-11-09 7:45 GMT+01:00, Gene Selkov [email protected]:

I still haven't found found any problems with my read delivery solution
(samtools + CGI); what I did have to fix in my branch was the maximum
canvas size. People working with Illumina data (as I am now) are not
likely
to see the problem (although I am sure they will soon). When I was with
Ion
Torrent, we had extremely high coverage in some panels (up to 100,000)
and
of course the whole thing broke, requiring downsampling. So even though
the
number of reads sent over the network is limited to those overlapping the
viewable region, there is no limit on the pileup size (and I do not know
of
a good way of limiting that besides downsampling or filtering by BAM
tags /
read family). A tall pileup (about 2000x or more) simply does not render
in
the upstream version of the widget because browsers silently fail to
render
a canvas taller than 32767 pixels. Microsoft's browser is far more
limited
than that. In my branch, I simply have a hard limit on canvas size and
let
the superfluous reads vanish (nobody can see all of them at once anyway).

On 8 November 2016 at 15:36, B. Arman Aksoy [email protected]
wrote:

Thanks for reporting this issue, @cwuensch <https://github.com/cwuensch
!

Sounds like the browser is having problems keeping all those read
information in the memory and eventually start having connectivity
issues
(NS error) due to the high volume of network traffic required to fetch
all
those read data.

One of the optimizations we incorporated into pileup.js was to minimize
the things we draw on the on and off the screen, where we don't try to
render any reads that are beyond the boundaries of the screen. In your
case, I think the problem is just because of the number of reads falling
within a particular genomic range and issues with their representation
in
the memory. IGV, for example, works around this problem by requiring the
user to be at a particular zoom level before loading/showing any read
data.
We currently don't have that limitation with pileup.js as this can
programmatically be accomplished by the developer if/when needed.

As for your suggestion to be able to limit the number of reads loaded
into
pileup.js: that might become tricky as to be able to fetch informative
reads, we still need to load all of them into the memory and then sort
their prioritization before showing them. If we do this at the level of
read fetching, then we risk missing reads that support a variant, which
is
not that ideal for the user investigating the read data. Having said
that,
if you have any suggestions towards how to accomplish this or if you can
describe your use case in details, maybe we can come up with a solution
that might help with your case.

Otherwise, I strongly suggest programmatically limiting the size of the
genomic region to be shown so that the expected number of reads falling
into a particular region always stays at a reasonable level. Another
option
might be to explore @selkovjr https://github.com/selkovjr's solution
(
#403 #403) and filtering
reads on the server side instead of the client.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#444
issuecomment-259267220>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
AAca6LZwDqsgRnEF76JrP9tSh2j4b4NEks5q8OtfgaJpZM4Krddi>
.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#444 (comment)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#444 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAca6O28sey3xa_1YsQTiyVSMt2mZ2Pkks5q8ed7gaJpZM4Krddi
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants