Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call out disconnected components in network visualizations #1

Open
eric-czech opened this issue Apr 8, 2020 · 5 comments
Open

Call out disconnected components in network visualizations #1

eric-czech opened this issue Apr 8, 2020 · 5 comments

Comments

@eric-czech
Copy link
Collaborator

I'm fairly certain a good number of these exist in STRING and I think it would be a very helpful annotation on embedding visualizations (i.e. it would be good context to know which clusters are different graphs entirely vs less related groups of proteins in the same graph).

@dhimmel
Copy link
Member

dhimmel commented Apr 13, 2020

Sorry didn't see this notification until now. Turns out I wasn't getting notifications... switched to "Watch" mode.

Let me look into the number of connected components in STRING.

I think a heatmap of any of the following would help distinct protein clusters: affinity / common neighbors / random walk distance / shortest path. I've been playing around with plotly. Would be really cool if we could get the heatmap and scatterplot to sync up, such that selecting on the scatterplot would create a synced heatmap.

@eric-czech
Copy link
Collaborator Author

Ah yea that would be very cool. If you go the dash route, this could be a helpful resource on linking scatter plot clicks or lasso/range selections to events in other figures.

It would also be cool if there was a way to link groups of scatter points so that clicking on one of them does something corresponding to the whole group, like highlight all other points in the same connected component. I think Plot.ly could do that if you made sure all the points for one connected component are in the same trace, but I'm not sure if they provide the trace name on click (I think you can assign a point id though and do a reverse-lookup).

dhimmel added a commit that referenced this issue Apr 13, 2020
@dhimmel
Copy link
Member

dhimmel commented Apr 13, 2020

I looked into connected components in 12.connected-components.ipynb. Amazing scipy was able to read all 12 matrices (one per evidence channel) and calculate the components in under 5 seconds.

I plotted the cumulative coverage of components ranked by size (interactive version in notebook):

connected-component-plot

For the combined score, the largest component contains 98.9% of all genes. This obviously could change if we applied a score threshold of 500. Although in general, it's probably best if we use edge weights rather than binarization as much as possible.

I think it would be a very helpful annotation on embedding visualizations

Yeah, I exported the component assignments in e3998c5, so we can always add this to the embedding. Maybe it'd be helpful to differentiate all genes not in the giant connected component.

Actually upon further investigation, all genes for combined_score that are not in the giant connected component have a component size of 1, i.e. are entirely disconnected. I think these genes actually drop out during the node2vec embedding stage, so they aren't in the visualization.

@dhimmel
Copy link
Member

dhimmel commented Apr 13, 2020

If you go the dash route, this could be a helpful resource on linking scatter plot clicks or lasso/range selections to events in other figures.

Wow that Explorer UI is really cool. Will check with you before proceeding with any dashboarding solution using dash or voila, since you have much more experience here than me (zero exp).

It would also be cool if there was a way to link groups of scatter points so that clicking on one of them does something corresponding to the whole group

Yeah! Definitely. I'll look into migrating the Bokeh scatterplot to Plotly, which seems a bit more powerful, intuitive, and compatible with voila / notebooks.

@eric-czech
Copy link
Collaborator Author

I exported the component assignments in e3998c5

I think it would be good to have that for a few combined score thresholds (Jack was simply ignoring all below 900, which I saw in a publication or two as well)

Will check with you before proceeding with any dashboarding solution

I'm always happy to riff about Dash! But I don't mean to bias you too much towards it. I'd love to know what other solutions can do. I default to that simply b/c I like Plot.ly and assume, probably incorrectly, that other libs don't do useful/interesting things beyond what it can. I definitely agree that the API is more intuitive though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants