-
-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProjectionVisualizer: unifying functionality of PCA and Manifold #874
Labels
level: expert
deep knowledge of packages required
priority: medium
can wait until after next release
type: feature
a new visualizer or utility for yb
type: technical debt
work to optimize or generalize code
Milestone
Comments
bbengfort
added
level: expert
deep knowledge of packages required
priority: medium
can wait until after next release
type: feature
a new visualizer or utility for yb
type: technical debt
work to optimize or generalize code
labels
Jun 4, 2019
Thanks, @bbengfort for summarizing this. This makes things simplified for me. |
This might also be useful for #889 |
2 tasks
bbengfort
pushed a commit
that referenced
this issue
Jul 2, 2019
Updates the DataVisualizer to perform target type identification as implemented in Manifold. This was an original requirement of the DataVisualizer but remained unimplemented since ParallelCoordinates and RadViz were the only main library subclasses. This is the first step in the ProjectionVisualizer high-dimensional visualization base class. Related to #874
3 tasks
bbengfort
pushed a commit
that referenced
this issue
Jul 17, 2019
This is the first major step toward completing #874: the implementation of a ProjectionVisualizer base class to unify functionality of decomposition visualizers that use PCA and Manifold and to extend support to other decomposition methods. In a follow up PR, we will reorganize this class and extend the functionality in Manifold and PCA.
Just a note that this issue would have the potential to close (or at least address portions of) a lot of existing issues: |
This was referenced Jul 22, 2019
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
level: expert
deep knowledge of packages required
priority: medium
can wait until after next release
type: feature
a new visualizer or utility for yb
type: technical debt
work to optimize or generalize code
One of the basic high-dimensional visualization techniques that Yellowbrick makes use of is to decompose or project a high dimensional space into 2 or 3 dimensions to display the data as a scatter plot. Projections of this kind reduce the amount of space between points (decreasing sparsity) but can still give us some intuition of structures in the higher dimensionality. Currently, we have three primary decomposition methods that use this technique:
sklearn.manifold
to produce embeddingsThese visualizers have a lot of shared functionality that can be combined to streamline these kinds of visualizations and make it easier to extend them (e.g. to add ICA, Fast PCA, etc. to the PCA decompositions, or to extend the text visualizers to use the manifold visualizations).
I propose we create a
ProjectionVisualizer
base class or mixin that knows how to:X
intoX'
of shape(n_instances, 2)
or(n_instances, 3)
X
for the projectionThis shared functionality could then be easily used by PCA, Manifold, etc.
The following notes about the class hierarchy:
MultiFeatureVisualizer
produces aself.features_
attribute onfit()
which is useful in PCA for biplots and to understand the original feature set.DataVisualizer
producesself.classes_
from y and is supposed to "provide helper functionality related to target identification" but does not currently implement this yet (it is implemented onManifold
)yellowbrick.contrib.ScatterVisualizer
might be valuable to be moved toyellowbrick.draw.scatter
and use as a mixin to handle part of these cases; though I don't necessarily want to confuse things too much.JointPlot
visualizer would also benefit from the target color handling things from above.This implies that the
ProjectionVisualizer
is aDataVisualizer
and that theDataVisualizer
needs to be updated to handle the target identification stuff that is inManifold
. It also implies thatJointPlot
should be aDataVisualizer
as well.More investigation on this topic is necessary, but I wanted to propose this solution to allow for further discussion by @DistrictDataLabs/team-oz-maintainers and @naresh-bachwani who is working on PCA this summer.
The text was updated successfully, but these errors were encountered: