Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point number limit in PCA decomposition #920

Closed
hugo-pires opened this issue Jul 10, 2019 · 6 comments
Closed

Point number limit in PCA decomposition #920

hugo-pires opened this issue Jul 10, 2019 · 6 comments

Comments

@hugo-pires
Copy link

Hello

Is there any limit to the number of points in PCADecomposition plot?

Thank you

@lwgray
Copy link
Contributor

lwgray commented Jul 10, 2019

hi @hugo-pires Thank you for using Yellowbrick. I don’t think there is an upper-limit but i will verify this with my colleagues. Are you having problems with the PCADecomposition visualizer?

@hugo-pires
Copy link
Author

hugo-pires commented Jul 10, 2019

Thank you @lwgray and congratulations for your work.

The questions is that I am projecting an aprox 5000 examples dataframe, but I'm getting an aprox 50 points. Probably they are overlaid, but I would like to doublecheck. Or use some alpha parameter.

@rebeccabilbro
Copy link
Member

Hi @hugo-pires — thanks for checking out Yellowbrick! There probably is an upper limit for PCA, but 5k samples isn't even close to it! So I agree it's likely that the points are merely being overlaid. You can test this out using the alpha parameter, as so:

from sklearn.datasets import make_classification
from yellowbrick.features.pca import PCADecomposition

X, y = make_classification(
    n_samples=5000, n_features=200, n_informative=2, n_redundant=2
)

visualizer = PCADecomposition(alpha=.25)
visualizer.fit_transform(X, y)
visualizer.poof()

The default alpha is .75 (check out our PCADecomposition docs for more on our defaults) and ranges between 0 and 1; as it decreases, the points will become more translucent.

alpha=.75:
image

alpha=.25:
image

@lwgray
Copy link
Contributor

lwgray commented Jul 10, 2019

@hugo-pires @rebeccabilbro is absolutely right that you should be able to plot 5000 points and her suggestion on altering alpha is right on. I wanted to add a few things.

  1. We are actively working on the PCA visualizer and the next version will have advanced support for opacity
  2. You can use proj_features=True to create a biplot which would help you interpret what's going on in the decomposition
  3. Alternatively you can use a Manifold - non linear decomposition to see if it's just a linear effect
  4. TSNE or isomap might tell you a lot about what's going on in the higher dimensional space

@hugo-pires
Copy link
Author

Thank you @lwgray and @rebeccabilbro . It would also be nice to have auto colors to discrete targets, based on the levels of the label.

@lwgray
Copy link
Contributor

lwgray commented Jul 10, 2019

@hugo-pires i think that #476 and #874 will address that request.

@lwgray lwgray closed this as completed Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants