-
-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend PCA Visualizer with Component-Feature Strength #615
Comments
nice library! The existing biplot in this package already gives feature strengths but might not be suitable for everyone, and can be difficult to visualise when there are too many features overlapping each other. this is indeed a good additional feature to add in for PCA @bbengfort . |
Thank you @mapattacker; we really enjoyed reading through your documentation! |
Is someone already working on this? If not, can I give it a shot? I haven't worked on an issue before, so this might be my first. |
@stoff3l you're more than welcome! Looking forward to your PR! Let me know if you have any questions. |
Hi there, I unfortunately don't have as much time as I thought I'd have to dedicate to this. Might be good if someone else can pick this up (seeing that it's still hacktober). Apologies |
No worries @stoff3l, thanks for your interest in contributing and feel free to check back in when you have more bandwidth! |
Can I be assigned this issue? |
Hi @naba7 . We don't assign issues out however you are always welcome to work on the problem yourself and submit a PR. Be sure to check out the referenced issues above to fully grasp what it is that we are looking for. Thanks. |
Ok, thank-you
…On Thu 14 Feb, 2019, 7:05 AM wagner2010 ***@***.*** wrote:
Hi @naba7 <https://github.com/Naba7> . We don't assign issues out however
you are always welcome to work on the problem yourself and submit a PR. Be
sure to check out the referenced issues above to fully grasp what it is
that we are looking for. Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#615 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AeGb91eL4531w5Jl1MdWAru1VKl8kMe_ks5vNL10gaJpZM4Wr707>
.
|
@bbengfort Will I update the docs of pca.rst or update pca.py ? |
The changes I have made
|
Hi @naba7 - for this issue, we would have to edit This feature is probably going to be a little complicated and will require a little back and forth in order to implement successfully. I would recommend writing out some example code either in a notebook or in a Python script that uses one of the yellowbrick datasets, so we can get a better feel for how this should be implemented more generally in the visualizer. Before we go too far down this road, however, I would strongly suggest that we clear up the forking and branching issues that you're currently having and resolve your currently open PR, otherwise this will likely be a mess that will be difficult to resolve. It looked like @rebeccabilbro made some excellent suggestions that might help you untangle what seems to be a git-related knot? We certainly appreciate your enthusiasm and I just want to make sure that you're set up to be successful as a YB contributor! |
Thank a lot .It is all because of all the members and fellow contributors who are so supportive and active. This image shows that after scaling the credit data,they merged to one point rather than getting scattered. The colorbar data is as above The code to PcaVisualizer on credit dataset is as follows: |
Hi @naba7 - thank you for providing the code snippet and trying the example! I've updated the code to use the new Here is my updated code snippet for you: https://gist.github.com/bbengfort/bf59c1f33b1e523ea1f4774bd3272876 When using the new dataset module, the target is excluded and that results in the following PCA image: Which is a tad better! Unfortunately, this code snippet creates two figures; here is the second figure (using imshow): Ideally, we'd like the strengths, the colorbar, and the scatter plot all in the same figure but in different axes. This is a fairly tricky problem - but you're definitely pushing this issue forward and we really appreciate it! I've been using make axes locateable to do this, see also the following StackOverflow questions:
However, this is a bit tricky. Perhaps for this initial experimentation phase you might want to try GridSpec? Thanks again for all your hard work on this! |
Thank you @bbengfort, for figuring out what went wrong and yes I have set up yb according to contributor's guide.I will keep this in mind to use yellowbrick.datasets from next time. I will try on this issue as well as try GridSpec too.This is really interesting,amazing and fun to work and get guidance and support from you.OSS is love. |
Proposal: |
@naresh-bachwani this is what you're currently working on right? Let's make sure this gets closed when it's finished! |
@bbengfort I think that we have achieved all of its tasks. Is there anything that I am missing? |
Describe the solution you'd like
Provide an optional heatmap and color bar underneath the PCA visualizer (by shifting the lower axes) that shows the magnitude of each feature value to the component. This provides an explanation of which features are contributing the most to which component.
Is your feature request related to a problem? Please describe.
Although we have the biplot mode to plot feature strengths, they can sometimes be visually overlapping or unintelligible, particularly if there is a large number of features.
Examples
Code to generate this:
Though we will probably want to use the
pcolormesh
rather thanimshow
as inRank2D
,ClassificationReport
andConfusionMatrix
. Additionally it might be a tad nicer if the color bar was above the feature plot so that the axes names were the last thing in the chart.Notes
This idea comes from page 55-56 of Data Science Documentation. I would be happy to include a citation to this in our documentation. (HTML version is here). @mapattacker any thoughts?
See also #476 for other updates to the PCA visualizer.
The text was updated successfully, but these errors were encountered: