Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo on Comparison of performance of S-Rerf against other classifiers on Real EEG data for Grasp detection #5

Open
wants to merge 1 commit into
base: staging
Choose a base branch
from

Conversation

sanika1201
Copy link

@sanika1201 sanika1201 commented Dec 9, 2019

Description
Goal: Compare performance of S-Rerf with different classifiers on grasp detection using real EEG data.

This demo is a Jupyter Notebook documentation analyzing the performance of S-Rerf against classifiers like K-Nearest Neighbors, Random Forest and Multi-Layer Perceptron on structured EEG data. To keep the structure of the data, binning (based on the concept of moving average filter) is done before training on the data. The challenge faced is that the data is highly unbalanced so it is balanced before training. The metric used for evaluation are precision curves, balanced accuracy and mean test error.

Output: The precision, balanced accuracy and mean test error plots that compare performance of S-Rerf with different classifiers.

Code and Details of the demo:
https://nbviewer.jupyter.org/github/NeuroDataDesign/team-forbidden-forest/blob/master/Sanika/Final_PR_upload.ipynb

* update max_features to accept a fraction > 1.0

* put inequality in easier to read form.
@bdpedigo
Copy link

@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.

@bdpedigo
Copy link

Some of your line lengths are way too long, my rule of thumb is <88 chars

@bdpedigo
Copy link

  • remove old code that is commented out
  • when you index by [:, 32] in cell 4, what is that doing?
  • I don't understand how you are doing the downsampling/resampling whatever. are you just grabbing time points at random?
  • This line Y_train_downsampled = X_train_downsampled.iloc[:,32] looks suspect to me, can you explain?
  • Again, can you explain X_train_downsampled.drop(X_train_downsampled.columns[[32]],axis=1,inplace = True) to me?
  • I'd just do all imports at the beginning of the notebook
  • raw,y_raw,raw_t,y_rar_t = None,None,None,None print (raw)?
  • can you plot some of the data? Maybe a few each positive and negative examples? It is hard for me to understand what is going on without it, and that might help understand what is going on for you too. May also want to consider doing so before and after your train test splitting as well as resampling so that you can make sure you are not messing anything up in that process
  • looks like the precision plot is still not making sense if I am understanding correctly
  • can you remind me what is the true class imbalance?

I think my main feedback is I want to better understand how you are splitting your data before debugging the downstream stuff too much. I am worried that may be part of the issue. I think to do that I would like to see some sample time series from each class, before and after all of your preprocessing. Let me know if that does not make sense or you don't agree

@sanika1201
Copy link
Author

@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.

@bdpedigo , I meant to PR to NeuroDataDesign/SPORF, i dont know how the commit got included. Should make a different PR?

@sanika1201
Copy link
Author

  • remove old code that is commented out
  • when you index by [:, 32] in cell 4, what is that doing?
  • I don't understand how you are doing the downsampling/resampling whatever. are you just grabbing time points at random?
  • This line Y_train_downsampled = X_train_downsampled.iloc[:,32] looks suspect to me, can you explain?
  • Again, can you explain X_train_downsampled.drop(X_train_downsampled.columns[[32]],axis=1,inplace = True) to me?
  • I'd just do all imports at the beginning of the notebook
  • raw,y_raw,raw_t,y_rar_t = None,None,None,None print (raw)?
  • can you plot some of the data? Maybe a few each positive and negative examples? It is hard for me to understand what is going on without it, and that might help understand what is going on for you too. May also want to consider doing so before and after your train test splitting as well as resampling so that you can make sure you are not messing anything up in that process
  • looks like the precision plot is still not making sense if I am understanding correctly
  • can you remind me what is the true class imbalance?

I think my main feedback is I want to better understand how you are splitting your data before debugging the downstream stuff too much. I am worried that may be part of the issue. I think to do that I would like to see some sample time series from each class, before and after all of your preprocessing. Let me know if that does not make sense or you don't agree

@bdpedigo I have made the changes we discussed and uploaded the latest code and plots to this PR.

@bdpedigo
Copy link

@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.

@bdpedigo , I meant to PR to NeuroDataDesign/SPORF, i dont know how the commit got included. Should make a different PR?

would rather you remove just that one commit, i don't like remaking PRs because you lose all of the comments

@bdpedigo
Copy link

the notebook itself should be part of this PR, just FYI

@bdpedigo
Copy link

I think we have talked about this already, but moving average filter is not what I meant by binning at all.

Binning for a single channel:

  • divide single timeseries into n bins, each of width m.
  • stack those individual bins into a n by m matrix, X. Input X as the training data

Binning for multichannel

  • For each channel 1...C, form X_1 ... X_C data matrices described above
  • concatenate columns of X_1 ... X_C to make X_big, a n by C x m matrix

@bdpedigo
Copy link

does that make sense? I want to make sure I am being clear. Though I think we may be out of time to actually do this right now, but I still want to make sure it is clear for the future.

@bdpedigo
Copy link

Plots look good though, and I think make more sense than what you have shown in the past

@sanika1201
Copy link
Author

I think we have talked about this already, but moving average filter is not what I meant by binning at all.

Binning for a single channel:

  • divide single timeseries into n bins, each of width m.
  • stack those individual bins into a n by m matrix, X. Input X as the training data

Binning for multichannel

  • For each channel 1...C, form X_1 ... X_C data matrices described above
  • concatenate columns of X_1 ... X_C to make X_big, a n by C x m matrix

Yes, I understand this, and it makes more sense. Due to memory limitations, I decided to down-sample it to one value representing each bin, which was the mean. I went through a few recommendations on kaggle and this was one of the suggestions which gave decent results on Neural Network so i went ahead with this.

@bdpedigo
Copy link

I see. in that case feels like we are mostly limited by compute power at this point?

@sanika1201
Copy link
Author

I see. in that case feels like we are mostly limited by compute power at this point?

Yes. If we can get a little more compute power next semester, will try to get better results on this with the improvements you mentioned above.

@bdpedigo
Copy link

plots are clear, and this should scale up nicely once we get you some actual compute resources, and at that point i think we will be able to actually compare results. I don't have much more to recommend right now so I think you are done. Nice work!

@sanika1201
Copy link
Author

sanika1201 commented Dec 20, 2019

plots are clear, and this should scale up nicely once we get you some actual compute resources, and at that point i think we will be able to actually compare results. I don't have much more to recommend right now so I think you are done. Nice work!

Thanks!

@sanika1201 sanika1201 closed this Dec 20, 2019
@sanika1201 sanika1201 reopened this Dec 20, 2019
@sanika1201
Copy link
Author

the notebook itself should be part of this PR, just FYI

@bdpedigo , I think the other commit got added to this pull request instead of my notebook. Should i just make another PR and link this PR there so that the comments are not lost?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants