Composite option, dataString compression, and correct splits percentage #40

pjaselin · 2021-12-27T18:32:45Z

Hi @topepo! I've been working on a port of your code into Python (I believe Kirk mentioned that) over here: https://github.com/pjaselin/Cubist. Thank you so much for all the work you are your colleagues have done on this!

Some improvements/fixes I've made here:

Composite option: in one parameter you can choose whether to use instance-based correction or let Cubist decide. I think this is helpful because instance-based correction adds a little opacity to the prediction process.
Moved number of nearest neighbors to cubistControl: I understand your intent was to change the number of neighbors at predict time since that's where it comes into use but I wonder if it makes more sense to be part of training as it is used in model evaluation.
dataString compression: I have the dataString compressed so the model has a smaller memory footprint when using instance-based correction.
Correct splits percentage: I noticed that you had hardcoded "<=" at

Cubist/R/cubist.R

Line 212 in 548ccd7

sum(x[, as.character(splits$variable[i])] <= splits$value[i]) / nrow(x)

so I came up with a way to get the right comparison operator based on the model. (This is probably the one definite fix here)

Let me know if you'd like to break this apart and I'd be happy to take feedback!

added committee and neighbor options

returned the neighbors parameter to the predict method

pjaselin · 2021-12-27T22:27:17Z

Also, giving the composite option should help accelerate prediction times for cases like #28

topepo · 2022-02-04T19:33:30Z

Thanks for doing this; it's great to have an outside contribute.

I've been looking at it for a few days and I'd like to tweak the ui. It's a little awkward to have an argument that could be logical or character. How about we

control the composite/model-only decision using the neighbors argument (values > 0 are composite)?
control the auto/manual decision with auto = TRUE/FALSE?

I think that would get us to the same place. Also, the current specification breaks a lot of existing analyses (in books, vignettes, and so on). I'd like it to be backward compatible and defaulting to auto = FALSE would do that.

working on Max's suggestions

pjaselin · 2022-02-08T04:18:43Z

Hi @topepo, I really appreciate your feedback and ideas here. Also thank you for allowing me to contribute! This is a much cleaner UI and I think I'll modify my Python implementation to match it.
Changes made:

auto=FALSE: decides whether to allow Cubist to decide whether to use composite models
neighbors=NA: decides whether to use composite models
cv=NA: run k-fold cross-validation (this is the last of my improvements in the Python code if you'd like to use it here, note that no model is returned by Cubist)

Also definitely please make sure this all passes your tests beyond mine!

fixed vignettes

removed model from cv output

pjaselin added 7 commits November 12, 2021 12:59

added committee and neighbor options

2174908

changed assignment method

0f2eaeb

switching to r studio

87257c5

added composite parameter, fixed splits table, and compressed datastring

4ec4b2f

Merge pull request #1 from pjaselin/feature/committees

3b18c4a

added committee and neighbor options

returned the neighbors parameter to the predict method

9914dcd

Merge pull request #2 from pjaselin/feature/committees

41790ce

returned the neighbors parameter to the predict method

pjaselin added 7 commits February 7, 2022 22:31

working on Max's suggestions

29ed6ab

added cross validation

2ca08e5

cleaned default values

76d3d7b

changed number of variables in function declaration

0a68128

added space around =

eb12477

cv works

b7c29d7

Merge pull request #3 from pjaselin/ui_changes

48ebd15

working on Max's suggestions

pjaselin added 5 commits February 7, 2022 23:34

fixed vignettes

4785cb7

Merge pull request #4 from pjaselin/ui_changes

436d2cf

fixed vignettes

removed model from cv output

3720f20

Merge pull request #5 from pjaselin/ui_changes

548ee6d

removed model from cv output

Merge branch 'topepo:master' into master

f0446d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composite option, dataString compression, and correct splits percentage #40

Composite option, dataString compression, and correct splits percentage #40

pjaselin commented Dec 27, 2021 •

edited

Loading

pjaselin commented Dec 27, 2021 •

edited

Loading

topepo commented Feb 4, 2022

pjaselin commented Feb 8, 2022 •

edited

Loading

Composite option, dataString compression, and correct splits percentage #40

Are you sure you want to change the base?

Composite option, dataString compression, and correct splits percentage #40

Conversation

pjaselin commented Dec 27, 2021 • edited Loading

pjaselin commented Dec 27, 2021 • edited Loading

topepo commented Feb 4, 2022

pjaselin commented Feb 8, 2022 • edited Loading

pjaselin commented Dec 27, 2021 •

edited

Loading

pjaselin commented Dec 27, 2021 •

edited

Loading

pjaselin commented Feb 8, 2022 •

edited

Loading