Optimised tree growing method #74

ArashBayatDev · 2018-04-10T04:32:41Z

I recommend the following improvement to VariantSpark Random Forest importance analysis.

Compute and write importance score to a file after building every 1000 tree.
Automatically identify when enough tree has been built. If implementing the first suggestion then we can compare importance score at each step (1000 trees built) with the importance scores computed in the previous step. if little change has happened then we can stop building more trees.
Frequently (every -rbs tree) dump models (built trees) to disk and allowing to integrate previously built models in a new run. If the process crash half way produced model can be used in the next run.

Yatish0833 · 2019-05-31T02:13:03Z

Work towards updating the VariantSpark code locally to generate frequently (every -rbs tree) dump models (built trees) to disk to create a test dataset which can be used to test the above hypothesis. Test results of this dataset will then be posted on this thread to get acceptance from everyone involved whether or not to move forward with this feature.

BauerLab changed the title ~~Improvement~~ Optimised tree growing method Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimised tree growing method #74

Optimised tree growing method #74

ArashBayatDev commented Apr 10, 2018

Yatish0833 commented May 31, 2019

Optimised tree growing method #74

Optimised tree growing method #74

Comments

ArashBayatDev commented Apr 10, 2018

Yatish0833 commented May 31, 2019