-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some decision tree branching may be unnecessary #226
Comments
I'll have a look, thanks! In the meantime, the code where this is done lives here: taxinomitis/serverless-functions/mltraining-numbers/numbers.py Lines 43 to 56 in db15bf8
|
@dalelane awesome, thanks! Yeah, it seems like the scikit learn algorithm is an approximation since for larger real-world data sets it can be intractable to compute the optimal representation link:
I suppose to simplify things for folks it'd be possible to write a function that does that simplification on the tree that's returned, since for ML for kids models the trees wouldn't be expensive to prune after the fact. sci-kit learn exposes the internals of the decision tree structure (link) but the graphviz function takes a Relatedly, it looks like subsequent runs are non-deterministic link:
This might be good to either add a line about in the explanation, or to avoid this to simplify the experience and fix a random seed so that folks aren't confused if they get different trees from different training runs on the same data set (eg, within a class). |
@dalelane #221 is amazing, awesome work! 👍 Thanks as always for sharing such great work in the open! ❤️
I made some limited test data just to check this out, and noticed that the explanation had some parts of the tree that didn't seem necessary since they all result in the same classification. Here's an example, with the parts highlighted that seem unnecessary:
Maybe the way the tree is constructed can lead to this? I'm not sure. I tried to see where the tree is constructed, guessing that this was done by a service but only got as far as
taxinomitis/src/lib/training/numbers.ts
Line 328 in db15bf8
I'm not sure if this is an actual issue, as in something isn't working as it should, or just something that might be worth adding to your great explanation up top.
Also, http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ has some awesome visuals of decision trees that might be a good fit as a "learn more" link in the explanation up to as well.
The text was updated successfully, but these errors were encountered: