-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New example glm ordinal features #717
New example glm ordinal features #717
Conversation
+ already complete and created in another env
+ cited inside Notebook + adjusted errata
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@ maintainer team Hello! It's been years since I submitted an examples notebook. This is the first and simplest of a handful of new notebooks that I'd like to add, and am starting simple and working upwards. I've tried to add all the various formatting bits and pieces that seem to make the current standard, so I hope any changes required will be minor, but I'll happily edit as needed! I've tested this running locally and in Google Collab |
WIP need to create a simple env with jupyter-black in pre-commit so we have the same validation. My env uses more and different validation (!), will return to this in a couple of hours after I get back from something |
…: Too many commas in 'B{\\"u}rkner, P., & Charpentier, E.'
+ ran pre-commit in full + autocreated *.myst file
Alrighty - ready for review! |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:29Z Probably better as paragraphs rather than bullet points (except where there are actual lists). For consistency with other notebooks, as well as readability.
Key issue with ordinal variables is that the distance between categories are unknown and potentially unequal (e.g. the difference between better and good vs that between good and very good). jonsedar commented on 2024-10-28T05:30:51Z Thanks - I've updated to be more clear :) |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:30Z Assume the target_accept comment is not necessary since the model runs fine with 0.8? jonsedar commented on 2024-10-28T04:57:44Z Yep fair enough - I was drawing attention to it for when I change it in Model B, but we can just have the implict default |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:30Z 🔥 jonsedar commented on 2024-10-28T04:58:21Z I couldn't resist ;) |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:31Z A few typos here. jonsedar commented on 2024-10-28T05:00:22Z My tpying skills aren't what they were! |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:32Z Cells above can be removed, no? jonsedar commented on 2024-10-28T07:57:47Z Stylistic thing.. I quite like to preserve the workflow if possible to explicitly show that data prep steps have been considered but not needed. I've added a note above in section 0 (NOTE some boilerplate steps are included but ~~struck through~~ with and explanatory comment e.g. "Not needed in this simple example". This is to preserve the logical workflow which is more generally useful)
Happy to remove completely if you prefer |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:33Z A bit more exposition on the encoding of categoricals may be helpful for those unfamiliar. jonsedar commented on 2024-10-28T07:54:04Z Sure, have added more description and dicussion of the data |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-10-28T03:00:33Z A bit more exposition about the target variable would be helpful, either here or in the introduction. I'm not really sure what I'm looking at. jonsedar commented on 2024-10-28T07:53:55Z Sure, have added more description and dicussion of the data |
Well caught! View entire conversation on ReviewNB |
Thanks - I've updated to be more clear :) View entire conversation on ReviewNB |
Thanks Chris - much appreciated! Nice to have all the software admin tools surrounding the process too - makes life much better.
Your request to have more detail on the data encouraged me to properly read that Burkner paper and see that the ordinal scales are supposed to be [0-4] on both features (d450, d455), but on d450 we only observe values [0-3]. So there's a missing data problem too, which makes this example more rich and reinforces the need for ordinal handling (not numeric).
That paper actually doesn't handle the missing value, so I'll take the opportunity to improve upon it in our notebook here |
Sure, added a title with a little explantion to encourage folk to dig more View entire conversation on ReviewNB |
Indeed! I've had this boilerplate for a while and can't remember why I was using it over and above factorize :D I've changed it accordingly and suffered no impact here so maybe I'll have to change my boilerplate too View entire conversation on ReviewNB |
Poor naming on my part, I've tried to clarify View entire conversation on ReviewNB |
Sure, have added more description and dicussion of the data View entire conversation on ReviewNB |
Sure, have added more description and dicussion of the data View entire conversation on ReviewNB |
Stylistic thing.. I quite like to preserve the workflow if possible to explicitly show that data prep steps have been considered but not needed. I've added a note above in section 0 (NOTE some boilerplate steps are included but ~~struck through~~ with and explanatory comment e.g. "Not needed in this simple example". This is to preserve the logical workflow which is more generally useful)
Happy to remove completely if you prefer View entire conversation on ReviewNB |
+ new work to positively include a coeff in mdlb for d450 = c4
Huh? Cell 11 does come after 4... why is that a problem?
|
Aha, the checks are defeated! Over to you @fonnesbeck :) |
+ fixed a couple of typos
This is ready to go, how do we get this merged? @fonnesbeck ? Cheers :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry this took so long to get back to! |
No worries - thanks for the merge! |
* + added new notebook GLM-ordinal-features.ipynb + already complete and created in another env * Created using Colab * minor updates for latest seaborn * Tweaks for collab and included authors * added header * + added new reference article and online + cited inside Notebook + adjusted errata * + ran black again... * + rep Burkner as B{\"u}rkner * + maybe fixed readthedocs complaint pybtex.database.InvalidNameString: Too many commas in 'B{\\"u}rkner, P., & Charpentier, E.' * + ran black-jupyter, let's see * + another run of black-jupyter * + installed local pymc_examples env + ran pre-commit in full + autocreated *.myst file * + update tags * + possibly persuaded the precommits to all pass * + rerun on colab to confirm all good post new pre-commit process * + okay, reran in colab again... lets see if this passes * + added (again) the myst.md * + minor updates post review + new work to positively include a coeff in mdlb for d450 = c4 * + reran precommit and readded myst.md * + rerun localyl e2e * + added myst.md again * + reran again to ensure cell execution order even for markdown cells * + reran again again to ensure order * + minor update: forced addtional level c4 into d450 categorical feature + fixed a couple of typos * + changed rating to intermediate
* + added new notebook GLM-ordinal-features.ipynb + already complete and created in another env * Created using Colab * minor updates for latest seaborn * Tweaks for collab and included authors * added header * + added new reference article and online + cited inside Notebook + adjusted errata * + ran black again... * + rep Burkner as B{\"u}rkner * + maybe fixed readthedocs complaint pybtex.database.InvalidNameString: Too many commas in 'B{\\"u}rkner, P., & Charpentier, E.' * + ran black-jupyter, let's see * + another run of black-jupyter * + installed local pymc_examples env + ran pre-commit in full + autocreated *.myst file * + update tags * + possibly persuaded the precommits to all pass * + rerun on colab to confirm all good post new pre-commit process * + okay, reran in colab again... lets see if this passes * + added (again) the myst.md * + minor updates post review + new work to positively include a coeff in mdlb for d450 = c4 * + reran precommit and readded myst.md * + rerun localyl e2e * + added myst.md again * + reran again to ensure cell execution order even for markdown cells * + reran again again to ensure order * + minor update: forced addtional level c4 into d450 categorical feature + fixed a couple of typos * + changed rating to intermediate
* + added new notebook GLM-ordinal-features.ipynb + already complete and created in another env * Created using Colab * minor updates for latest seaborn * Tweaks for collab and included authors * added header * + added new reference article and online + cited inside Notebook + adjusted errata * + ran black again... * + rep Burkner as B{\"u}rkner * + maybe fixed readthedocs complaint pybtex.database.InvalidNameString: Too many commas in 'B{\\"u}rkner, P., & Charpentier, E.' * + ran black-jupyter, let's see * + another run of black-jupyter * + installed local pymc_examples env + ran pre-commit in full + autocreated *.myst file * + update tags * + possibly persuaded the precommits to all pass * + rerun on colab to confirm all good post new pre-commit process * + okay, reran in colab again... lets see if this passes * + added (again) the myst.md * + minor updates post review + new work to positively include a coeff in mdlb for d450 = c4 * + reran precommit and readded myst.md * + rerun localyl e2e * + added myst.md again * + reran again to ensure cell execution order even for markdown cells * + reran again again to ensure order * + minor update: forced addtional level c4 into d450 categorical feature + fixed a couple of typos * + changed rating to intermediate
New example notebook: GLM-ordinal-features
The current example notebook for ordinal regression
GLM-ordinal-regression.ipynb
shows how to handle ordinals in the target (endogenous) feature, but not in the predictor (exogenous) feature(s).
I wanted to treat an ordinal feature in a model myself and dug up an old example by Austin Rochford, itself based on a 2018 paper by Burkner & Charpenitier. The example was good, but I thought I'd take the opportunity
to build out a full workflow, add more explanations and include in pymc-examples.
I've a few other related ones in the pipeline, but starting simple.
📚 Documentation preview 📚: https://pymc-examples--717.org.readthedocs.build/en/717/