Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use categorical dtype for categorical features #603

Merged
merged 1 commit into from
Oct 15, 2023

Conversation

probberechts
Copy link
Member

The categorical 'actiontype', 'result', 'bodypart' and 'bodypart_detailed' features now have dtype "pd.Categorical". This allows machine learning frameworks such as xgboost to automatically recognize these features as being categorical and handle them accordingly.

The features are now strings (e.g., "pass") instead of integer ids. Also, the column name for each of these features was renamed as follows: type_id -> actiontype
result_id -> result
bodypart_id -> bodypart

@probberechts probberechts added the breaking Breaking Changes label Oct 10, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (81cc37a) 82.83% compared to head (8b1df4c) 82.87%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@               Coverage Diff                @@
##           release/v1.5     #603      +/-   ##
================================================
+ Coverage         82.83%   82.87%   +0.03%     
================================================
  Files                47       47              
  Lines              3425     3433       +8     
  Branches            566      566              
================================================
+ Hits               2837     2845       +8     
  Misses              493      493              
  Partials             95       95              
Files Coverage Δ
socceraction/vaep/features.py 94.76% <100.00%> (+0.20%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The categorical 'actiontype', 'result', 'bodypart' and 'bodypart_detailed'
features now have dtype "pd.Categorical". This allows machine learning
frameworks such as xgboost to automatically recognize these features as being
categorical and handle them accordingly.

The features are now strings (e.g., "pass") instead of integer ids. Also, the
column name for each of these features was renamed as follows:
type_id -> actiontype
result_id -> result
bodypart_id -> bodypart
@probberechts probberechts merged commit b45fd57 into release/v1.5 Oct 15, 2023
20 checks passed
@probberechts probberechts deleted the feat/categorical-features branch December 29, 2023 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking Changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants