Aggregation Bug Fixed & Support Provided for Example Datasets #12

Garen-Wang · 2021-01-27T15:30:19Z

When directly running experiments according to the sample, error occurs due to passing a dict as a parameter of function agg, which is now deprecated by pandas.

Just changing its type to list can solve this problem. So there is no need to use pandas==0.25.

Jan 29 upd:

If directly run experiments of example datasets, actually feature importance and AUC cannot be calculated correctly.

For instance, the AUC will always be 0.5 and the feature importance of heart dataset will be like this:

   feature_name  split  gain  gain_percent  split_percent  feature_score
0           age      0   0.0           NaN            NaN            NaN
1           sex      0   0.0           NaN            NaN            NaN
2    chest-pain      0   0.0           NaN            NaN            NaN
3    bp-resting      0   0.0           NaN            NaN            NaN
4   cholesterol      0   0.0           NaN            NaN            NaN
5    bs-fasting      0   0.0           NaN            NaN            NaN
6   ecg-resting      0   0.0           NaN            NaN            NaN
7        hr-max      0   0.0           NaN            NaN            NaN
8           eia      0   0.0           NaN            NaN            NaN
9       oldpeak      0   0.0           NaN            NaN            NaN
10    k-oldpeak      0   0.0           NaN            NaN            NaN
11      vessels      0   0.0           NaN            NaN            NaN
12         thal      0   0.0           NaN            NaN            NaN

Actually we need to modify one of the parameters of algorithm LightGBM, called min_data, to a factor of the number of instances in our own dataset. Now the feature importance will become normal:

   feature_name  split        gain  gain_percent  split_percent  feature_score
7        hr-max     75  173.631106     11.917672      23.291925       0.198796
4   cholesterol     48   76.677835      5.263005      14.906832       0.120137
0           age     48   70.926419      4.868240      14.906832       0.118953
2    chest-pain     14  396.282254     27.199977       4.347826       0.112035
9       oldpeak     39  125.718957      8.629084      12.111801       0.110670
12         thal     14  290.041550     19.907839       4.347826       0.090158
11      vessels     20  174.509817     11.977985       6.211180       0.079412
3    bp-resting     27   37.454598      2.570804       8.385093       0.066408
1           sex     13   35.071794      2.407254       4.037267       0.035483
6   ecg-resting     10   17.174318      1.178809       3.105590       0.025276
8           eia      8   29.409258      2.018589       2.484472       0.023447
10    k-oldpeak      6   30.023399      2.060743       1.863354       0.019226
5    bs-fasting      0    0.000000      0.000000       0.000000       0.000000

Then we can expand our scalability to other datasets.

pull request from branch dev

Expand Scalability to Example Datasets

Garen-Wang added 9 commits January 27, 2021 23:21

bug fixed when aggregating

d78e857

add .gitignore

2215a7c

same bug fixed in function nunique & histstat

bc2e42c

delete .gitignore

07e0865

maintain original format

ac06fa7

Merge pull request #1 from Garen-Wang/dev

7459d47

pull request from branch dev

removed unused package name

979b640

add parameter min_data to lgb_model_train

df3e69a

Merge pull request #2 from Garen-Wang/dev

eb4da96

Expand Scalability to Example Datasets

Garen-Wang changed the title ~~bug fixed when aggregating~~ Aggregation Bug Fixed & Support Provided for Example Datasets Jan 29, 2021

requirements error fixed

625c0ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation Bug Fixed & Support Provided for Example Datasets #12

Aggregation Bug Fixed & Support Provided for Example Datasets #12

Garen-Wang commented Jan 27, 2021 •

edited

Loading

Aggregation Bug Fixed & Support Provided for Example Datasets #12

Are you sure you want to change the base?

Aggregation Bug Fixed & Support Provided for Example Datasets #12

Conversation

Garen-Wang commented Jan 27, 2021 • edited Loading

Garen-Wang commented Jan 27, 2021 •

edited

Loading