-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] simplify processing of pandas data #6066
Conversation
Thanks a lot for this! I agree that function has got quite complicated and it's great that you've found the time to simplify it. Sorry I haven't been able to review this, I'll provide a review in the following days. |
Thanks @jmoralez , no rush! I have a few other PRs I'm planning like this as well, trying to simplify the flow of the code to make it easier for contributors. |
There's something wrong with this PR, I can't approve it. I've tried from web, phone and gh CLI and I get the same error on all.
I'll try later today if that's ok with you. |
Yep no problem! GitHub is experiencing an outage in some services today. |
thanks @jmoralez ! |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Contributes to #3756.
Contributes to #3867.
The Python package has an internal utility function,
_data_from_pandas()
, which takes inpandas
DataFrames and extracts from them:numpy
formatThat function is quite complicated (in my opinion), and some logic hidden in there doesn't actually relate to
pandas
Dataframes.This PR proposes the following to simplify the Python package's code:
_data_from_pandas
onpandas
DataFramesfeature_name
andcategorical_feature
could be set toNone
"auto"
, or (categorical_feature only) a list of integer column indicesThis makes it easier for humans reading the code and for type-checking tools like
mypy
to understand the flow of data through the package.Notes for Reviewers
This will be easier to review if you apply the "Hide whitespace" changes in the GitHub diff view.