Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Un-pandas GroupedTimeSeriesSplit #605

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

FBruzzesi
Copy link
Collaborator

Description

In #604 there was a tiny hint at this 😂

In this PR I tried to remove the use of pandas from GroupedTimeSeriesSplit by moving on numpy backend. The only method that still needs a "dataframe"-like object is .summary() (which it's even a nice to have).

In a way this is an alternative way to achieve #597 when a dataframe is just a nice abstraction but not mandatory. The only function that has to work to proceed forward and run .split() method is indexable (from sklearn.utils.validation).

for i in range(self.n_splits):
yield np.where(groups == i)[0], np.where(groups == i + 1)[0]
yield group_indices[i], group_indices[i + 1]

def _calc_first_and_last_split_index(self, X=None, y=None, groups=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, the changes in _calc_first_and_last_split_index are the only hard core difference

@FBruzzesi
Copy link
Collaborator Author

Following all the effort put into adopting Narwhals, it would be nice to revamp this as well. I will close the PR and come back to the topic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants