-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Narwhals migration for dataframe-agnostic codebase #658
Comments
I'm on it |
Thanks for making this tracker! I think grouping shouldn't be an issue, there's an example of that here: https://narwhals-dev.github.io/narwhals/basics/dataframe/#example-2-group-by-and-mean If I remember correctly, |
Before this (nice) tracker was opened, I was taking a look at using Narwhals on Also, I'm trying to run the tests but I'm having issues installing what's needed. Specifically, |
Hey @DeaMariaLeon, thanks for the ping. I forgot to add Regarding cvxpy, I tried to do a quick search, but couldn't find related material. Maybe osqp build from source guide could help. Keep me posted on that 😇 |
One thing about that FairClassifier ... there may be a solid reason to consider porting that over to fairlearn at some point. The thinking here is that while scikit-lego is a cool place for somewhat experimental and fun components ... fairness might be a more serious topic so it might be a better home for our fairness tools. No formal decision has ever been made regarding this but it was something that was always in the back of people's mind. |
I'll leave FairClassifier for later then. What I noticed is that Regarding cvxpy, thanks for the help @FBruzzesi. Everything is installed now... But when running
|
Maybe we should write which function we are working on, so we don't overlap? I would like to work on |
Scikit-learn has a very similar function than Is |
Some of those tools were added 5-6 years ago back when sklearn did not support those features. Scikit is a lot further now but on the scikit-lego side of things we've kept things around since we didn't really see a reason to remove them. Since we're interested in giving narwhals a spin here I'd argue it'd probably be good to keep them around if only because the implementation exercise will also demonstrate that it can be used in other projects. |
Narwhals is not ready to easily work for |
Been looking into TimeGapSplit:
However, other than the index, |
@DeaMariaLeon sure, no problem for me! Let me know if I can help on the Narwhals side to make the missing bits available.
This is a good assumption in my opinion (but I am biased as I do the same in |
@MarcoGorelli I may have made a commit bypassing review by accident 🙈. Could you take a look at it when you have the time? It should make |
Nice, looks good! looks like it preserves the existing logic This has been quite a group effort, and much progress has been made - at what point do you think a release might be warranted? I'm presenting Narwhals at PyCon Italia in the 23rd of May (in 12 days), would be pretty cool if I could give a live demo of scikit-lego 😎 I understand if you'd rather wait until there's a bit more though, no pressure! |
@MarcoGorelli that would be grand! Let's try to make a release within this week (or next weekend at latest?!). Personally I would love to be able to replace |
@MarcoGorelli the stuff that's already been done seems sufficient to me as a "demo". It's mainly to report that folks are indeed working on the integration and that sofar it seems to work. |
Totally! I just meant, it would be nice to be able to demo this result: In [15]: %timeit add_lags(df, ['a', 'b', 'c'], [1, 2, 3, 4, 5])
4.35 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [16]: %timeit pl.from_pandas(add_lags(df.to_pandas(), ['a', 'b', 'c'], [1, 2, 3, 4, 5]))
140 ms ± 1.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [17]: type(df)
Out[17]: polars.dataframe.frame.DataFrame and say something like "This is why "just convert to pandas" isn't a satisfying solution. Native Polars support brings you library to another level - you can use this today, go out and
Sounds great! Given the pace at which things are moving, it may be realistic to have the whole thing done by then. If in a couple of places, non-pandas input is converted to pandas under the hood (like the grouped predictor in #667) then TBH I think that's fine as a temporary solution. And I don't think it would be temporary for too long 🚀 |
Been thinking about this one. I think what would be good to aim for would be:
One suggestion I have is:
Like this, scikit-lego keeps the same API for pandas and non-pandas users. For pandas, it keeps using An alternative would be to let users passing in Narwhals dtypes, but I'm not sure that end users should know about Narwhals 🤔 |
cool, thanks! granted, I'm slightly biased as I'm quite keen to present this as "you can try this today!" at the conference, but to be honest I don't think GroupedPredictor should be a blocker - we can definitely solve it, I don't see any theoretical reasons we can't do it, it'll just take a little longer |
Closed by #671 |
Just so folks know, it's all live now. Might make sense to see if we can collaborate on making a splash with an announcement? |
Yup! The blog post is now live: https://labs.quansight.org/blog/scikit-lego-narwhals Some posts I made: |
Description
Creating this issue to keep track of which classes/function could benefit from adopting Narwhals.
preprocessing.ColumnDropper
preprocessing.ColumnSelector
preprocessing.PandasTypeSelector
TypeSelector
and deprecate this onecommon.TrainOnlyTransformerMixin
.to_numpy()
and hash the array data?model_selection.TimeGapSplit
model_selection.GroupedTimeSeriesSplit
projections.InformationFilter
meta.RegressionOutlierDetector
meta.hierarchical_predictor.py
meta.grouped_transformer.py
meta.grouped_predictor.py
linear_models._FairClassifier
pandas_utils.py
datasets.py
read_csv
functionPersonally I would wait to have at least
preprocessing.pandastransformers.py
entire migration before bumping to v0.9.0.cc @MarcoGorelli @anopsy
(*) Regarding
pandas_utils
Changing
log_step
to narwhals is fairly easy (~4 lines of code), however as this decorator is supposed to work for any function that operates on pandas, doing so would limit its functionality. It could be reasonable to add another one which although restricted to narwhals methods, it can interoperate with all its compatible dataframes.Legend
✅ Done
🚧 WIP
🔲 Not Started
🚫 Won't do
The text was updated successfully, but these errors were encountered: