-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed Improvement: polars backends #77
Comments
I will check what's possible there |
Adding this as a running checklist for tracking what has been completed and by whom. Should you wish to contribute to this issue, and there are plenty of functions to work on please just @JustinKurland here and I will add you to the respective function you are working on and when completed make sure it is listed here for ongoing efforts and to get some credit for helping out! Polars Backend FunctionsWrangling Pandas Time Series DataFrames
Anomaly Detection
Adding Features to Time Series DataFrames (Augmenting)
TS Features
Finance Module
Time Series for Pandas Series
Date Utilities
Extra Pandas Helpers
13 Datasets
|
Will do augment_fourier (discussed with Justin Kurland) :) |
Awesome that's much appreciated! |
I will do ts_summary at the same time because I need it. I updated checks.py like this (not yet pushed):
It seems more Pythonic to me, if you agree with it. I ran the tests/ it is working :) I am doing a polars version of augment_fourier, then if possible I plan to merge the polar version with augment_fourier_v2, converting pandas dtypes to polars dtypes, then doing the computations, then converting back. Is that what you intended to do ? |
As long as it works as intended I'm Ok. Thanks! |
@GTimothee yes, that is correct. pandas -> polars -> pandas ... where inside the function the conversions occur. There may be some functions at the moment where polars dataframes are being accepted. Do not use that pattern those have to be refactored to only accept pandas. |
Understood :) Sorry I am lacking time a little bit but I am on it ! |
…d-fill-internal Update future.py forward fill internal
I think we can check augment_fourier, no ? |
Ok sounds good. I plan to release 0.2.0 tomorrow. Let me know if there is anything I can do to help. |
Actually the main problem I have is with checking my results. I am trying %timeit in a notebook cell but everytime I run it it gives me different results. And there is also a difference between running my experiments notebook locally and in colab'. Not the same output. I am not sure what I am doing wrong. But I guess my experimental function is not good enough anyway because in general, even with the variations, the current implementation is faster. I had an implementation leveraging itertools.permutation which was faster but I found that it does not give good results. I switched to itertools.product and now it is slower :/ |
In this function : https://github.com/business-science/pytimetk/blob/master/src/pytimetk/core/ts_summary.py#L398 why is there the comment "# "America/New_York" ? |
I think that's just an example of the time zone |
I was wondering if you were expected this particular time zone |
No I believe it can be different time zones. That comment is just an example. |
There are many reasons that running something even just locally could generate different results, I would not expect them to be identical. In fact you may get instances where the time goes down as a function of caching. Do not get thrown off by this. Further and related, I would not expect your results in colab to be the same. Also in colab I do not know what your setup is, but you can choose to take advantage of GPUs. You can check disk information using a command like
Maybe we can connect. I am not sure why you would be using |
Yes I will submit my experiments to you asap to get some feedback :) I was using itertools to generate permutations of order x period. It is how I would replace the loops. |
Can I take ceil_date? @JustinKurland |
Absolutely @seyf97 . I had begun working on this to figure out what this looked like for Dataframes
Series
Hopefully this helps jump start your effort quickly! |
Will do get_frequency_summary |
Running checklist of backends: #77 (comment)
The text was updated successfully, but these errors were encountered: