-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add advanced functionality to handle processing of big datasets #55
Comments
Hi @risenW i'll like to work on this please. |
Sure, please go ahead |
Krisblarq, do you want to start writing the test cases, how do you think we can collaborate?
Rising, I imagine that you'll like to retain existing code structure, which could mean that the pyspark code will be maintained as a separate module?
If we agree on the structure, the development can be accelerated.
Very Best!
Get Outlook for Android<https://aka.ms/ghei36>
…________________________________
From: Rising Odegua <[email protected]>
Sent: Saturday, August 1, 2020 9:24:56 AM
To: risenW/datasist <[email protected]>
Cc: Iretioluwa Olawuyi <[email protected]>; Author <[email protected]>
Subject: Re: [risenW/datasist] Add advanced functionality to handle processing of big datasets (#55)
Sure, please go ahead
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#55 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGC2IF6RTXAWW6MRM3VATATR6PGNRANCNFSM4OZXKWRA>.
|
Yes @iretex the module should be separate. Then it can be called from other modules when we intend to use it. This should be invisible to the user as well. |
Very clear!
…________________________________
From: Rising Odegua <[email protected]>
Sent: Saturday, August 1, 2020 1:31:56 PM
To: risenW/datasist <[email protected]>
Cc: Iretioluwa Olawuyi <[email protected]>; Mention <[email protected]>
Subject: Re: [risenW/datasist] Add advanced functionality to handle processing of big datasets (#55)
Yes @iretex<https://github.com/iretex> the module should be separate. Then it can be called from other modules when we intend to use it. This should be invisible to the user as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#55 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGC2IFZYRSCOOMDXIGENNKDR6QDLZANCNFSM4OZXKWRA>.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Adding pyspark module to be triggered in the event that the size of input dataset is bigger than 1mil rows.
This capability should improve the runtime generally and give improved user experience.
The text was updated successfully, but these errors were encountered: