Add advanced functionality to handle processing of big datasets #55

iretex · 2020-07-14T17:23:22Z

Adding pyspark module to be triggered in the event that the size of input dataset is bigger than 1mil rows.
This capability should improve the runtime generally and give improved user experience.

krisblarq · 2020-07-31T19:47:27Z

Hi @risenW i'll like to work on this please.

risenW · 2020-08-01T08:24:43Z

Sure, please go ahead

iretex · 2020-08-01T11:26:16Z

Krisblarq, do you want to start writing the test cases, how do you think we can collaborate? Rising, I imagine that you'll like to retain existing code structure, which could mean that the pyspark code will be maintained as a separate module? If we agree on the structure, the development can be accelerated. Very Best! Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: Rising Odegua <[email protected]> Sent: Saturday, August 1, 2020 9:24:56 AM To: risenW/datasist <[email protected]> Cc: Iretioluwa Olawuyi <[email protected]>; Author <[email protected]> Subject: Re: [risenW/datasist] Add advanced functionality to handle processing of big datasets (#55) Sure, please go ahead — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#55 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGC2IF6RTXAWW6MRM3VATATR6PGNRANCNFSM4OZXKWRA>.

risenW · 2020-08-01T12:31:42Z

Yes @iretex the module should be separate. Then it can be called from other modules when we intend to use it. This should be invisible to the user as well.

iretex · 2020-08-01T12:38:45Z

Very clear!

…

________________________________ From: Rising Odegua <[email protected]> Sent: Saturday, August 1, 2020 1:31:56 PM To: risenW/datasist <[email protected]> Cc: Iretioluwa Olawuyi <[email protected]>; Mention <[email protected]> Subject: Re: [risenW/datasist] Add advanced functionality to handle processing of big datasets (#55) Yes @iretex<https://github.com/iretex> the module should be separate. Then it can be called from other modules when we intend to use it. This should be invisible to the user as well. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#55 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGC2IFZYRSCOOMDXIGENNKDR6QDLZANCNFSM4OZXKWRA>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add advanced functionality to handle processing of big datasets #55

Add advanced functionality to handle processing of big datasets #55

iretex commented Jul 14, 2020

krisblarq commented Jul 31, 2020

risenW commented Aug 1, 2020

iretex commented Aug 1, 2020 via email

risenW commented Aug 1, 2020

iretex commented Aug 1, 2020 via email

Add advanced functionality to handle processing of big datasets #55

Add advanced functionality to handle processing of big datasets #55

Comments

iretex commented Jul 14, 2020

krisblarq commented Jul 31, 2020

risenW commented Aug 1, 2020

iretex commented Aug 1, 2020 via email

risenW commented Aug 1, 2020

iretex commented Aug 1, 2020 via email