Long processing times when handling very large files #12

hammzj · 2020-05-05T18:52:08Z

Hello,

I've been here before and I'm back 😊
This gem has become a cornerstone in one of the projects I've developed. In most cases, it handles very well, minus some configuration options we need to customize to each scenario, but so it goes.

Now, we are working with larger files, 1.5million and such rows. In some cases, it seems to take hours. I've tested this before in tests with files between 500,000 and 1,000,000 rows, and have experienced around 15 minutes or more to fully process diffs of these files using the gem. We can deal with that even though it's not lovely, but any time taking longer than that is detrimental.

Now, I am not sure if this is an issue with how we provide key_fields or such, but I am mainly writing this issue out as a question on what experiences people have had with comparing large files? Is this a gem constraint, our own CSVDiff configuration, or something else?

What have you recorded for working with files of one-million plus rows, with up to 100 columns?

The text was updated successfully, but these errors were encountered:

hammzj · 2020-10-27T19:19:45Z

@agardiner You're gonna hear from me a lot but that's because this tool is incredibly valuable to our team.

We have been experiencing very long times to process files over 100,000 rows. Mainly I am talking 500k+ rows in files. These run for hours without producing results. Have you done any performance testing with large files?

agardiner · 2020-10-27T23:34:08Z

I've not used this with files larger than 100k records, but I'd expect that performance drops off exponentially as your inputs grow. The implementation is pretty simple and works well for small inputs, but it was not designed for speed or to scale to large volume inputs.
Sorry I don't have any better news for you.

SerKnight · 2021-12-11T00:10:40Z

Came across this issue. If I continue using I can look to see if I can add a debug option so that at least you have insight into stage of process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long processing times when handling very large files #12

Long processing times when handling very large files #12

hammzj commented May 5, 2020

hammzj commented Oct 27, 2020

agardiner commented Oct 27, 2020

SerKnight commented Dec 11, 2021

Long processing times when handling very large files #12

Long processing times when handling very large files #12

Comments

hammzj commented May 5, 2020

hammzj commented Oct 27, 2020

agardiner commented Oct 27, 2020

SerKnight commented Dec 11, 2021