Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for masssive files #244

Open
Aditya94A opened this issue Feb 21, 2020 · 2 comments
Open

Better support for masssive files #244

Aditya94A opened this issue Feb 21, 2020 · 2 comments

Comments

@Aditya94A
Copy link

Aditya94A commented Feb 21, 2020

There should be an option to simply "upload" (or point to) the input file and download the output file.

Using textareas means that the browser just gets stuck on large-ish files trying to show it all, making everything laggy.

Solution: Don't show text for large files.

@peterdresslar
Copy link

The support for massive files is already pretty amazing--I just ran a 75M file and it worked perfectly. The only thing I would change might be mentioning in the docs that the browser may complain a few times that it is frozen--not to worry, the file will process if you press "Wait" a few times.

@gabefair
Copy link

gabefair commented Jan 10, 2021

I was having problems getting it to work with large files as well, but I was able to determine that the following helped me:

  1. Removing "_id":ObjectId("<number>") from the json
    In my case I had a massive mongodb dump from a journalist, and I removed them all using the following regex "_id" : \w+\("\w+"\), \n
  2. Make sure the JSON is valid. Online validators can help. I found this one was able to take my 18Mb sized JSON dump I pasted via my clipboard.
  3. Convert JSONL to JSON. By adding [ to the front and ] to the back I was able to convert the JSON Streaming (aka LDJSON) to valid JSON

My JSON had 13,122 urls and I was worried that this would break Eric's anonymous API request limit, but surprisingly it did not hit a limit and break.

I used Firefox Developer Edition with the script timeout setting changed to be its maximum value: 2147483647 and was able to download the CSV after leaving it to run for about 15 mins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants