-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON lines format #127
Comments
JSON lines would be preferable but btw https://github.com/precog/tectonic can be used to load data like this in a streaming manner. |
So, has there been any progress on this in the two years since this was requested? Or does anybody have suggestions on formatting the results? I'm using PHP to fetch the data, and would like to format it with html/css. Just a string of text is impossible to format when the results are vastly different from one drug to the next. At the very least indicate line breaks. |
I made a simple python script to modify the json documents. The only issue I have run into is broken json within the export files. The script tries to fix them if it can otherwise it stores them in a log file. It is currently made to run through two folders named device and drug. This version only works on windows.
|
It would be very useful to have a version of the dataset download files provided in JSON Lines format (one self contained record per line) so that it is splittable for ingestion by a distributed cluster computing system like Spark. In the current format, each file has to be loaded into memory entirely before it can be ingested.
The text was updated successfully, but these errors were encountered: