You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would have thought this be a popular ask, but I didnt seem to find any discussion/solution
I am trying to absolutely minimize memory usage because the document has a huge array, the doc looks like this
{<unpredictable stuff>, "array":[{"name":<long strings>},...]}}
using a filter like this
filter["array"][0]["name"] = true;
I am still running out of memory, because, with this filter, the final product still contains all the "name"s (again, this is a very long array)
I wonder if there are other ways to more concisely filter... perhaps something like this (doesn't seem to work)
filter["array"][3-14]["name"] = true;
so I get only forth to fifteenth elements? 16 elements fits perfectly in memory vs all the elements in my case.
An alternative idea maybe a callback
filter["array"][]["name"] = callback;
so i can examine each time a "name" is found, perhaps the call back tells me the index and lets me know the "name", so I can decide to keep it, throw it away, or even stop the parsing (for speed purpose)
Now I also thought about "deserialization-in-chunks" using findUntil.. but the preceding "<unpredictable stuff>" makes it unreliable, there maybe similar named elements at different nested level etc (unless I write my own JSON parsing code, which defeats the purpose of using this library)- after all, the idea of an annotated JSON doc is so that the doc can be out of order, with additional things you don't care, etc...
Thanks for any idea.
The text was updated successfully, but these errors were encountered:
This is indeed a popular ask, see #2072, #1723, #1486, #1316, #1708.
I like the idea of a callback-based filter, but it was impossible to do with v6, so I never added it to the backlog.
Best regards,
Benoit
bblanchon
changed the title
filtering with very long array in a doc: only want a small number of the elements
Callback-base filter
Apr 2, 2024
bblanchon
changed the title
Callback-base filter
Callback-based filter
Apr 2, 2024
I would have thought this be a popular ask, but I didnt seem to find any discussion/solution
I am trying to absolutely minimize memory usage because the document has a huge array, the doc looks like this
{<unpredictable stuff>, "array":[{"name":<long strings>},...]}}
using a filter like this
filter["array"][0]["name"] = true;
I am still running out of memory, because, with this filter, the final product still contains all the "name"s (again, this is a very long array)
I wonder if there are other ways to more concisely filter... perhaps something like this (doesn't seem to work)
filter["array"][3-14]["name"] = true;
so I get only forth to fifteenth elements? 16 elements fits perfectly in memory vs all the elements in my case.
An alternative idea maybe a callback
filter["array"][]["name"] = callback;
so i can examine each time a "name" is found, perhaps the call back tells me the index and lets me know the "name", so I can decide to keep it, throw it away, or even stop the parsing (for speed purpose)
Now I also thought about "deserialization-in-chunks" using findUntil.. but the preceding "<unpredictable stuff>" makes it unreliable, there maybe similar named elements at different nested level etc (unless I write my own JSON parsing code, which defeats the purpose of using this library)- after all, the idea of an annotated JSON doc is so that the doc can be out of order, with additional things you don't care, etc...
Thanks for any idea.
The text was updated successfully, but these errors were encountered: