Callback-based filter #2076

1plaintext · 2024-03-31T23:56:34Z

I would have thought this be a popular ask, but I didnt seem to find any discussion/solution
I am trying to absolutely minimize memory usage because the document has a huge array, the doc looks like this
{<unpredictable stuff>, "array":[{"name":<long strings>},...]}}
using a filter like this
filter["array"][0]["name"] = true;
I am still running out of memory, because, with this filter, the final product still contains all the "name"s (again, this is a very long array)

I wonder if there are other ways to more concisely filter... perhaps something like this (doesn't seem to work)
filter["array"][3-14]["name"] = true;
so I get only forth to fifteenth elements? 16 elements fits perfectly in memory vs all the elements in my case.

An alternative idea maybe a callback
filter["array"][]["name"] = callback;
so i can examine each time a "name" is found, perhaps the call back tells me the index and lets me know the "name", so I can decide to keep it, throw it away, or even stop the parsing (for speed purpose)

Now I also thought about "deserialization-in-chunks" using findUntil.. but the preceding "<unpredictable stuff>" makes it unreliable, there maybe similar named elements at different nested level etc (unless I write my own JSON parsing code, which defeats the purpose of using this library)- after all, the idea of an annotated JSON doc is so that the doc can be out of order, with additional things you don't care, etc...

Thanks for any idea.

bblanchon · 2024-04-02T16:18:43Z

Hi @1plaintext,

This is indeed a popular ask, see #2072, #1723, #1486, #1316, #1708.
I like the idea of a callback-based filter, but it was impossible to do with v6, so I never added it to the backlog.

Best regards,
Benoit

1plaintext added the enhancement label Mar 31, 2024

bblanchon changed the title ~~filtering with very long array in a doc: only want a small number of the elements~~ Callback-base filter Apr 2, 2024

bblanchon changed the title ~~Callback-base filter~~ Callback-based filter Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback-based filter #2076

Callback-based filter #2076

1plaintext commented Mar 31, 2024

bblanchon commented Apr 2, 2024

Callback-based filter #2076

Callback-based filter #2076

Comments

1plaintext commented Mar 31, 2024

bblanchon commented Apr 2, 2024