Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esnext.json.parse is generating (memory-wise) heavier objects #1354

Open
Kosta-Github opened this issue Jul 24, 2024 · 12 comments
Open

esnext.json.parse is generating (memory-wise) heavier objects #1354

Kosta-Github opened this issue Jul 24, 2024 · 12 comments

Comments

@Kosta-Github
Copy link

Kosta-Github commented Jul 24, 2024

When using esnext.json.parse with a reviver function the deserialized objects are heavier (memory-wise) than the ones that are generated by the non-polyfilled JSON.parse() function.

How to reproduce:

// require("core-js/actual/json/parse");

// generate a large JSON string containing an array of 1_000_000 random numbers
const data = [];
for (let i = 0; i < 1_000_000; i++) {
    data.push('' + Math.floor(Math.random() * 100));
}
const json = JSON.stringify({ results: [data] });

// trivial reviver function that does nothing
const reviver = (key, value) => value;

let lastMemUsage = process.memoryUsage();
const results = [];
while (true) {
    const currentMemUsage = process.memoryUsage();
    console.log(
        `[mem] current: ${Math.floor(currentMemUsage.heapTotal / (1024*1024))}, ` +
        `delta: ${Math.floor((currentMemUsage.heapTotal - lastMemUsage.heapTotal) / (1024*1024))}`
    );
    lastMemUsage = currentMemUsage;

    const parsed = JSON.parse(json, reviver);

    // keep a reference to the parsed objects to prevent the GC from collecting them
    results.push(parsed);
}

Let the above script run for a while and observe the memory usage and delta when using unmodified JSON.parse(), which is something like:

[mem] current: 52, delta: 0
[mem] current: 88, delta: 35
[mem] current: 89, delta: 1
[mem] current: 120, delta: 30
[mem] current: 151, delta: 30
[mem] current: 182, delta: 30
[mem] current: 213, delta: 30
[mem] current: 127, delta: -86
[mem] current: 158, delta: 30
[mem] current: 189, delta: 30
[mem] current: 220, delta: 30
[mem] current: 251, delta: 30
[mem] current: 282, delta: 30
[mem] current: 313, delta: 31
[mem] current: 344, delta: 30
[mem] current: 375, delta: 30
[mem] current: 196, delta: -179
[mem] current: 227, delta: 31
[mem] current: 258, delta: 30
[mem] current: 289, delta: 30
[mem] current: 320, delta: 30
[mem] current: 350, delta: 30
[mem] current: 381, delta: 30
[mem] current: 412, delta: 30
[mem] current: 443, delta: 30
[mem] current: 474, delta: 30
[mem] current: 505, delta: 31
[mem] current: 536, delta: 30
[mem] current: 567, delta: 30
[mem] current: 598, delta: 30
...

When uncomment the first line and using the polyfilled JSON.parse() function the output looks like this:

[mem] current: 52, delta: 0
[mem] current: 202, delta: 149
[mem] current: 236, delta: 33
[mem] current: 256, delta: 20
[mem] current: 401, delta: 145
[mem] current: 557, delta: 155
[mem] current: 447, delta: -110
[mem] current: 505, delta: 57
[mem] current: 650, delta: 145
[mem] current: 806, delta: 156
[mem] current: 962, delta: 155
[mem] current: 712, delta: -250
[mem] current: 768, delta: 56
[mem] current: 826, delta: 57
[mem] current: 947, delta: 121
[mem] current: 1103, delta: 155
[mem] current: 1259, delta: 155
[mem] current: 1415, delta: 156
[mem] current: 1571, delta: 155
[mem] current: 1727, delta: 155
[mem] current: 1137, delta: -591
[mem] current: 1194, delta: 57
[mem] current: 1251, delta: 56
[mem] current: 1308, delta: 56
[mem] current: 1395, delta: 86
[mem] current: 1552, delta: 157
[mem] current: 1708, delta: 155
[mem] current: 1863, delta: 155
[mem] current: 2019, delta: 155
[mem] current: 2176, delta: 156
[mem] current: 2332, delta: 156
...

You can see that the memory usage is up to 4-5 times larger and growing way quicker.

$ node --version

v20.15.0
@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

It's JSON.parse source text access polyfill.

Sure, we can't implement it in JS as optimized as it can be done in JS engines natively.

If you have some proposals how to optimize this polyfill - feel free to open a PR.

If performance is critical for you - you could update your Node, it's available natively -> polyfill not installed from Node 21, or just exclude this module from your app if you don't use JSON.parse source.

@Kosta-Github
Copy link
Author

  • this is just a simplified example showing the behavior with a trivial reviver function
  • I am actually in need of a reviver function and context.source support
  • I cannot update to a newer node version, we need to stay on LTS versions within our company
  • it is not clear to my why the generated object hierarchies should consume more memory when using the trivial reviver function than without? there needs to be some additional state stored somewhere on the objects that is contributing to the increased memory usage

@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

it is not clear to my why the generated object hierarchies should consume more memory when using the trivial reviver function than without?

Because in your case without reviver is used native JSON.parse, not a polyfill.

@Kosta-Github
Copy link
Author

Sure, that is obvious.

The question is, why would the object tree generated by the polyfilled JSON.parse() allocate more/additional memory when used with the reviver function?

I am not concerned about potential additional memory usage during the parse operation, but about the additional memory usage that is kept alive and associated with the returned object hierarchy after the parse operation.

Say, you are parsing this JSON { "hello": "world" } with and without the trivial reviver function. Why should the result consume more memory when the reviver function was used?

@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

They have the same tree. Why do you think that's not? One more time - when you call JSON.parse with reviver, used polyfilled method, without - native.

@Kosta-Github
Copy link
Author

Kosta-Github commented Jul 24, 2024

They have the same tree. Why do you think that's not?

Because the memory consumption is higher if that tree was generated with the polyfilled parse() function.

when you call JSON.parse with reviver, used polyfilled method, without - native.

Again, I get that.

This does not explain, why the generated tree consumes more memory.
I am not talking about the memory consumption during the parsing.

Something like:

mem_used_by_object(polyfilled.parse(json)) >= 4 * mem_used_by_object(native.parse(json))

@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

Because the native JSON.parse is more optimized (including memory) than the polyfill? -) They have different representations of this tree in memory, most likely the ways of garbage collection, etc.

If you want it, you could dig into it and try to optimize it. For example, Context#source is the same string on all instances and theoretically should be optimized by modern engines and refer to one place in memory - but something could be wrong. Or regexes usage, which also is not free. Etc. However, some specific features, like descriptors edge cases, are almost impossible to optimize because of the JS nature.

@zloirock
Copy link
Owner

V8 JSON parser is a low-level C++ tool, it's strange to ask why JS implementation of this takes more memory.

@zloirock zloirock changed the title bug: esnext.json.parse is generating (memory-wise) heavier objects esnext.json.parse is generating (memory-wise) heavier objects Jul 24, 2024
@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

If you talk about result objects, not about the JSON three, I see only 2 answers: how GC works and descriptors usage -> result objects representation in memory, but that's required for the proper result. In both cases, I don't see how it can be optimized on the core-js side.

@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

I played with your example with --expose-gc flag and manual GC handling. Even in this case polyfilled method result object takes more memory than native. As an option, it can happen because the result array can be non-optimized.

@zloirock
Copy link
Owner

zloirock commented Jul 24, 2024

In Node where this feature is available natively, also is a difference in memory usage between cases with reviver and without - however, not so significant.

@zloirock
Copy link
Owner

As I wrote, it's not a bug - it's an issue of optimization for specific engines. As I wrote, if it's interesting for you, feel free to play with internal representations of objects in V8 and open a PR with optimization of this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants