Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated fast-json-stringify #736

Closed
Uzlopak opened this issue Aug 2, 2023 · 22 comments
Closed

updated fast-json-stringify #736

Uzlopak opened this issue Aug 2, 2023 · 22 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers question Further information is requested

Comments

@Uzlopak
Copy link

Uzlopak commented Aug 2, 2023

Can you please update the benchmarks?

Btw. I updated the serializer for asString and I think you are using a similar serializer for strings. Maybe you want to adapt it for more performance.
fastify/fast-json-stringify@17bb4c2#diff-8e3d45dd0e9ec504195499d7fafe3efc08756c349fce602ef6538c593aa563d8

Also the benchmarks are a little bit unfair. E.g. you are benching for stringify but fast-json-stringify is not only stringifying but also does some assertions, like checking for required fields.

@Uzlopak Uzlopak changed the title updates fast-json-stringify updated fast-json-stringify Aug 2, 2023
@samchon
Copy link
Owner

samchon commented Aug 3, 2023

https://dev.to/samchon/good-bye-typescript-is-ancestor-of-typia-20000x-faster-validator-49fi

Around the same time, I had made another library named typescript-json.

It performs AoT (Ahead of Time) compliation like typescript-is, but it was not for runtime validation, but for JSON schema generation. About JSON serialization boosting, typescript-json had utilized fast-json-stringify by automatically generated JSON schema.

For reference, purpose of typescript-json was to accomplish below nestia, generating Swagger Documents with pure TypeScript type.

Thanks for notification. As typia had started from wrapper of fast-json-stringify (at that time, package name was typescript-json), I'm still using string serializer of fast-json-stringify. I'll update string serializer following fast-json-stringify. Also, I'll perform benchmark again with latest version of fast-json-stringify.

By the way, typia has four stringify functions and assertStringify is simliar with fast-json-stringify. Benchmark result is showing all of those typia stringify functions (https://typia.io/docs/json/stringify/#performance), but as normal users don't know the difference, I can understand why you feel it as unfair. About this topic, how about adding a comment that fast-json-stringify is similar with typia.assertStringify() under the graph? If you want, you can edit https://github.com/samchon/typia/blob/master/website/pages/docs/json/stringify.mdx#L497-L507, and send a PR for that.

  • stringify
  • isStringify
  • assertStringify
  • validateStringify

@samchon samchon self-assigned this Aug 3, 2023
@samchon samchon added documentation Improvements or additions to documentation good first issue Good for newcomers enhancement New feature or request question Further information is requested labels Aug 3, 2023
@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Sure, I will provide a PR.

Please check if the patch for serializer for asString makes a perf boost. If you see a performance regression, than I would need to investigate.

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

You can measure benchmark of typia.stringify<T>() function by below commands.

# PREVIOUS SMALL-STRING
git clone https://github.com/samchon/typia -b features/original typia@original
cd typia@original
npm install
npm run benchmark

# ADVANCED SMALL-STRING
cd ..
git clone https://github.com/samchon/typia -b features/stringify typia@advanced
cd typia@advanced
npm install
npm run benchmark

By the way, as my benchmark program of stringify function handling composite types, it would much better to make a new dedicated benchmark function for only your asStringSmall() function.

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

https://github.com/samchon/typia/blob/features/stringify/test/issues/736.ts

I made a dedicated benchmark, and you can run it through:

git clone https://github.com/samchon/typia -b features/stringify typia@stringify
cd typia@stringify
npm install
npm run issue 736

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

@Uzlopak Fixed length to be 41, and new algorithm became slower than before

previous 6867.585931699803
advanced 6861.237658316788

When fix length to be 5, and new algorithm is still slower

previous 33501.367175249296
advanced 33126.283614757245

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Yes, the number 42 is super arbitrary. I asked @mcollina why 42, but it seems it was a random number.

It should be investigated when the breaking point is useful.

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

didnt you write that in a special case it was double the perf?

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

@Uzlopak Oops, I missed the STR_ESCAPE pattern checking.

After adjusting that, new algorithm became faster when length 41. However, length 5 became slower.

The 2x faster was by my mistake that skipping length > 41 checking.

# LENGTH 41
previous 5193.547705809435
advanced 5982.83813974943

# LENGTH 5
previous 23195.423057822234
advanced 20127.113157546624

samchon added a commit that referenced this issue Aug 3, 2023
@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Your code is different from ours.

Your code is
length > 42 and call JSON.stringify, str escape check and wrap in double quotes, then fallback to simple case

It has to be length < 42 do simple case, str escape check and wrap in double quotes, fallback to JSON.stringify

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

@Uzlopak Oops, did lots of mistake. I repeated it carefully, and this may be the final result.

Sorry for repeated mistakes. It was so confusing because I did another at the same time.

Tried sequence of if conditions, but it was not a matter.

# LENGTH 41
previous 6739.206407670302
advanced 19771.155382046658

# LENGTH 5
previous 30564.183105977772
advanced 32337.116211885812

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Can you investigate if 42 as string length is optimal?

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

Well, as regex format condition newly added, I should consider which string to be used.

Do you have any idea about it?

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Make a very string matching this regex /[a-z0-9]/+. At the end put a double quote "

samchon added a commit that referenced this issue Aug 3, 2023
@samchon
Copy link
Owner

samchon commented Aug 3, 2023

Normal characters

[email protected] issue
node test/issue 736-normal

Limit Native Optimized Gap
#10 13,231 35,170 165.82 %
#20 10,373 20,038 93.17 %
#30 8,009 13,359 66.81 %
#40 5,733 8,997 56.93 %
#50 5,005 7,741 54.65 %
#60 5,260 7,851 49.28 %
#70 4,536 5,658 24.76 %
#80 3,637 4,928 35.49 %
#90 3,364 4,207 25.06 %
#100 2,504 3,829 52.92 %
#200 1,913 1,932 1.02 %
#300 1,312 1,359 3.6 %
#400 1,052 1,006 -4.41 %
#500 826 857 3.74 %
#600 781 884 13.17 %
#700 761 780 2.49 %
#800 669 552 -17.5 %
#900 521 478 -8.19 %
#1,000 449 462 2.96 %

Surrounded by " characters

[email protected] issue
node test/issue 736-special

Limit Native Optimized Gap
#10 11,640 14,012 20.38 %
#20 9,499 14,581 53.5 %
#30 6,159 10,490 70.31 %
#40 6,104 9,964 63.24 %
#50 5,640 8,743 55.03 %
#60 5,454 7,924 45.29 %
#70 4,794 6,160 28.48 %
#80 3,978 4,740 19.15 %
#90 3,949 5,202 31.71 %
#100 3,170 4,363 37.65 %
#200 1,983 2,673 34.78 %
#300 1,323 1,569 18.64 %
#400 1,062 1,355 27.56 %
#500 907 1,151 26.99 %
#600 814 963 18.29 %
#700 694 847 22.17 %
#800 642 746 16.12 %
#900 592 652 10.15 %
#1,000 509 575 13.12 %

Only regex pattern

[email protected] issue
node test/issue 736-regex

Limit Native Optimized Gap
#10 12,645 28,811 127.84 %
#20 10,155 26,702 162.93 %
#30 8,256 22,639 174.21 %
#40 6,658 23,138 247.53 %
#50 5,996 19,395 223.45 %
#60 5,562 15,214 173.54 %
#70 4,349 13,053 200.17 %
#80 4,028 11,180 177.55 %
#90 3,683 9,873 168.04 %
#100 3,027 9,763 222.49 %
#200 1,904 5,624 195.35 %
#300 1,467 4,105 179.73 %
#400 1,102 3,325 201.69 %
#500 886 2,721 207.22 %
#600 773 2,318 199.76 %
#700 687 2,056 199.52 %
#800 610 1,751 187.34 %
#900 570 1,239 117.42 %
#1,000 483 1,752 262.94 %

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

You also run that command, and determine which length to be use.

In my opinion, the length 42 is reasonable because 50 seems like the diminishing margin.

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

Comparing regex and special case, current code seems reasonable.

Diminishing margin of manual serialization logic is about 40 to 50.

Also, even though target string over the 42 length, regex pattern extremely diminish the serialization time.

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Currently running the benchmarks:

fastify/fast-json-stringify#637

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

I'm just confused by only regex filtered case. It is even faster than optimized case when no special character exists.

When special character exists, advanced manual stringify logic is faster, so I'm considering below implementation.

How do you think about below code, @Uzlopak ?

export const $string = (str: string): string => {
    if (STR_ESCAPE.test(str) === false) 
        return `"${str}"`;
    
    if (str.length > 41)
        return JSON.stringify(str);

    ...OPTIMIZED LOGIC
}

@samchon
Copy link
Owner

samchon commented Aug 3, 2023

Currently running the benchmarks:

fastify/fast-json-stringify#637

Great enhancement on short string, but short string with double quote be decreased.

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

In my opinion:

The regex is theoretically always slower than processing every character in a for loop.
Doing first the regex, means that this is the geneeral bottleneck. even though the optimized logic is also handling the same unicoode and double quotes etc.

The length check seems to be the cheapest operation from all. So thats why I would do that one first.

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

benchmarking now your consideration

fastify/fast-json-stringify#637

@Uzlopak
Copy link
Author

Uzlopak commented Aug 3, 2023

Always a tradeoff. The expectation should be actually that short string without escape characters are the majority and the strings with escape characters are exception. Also short strings are more common than huge strings

So personally I think

export const $string = (str: string): string => {
    if (STR_ESCAPE.test(str) === false) 
        return `"${str}"`;
    
    if (str.length > 41)
        return JSON.stringify(str);

    ...OPTIMIZED LOGIC
}

is the better tradeoff regarding the benchmarks.

But somehow the benchmarks are counter intuitive

samchon added a commit that referenced this issue Aug 3, 2023
Adapt #736 - optimize JSON serializer of `string` type
@samchon samchon closed this as completed in 0eefd45 Aug 4, 2023
samchon added a commit that referenced this issue Aug 4, 2023
Fix #736 - bug of undefined `length` problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants