Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.write(Buffer) support #174

Merged
merged 7 commits into from
Oct 3, 2023
Merged

.write(Buffer) support #174

merged 7 commits into from
Oct 3, 2023

Conversation

AVVS
Copy link
Contributor

@AVVS AVVS commented May 16, 2023

mashed up quick support for .write(chunk: Buffer), works rather well as long as input is a Buffer, in case it is a utf8 string performance degradation is quite bad. See #173 for why it might be needed

if raw buffer is something that is worth exploring further then I'd be happy to put more effort into having fast paths for both types of encoding, probably by preselecting mode of operation at the time of SonicBoom instantiation

benchSonic*1000: 2.189s
benchSonicSync*1000: 8.190s
benchSonic4k*1000: 2.147s
benchSonicSync4k*1000: 1.827s
benchCore*1000: 2.561s
benchConsole*1000: 4.835s
benchSonicBuf*1000: 840.388ms
benchSonicSyncBuf*1000: 6.882s
benchSonic4kBuf*1000: 824.605ms
benchSonicSync4kBuf*1000: 481.976ms
benchCoreBuf*1000: 824.466ms
benchSonic*1000: 2.105s
benchSonicSync*1000: 8.275s
benchSonic4k*1000: 2.179s
benchSonicSync4k*1000: 1.811s
benchCore*1000: 2.220s
benchConsole*1000: 5.287s
benchSonicBuf*1000: 856.635ms
benchSonicSyncBuf*1000: 6.852s
benchSonic4kBuf*1000: 849.188ms
benchSonicSync4kBuf*1000: 480.347ms
benchCoreBuf*1000: 802.708ms

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have found why this supports only strings.

I think supporting buffers might be helpful but I don't think it's worthwhile - you might as well use Node.js core fs streams instead. I don't expect any solution would outperform them by a significant margin.

return bufs[0]
}

return Buffer.concat(bufs, len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the source of the slowdown. We should not merge them, but rather keep them as a list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried having a list here, but tests that expect single flush instead of multiples break. Will work on it further now that there are 2 separate content modes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all Buffer.concat seem to be cheaper than multiple fs.write as long as it doesn't have to allocate memory outside of the Buffer.poolSize. Will investigate further how that could be avoided

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried fs.writev to avoid concat, but that made it slower than writeStream

@AVVS
Copy link
Contributor Author

AVVS commented May 16, 2023

benchSonicBuf*1000: 595.338ms

benchSonic4kBuf*1000: 590.645ms

benchCoreBuf*1000: 1.073s

There is at least a 2x speed up on my system with this vs core streams when writing buffers, plus proper error handling, so might be worth it, I'll see what the perf without merging small buffers is and whether it's possible to keep the exact performance profile for utf8

@AVVS
Copy link
Contributor Author

AVVS commented May 16, 2023

Added contentMode setting. Not sure if the approach I've selected is the best from maintenance standpoint, but benchmarks are ok

benchSonic*1000: 428.597ms
benchSonicSync*1000: 6.054s
benchSonic4k*1000: 408.553ms
benchSonicSync4k*1000: 240.762ms

benchCore*1000: 2.785s
benchConsole*1000: 4.752s

benchSonicBuf*1000: 468.216ms
benchSonicSyncBuf*1000: 6.271s
benchSonic4kBuf*1000: 458.785ms
benchSonicSync4kBuf*1000: 343.907ms
benchCoreBuf*1000: 1.081s

benchSonic*1000: 407.736ms
benchSonicSync*1000: 6.123s
benchSonic4k*1000: 403.239ms
benchSonicSync4k*1000: 236.555ms

benchCore*1000: 2.768s
benchConsole*1000: 5.307s

benchSonicBuf*1000: 455.687ms
benchSonicSyncBuf*1000: 6.225s
benchSonic4kBuf*1000: 454.69ms
benchSonicSync4kBuf*1000: 334.973ms
benchCoreBuf*1000: 1.079s

@AVVS
Copy link
Contributor Author

AVVS commented May 16, 2023

Master branch perf for comparison, so utf8 mode (default one) seem to be unaffected by changes

benchSonic*1000: 452.573ms
benchSonicSync*1000: 6.064s
benchSonic4k*1000: 429.246ms
benchSonicSync4k*1000: 258.853ms
benchCore*1000: 2.722s
benchConsole*1000: 4.725s

benchSonic*1000: 422.106ms
benchSonicSync*1000: 6.054s
benchSonic4k*1000: 426.696ms
benchSonicSync4k*1000: 253.348ms
benchCore*1000: 2.724s
benchConsole*1000: 5.305s

Only downside I see is that I have copied non-buffer related functions and changed logic there, so if any logic changes in the future then there are multiple places where this needs to be addressed

@mcollina
Copy link
Member

@mmarchini could you take a look as well?

@mmarchini
Copy link
Collaborator

Sorry, was drowned in GitHub Notifications for months. I'll take a look at it this week.

Copy link
Collaborator

@mmarchini mmarchini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks fine, I didn't review the tests yet. Main concern as you mentioned is code duplication, so it'll be up to the project to decide if taking the burden of having some logic implemented in two places is worth it for this feature.

As for the performance aspects of this, I can see concat being faster than multiple writes, although that'll be highly dependent on system specs. Can you share specs for the machine where you ran your benchmarks?

index.js Outdated
@@ -56,7 +57,11 @@ function openFile (file, sonic) {

// start
if (!sonic._writing && sonic._len > sonic.minLength && !sonic.destroyed) {
actualWrite(sonic)
if (sonic.contentMode === 'buffer') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if contentMode as an enum instead of a string would make more sense. Either way, this should be included in types/index.d.ts and should be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea!

index.js Outdated
actualWriteBuffer(sonic)
} else {
actualWrite(sonic)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call _actualWrite instead of having an if here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we assign it as sonic._actualWrite I think it's possible to do so. At the same time these are not hot paths, so not sure if it's worth it

Copy link
Collaborator

@mmarchini mmarchini Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was thinking of using _actualWrite for maintainability rather than performance (V8 would take care of ensuring these ifs are performant so I'm not worried about that in from a perf standpoint)

index.js Outdated
actualWriteBuffer(this)
} else {
actualWrite(this)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, can't we call _actualWrite

@AVVS
Copy link
Contributor Author

AVVS commented Jun 20, 2023

overall looks fine, I didn't review the tests yet. Main concern as you mentioned is code duplication, so it'll be up to the project to decide if taking the burden of having some logic implemented in two places is worth it for this feature.

As for the performance aspects of this, I can see concat being faster than multiple writes, although that'll be highly dependent on system specs. Can you share specs for the machine where you ran your benchmarks?

Chip Apple M1 Max
Memory 64 GB

@AVVS
Copy link
Contributor Author

AVVS commented Jun 29, 2023

Updates incoming, in terms of bench

Node 20 performs ~25% better, Node 18 ~ 10%

node 20.3.1

benchSonic*1000: 411.93ms
benchSonicSync*1000: 6.231s
benchSonic4k*1000: 403.094ms
benchSonicSync4k*1000: 231.349ms
benchCore*1000: 1.900s
benchConsole*1000: 6.654s
benchSonicBuf*1000: 479.036ms
benchSonicSyncBuf*1000: 6.643s
benchSonic4kBuf*1000: 479.021ms
benchSonicSync4kBuf*1000: 338.352ms
benchCoreBuf*1000: 647.737ms
benchSonic*1000: 402.104ms
benchSonicSync*1000: 6.381s
benchSonic4k*1000: 411.276ms
benchSonicSync4k*1000: 230.133ms
benchCore*1000: 1.879s
benchConsole*1000: 7.073s
benchSonicBuf*1000: 488.614ms
benchSonicSyncBuf*1000: 6.739s
benchSonic4kBuf*1000: 468.49ms
benchSonicSync4kBuf*1000: 346.683ms
benchCoreBuf*1000: 657.002ms

node 18.16.0

benchSonic*1000: 435.906ms
benchSonicSync*1000: 6.272s
benchSonic4k*1000: 437.36ms
benchSonicSync4k*1000: 252.272ms
benchCore*1000: 2.252s
benchConsole*1000: 4.260s
benchSonicBuf*1000: 799.471ms
benchSonicSyncBuf*1000: 6.771s
benchSonic4kBuf*1000: 796.44ms
benchSonicSync4kBuf*1000: 645.533ms
benchCoreBuf*1000: 893.629ms
benchSonic*1000: 418.545ms
benchSonicSync*1000: 6.380s
benchSonic4k*1000: 417.882ms
benchSonicSync4k*1000: 245.55ms
benchCore*1000: 2.214s
benchConsole*1000: 5.374s
benchSonicBuf*1000: 787.246ms
benchSonicSyncBuf*1000: 6.849s
benchSonic4kBuf*1000: 822ms
benchSonicSync4kBuf*1000: 638.927ms
benchCoreBuf*1000: 888.464ms

@AVVS
Copy link
Contributor Author

AVVS commented Jun 29, 2023

@mmarchini please take a look if the changes are sufficient based on your comments

@@ -106,12 +110,35 @@ function SonicBoom (opts) {
this.maxLength = maxLength || 0
this.maxWrite = maxWrite || MAX_WRITE
this.sync = sync || false
this.writable = true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to mention, this is so that node.js stream util compose function works properly with this

@mcollina
Copy link
Member

wow, quite a speedup also on writing things compared to core.

@AVVS
Copy link
Contributor Author

AVVS commented Jun 29, 2023

Would be interesting to bench this with io_uring support as fs operations should be much faster on linux

Some weirdness is going on with win/node-14 combo during dep install

@mcollina
Copy link
Member

Some weirdness is going on with win/node-14 combo during dep install

GHA is broken on that platform, just exclude it in the workflow file.

@mmarchini
Copy link
Collaborator

I'll try to take a look over the weekend :)

Would be interesting to bench this with io_uring support as fs operations should be much faster on linux

I have a Linux machine I can try it on. Will report back with the results

@AVVS
Copy link
Contributor Author

AVVS commented Aug 4, 2023

anything I should do with this to move it forward? :)

@jsumners jsumners requested review from mmarchini and mcollina August 28, 2023 16:22
@mcollina
Copy link
Member

@mmarchini could you take a quick look? I would prefer if this did not break you. Otherwise I'm ok to ship it in the coming weeks.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina changed the title PoC: .write(Buffer) support, related to #173 .write(Buffer) support Oct 3, 2023
@mcollina mcollina merged commit 2764745 into pinojs:master Oct 3, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants