Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The algorithm" is inconsistent, buffering nuances #5

Open
ghost opened this issue Jun 14, 2024 · 0 comments
Open

"The algorithm" is inconsistent, buffering nuances #5

ghost opened this issue Jun 14, 2024 · 0 comments

Comments

@ghost
Copy link

ghost commented Jun 14, 2024

I enjoyed the write-up.

The first algorithm does a table[state][c]; (i.e. table is UCHAR_MAX; and you will be in trouble if the platform uses signed char and your code does it correctly) while the latter is does an indirect lookup via columns table[state][column[c]];. Neither are wrong, of course, but it's confusing that they are inconsistent.

node.js's http_parser uses a callback model with a function call overhead per token. It still has to buffer at least enough to be able to at least emit a token. Buffering all tokens and the whole request is more or less equivalent. You contrast that with "reading the entire header in and buffering it". I think you overlook that Ethernet data arrive in packets so you may very well have the whole header in the first packet. If this is production server you have to apply size constraints, otherwise your parser may do a lot of work only to realize that you are under a denial of service (DoS) attempt. Either do the benchmark, or avoid extrapolating one data set to another case that is not directly comparable.

Whenever I explored the "Pointer arithmetic" point I have found that gcc generated identical assembler. My conclusion was to write what is easier to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants