Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html-rewriter: Support streaming content replacements #3211

Merged
merged 3 commits into from
Dec 20, 2024

Conversation

npaun
Copy link
Member

@npaun npaun commented Dec 5, 2024

lol-html allows you to insert replacement values in various places in a HTML stream. Previously, these replacement values were always strings. However, recently lol-html added support for streaming replacements in cloudflare/lol-html#229. lol-html requires the streams to produce valid UTF-8 data, but it may be written in arbitrary length chunks (even partial UTF-8 characters).

This PR adds support for replacement values that are ReadableStreams or Responses.

@npaun npaun requested a review from harrishancock December 5, 2024 04:39
@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch from 2ceeba1 to 25d56aa Compare December 5, 2024 04:47
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.h Outdated Show resolved Hide resolved
@@ -304,6 +307,8 @@ class EndTag final: public HTMLRewriter::Token {

private:
kj::Maybe<CType&> impl;
// TODO(npaun): is adding rewriter field like this legit?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say, not quite legit, because this could become dangling. The reason is that subclasses of Token are the C++-side implementations of JS wrapper objects, meaning JS can arbitrarily extend their lifetimes beyond Rewriter's lifetime. This is the reason there's the TokenScope guard which nullifies impl above.

Of course, checkToken() is called in all of the places we use this rewriter reference, so the code as written is safe, but it feels like leaving a footgun around.

It occurs to me that an alternative could be to make each Token subclass own their own HashMap of registered replacers. We would expect each Token's replacers to be dropped before its associated TokenScope ends, meaning we could have ~TokenScope() assert if there are any registered replacers left in the Token.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🙏🏻👍🏻

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dangling reference issue is fixed.
However, I didn't move the registered replacers to Token yet because it seems like the replacer callback can be invoked after their registering handler has exited. Happy to revisit this if you have any ideas to try later on.

@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch 4 times, most recently from df2e80f to 9154b3e Compare December 5, 2024 22:56
@npaun npaun marked this pull request as ready for review December 16, 2024 15:27
@npaun npaun requested review from a team as code owners December 16, 2024 15:27
@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch from 9154b3e to 64202f3 Compare December 16, 2024 15:28
deps/rust/cargo.bzl Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.h Outdated Show resolved Hide resolved
src/workerd/api/tests/htmlrewriter-test.js Show resolved Hide resolved
@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch 2 times, most recently from 62fabff to dabc3b6 Compare December 17, 2024 22:20
@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch from dabc3b6 to da5d326 Compare December 20, 2024 07:37
@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch from da5d326 to ce80785 Compare December 20, 2024 08:07
Copy link

github-actions bot commented Dec 20, 2024

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@npaun npaun requested a review from a team as a code owner December 20, 2024 08:18
src/workerd/api/html-rewriter.h Outdated Show resolved Hide resolved
src/workerd/api/html-rewriter.c++ Outdated Show resolved Hide resolved
src/workerd/api/tests/htmlrewriter-test.js Outdated Show resolved Hide resolved
@npaun npaun force-pushed the npaun/lol-html-streaming-replacement branch from d9535a3 to 33c3910 Compare December 20, 2024 16:37
@npaun npaun merged commit c77df01 into main Dec 20, 2024
15 checks passed
@npaun npaun deleted the npaun/lol-html-streaming-replacement branch December 20, 2024 16:51
fhanau added a commit that referenced this pull request Dec 27, 2024
This reverts commit c77df01, which is causing upstream test failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Support HTMLRewriter replacing HTML with content from a ReadableStream or Response
4 participants