Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: implement whatwg's URLPattern spec #56452

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Jan 3, 2025

Work in progress. Opening to receive some early response. Not ready to land.

Co-authored-by: Daniel Lemire (@lemire)

Blocked

This is blocked from landing due to the old macOS machines we use in our infrastructure (cc @nodejs/build)

Notes:

  • Ada now requires C++20
  • URLPattern is now a global class.
  • URLPattern is also exposed in node:url module
  • Ada now enables exceptions just like V8. This is done because std::regex, the default regex library of C++ does not have any non-exception API surface like std::filesystem. The alternative to not enabling exceptions is to bundle Ada with a regex library or implementing it's own regex parser, which is too much work for URLPattern at this stage. Further Ada releases can support such changes to disable exceptions.

TODOs

  • Pass all web-platform tests
  • Release Ada v3 before landing this PR
  • Make sure to split all changes to multiple commits
  • Add @lemire as co-author to all commits
  • Land upstream pull-request implement URLPattern ada-url/ada#785
  • Add documentation for global and node:url module declarations.

cc @nodejs/cpp-reviewers

Fixes #40844

@anonrig anonrig requested review from jasnell and RafaelGSS January 3, 2025 16:07
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/security-wg
  • @nodejs/startup
  • @nodejs/url
  • @nodejs/web-standards

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jan 3, 2025
@@ -24,4 +24,4 @@ for (const item of items) {
leaks.push(item);
}

assert.deepStrictEqual(leaks, []);
assert.deepStrictEqual(leaks, ['URLPattern']);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@legendecas Is this correct? How can I make sure it is not exposed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interfaces exposed in lib/internal/bootstrap/web/exposed-wildcard.js should be annotated in the WebIDL as [Exposed=*].

https://urlpattern.spec.whatwg.org/#urlpattern-class defines that URLPattern is annotated as [Exposed=(Window,Worker)], so it should be exposed in lib/internal/bootstrap/web/exposed-window-or-worker.js.

Copy link
Member

@legendecas legendecas Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds why not exposing URLPattern as [Exposed=*]. whatwg/urlpattern#236 is the pending work to do it.

@targos targos added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jan 3, 2025
@anonrig anonrig added macos Issues and PRs related to the macOS platform / OSX. blocked PRs that are blocked by other issues or PRs. build-agenda labels Jan 3, 2025
@targos
Copy link
Member

targos commented Jan 3, 2025

Ada now enables exceptions just like UV and V8

Can you elaborate? libuv is a C library so I don't think exceptions exist there, and I'm pretty sure V8 is built with exceptions disabled.

@anonrig
Copy link
Member Author

anonrig commented Jan 3, 2025

Ada now enables exceptions just like UV and V8

Can you elaborate? libuv is a C library so I don't think exceptions exist there, and I'm pretty sure V8 is built with exceptions disabled.

My bad UV does not enable exceptions. Referencing v8.gyp file:

{
  'target_name': 'torque_base',
  'type': 'static_library',
  'toolsets': ['host', 'target'],
  'sources': [
    '<!@pymod_do_main(GN-scraper "<(V8_ROOT)/BUILD.gn"  "\\"torque_base.*?sources = ")',
  ],
  'dependencies': [
    'v8_shared_internal_headers',
    'v8_libbase',
  ],
  'defines!': [
    '_HAS_EXCEPTIONS=0',
    'BUILDING_V8_SHARED=1',
  ],
  'cflags_cc!': ['-fno-exceptions'],
  'cflags_cc': ['-fexceptions'],
  'xcode_settings': {
    'GCC_ENABLE_CPP_EXCEPTIONS': 'YES',  # -fexceptions
  },
  'msvs_settings': {
    'VCCLCompilerTool': {
      'RuntimeTypeInfo': 'true',
      'ExceptionHandling': 1,
    },
  },
}

@targos
Copy link
Member

targos commented Jan 3, 2025

This is not really V8. It's a build-time executable (torque) used to generate code for V8

@anonrig anonrig requested a review from Qard January 3, 2025 16:27
src/node_url_pattern.cc Outdated Show resolved Hide resolved
src/node_url_pattern.cc Outdated Show resolved Hide resolved
ada::url_pattern_options options{};
Local<Value> ignore_case;
if (obj->Get(env->context(),
FIXED_ONE_BYTE_STRING(env->isolate(), "ignoreCase"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding to env-properties.h


MaybeLocal<Value> URLPattern::Hash() const {
auto context = env()->context();
return ToV8Value(context, url_pattern_.get_hash());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key challenge here is that this will copy the string on every call. Any chance of memoizing the string once created.

URLPattern::URLPattern(Environment* env,
Local<Object> object,
ada::url_pattern&& url_pattern)
: BaseObject(env, object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely should introduce this as experimental in the first release, even if it graduates from experimental quickly. There should likely be a warning emitted on the first construction.

@@ -1571,6 +1572,7 @@ module.exports = {
toPathIfFileURL,
installObjectURLMethods,
URL,
URLPattern,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs added.

Comment on lines +398 to +391
if (!url_pattern->Protocol().ToLocal(&result)) {
return;
}
info.GetReturnValue().Set(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!url_pattern->Protocol().ToLocal(&result)) {
return;
}
info.GetReturnValue().Set(result);
if (url_pattern->Protocol().ToLocal(&result)) {
info.GetReturnValue().Set(result);
}

a bit more compact to invert the checks on these.

src/node_url_pattern.cc Outdated Show resolved Hide resolved
void URLPattern::New(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);

CHECK(args.IsConstructCall());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this constructor is exposed directly to users, this should throw an exception rather than abort

@jasnell
Copy link
Member

jasnell commented Jan 3, 2025

Can you also include a fairly simple benchmark?

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 80.99174% with 69 lines in your changes missing coverage. Please review.

Project coverage is 88.01%. Comparing base (ca69d0a) to head (af23313).
Report is 62 commits behind head on main.

Files with missing lines Patch % Lines
src/node_url_pattern.cc 81.07% 31 Missing and 36 partials ⚠️
src/node_url_pattern.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #56452      +/-   ##
==========================================
- Coverage   88.54%   88.01%   -0.54%     
==========================================
  Files         657      662       +5     
  Lines      190393   191249     +856     
  Branches    36552    36404     -148     
==========================================
- Hits       168582   168326     -256     
- Misses      14998    16031    +1033     
- Partials     6813     6892      +79     
Files with missing lines Coverage Δ
lib/internal/bootstrap/web/exposed-wildcard.js 99.14% <100.00%> (+0.01%) ⬆️
lib/internal/url.js 95.79% <100.00%> (-1.88%) ⬇️
lib/url.js 98.94% <100.00%> (-1.06%) ⬇️
src/node_binding.cc 83.66% <ø> (ø)
src/node_external_reference.h 100.00% <ø> (ø)
src/node_url_pattern.h 0.00% <0.00%> (ø)
src/node_url_pattern.cc 81.07% <81.07%> (ø)

... and 135 files with indirect coverage changes

BufferValue input_buffer(env->isolate(), args[0]);
CHECK_NOT_NULL(*input_buffer);
input = input_buffer.ToString();
} else if (args[0]->IsObject()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not part of spec but can we add a “is URL instance” fast path here? To avoid re-normalizing each URL property (as with the JsObject route)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can. I haven't started working on optimizations yet due to the inconsistencies between the spec and WPT.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is a good pattern to land in Node.js. Specifically, a server using this will create one per route and iterate in a loop. This will be slow, specifically if you need to match the last of the list.

(This feedback was provided when URLPattern was standardized and essentially ignored).

For this to be useful, we would need to have a Node.js-specific API to organize these URLPattern in a radix prefix trie and actually do the matching all at once.

I can possibly be persuaded that we need this for Web platform compatibility, but it’s not that popular either (unlike fetch()).

@mcollina
Copy link
Member

mcollina commented Jan 3, 2025

@jasnell I’ll try to build this and get a benchmark going against the ecosystem routers.

@anonrig
Copy link
Member Author

anonrig commented Jan 3, 2025

@jasnell I’ll try to build this and get a benchmark going against the ecosystem routers.

Right now, this pull-request does not pass WPT, and not at all optimized. Any benchmarks will not be beneficial.

@anonrig anonrig added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jan 3, 2025
@jasnell
Copy link
Member

jasnell commented Jan 3, 2025

I can possibly be persuaded that we need this for Web platform compatibility, but it’s not that popular either (unlike fetch()).

It's not a great spec but it is implemented in Deno, Bun, Workers, and browsers. We're looking at adding it to the WinterTC Common Minimum set. Node.js' lack of support is really the only thing holding that up right now. While I recognize that you don't like this particular spec, I think compat and consistency with other runtimes is a worthwhile goal here. We should land this then keep following up with optimizations and improvements. It certainly is not the first web platform api that is suboptimal from a performance pov

@jasnell
Copy link
Member

jasnell commented Jan 3, 2025

@nodejs/tsc ... please review and weigh in. There is definite demand for this API as we'e had a number of folks ask over the years. It's also implemented in other runtimes and is a web platform API standard. On the downside it's not the best pattern or most performant approach. Ideally we'd be able to resolve without referring to a vote. To be certain, we know this approach isn't the most performant so how it performs relative to other router impls should not be part of the consideration here.

@marco-ippolito
Copy link
Member

I'm +1 on adding it to Node.js, I like more WP compatibility and I feel like it's an api that could be useful for libraries.
I will not comment on the implementation details since I'm not an expert on the subject

Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have landed plenty of inefficient Web API's. Don't see why this would be different. I'm of the consistent opinion that we should implement them but strongly discourage their use for any performance sensitive code.

@mon-jai
Copy link

mon-jai commented Jan 4, 2025

Just out of curiosity, why can't we port the Chromium's implementation but instead need to reimplement this ourselves? (I am not a maintainer)

@jasnell
Copy link
Member

jasnell commented Jan 4, 2025

Just out of curiosity, why can't we port the Chromium's implementation but instead need to reimplement this ...

The chromium implementation depends on quite a few chromium/blink internals that we don't have here. That kind of reuse is difficult to just make happen. This ada-based implementation is standalone, has no other dependencies, etc. It's actually easier overall just to write it from scratch than to repurpose that other impl.

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 2 times, most recently from 65631d9 to 4e224f9 Compare January 4, 2025 17:08
@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 2 times, most recently from 4982d63 to 022eefd Compare January 4, 2025 19:40
@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from 022eefd to af23313 Compare January 5, 2025 16:39
@domenic
Copy link
Contributor

domenic commented Jan 6, 2025

One worry worth highlighting here is that, IIUC, the architecture of this PR will use a separate regexp library for the regexps that show up in the URLPattern constructor, vs. the ones that show up in the RegExp constructor.

That could be confusing for developers, as I assume the feature overlap will not be the same. For example, I wonder what results you would get after this PR for:

new URLPattern("https://example.com/:id([\\p{Decimal_Number}--[1-9]])").test("https://example.com/0");

new RegExp("[\\p{Decimal_Number}--[1-9]]", "v").test("0");

In Chromium we have architected our URLPattern implementation into two parts: one, liburlpattern, which is meant to have very few dependencies, and largely works to produce regexp pattern strings. The second part, in blink/renderer/core/url_pattern, implements more parts of the spec, including bridging to V8's regexp engine.

I suspect a similar architecture would not fit that well for you, given the different boundaries between Ada / Node.js. But maybe Ada could have some sort of regexp-matcher-callback system which would allow the use of V8's regexps instead of std::regex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. build-agenda lib / src Issues and PRs related to general changes in the lib or src directory. macos Issues and PRs related to the macOS platform / OSX. needs-ci PRs that need a full CI run. semver-major PRs that contain breaking changes and should be released in the next major version. tsc-agenda Issues and PRs to discuss during the meetings of the TSC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement URLPattern