-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds urlpattern to ada #381
Conversation
include/ada/urlpattern.h
Outdated
bool ignore_case = false; | ||
}; | ||
|
||
struct component_result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ada::component_result
seems a really vague naming, since it might be mistaken with ada::url_components
namespace ada::urlpattern { | ||
struct urlpattern_component_result { | ||
std::string_view input; | ||
std::unordered_map<std::string_view, std::optional<std::string_view>> groups; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire How's the performance comparison of using std::optional in here versus std::variant<std::nullopt, std::string_view>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect it matters much.
namespace ada::urlpattern { | ||
|
||
struct urlpattern_options { | ||
std::string_view delimiter = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string_view delimiter = ""; | |
std::string_view delimiter{}; |
|
||
std::u32string_view input; | ||
std::vector<token> token_list; | ||
size_t component_start = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t component_start = 0; | |
size_t component_start{0}; |
|
||
namespace ada::urlpattern { | ||
// https://wicg.github.io/urlpattern/#component | ||
struct urlpattern_component { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is already under ada::urlpattern namespace, why not call this struct component?
std::string final_utf8_url(utf8_size, '\0'); | ||
ada::idna::utf32_to_utf8(url.data(), url.size(), final_utf8_url.data()); | ||
|
||
if (ada::can_parse(final_utf8_url)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a todo in here to optimize the can_parse function?
ada_really_inline bool constructor_string_parser::is_group_open() { | ||
// If parser’s token list[parser’s token index]'s type is "open", then | ||
// return true. Else return false. | ||
return token_list[token_index].type == TOKEN_TYPE::OPEN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not safe. We should add development asserts to make sure token_index is smaller than token_list length.
while (index < input.size()) { | ||
size_t pos = input.find_first_of(U".+*?^${}()[]|/\\)"); | ||
if (pos == std::string_view::npos) { | ||
result = result += input.substr(index, input.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is weird.
// 1. Set type to "full-wildcard". | ||
type = PART_TYPE::FULL_WILDCARD; | ||
// 2. Set regexp value to the empty string. | ||
regexp_value.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are assigning and later clearing this value, which is creating performance degregation. We should eventually optimize it to reduce unnecessary allocations.
} | ||
|
||
// 3. Set token to the result of running try to consume a token given parser | ||
// and "asterisk". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a function like t.value_or("default val")
|
||
// If name token is null and token is null, then set token to the result of | ||
// running try to consume a token given parser and "asterisk". | ||
if (!name_token.has_value() && !regexp_or_wildcard.has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is fast path here. If name_token does not have a value, you dont need to try to consume token right?
Is there any way to crank 🔧 this up for nodejs/node#51060 needs? 🤔 |
Houston, do you read me? 📡 |
@bricss Are you available to help push this forward? |
Yes, with only one exception, I don't have big/deep experience with C++ coding 🤷♂️ atm 🙄 |
@bricss Lack of knowledge of C++ could be a problem. |
TL;DR
WIP!
So i started to implement https://wicg.github.io/urlpattern/
Seems that this is an work-in-progress spec, so maybe we'll have some freestyle stuff along the way of the implementation.
I'm using https://github.com/kenchris/urlpattern-polyfill and https://github.com/denoland/rust-urlpattern as references.
Don't know if there is a "right way" to do this, but I get the unicode identifier start/part ranges using the script:
++ something similar for the identifier-part with the regex
/[$_\u200C\u200D\p{ID_Continue}]/u
I got those regexes in https://github.com/kenchris/urlpattern-polyfill/blob/main/src/path-to-regex-modified.ts#LL70C1-L70C50
Then I used the ranges to create bitwise masks in the unicode.cpp.
Nothing is tested yet
It will took some time to finish!
ref nodejs/node#40844