Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add CurlSharePersistentHandle objects #16937

Merged
merged 19 commits into from
Jan 6, 2025

Conversation

ericnorris
Copy link
Contributor

@ericnorris ericnorris commented Nov 25, 2024

RFC: https://wiki.php.net/rfc/curl_share_persistence_improvement


This is an alternative to #15603, and both are relevant to https://wiki.php.net/rfc/curl_share_persistence.

During the voting period, @TimWolla provided some feedback on the internals mailing list:

Also badly chosen $persistent_ids might result in a
large number of handles accumulating, without any kind of garbage
collection. For the existing persistent handles (e.g. for database
connections), the ID is chosen by PHP itself, ensuring a somewhat
predictable behavior.

and:

Given the information regarding the TLS re-use, the cookie sharing is my
only remaining concern. In fact with cookie sharing disabled it might
not even be necessary for the user to choose an ID: Given that libcurl
does the heavy lifting, as a user I should only need a single share
handle and let libcurl figure out the details, no? A boolean “share
connections across requests” when initializing a CurlHandle should
probably sufficient.

After fixing up the original PR, I had some free time this past weekend to explore an alternative implementation that addressed these points.

In this implementation I've introduced a new function, curl_share_persistent_init(array $share_options), which returns a CurlSharePersistentHandle object. Under the hood, the object is identical to a CurlShareHandle, and can be used in curl_setopt. It cannot be used with curl_share_setopt, however, which ensures that the handle is immutable across requests - something the other PR could not do in a statically analyzable way.

The function uses the share options to construct a persistent ID internally, thus eliminating the need for a user to pick the persistent ID. Users will, however, need to ensure CURLOPT_MAXCONNECTS is set appropriately on a CurlHandle, since there will only be a single share handle per unique set of $share_options. Note that this was true in the other PR, but less important since you could pick a unique persistent ID.

The function disallows CURL_LOCK_DATA_COOKIE; I don't particularly feel strongly about this but I think it's easier to relax restrictions in the future compared to making them more strict.

The second commit introduces a read-only property to allow users to introspect the options set on a persistent share handle, but I could remove this commit if people felt it was unnecessary.

I'm looking for the following feedback:

  • Is this better than and preferred to feat: enable persistent CurlShareHandle objects #15603? The RFC for this PR was accepted, so this supersedes the prior PR.
  • In the second commit, I've exposed php_array_data_compare_unstable_i in order to avoid having to duplicate comparison logic. I see that collator_sort.c had a similar problem, but took the path of duplicating the logic. What is preferred?. I've tried a different approach for constructing the ->options property.
  • Is the "hash" function I used to create a unique persistent ID acceptable?

This commit introduces a new function, curl_persistent_share_init, that
creates a persistent curl share handle by storing it within a hash table
in the curl module's global variables.

Persisting a curl share handle allows PHP userspace to cache things like
DNS lookups or connections between multiple PHP requests, thus reducing
work for subsequent requests.

The function takes an array of CURL_LOCK_DATA_* constants, which are used
to either construct a new curl share handle with the given options, or
to find an existing one with the same set of options.

CURL_LOCK_DATA_COOKIE is not allowed here, however, as it is considered
unsafe; a developer could accidentally share a user's cookies from an
earlier request when handling a subsequent from a different user.
Notably, I needed to expose php_array_data_compare_unstable_i, in order
to avoid re-implementing a comparison function. ext/intl/collator_sort.c
chose to copy the implementation, and could maybe benefit from using
the exposed function instead.
I opted to use the existing Caddy testing infrastructure since it
supports keepalives, whereas it seems the PHP development server does
not.

Alternatively I could write just enough of a socket listener to confirm
that there is only ever a single connection coming from curl, but I
believe it is safe to rely on connect_time being zero when a connection
is reused.
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I find these changes much more agreeable. Unfortunately they are not what has been agreed on in the RFC, so we probably need a follow-up RFC, that I would happily vote in favor of.

ext/curl/curl.stub.php Outdated Show resolved Hide resolved
ext/curl/curl_private.h Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
@ericnorris
Copy link
Contributor Author

Thank you. I find these changes much more agreeable.

Thank you for the feedback in the first place! As I look more at this implementation over the other, I do think I prefer this one.

Unfortunately they are not what has been agreed on in the RFC, so we probably need a follow-up RFC, that I would happily vote in favor of.

I agree that this might not be exactly the letter of the RFC vote, but I do think it's in the spirit - the title of the RFC is just "Add persistent curl share handles".

That said, I'm open to creating another RFC, but I'm wary of being too noisy. I would need to send out another discussion email, etc. Is there precedent for the amount of leeway in post-RFC implementation? I follow the internals mailing list, but not the implementation PRs closely enough to know the answer.

@TimWolla
Copy link
Member

That said, I'm open to creating another RFC, but I'm wary of being too noisy. I would need to send out another discussion email, etc. Is there precedent for the amount of leeway in post-RFC implementation? I follow the internals mailing list, but not the implementation PRs closely enough to know the answer.

Your RFC was pretty specific about the proposed API (that's why I was so concerned during the discussion and would have preferred if the vote had failed). I don't think there's precedent for that amount of implementation leeway as done here and it required a mailing list email in all cases. The #[\SensitiveParameter] RFC had a change that was resolved by simple agreement, but that was a fix for behavior that clearly was non-sense: https://wiki.php.net/rfc/redact_parameters_in_back_traces#errata

In any case there's precedent for follow-up RFCs, due to issues found only after the vote started:

As both a user and a maintainer, I prefer some additional RFC noise over the pain that a subpar implementation would cause in the next decade.

@Girgias
Copy link
Member

Girgias commented Nov 27, 2024

I think this approach is vastly better than the initial implementation. So thank you!

One question about:

Users will, however, need to ensure CURLOPT_MAXCONNECTS is set appropriately on a CurlHandle, since there will only be a single share handle per unique set of $share_options

Did you mean CurlPersistentShareHandle instead of CurlHandle in that comment? And if yes, why not add a parameter to the init function int $max_connections so that one cannot forget to set this option?

ext/curl/share.c Outdated Show resolved Hide resolved
@ericnorris
Copy link
Contributor Author

Understood, @TimWolla, I'm working on writing an "improvement" RFC as we speak, and then I'll come back to this PR.

I think this approach is vastly better than the initial implementation. So thank you!

Thank you for taking a look @Girgias!

One question about:

Users will, however, need to ensure CURLOPT_MAXCONNECTS is set appropriately on a CurlHandle, since there will only be a single share handle per unique set of $share_options

Did you mean CurlPersistentShareHandle instead of CurlHandle in that comment? And if yes, why not add a parameter to the init function int $max_connections so that one cannot forget to set this option?

I believe I meant CurlHandle. It's a little unintuitive to me, but it seems like the max number of connections is specified on curl easy and multi handles, and not curl share handles. CURLOPT_MAXCONNECTS is a CURLOPT constant, meaning it's used in curl_setopt, and CURLMOPT_MAXCONNECTS is a CURLMOPT constant, meaning it's used in curl_multi_setopt; both are not usable with curl_share_setopt.

@ericnorris
Copy link
Contributor Author

I've created the RFC at https://wiki.php.net/rfc/curl_share_persistence_improvement. I'll begin working on the current suggested PR changes shortly.

@TimWolla
Copy link
Member

I've created the RFC at https://wiki.php.net/rfc/curl_share_persistence_improvement.

Thank you. I've had a read through it. The only remark I have is clarifying whether or not those handles can explicitly be closed to “reset” them. I can imagine both widening curl_share_close to also accept a persistent handle and adding a new curl_share_close_persistent function - and also not providing this option.

Other than that it's looking good to me.

@ericnorris
Copy link
Contributor Author

The only remark I have is clarifying whether or not those handles can explicitly be closed to “reset” them. I can imagine both widening curl_share_close to also accept a persistent handle and adding a new curl_share_close_persistent function - and also not providing this option.

Interesting. By "reset", do you mean that calling curl_share_close would destroy and re-initialize the persistent handle, as opposed to just closing it?

I think I would be against allowing a user to close a persistent share handle, since it would leave the object in an invalid state (the previous PR currently does allow this, however). Resetting would probably be fine, but it's not immediately clear to me when that would be useful.

I'd probably lean towards not providing the option for now, so I might add that under 'Rejected Features' and note that we could add support for this in the future?

- use hardcoded allow list for $share_options
- use zval_try_get_long
@ericnorris ericnorris changed the title feat: add CurlPersistentShareHandle objects feat: add CurlSharePersistentHandle objects Nov 27, 2024
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. Implementation overall LGTM. Thanks for the RFC.

ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, also the tests could try/catch so you don't print the whole stack trace.

ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
ext/curl/share.c Outdated Show resolved Hide resolved
@ericnorris
Copy link
Contributor Author

Thanks @TimWolla! Is there anything else needed from me before this can be merged? I'm up for helping with the documentation as well, but that process would be entirely new to me.

ext/curl/share.c Outdated Show resolved Hide resolved
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

ext/curl/share.c Outdated Show resolved Hide resolved
@nielsdos
Copy link
Member

nielsdos commented Jan 3, 2025

Is there anything else needed from me before this can be merged?

I don't think so. We might wait a bit on the code owner of the curl ext to see if he has comments, but he's not actively maintaining the curl ext right now so maybe he won't comment.

I'm up for helping with the documentation as well, but that process would be entirely new to me.

The documentation lives in a separate repo. See https://github.com/php/doc-en

@ericnorris
Copy link
Contributor Author

I don't think so. We might wait a bit on the code owner of the curl ext to see if he has comments...

Makes sense.

The documentation lives in a separate repo. See https://github.com/php/doc-en

Got it, I'll start a PR there and I assume we'll wait to merge that until this is merged.

ext/curl/share.c Outdated Show resolved Hide resolved
…s_prop`

also, only declare `zval *entry` where we need it.
ext/curl/tests/curl_persistent_share_003.phpt Outdated Show resolved Hide resolved
ext/curl/tests/curl_persistent_share_003.phpt Outdated Show resolved Hide resolved
ext/curl/tests/curl_persistent_share_005.phpt Show resolved Hide resolved
@TimWolla TimWolla changed the title feat: add CurlSharePersistentHandle objects RFC: Add CurlSharePersistentHandle objects Jan 6, 2025
@TimWolla TimWolla added the RFC label Jan 6, 2025
@TimWolla
Copy link
Member

TimWolla commented Jan 6, 2025

Just added UPGRADING/NEWS and merged the latest master. Will wait for the CI and then merge, any further changes can then be made as a follow-up. Thank you 😃

@TimWolla TimWolla merged commit d20880c into php:master Jan 6, 2025
10 checks passed
@TimWolla
Copy link
Member

TimWolla commented Jan 6, 2025

Now merged. Don't forget to move the RFC to the “Implemented” section and update it with the resulting implementation commit and status.

@ericnorris
Copy link
Contributor Author

Thanks @TimWolla and everyone else for the feedback! I'm happy with the result, and I hope others are too. I'll follow up with a documentation PR shortly.

@ericnorris ericnorris deleted the feat-persistent-curl-share-handle-alt branch January 6, 2025 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants