-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy: Do not store policy reference in Cilium socket option #1010
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jrajahalme
force-pushed
the
debug-build-for-testing
branch
4 times, most recently
from
November 22, 2024 14:57
07d4662
to
080d03a
Compare
jrajahalme
force-pushed
the
debug-build-for-testing
branch
from
November 22, 2024 16:02
080d03a
to
2962755
Compare
jrajahalme
changed the title
bazel: debug build
policy: Remove policy reference from SocketOption
Nov 22, 2024
jrajahalme
added
bug
Something isn't working
needs-backport/1.30
and removed
wip
work-in-progress, no need to review
labels
Nov 22, 2024
jrajahalme
force-pushed
the
debug-build-for-testing
branch
from
November 24, 2024 08:20
c6111e0
to
63d0b4a
Compare
jrajahalme
changed the title
policy: Remove policy reference from SocketOption
SocketOption: Only store a weak policy reference
Nov 24, 2024
Include also sni test not having "l7" in it's name, and check the logs for errors as well. Test with embedded cilium-envoy so that Envoy error and warning logs will also be checked for. Signed-off-by: Jarno Rajahalme <[email protected]>
Add a runtime assert that the policy destruction happens from the main or test thread. The "test thread" case also covers the destruction from the initial thread, which happens for the static members of 'AllowAllEgressPolicyInstanceImpl' when running `cilium-envoy version', for example. Signed-off-by: Jarno Rajahalme <[email protected]>
jrajahalme
force-pushed
the
debug-build-for-testing
branch
from
November 24, 2024 09:35
63d0b4a
to
1810b93
Compare
jrajahalme
changed the title
SocketOption: Only store a weak policy reference
policy: Do not store policy reference in socket_option
Nov 24, 2024
jrajahalme
force-pushed
the
debug-build-for-testing
branch
from
November 24, 2024 09:38
1810b93
to
a94e8f2
Compare
jrajahalme
force-pushed
the
debug-build-for-testing
branch
from
November 24, 2024 09:53
a94e8f2
to
2824f9e
Compare
Remove the member 'initial_policy_' from the Cilium SocketOption class, as that reference was possibly kept after the policy had already been deleted. This made it possible for the policy to be destructed from the worker thread, which can lead to Envoy crash. 'initial_policy_' was stored as we already did the policy lookup and the same policy is needed in other Cilium filters. Have the cilium.network and cilium.tls_wrapper filters do their own policy lookups instead, so that we do not need to keep the reference in SocketOption. Worker threads do the policy lookup from their own thread local maps, which hold a reference to the policy. These thread local references are relinquished from post function during policy update, which will release worker thread's last reference to the policy at the time. The main thread is the last one to update or delete policy, so the policy instance destruction happens from the main thread. For this to work it is imperative to only hold policy references in these thread local maps. Note that even keeping a weak reference accessible to the worker threads outside of the policy maps is risky, as then the worker thread could convert that weak reference to a shared pointer while the reference has already been relinquished from the thread's local policy map. In this situation the other threads could also relinquish their references concurrently to the worker thread holding the shared pointer, which would then become the last reference and destruction would happen from the worker thread again. Signed-off-by: Jarno Rajahalme <[email protected]>
jrajahalme
force-pushed
the
debug-build-for-testing
branch
from
November 24, 2024 09:54
2824f9e
to
78b3394
Compare
jrajahalme
changed the title
policy: Do not store policy reference in socket_option
policy: Do not store policy reference in Cilium socket option
Nov 24, 2024
sayboras
approved these changes
Nov 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified the stack trace before and after the changes, nice find on socket option 🎖️.
Two comments as per below, but not a blocker.
This was referenced Nov 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove the member
initial_policy_
from the CiliumSocketOption
class, as that reference was possibly kept after the policy had already been deleted. This made it possible for the policy to be destructed from the worker thread, which can lead to Envoy crash.initial_policy_
was stored as we already did the policy lookup and the same policy is needed in other Cilium filters. Have thecilium.network
andcilium.tls_wrapper
filters do their own policy lookups instead, so that we do not need to keep the reference inSocketOption
.Worker threads do the policy lookup from their own thread local maps, which hold a reference to the policy. These thread local references are relinquished from post function during policy update, which will release worker thread's last reference to the policy at the time. The main thread is the last one to update or delete policy, so the policy instance destruction happens from the main thread. For this to work it is imperative to only hold policy references in these thread local maps.
Note that even keeping a weak reference accessible to the worker threads outside of the policy maps is risky, as then the worker thread could convert that weak reference to a shared pointer while the reference has already been relinquished from the thread's local policy map. In this situation the other threads could also relinquish their references concurrently to the worker thread holding the shared pointer, which would then become the last reference and destruction would happen from the worker thread again.