Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition can leads to egress connectivity #83

Open
XciD opened this issue Oct 2, 2023 · 14 comments
Open

Race condition can leads to egress connectivity #83

XciD opened this issue Oct 2, 2023 · 14 comments
Labels
strict mode Issues blocked on strict mode implementation

Comments

@XciD
Copy link

XciD commented Oct 2, 2023

We are testing the new features provided by this agent on one of our cluster (recently updated to 1.28).

We saw, that the pod connectivity is not fully ensure when a pod starts.

For example, this simple Pod will print:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: test
  name: test
spec:
  containers:
    - args:
        - http://portquiz.net:1023
      image: alpine/curl
      name: test
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test
spec:
  egress:
  - ports:
    - port: 53
      protocol: UDP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
  - ports:
    - endPort: 65535
      port: 1024
      protocol: TCP
    to:
    - ipBlock:
        cidr: 0.0.0.0/0
  podSelector:
    matchExpressions:
    - key: app
      operator: Exists
  policyTypes:
  - Egress
Port test successful!
Your IP: 44.208.xx.xx

If you wait a little before making the call, it's correctly bloqued.
You can see that on pod restart, policy works too.

@jayanthvn
Copy link
Contributor

@XciD - If I am not wrong you mean the IP is not getting blocked? If so it looks to be similar to this #58.

@XciD
Copy link
Author

XciD commented Oct 2, 2023

Yes, portquiz.net:1023 is not blocked. But if you sleep 2 seconds before the call, it's blocked

@jayanthvn
Copy link
Contributor

jayanthvn commented Oct 2, 2023

Ok. The default behavior is allow (default Kubernetes behavior) until the policy endpoints are reconciled i.e, the controller should identify the pods for the network policy and send the update downstream to the node agent to enforce the policy.

We are exploring strict mode option to have default block/deny until policy endpoints reconcile - Ref - aws/containers-roadmap#1478 (comment)

@XciD
Copy link
Author

XciD commented Oct 2, 2023

It also mean that if the node is under cpu pressure, it can take more than 2 sec to enforce security policies.

@jayanthvn
Copy link
Contributor

@XciD have you tried less than 2 seconds sleep or is the reconciliation consistently taking around 2 seconds?

@XciD
Copy link
Author

XciD commented Oct 2, 2023

I've tried with 1s but my e2e test fail.
(We have full CI/CD tests suit over our production cluster)

This code with our calico + aws cni fails immediatly

import os
import requests

try:
    requests.get("http://portquiz.net:1023", timeout=2)
except:
    print("error")
    os._exit(1)

With the new cluster, without sleep or sleep < 2 before the request, it fails

@wiseelf
Copy link

wiseelf commented Nov 2, 2023

@XciD i have almost similar issue #73 in my case it blocks already established connection when netpol is applied. Sleep also solves that issue. But I wouldn't call it a solution :)

@Mohsen51
Copy link

Mohsen51 commented Dec 1, 2023

Got the same issue, would be great to implement a strict mode that would force pod to wait until the network policy agent configures well the pod !

@jdn5126
Copy link
Contributor

jdn5126 commented Dec 26, 2023

Strict mode implementation is still in progress. Will provide an update on this ticket when PRs are available

@allamand
Copy link

I’m not sure strict mode would be the solution here has it would still take some time for the reconciliation to happened. What about introducing podreadinessgate that would flag pod ready only when the netpol reconciliation has happened ?

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 16, 2024

@allamand that is what strict mode does. The pod is not marked as Ready until Network Policies have been applied and properly reconciled.

@jdn5126 jdn5126 added the strict mode Issues blocked on strict mode implementation label Feb 16, 2024
@allamand
Copy link

@jdn5126 ok this is nice, thanks. Any ETA to share ?

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 20, 2024

@jdn5126 ok this is nice, thanks. Any ETA to share ?

#209 is the PR, and there are some accompanying VPC CNI changes in aws/amazon-vpc-cni-k8s#2790, but I am not sure what the ETA is. I think sometime in Q2

@achevuru
Copy link
Contributor

achevuru commented Jun 3, 2024

Strict mode is now available. Let us know if that helps with the above use case/issue..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strict mode Issues blocked on strict mode implementation
Projects
None yet
Development

No branches or pull requests

7 participants