Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-Cluster Service Connectivity Fails with "Host is Unreachable" Despite Successful DNS Resolution in Submariner GlobalNet Setup #3204

Closed
aswinayyolath opened this issue Oct 31, 2024 · 42 comments
Assignees
Labels
bug Something isn't working datapath Datapath related issues or enhancements flannel flannel CNI

Comments

@aswinayyolath
Copy link

aswinayyolath commented Oct 31, 2024

What happened:
I deployed Submariner with GlobalNet across two Kubernetes clusters. DNS resolution works as expected, but connectivity to services across clusters fails with a Host is unreachable error.

More info is available in below link

https://kubernetes.slack.com/archives/C010RJV694M/p1730390376380879

What you expected to happen:
curl requests from a pod in cluster2 to a service exposed via Submariner in cluster1 should succeed, indicating that cross-cluster communication is functioning.

How to reproduce it (as minimally and precisely as possible):

  • Set up two Kubernetes clusters and deploy Submariner with GlobalNet enabled.
  • cluster1 GlobalNet CIDR: 242.0.0.0/16
  • cluster2 GlobalNet CIDR: 243.0.0.0/16
  • Deploy an nginx pod in cluster1, expose it as a service, and export it using Submariner.
  • Deploy a test pod (tmp-shell) in cluster2.
  • Attempt to access the nginx service in cluster1 from tmp-shell in cluster2 using DNS (nginx-cluster1.default.svc.clusterset.local) or the resolved GlobalNet IP.

Anything else we need to know?:

Environment:

  • Diagnose information (use subctl diagnose all):
Cluster 1 info
Aswin 🔥🔥🔥 $ subctl diagnose all --kubeconfig /Users/aswina/Downloads/sub1
Cluster "sub1"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.30.6" is supported

 ✗ Globalnet deployment detected - checking that globalnet CIDRs do not overlap
 ✗ Error getting the Broker's REST config: error getting auth rest config: Get "https://9.66.245.122:6443/apis/submariner.io/v1/namespaces/submariner-k8s-broker/clusters/any": tls: failed to verify certificate: x509: "kube-apiserver" certificate is not trusted

 ⚠ Checking Submariner support for the CNI network plugin
 ⚠ Submariner could not detect the CNI network plugin and is using ("generic") plugin. It may or may not work.
 ✓ Checking gateway connections
 ✗ Checking route agent connections
 ✗ Connection to cluster "cluster2" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "cluster2",
    "cable_name": "submariner-cable-cluster2-10-21-82-227",
    "healthCheckIP": "243.0.255.254",
    "hostname": "sub2-worker-1.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "10.21.82.227",
    "public_ip": "129.41.87.3",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "cluster2" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "cluster2",
    "cable_name": "submariner-cable-cluster2-10-21-82-227",
    "healthCheckIP": "243.0.255.254",
    "hostname": "sub2-worker-1.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "10.21.82.227",
    "public_ip": "129.41.87.3",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "cluster2" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "cluster2",
    "cable_name": "submariner-cable-cluster2-10-21-82-227",
    "healthCheckIP": "243.0.255.254",
    "hostname": "sub2-worker-1.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "10.21.82.227",
    "public_ip": "129.41.87.3",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✗ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✗ The tcpdump output from the sniffer pod does not contain the expected remote endpoint IP 243.0.0.0. Please check that your firewall configuration allows UDP/4800 traffic. Actual pod output:
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

 ✓ Checking that Globalnet is correctly configured and functioning

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.19.0

Aswin 🔥🔥🔥 $
Cluster 2 info
Aswin 🔥🔥🔥 $ subctl diagnose all --kubeconfig /Users/aswina/Downloads/sub2
Cluster "sub2"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.30.6" is supported

 ✗ Globalnet deployment detected - checking that globalnet CIDRs do not overlap
 ✗ Error getting the Broker's REST config: error getting auth rest config: Get "https://9.66.245.122:6443/apis/submariner.io/v1/namespaces/submariner-k8s-broker/clusters/any": tls: failed to verify certificate: x509: "kube-apiserver" certificate is not trusted

 ⚠ Checking Submariner support for the CNI network plugin
 ⚠ Submariner could not detect the CNI network plugin and is using ("generic") plugin. It may or may not work.
 ✓ Checking gateway connections
 ✗ Checking route agent connections
 ✗ Connection to cluster "cluster1" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"242.0.255.254\"",
  "spec": {
    "cluster_id": "cluster1",
    "cable_name": "submariner-cable-cluster1-10-21-101-75",
    "healthCheckIP": "242.0.255.254",
    "hostname": "sub1-worker-1.fyre.ibm.com",
    "subnets": [
      "242.0.0.0/16"
    ],
    "private_ip": "10.21.101.75",
    "public_ip": "129.41.87.4",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "cluster1" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"242.0.255.254\"",
  "spec": {
    "cluster_id": "cluster1",
    "cable_name": "submariner-cable-cluster1-10-21-101-75",
    "healthCheckIP": "242.0.255.254",
    "hostname": "sub1-worker-1.fyre.ibm.com",
    "subnets": [
      "242.0.0.0/16"
    ],
    "private_ip": "10.21.101.75",
    "public_ip": "129.41.87.4",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "cluster1" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"242.0.255.254\"",
  "spec": {
    "cluster_id": "cluster1",
    "cable_name": "submariner-cable-cluster1-10-21-101-75",
    "healthCheckIP": "242.0.255.254",
    "hostname": "sub1-worker-1.fyre.ibm.com",
    "subnets": [
      "242.0.0.0/16"
    ],
    "private_ip": "10.21.101.75",
    "public_ip": "129.41.87.4",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✗ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✗ The tcpdump output from the sniffer pod does not contain the expected remote endpoint IP 242.0.0.0. Please check that your firewall configuration allows UDP/4800 traffic. Actual pod output:
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

 ✓ Checking that Globalnet is correctly configured and functioning

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.19.0
  • Gather information (use subctl gather):

sub1.zip

sub2.zip

  • Cloud provider or hardware configuration:

K8S is installed on ubuntu VM

OS INFO

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@sub2-master:~# kubectl version
Client Version: v1.30.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.6
root@sub1-master:~# kubectl get networkpolicies --all-namespaces
No resources found
root@sub1-master:~#
@aswinayyolath aswinayyolath added the bug Something isn't working label Oct 31, 2024
@yboaron
Copy link
Contributor

yboaron commented Nov 3, 2024

Thanks for reaching out @aswinayyolath.

A. As mentioned in Slack discussion, inter-cluster libreswan tunnel is up and communication between gw nodes is fine while communication from non-GW node to gw node is failing.

further dapath investigation is needed here, I assume that for some reason (maybe infra firewall, connection tracking) ingress packet is being dropped in gwnode@clusterX to nongwnode@clusterX segment.

Can you please run ping from non-gw node@sub1 to gw-node@sub2 (for gw-node@sub2 IP address you should use endpoint healthcheck IP == 242.0.255.254) and tcpdump the gw node and non-gw node on cluster sub1 ?

B. Also, this is not relevant to datapath issue, but I noticed that Submariner detected the CNI as generic instead of flannel, Submariner uses this code to discover network details for flannel CNI.
Can you share please the daemonsets list from kube-system namespace ?
and if one of those daemonset's name contains the string 'flannel' , share also the volume list of this ds (and if exists volume/configmap with name containing 'flannel' substring share its content)

@yboaron yboaron added flannel flannel CNI datapath Datapath related issues or enhancements labels Nov 3, 2024
@yboaron yboaron added this to Backlog Nov 3, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Backlog Nov 3, 2024
@aswinayyolath
Copy link
Author

Please note: I have to create a new cluster as I messed up the old one trying various stuffs

Endpoint health check IP for the gateway node in sub2 : 243.0.255.254

kubectl get endpoint cluster2-submariner-cable-cluster2-10-21-3-236 -n submariner-operator -o jsonpath='{.spec.healthCheckIP}'

243.0.255.254

GW node of cluster 1

root@st-1-master:~# kubectl get nodes -l submariner.io/gateway=true -o wide
NAME                         STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
st-1-worker-1.fyre.ibm.com   Ready    <none>   10h   v1.30.6   10.21.68.239   <none>        Ubuntu 22.04.4 LTS   5.15.0-118-generic   containerd://1.7.22
root@st-1-master:~#

Ping Test from non-GW node (sub1) to GW node (sub2)

root@st-1-worker-3:~# ping 243.0.255.254
PING 243.0.255.254 (243.0.255.254) 56(84) bytes of data.
From 10.244.3.0 icmp_seq=1 Destination Host Unreachable
From 10.244.3.0 icmp_seq=2 Destination Host Unreachable
From 10.244.3.0 icmp_seq=3 Destination Host Unreachable
From 10.244.3.0 icmp_seq=4 Destination Host Unreachable
From 10.244.3.0 icmp_seq=5 Destination Host Unreachable
From 10.244.3.0 icmp_seq=6 Destination Host Unreachable
From 10.244.3.0 icmp_seq=7 Destination Host Unreachable
From 10.244.3.0 icmp_seq=8 Destination Host Unreachable
From 10.244.3.0 icmp_seq=9 Destination Host Unreachable
^C
--- 243.0.255.254 ping statistics ---
10 packets transmitted, 0 received, +9 errors, 100% packet loss, time 9214ms
pipe 4
root@st-1-worker-3:~#

Ping Test from GW node (sub1) to GW node (sub2)

root@st-1-worker-1:~# ping 243.0.255.254
PING 243.0.255.254 (243.0.255.254) 56(84) bytes of data.
64 bytes from 243.0.255.254: icmp_seq=1 ttl=64 time=0.931 ms
64 bytes from 243.0.255.254: icmp_seq=2 ttl=64 time=0.918 ms
64 bytes from 243.0.255.254: icmp_seq=3 ttl=64 time=0.720 ms
64 bytes from 243.0.255.254: icmp_seq=4 ttl=64 time=0.899 ms
64 bytes from 243.0.255.254: icmp_seq=5 ttl=64 time=1.09 ms
^C
--- 243.0.255.254 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4008ms
rtt min/avg/max/mdev = 0.720/0.911/1.091/0.117 ms
root@st-1-worker-1:~#

Capture Traffic on GW Node and non-GW Node (sub1) with tcpdump

image

Results

Ping Test from Non-GW Node

  • The non GW Node in cluster1 tried to ping the health check IP 243.0.255.254 (GW node in cluster2).
  • The ping failed with Destination Host Unreachable -> the non-GW node could not reach 243.0.255.254 😔.

TCPDump on Non-GW Node

  • I started a tcpdump on the non-GW node to capture any traffic related to 243.0.255.254
  • No ICMP requests or replies seem to appear in the output, indicating that either the packets aren't being sent from this node or they're being dropped somewhere along the path.

TCPDump on GW Node in cluster1

  • On the GW node in cluster1 the tcpdump shows multiple ICMP echo replies from 243.0.255.254 to 242.0.0.1(likely another internal address within cluster1 idk)
  • Looks like the GW in cluster1 is receiving traffic from the gateway in cluster2, but it isn’t successfully reaching the non-gateway node or responding to it.

@aswinayyolath
Copy link
Author

DaemonSet List:

NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-proxy   4         4         4       4            4           kubernetes.io/os=linux   11h

Checked Pods

NAME                                               READY   STATUS    RESTARTS   AGE
coredns-55cb58b774-5xgj7                           1/1     Running   0          11h
coredns-55cb58b774-rs6ln                           1/1     Running   0          11h
etcd-st-1-master.fyre.ibm.com                      1/1     Running   0          11h
kube-apiserver-st-1-master.fyre.ibm.com            1/1     Running   0          11h
kube-controller-manager-st-1-master.fyre.ibm.com   1/1     Running   0          11h
kube-proxy-hrqjb                                   1/1     Running   0          11h
kube-proxy-htzg6                                   1/1     Running   0          11h
kube-proxy-rd267                                   1/1     Running   0          11h

CNI Configuration

root@st-1-master:~# cat /etc/cni/net.d/10-flannel.conflist
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

@yboaron
Copy link
Contributor

yboaron commented Nov 4, 2024

DaemonSet List:

NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-proxy   4         4         4       4            4           kubernetes.io/os=linux   11h

Checked Pods

NAME                                               READY   STATUS    RESTARTS   AGE
coredns-55cb58b774-5xgj7                           1/1     Running   0          11h
coredns-55cb58b774-rs6ln                           1/1     Running   0          11h
etcd-st-1-master.fyre.ibm.com                      1/1     Running   0          11h
kube-apiserver-st-1-master.fyre.ibm.com            1/1     Running   0          11h
kube-controller-manager-st-1-master.fyre.ibm.com   1/1     Running   0          11h
kube-proxy-hrqjb                                   1/1     Running   0          11h
kube-proxy-htzg6                                   1/1     Running   0          11h
kube-proxy-rd267                                   1/1     Running   0          11h

CNI Configuration

root@st-1-master:~# cat /etc/cni/net.d/10-flannel.conflist
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

Is there flannel daemonset in another namespace?

@aswinayyolath
Copy link
Author

Yes

root@st-1-master:~# kubectl get daemonset -A
NAMESPACE             NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                AGE
kube-flannel          kube-flannel-ds            4         4         4       4            4           <none>                       14h
kube-system           kube-proxy                 4         4         4       4            4           kubernetes.io/os=linux       14h
submariner-operator   submariner-gateway         1         1         1       1            1           submariner.io/gateway=true   14h
submariner-operator   submariner-globalnet       1         1         1       1            1           submariner.io/gateway=true   14h
submariner-operator   submariner-metrics-proxy   1         1         1       1            1           submariner.io/gateway=true   14h
submariner-operator   submariner-routeagent      4         4         4       4            4           <none>                       14h
root@st-1-master:~#

@aswinayyolath
Copy link
Author

The kube-flannel-ds DaemonSet has the following volumes

volumes:
- name: run
  hostPath:
    path: /run/flannel
- name: cni-plugin
  hostPath:
    path: /opt/cni/bin
- name: cni
  hostPath:
    path: /etc/cni/net.d
- name: flannel-cfg
  configMap:
    name: kube-flannel-cfg
- name: xtables-lock
  hostPath:
    path: /run/xtables.lock
    type: FileOrCreate

CM details

root@st-1-master:~# kubectl get configmap kube-flannel-cfg -n kube-flannel -o yaml
apiVersion: v1
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "EnableNFTables": false,
      "Backend": {
        "Type": "vxlan"
      }
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"cni-conf.json":"{\n  \"name\": \"cbr0\",\n  \"cniVersion\": \"0.3.1\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n","net-conf.json":"{\n  \"Network\": \"10.244.0.0/16\",\n  \"EnableNFTables\": false,\n  \"Backend\": {\n    \"Type\": \"vxlan\"\n  }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","k8s-app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-flannel"}}
  creationTimestamp: "2024-11-03T16:24:55Z"
  labels:
    app: flannel
    k8s-app: flannel
    tier: node
  name: kube-flannel-cfg
  namespace: kube-flannel
  resourceVersion: "282"
  uid: f0058e4b-4ba9-49be-a759-fd0c9843a88d

@yboaron
Copy link
Contributor

yboaron commented Nov 4, 2024

Thanks for the information,

Regarding flannel discovery, it looks like we need to update flannel discovery code.
Maybe we should list ds in all namespaces and filter k8s-app=flannel label

QQ: does **kubectl get ds -A -l k8s-app=flannel** return flannel ds ?

Could you please report a new issue for flannel CNI discovery? please attach relevant information, we welcome any code contribution here :-) .

As per the datapath issue, traffic initiated at nongw node@clusterA towards remoter cluster is encapsulated in VxLAN (port 4800, interface vx-submariner) towards gw node@clusterA and gw node should forward it to remote cluster gw.

Can you double check (maybe use tcpdump -pi ) that no packet is sent in nonGW node ? I can see that on gw node iptables (filter table) packet counter for input traffic on vx-submariner interface is > 0 , check [1] .

[1]
Chain SUBMARINER-INPUT (1 references) num pkts bytes target prot opt in out source destination 1 952 74256 ACCEPT 17 -- * * 0.0.0.0/0 0.0.0.0/0 udp dpt:4800

@aswinayyolath
Copy link
Author

QQ: does kubectl get ds -A -l k8s-app=flannel return flannel ds ?
Ans: Yes

root@st-1-master:~# kubectl get ds -A -l k8s-app=flannel
NAMESPACE      NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-flannel   kube-flannel-ds   4         4         4       4            4           <none>          17h
root@st-1-master:~#

I will report a new issue and see if I can contribute (I guess changes should be relatively small) ...

Packet Transmission on the Non-GW Node in sub1 (ClusterA)

  • I pinged one of the node in Cluster B from non gw node of Sub1 (Cluster A)
root@st-1-worker-3:~# ping 9.46.96.194
PING 9.46.96.194 (9.46.96.194) 56(84) bytes of data.
64 bytes from 9.46.96.194: icmp_seq=1 ttl=63 time=0.752 ms
64 bytes from 9.46.96.194: icmp_seq=2 ttl=63 time=0.725 ms
64 bytes from 9.46.96.194: icmp_seq=3 ttl=63 time=0.771 ms
^C
--- 9.46.96.194 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.725/0.749/0.771/0.018 ms
root@st-1-worker-3:~#

Run Packet Capture on the Non-Gateway Node in ClusterA

root@st-1-worker-3:~# sudo tcpdump -i vx-submariner port 4800
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

Verified Reception on the Gateway Node in ClusterA

root@st-1-master:~# sudo iptables -t filter -L SUBMARINER-INPUT -v -n
Chain SUBMARINER-INPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0
root@st-1-master:~#

is this what you want me to do? I am not 100% sure

@aswinayyolath
Copy link
Author

I have created a new issue: #3210. A draft change has been pushed here: #3268.

@yboaron, I haven't yet looked into linting, unit tests, or e2es testing; I'm just checking if the changes look something like this (Draft linked above). I also modified the loop structure from

       for k := range daemonsets.Items {
               if strings.Contains(daemonsets.Items[k].Name, "flannel") {
                       volumes = daemonsets.Items[k].Spec.Template.Spec.Volumes

to

       for _, ds := range daemonsets.Items {
               if strings.Contains(ds.Name, "flannel") {
                       flannelDaemonSet = &ds
                       volumes = ds.Spec.Template.Spec.Volumes
                       break
                }
        }

to enhance code readability and clarity. I think thid approach makes it clear that ds represents a DaemonSet obj, eliminating the need for indexing. Additionally, by storing a pointer to the found DS and breaking the loop upon finding it, I believe if we do something like this the code becomes more efficient and reduces the risk of errors associated with accessing elements via an index.

@yboaron
Copy link
Contributor

yboaron commented Nov 4, 2024

I will report a new issue and see if I can contribute (I guess changes should be relatively small) ...

Packet Transmission on the Non-GW Node in sub1 (ClusterA)

  • I pinged one of the node in Cluster B from non gw node of Sub1 (Cluster A)
root@st-1-worker-3:~# ping 9.46.96.194
PING 9.46.96.194 (9.46.96.194) 56(84) bytes of data.
64 bytes from 9.46.96.194: icmp_seq=1 ttl=63 time=0.752 ms
64 bytes from 9.46.96.194: icmp_seq=2 ttl=63 time=0.725 ms
64 bytes from 9.46.96.194: icmp_seq=3 ttl=63 time=0.771 ms
^C
--- 9.46.96.194 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.725/0.749/0.771/0.018 ms
root@st-1-worker-3:~#

Run Packet Capture on the Non-Gateway Node in ClusterA

root@st-1-worker-3:~# sudo tcpdump -i vx-submariner port 4800
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

Submariner only handles egress routing and only for packets destined to remote clusters (dest IP is from remote pod,service CIDRs, in your case it is globalNet CIDR for remote cluster) , please tcpdump while pinging remote endpoint healthcheck IP address

@aswinayyolath
Copy link
Author

image

@yboaron
Copy link
Contributor

yboaron commented Nov 5, 2024

Can you try running tcpdump -vv -penni any | grep -i icmp on nongw node and check if you get anything ?

@aswinayyolath
Copy link
Author

I am seeing a lot of output from tcpdump -vv -penni any | grep -i icmp, but I don't really understand it.

group record(s) [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:fffc:d5dc to_in { }] [gaddr ff02::1:fff3:9969 to_ex { }]
01:50:24.775041 eth0  M   ifindex 2 00:00:0a:15:47:c4 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:fffc:d5dc to_in { }] [gaddr ff02::1:fff3:9969 to_ex { }]
01:50:24.775188 eth0  M   ifindex 2 00:00:0a:15:48:2a ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:fffc:d5dc to_in { }] [gaddr ff02::1:fff3:9969 to_ex { }]
01:50:24.776386 eth0  M   ifindex 2 00:00:0a:15:41:6d ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:fffc:d5dc to_in { }] [gaddr ff02::1:fff3:9969 to_ex { }]
01:50:24.804302 eth0  M   ifindex 2 00:00:0a:15:50:4b ethertype IPv6 (0x86dd), length 116: (hlim 1, next-header Options (0) payload length: 56) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 2 group record(s) [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:fff3:9969 to_ex { }]
01:50:24.823240 eth0  M   ifindex 2 00:00:0a:15:49:bb ethertype IPv6 (0x86dd), length 92: (hlim 255, next-header ICMPv6 (58) payload length: 32) :: > ff02::1:fff3:9969: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::d80a:bb86:85f3:9969
01:50:24.824565 eth0  M   ifindex 2 00:00:0a:15:44:ac ethertype IPv6 (0x86dd), length 92: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::d80a:bb86:85f3:9969 > ff02::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::d80a:bb86:85f3:9969, Flags [override]
01:50:24.826553 eth0  M   ifindex 2 00:00:0a:15:49:d2 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.826709 eth0  M   ifindex 2 00:00:0a:15:4e:24 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.826887 eth0  M   ifindex 2 00:00:0a:15:42:a0 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.827218 eth0  M   ifindex 2 00:00:0a:15:4d:17 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.827280 eth0  M   ifindex 2 00:00:0a:15:4f:48 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.827483 eth0  M   ifindex 2 00:00:0a:15:47:f3 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.827632 eth0  M   ifindex 2 00:00:0a:15:40:a5 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.827839 eth0  M   ifindex 2 00:00:0a:15:43:25 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828020 eth0  M   ifindex 2 00:00:0a:15:49:c3 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828150 eth0  M   ifindex 2 00:00:0a:15:49:bb ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828307 eth0  M   ifindex 2 00:00:0a:15:51:c2 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828473 eth0  M   ifindex 2 00:00:0a:15:47:bd ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828628 eth0  M   ifindex 2 00:00:0a:15:41:d2 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828787 eth0  M   ifindex 2 00:00:0a:15:4b:6a ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.828940 eth0  M   ifindex 2 00:00:0a:15:4c:be ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.829140 eth0  M   ifindex 2 00:00:0a:15:4d:f4 ethertype IPv6 (0x86dd), length 156: (hlim 1, next-header Options (0) payload length: 96) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 4 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::fb to_ex { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.829286 eth0  M   ifindex 2 00:00:0a:15:47:be ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.829430 eth0  M   ifindex 2 00:00:0a:15:50:a7 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.829591 eth0  M   ifindex 2 00:00:0a:15:46:90 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.829780 eth0  M   ifindex 2 00:00:0a:15:48:2a ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.829939 eth0  M   ifindex 2 00:00:0a:15:47:c4 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.830090 eth0  M   ifindex 2 00:00:0a:15:49:27 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.830245 eth0  M   ifindex 2 00:00:0a:15:43:34 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.830411 eth0  M   ifindex 2 00:00:0a:15:50:4b ethertype IPv6 (0x86dd), length 116: (hlim 1, next-header Options (0) payload length: 56) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 2 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.830600 eth0  M   ifindex 2 00:00:0a:15:41:6d ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffd6:f6f8 to_in { }] [gaddr ff02::1:ff8e:4640 to_ex { }]
01:50:24.831228 eth0  M   ifindex 2 00:00:0a:15:49:bb ethertype IPv6 (0x86dd), length 92: (hlim 255, next-header ICMPv6 (58) payload length: 32) :: > ff02::1:ff8e:4640: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::3874:f6a7:af8e:4640
01:50:24.832144 eth0  M   ifindex 2 00:00:0a:15:45:7e ethertype IPv6 (0x86dd), length 92: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::3874:f6a7:af8e:4640 > ff02::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::3874:f6a7:af8e:4640, Flags [override]
01:50:24.834225 eth0  M   ifindex 2 00:00:0a:15:49:c3 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.834496 eth0  M   ifindex 2 00:00:0a:15:47:bd ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.834685 eth0  M   ifindex 2 00:00:0a:15:4d:f4 ethertype IPv6 (0x86dd), length 156: (hlim 1, next-header Options (0) payload length: 96) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 4 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::fb to_ex { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.835124 eth0  M   ifindex 2 00:00:0a:15:4e:24 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.835278 eth0  M   ifindex 2 00:00:0a:15:49:d2 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.835421 eth0  M   ifindex 2 00:00:0a:15:43:34 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.835572 eth0  M   ifindex 2 00:00:0a:15:49:27 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.835723 eth0  M   ifindex 2 00:00:0a:15:42:a0 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.835916 eth0  M   ifindex 2 00:00:0a:15:4d:17 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836052 eth0  M   ifindex 2 00:00:0a:15:4f:48 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836226 eth0  M   ifindex 2 00:00:0a:15:47:f3 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836429 eth0  M   ifindex 2 00:00:0a:15:40:a5 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836550 eth0  M   ifindex 2 00:00:0a:15:43:25 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836695 eth0  M   ifindex 2 00:00:0a:15:51:c2 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836837 eth0  M   ifindex 2 00:00:0a:15:49:bb ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.836963 eth0  M   ifindex 2 00:00:0a:15:41:d2 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.837107 eth0  M   ifindex 2 00:00:0a:15:4b:6a ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.837246 eth0  M   ifindex 2 00:00:0a:15:4c:be ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.837370 eth0  M   ifindex 2 00:00:0a:15:47:be ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.837555 eth0  M   ifindex 2 00:00:0a:15:50:a7 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.837738 eth0  M   ifindex 2 00:00:0a:15:46:90 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.837859 eth0  M   ifindex 2 00:00:0a:15:48:2a ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.838279 eth0  M   ifindex 2 00:00:0a:15:50:4b ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.838393 eth0  M   ifindex 2 00:00:0a:15:47:c4 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.838520 eth0  M   ifindex 2 00:00:0a:15:41:6d ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:fff3:9969 to_in { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.840276 eth0  M   ifindex 2 00:00:0a:15:4d:f4 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::fb to_ex { }] [gaddr ff02::1:ffa3:b55e to_ex { }]
01:50:24.846137 eth0  M   ifindex 2 00:00:0a:15:47:f3 ethertype IPv6 (0x86dd), length 92: (hlim 255, next-header ICMPv6 (58) payload length: 32) :: > ff02::1:ffa3:b55e: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5dda:4834:24a3:b55e
01:50:24.847290 eth0  M   ifindex 2 00:00:0a:15:46:21 ethertype IPv6 (0x86dd), length 92: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5dda:4834:24a3:b55e > ff02::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5dda:4834:24a3:b55e, Flags [override]
01:50:24.849025 eth0  M   ifindex 2 00:00:0a:15:49:c3 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.849206 eth0  M   ifindex 2 00:00:0a:15:49:bb ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.849409 eth0  M   ifindex 2 00:00:0a:15:47:bd ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.849632 eth0  M   ifindex 2 00:00:0a:15:4d:f4 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::fb to_ex { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.849776 eth0  M   ifindex 2 00:00:0a:15:4e:24 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.849911 eth0  M   ifindex 2 00:00:0a:15:42:a0 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.850069 eth0  M   ifindex 2 00:00:0a:15:4d:17 ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3 group record(s) [gaddr ff02::1:ffa3:b55e to_in { }] [gaddr ff02::1:ff8e:4640 to_in { }] [gaddr ff02::1:ff6f:6de1 to_ex { }]
01:50:24.850233 eth0  M   ifindex 2 00:00:0a:15:47:be ethertype IPv6 (0x86dd), length 136: (hlim 1, next-header Options (0) payload length: 76) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 3

@yboaron
Copy link
Contributor

yboaron commented Nov 5, 2024

Hmmmm, its ICMP/IPv6 traffic , don't you get any ICMP/IPv4 (tcpdump -penni any -vv | grep -i icmp | grep IPv4) traffic?

@aswinayyolath
Copy link
Author

root@st-1-worker-3:~# tcpdump -penni any -vv | grep -i icmp | grep IPv4
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
20:03:40.206272 lo    In  ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 100: (tos 0xc0, ttl 64, id 12508, offset 0, flags [none], proto ICMP (1), length 80)
20:04:40.206323 lo    In  ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 100: (tos 0xc0, ttl 64, id 18713, offset 0, flags [none], proto ICMP (1), length 80)
20:05:40.206530 lo    In  ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 100: (tos 0xc0, ttl 64, id 24579, offset 0, flags [none], proto ICMP (1), length 80)
20:06:40.206287 lo    In  ifindex 1 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 100: (tos 0xc0, ttl 64, id 27221, offset 0, flags [none], proto ICMP (1), length 80)

@yboaron
Copy link
Contributor

yboaron commented Nov 12, 2024

Hmm, strange, can't see the V4 icmp sent to remote cluster.
If you still have the env available could you please upload subctl gather?

@aswinayyolath
Copy link
Author

I don't have the cluster with me 😔. But I will create one (in fact 2). @yboaron I would like to check with you if the steps I am following is correct or not. Could you please review the Steps here (https://kubernetes.slack.com/archives/C010RJV694M/p1730390398271589?thread_ts=1730390376.380879&cid=C010RJV694M) and let me know If I am missing anything please?

@aswinayyolath
Copy link
Author

aswinayyolath commented Nov 12, 2024

I would also like to test the same in AWS across 2 regions. I just want to know if the steps I followed is correct and I will try it in Both the VM I used before as well as I will create 2 EKS cluster in 2 diff regions in AWS and see if that works

@yboaron
Copy link
Contributor

yboaron commented Nov 12, 2024

I don't have the cluster with me 😔. But I will create one (in fact 2). @yboaron I would like to check with you if the steps I am following is correct or not. Could you please review the Steps here (https://kubernetes.slack.com/archives/C010RJV694M/p1730390398271589?thread_ts=1730390376.380879&cid=C010RJV694M) and let me know If I am missing anything please?

Yep, looks fine.

Can you try reinstalling without adding --globalnet-cidr 242.0.0.0/16 flag in subctl join command for both clusters

@rohan-anilkumar
Copy link

rohan-anilkumar commented Nov 25, 2024

Hello @yboaron. Since @aswinayyolath is busy with some other tasks, I'm looking at this issue. We're on the same team working on the same project.

Since we have same CIDRs on our K8s clusters we cannot have submariner run without global net. To counter this we created an AWS account and then tried to run submariner on EKS.
We followed this tutorial to setup the aws eks control plane: https://www.youtube.com/watch?v=0bUEKcjC_jM&t=261s
And followed this tutorial to setup submariner on aws: https://www.youtube.com/watch?v=fMhZRNn0fxQ&t=5s

But this does not work and gives these outputs while running diagnostics

rohananilkumar@Rohans-MacBook-Pro .kube % subctl diagnose all --kubeconfig config-str-aws
Cluster "arn:aws:eks:eu-north-1:<SNIPPED>:cluster/stretch-1"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.31.2-eks-7f9249a" is supported

 ✗ Non-Globalnet deployment detected - checking that cluster CIDRs do not overlap
 ✗ Error getting the Broker's REST config: error getting auth rest config: Get "https://<SNIPPED>.eu-north-1.eks.amazonaws.com/apis/submariner.io/v1/namespaces/submariner-k8s-broker/clusters/any": tls: failed to verify certificate: x509: “kube-apiserver” certificate is not trusted

 ⚠ Checking Submariner support for the CNI network plugin
 ⚠ Submariner could not detect the CNI network plugin and is using ("generic") plugin. It may or may not work.
 ✗ Checking gateway connections
 ✗ There are no active connections on gateway "ip-<SNIPPED>.eu-north-1.compute.internal"
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✗ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✗ Unable to obtain a remote endpoint: endpoints.submariner.io "remote Endpoint" not found

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.18.0

I suspect that there is some issue with setting up the subnets. What is something I should try next to get submariner up and running on AWS?

@yboaron @Jaanki

@yboaron
Copy link
Contributor

yboaron commented Dec 11, 2024

Maybe you can follow this link ?

In case deployment fails please attach debug details from clusters (subctl gather , subctl diagnose all ) ?

@aswinayyolath
Copy link
Author

aswinayyolath commented Jan 2, 2025

@Jaanki @aswinsuryan @yboaron

I have tried using Kind following the documentation https://submariner.io/getting-started/quickstart/kind/

root@stretch-cluster1:~/submariner-operator# kind get clusters
cluster1
cluster2
root@stretch-cluster1:~/submariner-operator# kubectl config get-contexts
CURRENT   NAME       CLUSTER    AUTHINFO   NAMESPACE
*         cluster1   cluster1   cluster1
          cluster2   cluster2   cluster2
root@stretch-cluster1:~/submariner-operator# subctl deploy-broker
 ✓ Setting up broker RBAC
 ✓ Deploying the Submariner operator
 ✓ Created operator namespace: submariner-operator
 ✓ Deploying the broker
 ✓ Saving broker info to file "broker-info.subm"
root@stretch-cluster1:~/submariner-operator#
root@stretch-cluster1:~/submariner-operator# subctl join  broker-info.subm --clusterid cluster1 --natt=false
 ✓ broker-info.subm indicates broker is at https://172.18.0.7:6443/
 ✓ Discovering network details
        Network plugin:  kindnet
        Service CIDRs:   [100.66.0.0/16]
        Cluster CIDRs:   [10.130.0.0/16]
        ClustersetIP CIDR:     243.0.0.0/20
   There are 1 node(s) labeled as gateways:
    - cluster1-worker
 ✓ Retrieving the gateway nodes
 ✓ Gathering relevant information from Broker
 ✓ Retrieving Globalnet information from the Broker
 ✓ Validating Globalnet configuration
 ✓ Retrieving ClustersetIP information from the Broker
 ✓ Validating ClustersetIP configuration
 ✓ Assigning ClustersetIP IPs
 ✓ Using pre-configured clustersetip CIDR 243.0.0.0/20
 ✓ Deploying the Submariner operator
 ✓ Created operator namespace: submariner-operator
 ✓ Creating SA for cluster
 ✓ Connecting to Broker
 ✓ Deploying submariner
 ✓ Submariner is up and running
root@stretch-cluster1:~/submariner-operator# kubectl config use-context cluster2
Switched to context "cluster2".
root@stretch-cluster1:~/submariner-operator# subctl join broker-info.subm --clusterid cluster2 --natt=false
 ✓ broker-info.subm indicates broker is at https://172.18.0.7:6443/
 ✓ Discovering network details
        Network plugin:  kindnet
        Service CIDRs:   [100.67.0.0/16]
        Cluster CIDRs:   [10.131.0.0/16]
        ClustersetIP CIDR:     243.0.16.0/20
   There are 1 node(s) labeled as gateways:
    - cluster2-worker
 ✓ Retrieving the gateway nodes
 ✓ Gathering relevant information from Broker
 ✓ Retrieving Globalnet information from the Broker
 ✓ Validating Globalnet configuration
 ✓ Retrieving ClustersetIP information from the Broker
 ✓ Validating ClustersetIP configuration
 ✓ Assigning ClustersetIP IPs
 ✓ Using pre-configured clustersetip CIDR 243.0.16.0/20
 ✓ Deploying the Submariner operator
 ✓ Created operator namespace: submariner-operator
 ✓ Creating SA for cluster
 ✓ Connecting to Broker
 ✓ Deploying submariner
 ✓ Submariner is up and running
root@stretch-cluster1:~/submariner-operator# subctl verify --context cluster1 --tocontext cluster2 --only service-discovery,connectivity --verbose
Performing the following verifications: service-discovery, connectivity
Jan  2 02:02:02.160: Creating kubernetes clients
panic: Your Test Panicked
github.com/submariner-io/[email protected]/test/e2e/framework/framework.go:562
  When you, or your assertion library, calls Ginkgo's Fail(),
  Ginkgo panics to prevent subsequent assertions from running.

  Normally Ginkgo rescues this panic so you shouldn't see it.

  However, if you make an assertion in a goroutine, Ginkgo can't capture the
  panic.
  To circumvent this, you should call

  	defer GinkgoRecover()

  at the top of the goroutine that caused this panic.

  Alternatively, you may have made an assertion outside of a Ginkgo
  leaf node (e.g. in a container node or some out-of-band function) - please
  move your assertion to
  an appropriate Ginkgo node (e.g. a BeforeSuite, BeforeEach, It, etc...).

  Learn more at:
  http://onsi.github.io/ginkgo/#mental-model-how-ginkgo-handles-failure


goroutine 1 [running]:
github.com/onsi/ginkgo/v2.Fail({0xc0007db680, 0xb4}, {0xc000777be0?, 0xc00007fce0?, 0x40f47f?})
	github.com/onsi/ginkgo/[email protected]/core_dsl.go:427 +0x21e
github.com/onsi/gomega/internal.(*Assertion).match(0xc0006522c0, {0x3e94038, 0x5992800}, 0x0, {0xc000504180, 0x1, 0x1})
	github.com/onsi/[email protected]/internal/assertion.go:106 +0x1f0
github.com/onsi/gomega/internal.(*Assertion).NotTo(0xc0006522c0, {0x3e94038, 0x5992800}, {0xc000504180, 0x1, 0x1})
	github.com/onsi/[email protected]/internal/assertion.go:74 +0xad
github.com/submariner-io/shipyard/test/e2e/framework.AwaitUntil({0xc000646e70?, 0x18?}, 0xc000579498?, 0x2?)
	github.com/submariner-io/[email protected]/test/e2e/framework/framework.go:562 +0xcb
github.com/submariner-io/shipyard/test/e2e/framework.fetchClusterIDs()
	github.com/submariner-io/[email protected]/test/e2e/framework/framework.go:336 +0x165
github.com/submariner-io/shipyard/test/e2e/framework.BeforeSuite()
	github.com/submariner-io/[email protected]/test/e2e/framework/framework.go:206 +0x3e5
github.com/submariner-io/subctl/cmd/subctl.runVerify(0xc00048da00, 0xc000424f40, 0x0, {0x3964071, 0x13}, {0xc00069b0a0, 0x2, 0x2})
	github.com/submariner-io/subctl/cmd/subctl/verify.go:356 +0x7c5
github.com/submariner-io/subctl/cmd/subctl.init.func33.1.1(0xc000424f40, {0x8?, 0x394192c?}, {0x3ea9370?, 0xc000430fb0?})
	github.com/submariner-io/subctl/cmd/subctl/verify.go:104 +0xe5
github.com/submariner-io/subctl/internal/restconfig.(*Producer).RunOnSelectedPrefixedContext(0xc0000aaea0, {0x394192c, 0x2}, 0xc000579b98, {0x3ea9370, 0xc000430fb0})
	github.com/submariner-io/subctl/internal/restconfig/restconfig.go:292 +0x443
github.com/submariner-io/subctl/cmd/subctl.init.func33.1(0xc000253860?, {0x3964071?, 0xc000328d88?}, {0x3ea9370?, 0xc000430fb0?})
	github.com/submariner-io/subctl/cmd/subctl/verify.go:92 +0x7d
github.com/submariner-io/subctl/internal/restconfig.(*Producer).RunOnSelectedContext(0xc0000aaea0, 0xc000579c78, {0x3ea9370, 0xc000430fb0})
	github.com/submariner-io/subctl/internal/restconfig/restconfig.go:229 +0x1fd
github.com/submariner-io/subctl/cmd/subctl.init.func33(0x58fbfc0, {0xc00041eee0?, 0x0?, 0x394243e?})
	github.com/submariner-io/subctl/cmd/subctl/verify.go:89 +0x51
github.com/spf13/cobra.(*Command).execute(0x58fbfc0, {0xc00041ee70, 0x7, 0x7})
	github.com/spf13/[email protected]/command.go:989 +0xab1
github.com/spf13/cobra.(*Command).ExecuteC(0x58f7da0)
	github.com/spf13/[email protected]/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/[email protected]/command.go:1041
github.com/submariner-io/subctl/cmd/subctl.Execute()
	github.com/submariner-io/subctl/cmd/subctl/root.go:76 +0x1a
main.main()
	github.com/submariner-io/subctl/cmd/main.go:24 +0xf
root@stretch-cluster1:~/submariner-operator# kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-676b6c5bbc-btx7z   1/1     Running   0          9m41s
root@stretch-cluster1:~/submariner-operator# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   100.67.0.1      <none>        443/TCP   153m
nginx        ClusterIP   100.67.70.213   <none>        80/TCP    8m50s
root@stretch-cluster1:~/submariner-operator# subctl export service nginx
 ✗ Failed to export Service: the server could not find the requested resource

subctl version: v0.19.1

submariner-operator logs

submariner-operator version: release-0.19-077b8d39d39e
2025-01-02T10:18:41.459Z INF ..er-operator/main.go:108 cmd                  Starting submariner-operator
2025-01-02T10:18:41.459Z INF ..ner-operator/main.go:73 cmd                  Go Version: go1.22.7
2025-01-02T10:18:41.459Z INF ..ner-operator/main.go:74 cmd                  Go OS/Arch: linux/amd64
2025-01-02T10:18:41.460Z INF ..5.0/leader/leader.go:97 leader               Trying to become the leader.
2025-01-02T10:18:41.460Z INF ...0/leader/leader.go:267 leader               Found podname Pod.Name=submariner-operator-7cd64cbc9c-zq2ht
2025-01-02T10:18:48.633Z ERR ...0/leader/leader.go:273 leader               Failed to get Pod error="failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"[https://100.67.0.1:443/api/v1](https://100.67.0.1/api/v1)\": dial tcp 100.67.0.1:443: connect: no route to host" Pod.Name=submariner-operator-7cd64cbc9c-zq2ht Pod.Namespace=submariner-operator
2025-01-02T10:18:48.633Z ERR ..er-operator/main.go:129 cmd                   error="failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"[https://100.67.0.1:443/api/v1](https://100.67.0.1/api/v1)\": dial tcp 100.67.0.1:443: connect: no route to host"
root@stretch-cluster1:~/submariner-operator# subctl show networks
Cluster "cluster2"
 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:
        Service CIDRs:   []
        Cluster CIDRs:   []

Cluster "cluster1"
 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:
        Service CIDRs:   []
        Cluster CIDRs:   []

@aswinayyolath
Copy link
Author

As per @aswinsuryan 's suggestion we switched to Calico instead of Flannel

Steps Performed

rohananilkumar@Rohans-MacBook-Pro .kube % subctl deploy-broker --kubeconfig config-shibu --globalnet
 ✓ Setting up broker RBAC
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Created submariner service account and role
 ✓ Created lighthouse service account and role
 ✓ Deployed the operator successfully
 ✓ Deploying the broker
 ✓ Saving broker info to file "broker-info.subm"
 ✓ Backed up previous file "broker-info.subm" to "broker-info.subm.2025-01-02T14_41_58+05_30"
rohananilkumar@Rohans-MacBook-Pro .kube % subctl join --kubeconfig config-shibu --clusterid cluster1 --globalnet-cidr 242.0.0.0/16 broker-info.subm --check-broker-certificate=false
 ✓ broker-info.subm indicates broker is at https://9.46.108.74:6443/
 ✓ Discovering network details
        Network plugin:  calico
        Service CIDRs:   [10.96.0.0/12]
        Cluster CIDRs:   [192.168.0.0/16]
 ✓ Retrieving the gateway nodes
 ✓ Retrieving all worker nodes
? Which node should be used as the gateway? stretch-calico-1-master.fyre.ibm.com
 ✓ Labeling node "stretch-calico-1-master.fyre.ibm.com" as a gateway
 ✓ Gathering relevant information from Broker
 ✓ Retrieving Globalnet information from the Broker
 ✓ Validating Globalnet configuration
 ✓ Assigning Globalnet IPs
 ✓ Using specified global CIDR 242.0.0.0/16
 ✓ Updating the Globalnet information on the Broker
 ✓ Deploying the Submariner operator
 ✓ Created operator namespace: submariner-operator
 ✓ Creating SA for cluster
 ✓ Connecting to Broker
 ✓ Deploying submariner
 ✓ Submariner is up and running
rohananilkumar@Rohans-MacBook-Pro .kube % subctl join --kubeconfig config-rohan --clusterid cluster2 --globalnet-cidr 243.0.0.0/16 broker-info.subm --check-broker-certificate=false
 ✓ broker-info.subm indicates broker is at https://9.46.108.74:6443/
 ✓ Discovering network details
        Network plugin:  calico
        Service CIDRs:   [10.96.0.0/12]
        Cluster CIDRs:   [192.168.0.0/16]
 ✓ Retrieving the gateway nodes
 ✓ Retrieving all worker nodes
? Which node should be used as the gateway? rak-4-master.fyre.ibm.com
 ✓ Labeling node "rak-4-master.fyre.ibm.com" as a gateway
 ✓ Gathering relevant information from Broker
 ✓ Retrieving Globalnet information from the Broker
 ✓ Validating Globalnet configuration
 ✓ Assigning Globalnet IPs
 ✓ Using specified global CIDR 243.0.0.0/16
 ✓ Updating the Globalnet information on the Broker
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Created submariner service account and role
 ✓ Created lighthouse service account and role
 ✓ Deployed the operator successfully
 ✓ Creating SA for cluster
 ✓ Connecting to Broker
 ✓ Deploying submariner
 ✓ Submariner is up and running
rohananilkumar@Rohans-MacBook-Pro .kube % subctl show connections --kubeconfig config-rohan
Cluster "rak-4"
 ✓ Showing Connections
GATEWAY                          CLUSTER    REMOTE IP     NAT   CABLE DRIVER   SUBNETS        STATUS      RTT avg.
stretch-calico-1-master.fyre.i   cluster1   9.46.108.74   no    libreswan      242.0.0.0/16   connected   1.959807ms

rohananilkumar@Rohans-MacBook-Pro .kube % subctl show connections --kubeconfig config-shibu
Cluster "stretch-calico-1"
 ✓ Showing Connections
GATEWAY                     CLUSTER    REMOTE IP     NAT   CABLE DRIVER   SUBNETS        STATUS   RTT avg.
rak-4-master.fyre.ibm.com   cluster2   9.46.66.208   no    libreswan      243.0.0.0/16   error    0s

@rohan-anilkumar could you please upload the output of subctl diagnose all ?

@yboaron
Copy link
Contributor

yboaron commented Jan 2, 2025

@aswinayyolath, did you follow the instructions for Submariner with Calico ?

@aswinayyolath
Copy link
Author

I think Yes! but not 100% sure I will ask @rohan-anilkumar to confirm we saw about this here

@rohan-anilkumar
Copy link

@aswinayyolath @yboaron we haven't installed the Calico API server. From the link it seems like Calico API server needs to be installed for it to run.

@aswinayyolath
Copy link
Author

Aswin 🔥🔥🔥 $ kubectl get crd --kubeconfig rak-4 | grep -i IPPool
ippools.crd.projectcalico.org                         2025-01-02T07:42:43Z
Aswin 🔥🔥🔥 $ kubectl get pods -n kube-system --kubeconfig rak-4 | grep calico
calico-kube-controllers-596754b6c7-lqhxg            1/1     Running   0          23h
calico-node-2rbvr                                   1/1     Running   0          23h
calico-node-8l5wz                                   1/1     Running   0          23h
calico-node-c27c6                                   1/1     Running   0          23h
calico-node-n2hsv                                   1/1     Running   0          23h
calico-node-pkhr5                                   1/1     Running   0          23h
calico-node-vw586                                   1/1     Running   0          23h
calico-node-xrp2z                                   1/1     Running   0          23h
calico-node-zxtxv                                   1/1     Running   0          23h
Aswin 🔥🔥🔥 $

Cluster 1

Aswin 🔥🔥🔥 $ kubectl cluster-info dump --kubeconfig stretch-calico-1 | grep -m 1 service-cluster-ip-range
                            "--service-cluster-ip-range=10.96.0.0/12",
Aswin 🔥🔥🔥 $ kubectl cluster-info dump --kubeconfig stretch-calico-1 | grep -m 1 cluster-cidr
                            "--cluster-cidr=192.168.0.0/16",

Cluster 2

Aswin 🔥🔥🔥 $ kubectl cluster-info dump --kubeconfig rak-4 | grep -m 1 service-cluster-ip-range
                            "--service-cluster-ip-range=10.96.0.0/12",
Aswin 🔥🔥🔥 $
Aswin 🔥🔥🔥 $ kubectl cluster-info dump --kubeconfig rak-4 | grep -m 1 cluster-cidr
                            "--cluster-cidr=192.168.0.0/16",
Aswin 🔥🔥🔥 $

Since both the cluster have the same Service CIDR 10.96.0.0/12 and Pod CIDR 192.168.0.0/16, this configuration will result in overlapping CIDRs. @yboaron Can I use globalnet in this Case and proceed with next set of steps

@aswinayyolath
Copy link
Author

aswinayyolath commented Jan 4, 2025

I have tried Deploy kind with Submariner Locally using this link https://submariner.io/getting-started/quickstart/kind/

I'm getting same issue even with Kind..

OS details and versions of Binaries used

Image
Ubuntu 24.04 LTS

Size
4 Core 8GB 250GB
root@c64324v1:~# kind version
kind v0.27.0-alpha+3ab1dab1c81267 go1.23.4 linux/amd64
root@c64324v1:~# kubectl version
Client Version: v1.32.0
Kustomize Version: v5.5.0
root@c64324v1:~# docker version
Client:
 Version:           27.2.0
 API version:       1.47
 Go version:        go1.21.13
 Git commit:        3ab4256
 Built:             Fri Sep  6 19:08:12 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          27.2.0
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.21.13
  Git commit:       3ab5c7d
  Built:            Fri Sep  6 19:08:45 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.21
  GitCommit:        472731909fa34bd7bc9c087e4c27943f9835f111
 runc:
  Version:          1.1.13
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
root@c64324v1:~#
root@c64324v1:~# subctl version
subctl version: v0.19.1
root@c64324v1:~#
root@c64324v1:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
root@c64324v1:~#

Result of Verify Automatically with subctl

Summarizing 9 Failures:
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane, basic]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane, basic]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod with HostNetworking connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod with HostNetworking connects via TCP to a remote pod when the pod is on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
  [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod in reverse direction when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane]
  github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196

Ran 26 of 48 Specs in 1889.641 seconds
FAIL! -- 17 Passed | 9 Failed | 0 Pending | 22 Skipped

subctl gather output

cluster1.zip
cluster2.zip

Other details which might be useful

root@c64324v1:~/submariner-operator# kubectl get gateways -n submariner-operator --kubeconfig output/kubeconfigs/kind-config-cluster1
NAME                     HA STATUS
cluster1-control-plane   active
root@c64324v1:~/submariner-operator# kubectl get gateways -n submariner-operator --kubeconfig output/kubeconfigs/kind-config-cluster2
NAME                     HA STATUS
cluster2-control-plane   active
root@c64324v1:~/submariner-operator# kubectl get pod --kubeconfig output/kubeconfigs/kind-config-cluster2
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          7m9s
root@c64324v1:~/submariner-operator# kubectl get svc --kubeconfig output/kubeconfigs/kind-config-cluster2
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   100.67.0.1     <none>        443/TCP   28m
nginx        ClusterIP   100.67.27.14   <none>        80/TCP    6m26s
root@c64324v1:~/submariner-operator# subctl export service --kubeconfig output/kubeconfigs/kind-config-cluster2 --namespace default nginx
 ✓ Service exported successfully
root@c64324v1:~/submariner-operator# kubectl --kubeconfig output/kubeconfigs/kind-config-cluster1 -n default run tmp-shell --rm -i --tty --image quay.io/submariner/nettest \
-- /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.0# curl nginx.default.svc.clusterset.local
^C
bash-5.0# curl -vvv  nginx.default.svc.clusterset.local
*   Trying 100.67.27.14:80...
* connect to 100.67.27.14 port 80 failed: Operation timed out
* Failed to connect to nginx.default.svc.clusterset.local port 80: Operation timed out
* Closing connection 0
curl: (28) Failed to connect to nginx.default.svc.clusterset.local port 80: Operation timed out

subctl diagnose all

root@c64324v1:~/submariner-operator# subctl diagnose all --kubeconfig output/kubeconfigs/kind-config-cluster1
Cluster "cluster1"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.31.2" is supported

 ✓ Non-Globalnet deployment detected - checking that cluster CIDRs do not overlap
 ✓ Checking DaemonSet "submariner-gateway"
 ✓ Checking DaemonSet "submariner-routeagent"
 ✓ Checking DaemonSet "submariner-metrics-proxy"
 ✓ Checking Deployment "submariner-lighthouse-agent"
 ✓ Checking Deployment "submariner-lighthouse-coredns"
 ✓ Checking the status of all Submariner pods
 ✓ Checking that gateway metrics are accessible from non-gateway nodes

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("kindnet") is supported
 ✓ Checking gateway connections
 ✗ Checking route agent connections
 ✗ Connection to cluster "cluster2" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"10.131.0.1\"",
  "spec": {
    "cluster_id": "cluster2",
    "cable_name": "submariner-cable-cluster2-172-18-0-4",
    "healthCheckIP": "10.131.0.1",
    "hostname": "cluster2-control-plane",
    "subnets": [
      "100.67.0.0/16",
      "10.131.0.0/16"
    ],
    "private_ip": "172.18.0.4",
    "public_ip": "170.225.223.17",
    "nat_enabled": false,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✓ Checking that firewall configuration allows intra-cluster VXLAN traffic

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.19.1
root@c64324v1:~/submariner-operator# subctl diagnose all --kubeconfig output/kubeconfigs/kind-config-cluster2
Cluster "cluster2"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.31.2" is supported

 ✓ Non-Globalnet deployment detected - checking that cluster CIDRs do not overlap
 ✓ Checking DaemonSet "submariner-gateway"
 ✓ Checking DaemonSet "submariner-routeagent"
 ✓ Checking DaemonSet "submariner-metrics-proxy"
 ✓ Checking Deployment "submariner-lighthouse-agent"
 ✓ Checking Deployment "submariner-lighthouse-coredns"
 ✓ Checking the status of all Submariner pods
 ✓ Checking that gateway metrics are accessible from non-gateway nodes

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("kindnet") is supported
 ✓ Checking gateway connections
 ✗ Checking route agent connections
 ✗ Connection to cluster "cluster1" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"10.130.0.1\"",
  "spec": {
    "cluster_id": "cluster1",
    "cable_name": "submariner-cable-cluster1-172-18-0-6",
    "healthCheckIP": "10.130.0.1",
    "hostname": "cluster1-control-plane",
    "subnets": [
      "100.66.0.0/16",
      "10.130.0.0/16"
    ],
    "private_ip": "172.18.0.6",
    "public_ip": "170.225.223.21",
    "nat_enabled": false,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✓ Checking that firewall configuration allows intra-cluster VXLAN traffic

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.19.1

@yboaron
Copy link
Contributor

yboaron commented Jan 5, 2025

IIRC, there were other users who also reported problems deploying Kind on Ubuntu in the past, probably due to environment configuration issues

A. I usually apply [1] script on my host before deploying Submariner on Kind, could you check that?
B. Also as workaround, you can try deploying Submariner with OVNK as CNI, by running :
make deploy using=ovn and see if that helps

[1]
`sudo setenforce 0
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo dnf install -y iptables-services
sudo touch /etc/sysconfig/iptables
sudo touch /etc/sysconfig/ip6tables
sudo systemctl start iptables
sudo systemctl start ip6tables
sudo systemctl enable iptables
sudo systemctl enable ip6tables
sudo iptables -t filter -F
sudo iptables -t filter -X
sudo sysctl net.bridge.bridge-nf-call-iptables=0
sudo sysctl net.bridge.bridge-nf-call-arptables=0
sudo sysctl net.bridge.bridge-nf-call-ip6tables=0
sudo systemctl restart docker`

@aswinayyolath
Copy link
Author

aswinayyolath commented Jan 5, 2025

@yboaron I found a nice blog that you have written recently, Thanks for that

https://medium.com/@yboaron/connecting-k8s-cilium-cluster-and-k8s-calico-cluster-using-submariner-d56d7c39f0cb

Instead of using Calium, I used Calico for both the clusters and I was able to test connectivity

bash-5.0# curl nginx.default.svc.clusterset.local
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
bash-5.0# exit
exit
Session ended, resume using 'kubectl attach tmp-shell -c tmp-shell -i -t' command when the pod is running
pod "tmp-shell" deleted

I used below steps

# Download the latest Kind binary
curl -Lo ./kind https://kind.sigs.k8s.io/dl/latest/kind-linux-amd64

# Make the Kind binary executable
chmod +x ./kind

# Move the binary to PATH
sudo mv ./kind /usr/local/bin/kind

# Verify the installation
kind --version


# Download the latest stable release of kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

# Make the kubectl binary executable
chmod +x kubectl

# Move the binary to  PATH
sudo mv kubectl /usr/local/bin/kubectl

# Verify the installation
kubectl version --client


# Download the latest Subctl release
curl -Ls https://get.submariner.io | bash
export PATH=$PATH:~/.local/bin
echo export PATH=\$PATH:~/.local/bin >> ~/.profile

# Verify the installation
subctl version

# Install Docker 
snap install docker

# Install Make
apt install make

# Install Kind with Calcio CNI
git clone https://github.com/submariner-io/shipyard.git
cd shipyard/


cat > deploy.two.clusters.nocni.yaml << EOF
nodes: control-plane worker
clusters:
 cluster1:
   cni: none
 cluster2:
   cni: none
EOF

make SETTINGS=deploy.two.clusters.nocni.yaml clusters

# increase  inotify resource limits.
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512


## List clusters
kind get clusters

# Check the current Context
export KUBECONFIG=$(find $(git rev-parse --show-toplevel)/output/kubeconfigs/ -type f -printf %p:)
kubectl config get-contexts

# Confirm that we have two nodes in each cluster
kubectl --context cluster1 get nodes
kubectl --context cluster2 get nodes

# Deploy Calico on cluster1

kubectl --context cluster1 create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.0/manifests/tigera-operator.yaml

mkdir calico_manifests
wget -O calico_manifests/custom-resources.yaml  https://raw.githubusercontent.com/projectcalico/calico/v3.29.0/manifests/custom-resources.yaml

sed -i 's,cidr: 192.168.0.0/16,cidr: 10.130.0.0/16,g' calico_manifests/custom-resources.yaml

sed -i 's,VXLANCrossSubnet,VXLAN,g' calico_manifests/custom-resources.yaml

kubectl --context cluster1 apply -f calico_manifests/custom-resources.yaml

# Install Calico on Cluster 2

kubectl --context cluster2 create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.0/manifests/tigera-operator.yaml

wget -O calico_manifests/custom-resources.yaml https://raw.githubusercontent.com/projectcalico/calico/v3.29.0/manifests/custom-resources.yaml

sed -i 's,cidr: 192.168.0.0/16,cidr: 10.131.0.0/16,g' calico_manifests/custom-resources.yaml

sed -i 's,VXLANCrossSubnet,VXLAN,g' calico_manifests/custom-resources.yaml

kubectl --context cluster2 apply -f calico_manifests/custom-resources.yaml

# Deploy Submariner

subctl deploy-broker --context cluster1

subctl join --context cluster1  broker-info.subm --clusterid cluster1 --natt=false

subctl join --context cluster2  broker-info.subm --clusterid cluster2 --natt=false

#  check Submariner inter-cluster tunnels status
subctl show connections  --context cluster2   
subctl show connections  --context cluster1

# Verify inter-cluster connectivity
kubectl --context cluster2 create deployment nginx --image=nginx

kubectl --context cluster2 expose deployment nginx --port=80

subctl export service --context cluster2 --namespace default nginx

# Run nettest pod on cluster1 to access the nginx service
kubectl --context cluster1 -n default run tmp-shell --rm -i --tty --image quay.io/submariner/nettest -- /bin/bash

I wanted to test Submariner on normal K8s , OCP etc. and try to establish Cross cluster connectivity and see if that works so I will continue my research on diff Machines and Flavors of K8s but now at least I can test it on Kind using Calico

@aswinayyolath
Copy link
Author

May be it would be nice re visiting the doc to check if it's outdated or needs any modification.

Now as a next Step I will try Submariner directly on proper K8s running on Ubuntu without Kind. I will try the commands you provided here.

@yboaron
Copy link
Contributor

yboaron commented Jan 5, 2025

May be it would be nice re visiting the doc to check if it's outdated or needs any modification.

Now as a next Step I will try Submariner directly on proper K8s running on Ubuntu without Kind. I will try the commands you provided here.

Did updating the fs.inotify.max_user_watches and fs.inotify.max_user_instances values ​​fix things in your environment?

Sure, we can update docs if needed.

please let me know how it goes with Submariner testing on non-Kind clusters.

@aswinayyolath
Copy link
Author

I run fs.inotify.max_user_watches and fs.inotify.max_user_instances as a prereq. I don't think it has fixed the issue. When I follow the official doc it uses Kindnet as CNI but when I followed your blog it has details on how to setup Kind with Calico . I guess that might be the difference because I have ran below stuff

sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

when I followed the steps in this doc as well but, I was getting below error

bash-5.0# curl nginx.default.svc.clusterset.local
^C
bash-5.0# curl -vvv  nginx.default.svc.clusterset.local
*   Trying 100.67.27.14:80...
* connect to 100.67.27.14 port 80 failed: Operation timed out
* Failed to connect to nginx.default.svc.clusterset.local port 80: Operation timed out
* Closing connection 0
curl: (28) Failed to connect to nginx.default.svc.clusterset.local port 80: Operation timed out

@aswinayyolath
Copy link
Author

@rohan-anilkumar

I have created 2 K8s cluster which has Pod and Service CIDRs as below

Aswin 🔥🔥🔥 $ kubectl --kubeconfig kubeconfig/rohan get cm  kubeadm-config -n kube-system -o yaml  | grep -A2 Subnet
      podSubnet: 192.168.0.0/16
      serviceSubnet: 10.96.0.0/12
    proxy: {}
    scheduler: {}
Aswin 🔥🔥🔥 $ kubectl --kubeconfig kubeconfig/shibu get cm  kubeadm-config -n kube-system -o yaml  | grep -A2 Subnet
      podSubnet: 192.168.0.0/16
      serviceSubnet: 10.96.0.0/12
    proxy: {}
    scheduler: {}

These clusters have Calico as CNI

Aswin 🔥🔥🔥 $ kubectl get pods  --kubeconfig kubeconfig/rohan -n kube-system
NAME                                                           READY   STATUS    RESTARTS   AGE
calico-kube-controllers-596754b6c7-dqvv2                       1/1     Running   0          3h47m
calico-node-86c9c                                              1/1     Running   0          3h47m
calico-node-9hslz                                              1/1     Running   0          3h47m
calico-node-c5xb6                                              1/1     Running   0          3h47m
calico-node-x4wnr                                              1/1     Running   0          3h47m
coredns-7c65d6cfc9-6vr2q                                       1/1     Running   0          3h47m
coredns-7c65d6cfc9-qv25x                                       1/1     Running   0          3h47m
etcd-stretch-calico-2-master.fyre.ibm.com                      1/1     Running   0          3h47m
kube-apiserver-stretch-calico-2-master.fyre.ibm.com            1/1     Running   0          3h47m
kube-controller-manager-stretch-calico-2-master.fyre.ibm.com   1/1     Running   0          3h47m
kube-proxy-8jdlf                                               1/1     Running   0          3h47m
kube-proxy-gcb7k                                               1/1     Running   0          3h47m
kube-proxy-nfcz5                                               1/1     Running   0          3h47m
kube-proxy-xkb6k                                               1/1     Running   0          3h47m
kube-scheduler-stretch-calico-2-master.fyre.ibm.com            1/1     Running   0          3h47m
Aswin 🔥🔥🔥 $
Aswin 🔥🔥🔥 $ kubectl get pods  --kubeconfig kubeconfig/shibu -n kube-system
NAME                                                READY   STATUS    RESTARTS   AGE
calico-kube-controllers-596754b6c7-gmsn5            1/1     Running   0          3h34m
calico-node-274rd                                   1/1     Running   0          3h34m
calico-node-cs2sb                                   1/1     Running   0          3h34m
calico-node-l8kdc                                   1/1     Running   0          3h34m
calico-node-pxg7f                                   1/1     Running   0          3h34m
calico-node-q9r7m                                   1/1     Running   0          3h34m
calico-node-s9p68                                   1/1     Running   0          3h34m
calico-node-wnlr4                                   1/1     Running   0          3h34m
calico-node-xwdxm                                   1/1     Running   0          3h34m
coredns-7c65d6cfc9-ldgn4                            1/1     Running   0          3h35m
coredns-7c65d6cfc9-m6whd                            1/1     Running   0          3h35m
etcd-rak-5-master.fyre.ibm.com                      1/1     Running   0          3h35m
kube-apiserver-rak-5-master.fyre.ibm.com            1/1     Running   0          3h35m
kube-controller-manager-rak-5-master.fyre.ibm.com   1/1     Running   0          3h35m
kube-proxy-bm4fv                                    1/1     Running   0          3h35m
kube-proxy-d2ws6                                    1/1     Running   0          3h34m
kube-proxy-h959k                                    1/1     Running   0          3h34m
kube-proxy-p84jd                                    1/1     Running   0          3h34m
kube-proxy-p8k2w                                    1/1     Running   0          3h34m
kube-proxy-tr9bx                                    1/1     Running   0          3h34m
kube-proxy-xcc8l                                    1/1     Running   0          3h34m
kube-proxy-zrq4t                                    1/1     Running   0          3h34m
kube-scheduler-rak-5-master.fyre.ibm.com            1/1     Running   0          3h35m
Aswin 🔥🔥🔥 $

Since the CIDRs overlap we need to use global net

Aswin 🔥🔥🔥 $ kubectl get crd ippools.crd.projectcalico.org -o yaml --kubeconfig kubeconfig/rohan
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: ippools.crd.projectcalico.org
spec:
  conversion:
    strategy: None
  group: crd.projectcalico.org
  names:
    kind: IPPool
    listKind: IPPoolList
    plural: ippools
    singular: ippool
  scope: Cluster
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: IPPoolSpec contains the specification for an IPPool resource.
            properties:
              allowedUses:
                description: AllowedUse controls what the IP pool will be used for.  If
                  not specified or empty, defaults to ["Tunnel", "Workload"] for back-compatibility
                items:
                  type: string
                type: array
              blockSize:
                description: The block size to use for IP address assignments from
                  this pool. Defaults to 26 for IPv4 and 122 for IPv6.
                type: integer
              cidr:
                description: The pool CIDR.
                type: string
              disableBGPExport:
                description: 'Disable exporting routes from this IP Pool''s CIDR over
                  BGP. [Default: false]'
                type: boolean
              disabled:
                description: When disabled is true, Calico IPAM will not assign addresses
                  from this pool.
                type: boolean
              ipip:
                description: 'Deprecated: this field is only used for APIv1 backwards
                  compatibility. Setting this field is not allowed, this field is
                  for internal use only.'
                properties:
                  enabled:
                    description: When enabled is true, ipip tunneling will be used
                      to deliver packets to destinations within this pool.
                    type: boolean
                  mode:
                    description: The IPIP mode.  This can be one of "always" or "cross-subnet".  A
                      mode of "always" will also use IPIP tunneling for routing to
                      destination IP addresses within this pool.  A mode of "cross-subnet"
                      will only use IPIP tunneling when the destination node is on
                      a different subnet to the originating node.  The default value
                      (if not specified) is "always".
                    type: string
                type: object
              ipipMode:
                description: Contains configuration for IPIP tunneling for this pool.
                  If not specified, then this is defaulted to "Never" (i.e. IPIP tunneling
                  is disabled).
                type: string
              nat-outgoing:
                description: 'Deprecated: this field is only used for APIv1 backwards
                  compatibility. Setting this field is not allowed, this field is
                  for internal use only.'
                type: boolean
              natOutgoing:
                description: When natOutgoing is true, packets sent from Calico networked
                  containers in this pool to destinations outside of this pool will
                  be masqueraded.
                type: boolean
              nodeSelector:
                description: Allows IPPool to allocate for a specific node by label
                  selector.
                type: string
              vxlanMode:
                description: Contains configuration for VXLAN tunneling for this pool.
                  If not specified, then this is defaulted to "Never" (i.e. VXLAN
                  tunneling is disabled).
                type: string
            required:
            - cidr
            type: object
        type: object
    served: true
    storage: true
status:
  acceptedNames:
    kind: IPPool
    listKind: IPPoolList
    plural: ippools
    singular: ippool
  conditions:
  - lastTransitionTime: "2025-01-06T04:44:50Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2025-01-06T04:44:50Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1

So we need to use IPPool CR with API Group crd.projectcalico.org/v1 instead of projectcalico.org/v3 given in the doc here . Could pls test Submariner with globalnet in this cluster?

@aswinayyolath
Copy link
Author

@yboaron I have Tried Submariner directly on K8s cluster with Calico and Globalnet, Installed Calico API Server so that IPPool will be created by Submariner

Aswin 🔥🔥🔥 $ subctl deploy-broker --globalnet --globalnet-cidr-range 243.0.0.0/8  --kubeconfig kubeconfig/rohan
 ✓ Setting up broker RBAC
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Created submariner service account and role
 ✓ Created lighthouse service account and role
 ✓ Deployed the operator successfully
 ✓ Deploying the broker
 ✓ Saving broker info to file "broker-info.subm"
 ✓ Backed up previous file "broker-info.subm" to "broker-info.subm.2025-01-06T18_24_15+05_30"
Aswin 🔥🔥🔥 $ subctl join --kubeconfig kubeconfig/rohan --clusterid rohan --natt=true  broker-info.subm --check-broker-certificate=false
 ✓ broker-info.subm indicates broker is at https://9.46.87.198:6443
 ✓ Discovering network details
        Network plugin:  calico
        Service CIDRs:   [10.96.0.0/12]
        Cluster CIDRs:   [192.168.0.0/16]
 ✓ Retrieving the gateway nodes
 ✓ Retrieving all worker nodes
? Which node should be used as the gateway? stretch-calico-2-master.fyre.ibm.com
 ✓ Labeling node "stretch-calico-2-master.fyre.ibm.com" as a gateway
 ✓ Gathering relevant information from Broker
 ✓ Retrieving Globalnet information from the Broker
 ✓ Validating Globalnet configuration
 ✓ Assigning Globalnet IPs
 ✓ Allocated global CIDR 243.0.0.0/16
 ✓ Updating the Globalnet information on the Broker
 ✓ Retrieving ClustersetIP information from the Broker
 ✓ Validating ClustersetIP configuration
 ✓ Assigning ClustersetIP IPs
 ✓ Allocated clustersetip CIDR 243.0.0.0/20
 ✓ Updating the ClustersetIP information on the Broker
 ✓ Deploying the Submariner operator
 ✓ Created operator namespace: submariner-operator
 ✓ Creating SA for cluster
 ✓ Connecting to Broker
 ✓ Deploying submariner
 ✓ Submariner is up and running
Aswin 🔥🔥🔥 $ kubectl get pods -n submariner-operator --kubeconfig kubeconfig/rohan
NAME                                             READY   STATUS    RESTARTS   AGE
submariner-gateway-n6xxl                         1/1     Running   0          49s
submariner-globalnet-jxfdz                       1/1     Running   0          49s
submariner-lighthouse-agent-5b5d68ccbc-l6bbc     1/1     Running   0          49s
submariner-lighthouse-coredns-85c87f685c-4dmqm   1/1     Running   0          49s
submariner-lighthouse-coredns-85c87f685c-lmk67   1/1     Running   0          49s
submariner-metrics-proxy-94dpt                   2/2     Running   0          49s
submariner-operator-566c47dbb-kzz7b              1/1     Running   0          2m15s
submariner-routeagent-cdb2f                      1/1     Running   0          49s
submariner-routeagent-k74xn                      1/1     Running   0          49s
submariner-routeagent-nx42n                      1/1     Running   0          49s
submariner-routeagent-ttb2w                      1/1     Running   0          49s
Aswin 🔥🔥🔥 $ subctl join --kubeconfig kubeconfig/shibu --clusterid shibu --natt=true  broker-info.subm --check-broker-certificate=false
 ✓ broker-info.subm indicates broker is at https://9.46.87.198:6443
 ✓ Discovering network details
        Network plugin:  calico
        Service CIDRs:   [10.96.0.0/12]
        Cluster CIDRs:   [192.168.0.0/16]
 ✓ Retrieving the gateway nodes
 ✓ Retrieving all worker nodes
? Which node should be used as the gateway? rak-5-master.fyre.ibm.com
 ✓ Labeling node "rak-5-master.fyre.ibm.com" as a gateway
 ✓ Gathering relevant information from Broker
 ✓ Retrieving Globalnet information from the Broker
 ✓ Validating Globalnet configuration
 ✓ Assigning Globalnet IPs
 ✓ Allocated global CIDR 243.1.0.0/16
 ✓ Updating the Globalnet information on the Broker
 ✓ Retrieving ClustersetIP information from the Broker
 ✓ Validating ClustersetIP configuration
 ✓ Assigning ClustersetIP IPs
 ✓ Allocated clustersetip CIDR 243.0.16.0/20
 ✓ Updating the ClustersetIP information on the Broker
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Created submariner service account and role
 ✓ Created lighthouse service account and role
 ✓ Deployed the operator successfully
 ✓ Creating SA for cluster
 ✓ Connecting to Broker
 ✓ Deploying submariner
 ✓ Submariner is up and running
Aswin 🔥🔥🔥 $ kubectl get pods -n submariner-operator --kubeconfig kubeconfig/shibu
NAME                                             READY   STATUS    RESTARTS   AGE
submariner-gateway-pxrrw                         1/1     Running   0          9s
submariner-globalnet-dz5pk                       1/1     Running   0          9s
submariner-lighthouse-agent-6f6898f895-tvrwz     1/1     Running   0          8s
submariner-lighthouse-coredns-55fb8f88c8-dkndp   1/1     Running   0          8s
submariner-lighthouse-coredns-55fb8f88c8-tjrdw   1/1     Running   0          8s
submariner-metrics-proxy-849kw                   2/2     Running   0          9s
submariner-operator-566c47dbb-bsbqh              1/1     Running   0          22s
submariner-routeagent-8228b                      1/1     Running   0          9s
submariner-routeagent-j7qxs                      1/1     Running   0          9s
submariner-routeagent-khp7k                      1/1     Running   0          9s
submariner-routeagent-n572w                      1/1     Running   0          9s
submariner-routeagent-psrr4                      1/1     Running   0          9s
submariner-routeagent-rjss6                      1/1     Running   0          9s
submariner-routeagent-sjg6r                      1/1     Running   0          9s
submariner-routeagent-wjpfw                      1/1     Running   0          9s
Aswin 🔥🔥🔥 $

Then tested Submariner

Aswin 🔥🔥🔥 $ kubectl get pods -n test-app --kubeconfig kubeconfig/rohan
NAME             READY   STATUS    RESTARTS   AGE
nginx-cluster1   1/1     Running   0          5h6m
Aswin 🔥🔥🔥 $ kubectl get svc --kubeconfig kubeconfig/rohan -n test-app
NAME                                          TYPE        CLUSTER-IP      EXTERNAL-IP     PORT(S)   AGE
nginx-cluster1                                ClusterIP   10.97.244.244   <none>          80/TCP    5h6m
submariner-t2cajkojelyzlt2kt66s4w54cvykmeou   ClusterIP   10.98.196.253   243.0.255.253   80/TCP    5h5m
Aswin 🔥🔥🔥 $ subctl  export service  nginx-cluster1 -n test-app --kubeconfig kubeconfig/rohan
 ✓ Service already exported
Aswin 🔥🔥🔥 $ subctl show connections --kubeconfig kubeconfig/shibu
Cluster "rak-5"
 ✓ Showing Connections
GATEWAY                          CLUSTER   REMOTE IP     NAT   CABLE DRIVER   SUBNETS        STATUS      RTT avg.
stretch-calico-2-master.fyre.i   rohan     9.46.87.198   no    libreswan      243.0.0.0/16   connected   2.467209ms

Aswin 🔥🔥🔥 $ subctl show connections --kubeconfig kubeconfig/rohan
Cluster "stretch-calico-2"
 ✓ Showing Connections
GATEWAY                     CLUSTER   REMOTE IP    NAT   CABLE DRIVER   SUBNETS        STATUS      RTT avg.
rak-5-master.fyre.ibm.com   shibu     9.46.72.88   no    libreswan      243.1.0.0/16   connected   1.740257ms
Aswin 🔥🔥🔥 $ kubectl exec tmp-shell -it bash -n test-app --kubeconfig kubeconfig/shibu
bash-5.0# curl -vv nginx-cluster1.test-app.svc.clusterset.local
*   Trying 243.0.255.253:80...
* connect to 243.0.255.253 port 80 failed: Host is unreachable
* Failed to connect to nginx-cluster1.test-app.svc.clusterset.local port 80: Host is unreachable
* Closing connection 0
curl: (7) Failed to connect to nginx-cluster1.test-app.svc.clusterset.local port 80: Host is unreachable

subctl diagnose all

Aswin 🔥🔥🔥 $ subctl diagnose  all --kubeconfig kubeconfig/rohan
Cluster "stretch-calico-2"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.31.4" is supported

 ✗ Globalnet deployment detected - checking that globalnet CIDRs do not overlap
 ✗ Error getting the Broker's REST config: error getting auth rest config: Get "https://9.46.87.198:6443/apis/submariner.io/v1/namespaces/submariner-k8s-broker/clusters/any": tls: failed to verify certificate: x509: “kube-apiserver” certificate is not trusted

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("calico") is supported
 ✓ Calico CNI detected, checking if the Submariner IPPool pre-requisites are configured
 ✓ Checking gateway connections
 ✗ Checking route agent connections
 ✗ Connection to cluster "shibu" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.1.255.254\"",
  "spec": {
    "cluster_id": "shibu",
    "cable_name": "submariner-cable-shibu-9-46-72-88",
    "healthCheckIP": "243.1.255.254",
    "hostname": "rak-5-master.fyre.ibm.com",
    "subnets": [
      "243.1.0.0/16"
    ],
    "private_ip": "9.46.72.88",
    "public_ip": "129.41.87.0",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "shibu" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.1.255.254\"",
  "spec": {
    "cluster_id": "shibu",
    "cable_name": "submariner-cable-shibu-9-46-72-88",
    "healthCheckIP": "243.1.255.254",
    "hostname": "rak-5-master.fyre.ibm.com",
    "subnets": [
      "243.1.0.0/16"
    ],
    "private_ip": "9.46.72.88",
    "public_ip": "129.41.87.0",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "shibu" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.1.255.254\"",
  "spec": {
    "cluster_id": "shibu",
    "cable_name": "submariner-cable-shibu-9-46-72-88",
    "healthCheckIP": "243.1.255.254",
    "hostname": "rak-5-master.fyre.ibm.com",
    "subnets": [
      "243.1.0.0/16"
    ],
    "private_ip": "9.46.72.88",
    "public_ip": "129.41.87.0",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✗ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✗ The tcpdump output from the sniffer pod does not contain the expected remote endpoint IP 243.1.0.0. Please check that your firewall configuration allows UDP/4800 traffic. Actual pod output:
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

 ✓ Checking that Globalnet is correctly configured and functioning

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.19.0
Aswin 🔥🔥🔥 $ subctl diagnose  all --kubeconfig kubeconfig/shibu
Cluster "rak-5"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.31.4" is supported

 ✗ Globalnet deployment detected - checking that globalnet CIDRs do not overlap
 ✗ Error getting the Broker's REST config: error getting auth rest config: Get "https://9.46.87.198:6443/apis/submariner.io/v1/namespaces/submariner-k8s-broker/clusters/any": tls: failed to verify certificate: x509: “kube-apiserver” certificate is not trusted

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("calico") is supported
 ✓ Calico CNI detected, checking if the Submariner IPPool pre-requisites are configured
 ✓ Checking gateway connections
 ✗ Checking route agent connections
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✗ Connection to cluster "rohan" is not established. Connection details:
{
  "status": "error",
  "statusMessage": "Failed to successfully ping the remote endpoint IP \"243.0.255.254\"",
  "spec": {
    "cluster_id": "rohan",
    "cable_name": "submariner-cable-rohan-9-46-87-198",
    "healthCheckIP": "243.0.255.254",
    "hostname": "stretch-calico-2-master.fyre.ibm.com",
    "subnets": [
      "243.0.0.0/16"
    ],
    "private_ip": "9.46.87.198",
    "public_ip": "129.41.87.6",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
 ✓ Checking Submariner support for the kube-proxy mode
 ✓ The kube-proxy mode is supported
 ✗ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✗ The tcpdump output from the sniffer pod does not contain the expected remote endpoint IP 243.0.0.0. Please check that your firewall configuration allows UDP/4800 traffic. Actual pod output:
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

 ✓ Checking that Globalnet is correctly configured and functioning

 ✓ Checking that services have been exported properly

Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.

subctl version: v0.19.0
Aswin 🔥🔥🔥 $ kubectl get IPPool   --kubeconfig kubeconfig/rohan
NAME                            AGE
default-ipv4-ippool             8h
submariner-shibu-243.1.0.0-16   27m
Aswin 🔥🔥🔥 $ kubectl get IPPool   --kubeconfig kubeconfig/shibu
NAME                            AGE
default-ipv4-ippool             8h
submariner-rohan-243.0.0.0-16   27m
Aswin 🔥🔥🔥 $

@yboaron could you please advise?

@aswinayyolath
Copy link
Author

Subctl gather

stretch-calico-2.zip

rak-5.zip

@aswinayyolath
Copy link
Author

Working Kind Cluster subctl gather

cluster1.zip

@aswinsuryan
Copy link
Contributor

aswinsuryan commented Jan 7, 2025

Seems like the working cluster has vxlan encapsulation between nodes and the one has issue uses IPinIP mode . @yboaron do we have an issue with IPIP mode when using Calico?

@yboaron
Copy link
Contributor

yboaron commented Jan 7, 2025

Seems like the working cluster has vxlan encapsulation between nodes and the one has issue uses IPinIP mode . @yboaron do we have an issue with IPIP mode when using Calico?

Yep, we have tested Submariner when Calico using VxLAN encapsulation .
please change Calico intra-cluster encapsulation to VxLAN and see if that helps.

@aswinayyolath
Copy link
Author

Does it look correct

Aswin 🔥🔥🔥 $ kubectl get ippools -A  --kubeconfig rohan -o yaml
apiVersion: v1
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2025-01-06T04:45:09Z"
    name: default-ipv4-ippool
    resourceVersion: "310349"
    uid: 29a06c8c-47de-4ef7-80a0-fd7bf99624ec
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 192.168.0.0/16
    ipipMode: Never
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Always
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2025-01-07T11:01:45Z"
    labels:
      submariner.io/ippool: "true"
    name: submariner-shibu-244.1.0.0-16
    resourceVersion: "315166"
    uid: 8bcb804d-24fc-427a-a29d-7b8fcb3b3907
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 244.1.0.0/16
    disableBGPExport: true
    disabled: true
    ipipMode: Never
    nodeSelector: all()
    vxlanMode: Never
kind: List
metadata:
  resourceVersion: ""
Aswin 🔥🔥🔥 $
Aswin 🔥🔥🔥 $ kubectl get ippools -A  --kubeconfig shibu -o yaml
apiVersion: v1
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2025-01-06T04:57:58Z"
    name: default-ipv4-ippool
    resourceVersion: "351735"
    uid: 333e2785-699e-495e-87f4-e4783e6eab6c
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 192.168.0.0/16
    ipipMode: Never
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Always
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    creationTimestamp: "2025-01-07T11:01:47Z"
    labels:
      submariner.io/ippool: "true"
    name: submariner-rohan-244.0.0.0-16
    resourceVersion: "355951"
    uid: 8935212e-62bd-4949-9295-a7f07b5d3f53
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 244.0.0.0/16
    disableBGPExport: true
    disabled: true
    ipipMode: Never
    nodeSelector: all()
    vxlanMode: Never
kind: List
metadata:
  resourceVersion: ""
Aswin 🔥🔥🔥 $
Aswin 🔥🔥🔥 $ subctl diagnose firewall intra-cluster --kubeconfig kubeconfig/rohan
Cluster "stretch-calico-2"
 ✗ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✗ The tcpdump output from the sniffer pod does not contain the expected remote endpoint IP 244.1.0.0. Please check that your firewall configuration allows UDP/4800 traffic. Actual pod output:
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vx-submariner, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel


subctl version: v0.19.0
Aswin 🔥🔥🔥 $ kubectl get routeagents -o yaml -n submariner-operator --kubeconfig kubeconfig/rohan
apiVersion: v1
items:
- apiVersion: submariner.io/v1
  kind: RouteAgent
  metadata:
    creationTimestamp: "2025-01-07T11:00:52Z"
    generation: 3
    name: stretch-calico-2-master.fyre.ibm.com
    namespace: submariner-operator
    resourceVersion: "316267"
    uid: 0b9292cb-35ed-46a0-96dd-42f466be9008
  status:
    remoteEndpoints:
    - spec:
        backend: libreswan
        backend_config:
          natt-discovery-port: "4490"
          preferred-server: "false"
          udp-port: "4500"
        cable_name: submariner-cable-shibu-10-21-98-97
        cluster_id: shibu
        healthCheckIP: 244.1.255.254
        hostname: rak-5-worker-1.fyre.ibm.com
        nat_enabled: true
        private_ip: 10.21.98.97
        public_ip: 129.41.87.4
        subnets:
        - 244.1.0.0/16
      status: error
      statusMessage: Failed to successfully ping the remote endpoint IP "244.1.255.254"
    statusFailure: ""
    version: release-0.19-63bfdce6ad6e
- apiVersion: submariner.io/v1
  kind: RouteAgent
  metadata:
    creationTimestamp: "2025-01-07T11:00:52Z"
    generation: 2
    name: stretch-calico-2-worker-1.fyre.ibm.com
    namespace: submariner-operator
    resourceVersion: "315186"
    uid: 6d3f06ba-88ca-4615-8f5d-954a60181707
  status:
    remoteEndpoints:
    - spec:
        backend: libreswan
        backend_config:
          natt-discovery-port: "4490"
          preferred-server: "false"
          udp-port: "4500"
        cable_name: submariner-cable-shibu-10-21-98-97
        cluster_id: shibu
        healthCheckIP: 244.1.255.254
        hostname: rak-5-worker-1.fyre.ibm.com
        nat_enabled: true
        private_ip: 10.21.98.97
        public_ip: 129.41.87.4
        subnets:
        - 244.1.0.0/16
      status: none
      statusMessage: Health check is not performed on gateway nodes
    statusFailure: ""
    version: release-0.19-63bfdce6ad6e
- apiVersion: submariner.io/v1
  kind: RouteAgent
  metadata:
    creationTimestamp: "2025-01-07T11:00:51Z"
    generation: 3
    name: stretch-calico-2-worker-2.fyre.ibm.com
    namespace: submariner-operator
    resourceVersion: "316260"
    uid: eb3318e7-4e43-44f9-8fa2-779aeaeb7f25
  status:
    remoteEndpoints:
    - spec:
        backend: libreswan
        backend_config:
          natt-discovery-port: "4490"
          preferred-server: "false"
          udp-port: "4500"
        cable_name: submariner-cable-shibu-10-21-98-97
        cluster_id: shibu
        healthCheckIP: 244.1.255.254
        hostname: rak-5-worker-1.fyre.ibm.com
        nat_enabled: true
        private_ip: 10.21.98.97
        public_ip: 129.41.87.4
        subnets:
        - 244.1.0.0/16
      status: error
      statusMessage: Failed to successfully ping the remote endpoint IP "244.1.255.254"
    statusFailure: ""
    version: release-0.19-63bfdce6ad6e
- apiVersion: submariner.io/v1
  kind: RouteAgent
  metadata:
    creationTimestamp: "2025-01-07T11:00:51Z"
    generation: 3
    name: stretch-calico-2-worker-3.fyre.ibm.com
    namespace: submariner-operator
    resourceVersion: "316258"
    uid: 1b7dfd8a-7e24-41be-83f0-8a00d33affca
  status:
    remoteEndpoints:
    - spec:
        backend: libreswan
        backend_config:
          natt-discovery-port: "4490"
          preferred-server: "false"
          udp-port: "4500"
        cable_name: submariner-cable-shibu-10-21-98-97
        cluster_id: shibu
        healthCheckIP: 244.1.255.254
        hostname: rak-5-worker-1.fyre.ibm.com
        nat_enabled: true
        private_ip: 10.21.98.97
        public_ip: 129.41.87.4
        subnets:
        - 244.1.0.0/16
      status: error
      statusMessage: Failed to successfully ping the remote endpoint IP "244.1.255.254"
    statusFailure: ""
    version: release-0.19-63bfdce6ad6e
kind: List
metadata:
  resourceVersion: ""
Aswin 🔥🔥🔥 $

@aswinsuryan
Copy link
Contributor

We need to change in the CNI configuration not just IP pools.

@aswinsuryan
Copy link
Contributor

Since this setup is not used anymore and we cannot validate if changing the encapsulation solves it, shall we close these issue? Please feel free to reopen if occurs again.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Submariner 0.20 Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datapath Datapath related issues or enhancements flannel flannel CNI
Projects
Status: Done
Development

No branches or pull requests

5 participants