Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: De-centralize the network to support multiple clusters #82

Merged
merged 1 commit into from
Jul 13, 2024
Merged

Fix: De-centralize the network to support multiple clusters #82

merged 1 commit into from
Jul 13, 2024

Conversation

codinja1188
Copy link
Contributor

@codinja1188 codinja1188 commented Jun 25, 2024

Description:

This pull request introduces changes to decentralize the network setup in the Terraform Equinix Metal Nutanix cluster module to support multiple clusters. Key modifications include:

  • Network Decentralization: Adjustments in network configurations to facilitate multiple cluster setups.
  • Variable Updates: Introduction of optional variables for netmask and other network parameters, allowing more flexible configurations.
  • Code Refinements: Various updates to templates and scripts to ensure proper functionality and connectivity between clusters.

Additional issues it fixes.

https://github.com/equinix-labs/terraform-equinix-metal-nutanix-cluster/issues/74

@codinja1188
Copy link
Contributor Author

@displague ,

Here are some snapshots

VRF

image

Metal Gateways

image

main.tf Outdated
@@ -60,7 +61,8 @@ resource "equinix_metal_device" "bastion" {
user_data = templatefile("${path.module}/templates/bastion-userdata.tmpl", {
metal_vlan_id = local.vxlan,
address = cidrhost(var.cluster_subnet, 2),
netmask = cidrnetmask(var.cluster_subnet),
netmask = cidrnetmask(cidrsubnet(var.cluster_subnet, -1, -1)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netmask should be optional variable to the module.

When netmask is given, it will be used. Your example will pass the VRF /21 netmask in to this module.

When netmask is omitted (default, empty), a local will calculate it using the cidrnetmask function.

Since netmask is only used in one place in the userdata script, this seems sufficient.
https://github.com/equinix-labs/terraform-equinix-metal-nutanix-cluster/blob/main/templates/bastion-userdata.tmpl#L44

main.tf Outdated
@@ -134,7 +136,8 @@ resource "equinix_metal_device" "nutanix" {
wait_for_reservation_deprovision = length(var.nutanix_reservation_ids) > 0

ip_address {
type = "private_ipv4"
type = "private_ipv4"
cidr = 21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The private_addr CIDR assigned on equinix_metal_device create can not be a /21.

ip_address represents the Layer3, per-server, IP assignments.

Modifying the VRF range to represent a /21 would be done in your example (#68), not in the main module.

The range passed to this module would be a /22, from which the gateway would be derived or its IP also passed in as a variable.

@@ -51,12 +51,15 @@ write_files:
dhcp-range=${host_dhcp_start},${host_dhcp_end},${lease_time}
dhcp-mac=set:${set},${nutanix_mac}
dhcp-range=tag:${set},${vm_dhcp_start},${vm_dhcp_end},${lease_time}
dhcp-option=option:netmask,${netmask}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how this was working without the netmask previously defined. Perhaps the default behavior was to use the netmask and gateway from the host's interface where the DHCP range fits.

I think these explicit definitions do make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's blocking the connectivity to outside. Unable to download the image.

packages:
- iptables-persistent
- expect
- sshpass
- dnsmasq
runcmd:
- sysctl -p /etc/sysctl.d/10-ip-forwarding.conf
- sysctl -w net.ipv4.ip_forward=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is redundant to the line above it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest commits

@codinja1188
Copy link
Contributor Author

codinja1188 commented Jun 28, 2024

@displague,

How to verify/confirm metal gateways are reachable OR not?

@codinja1188
Copy link
Contributor Author

codinja1188 commented Jun 28, 2024

@displague,

Two clusters(like Bastion, Nutanix AHV, CVM controller) are to reachable to their gateways IP's

admin@NTNX-7WWG2N3-A-CVM:192.168.96.18:~$ ping 192.168.96.1
PING 192.168.96.1 (192.168.96.1) 56(84) bytes of data.
64 bytes from 192.168.96.1: icmp_seq=1 ttl=64 time=0.187 ms
64 bytes from 192.168.96.1: icmp_seq=2 ttl=64 time=0.204 ms
64 bytes from 192.168.96.1: icmp_seq=3 ttl=64 time=0.191 ms
^C
--- 192.168.96.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.187/0.194/0.204/0.007 ms
admin@NTNX-7WWG2N3-A-CVM:192.168.96.18:~$ exit
logout
Connection to 192.168.96.18 closed.
[root@NTNX-7WWG2N3-A ~]# ping 192.168.96.1
PING 192.168.96.1 (192.168.96.1) 56(84) bytes of data.
64 bytes from 192.168.96.1: icmp_seq=1 ttl=64 time=0.363 ms
64 bytes from 192.168.96.1: icmp_seq=2 ttl=64 time=0.154 ms
64 bytes from 192.168.96.1: icmp_seq=3 ttl=64 time=0.148 ms
64 bytes from 192.168.96.1: icmp_seq=4 ttl=64 time=0.169 ms
^C
--- 192.168.96.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3057ms
rtt min/avg/max/mdev = 0.148/0.208/0.363/0.090 ms
[root@NTNX-7WWG2N3-A ~]# exit
logout
Connection to 192.168.96.4 closed.
root@bastion:~# ping 192.168.96.1 -c 5
PING 192.168.96.1 (192.168.96.1) 56(84) bytes of data.
64 bytes from 192.168.96.1: icmp_seq=1 ttl=64 time=0.189 ms
64 bytes from 192.168.96.1: icmp_seq=2 ttl=64 time=0.216 ms
64 bytes from 192.168.96.1: icmp_seq=3 ttl=64 time=0.274 ms
64 bytes from 192.168.96.1: icmp_seq=4 ttl=64 time=0.177 ms
64 bytes from 192.168.96.1: icmp_seq=5 ttl=64 time=0.240 ms

--- 192.168.96.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4090ms
rtt min/avg/max/mdev = 0.177/0.219/0.274/0.035 ms

Cluster -A is not reachable for Cluster -B Gateway, Do you think, do we have to add any firewall rules ?

@codinja1188 codinja1188 marked this pull request as ready for review July 1, 2024 15:42
@codinja1188
Copy link
Contributor Author

@displague ,

I can successfully ping VIPs between clusters.

variables.tf Outdated
variable "cluster_gateway" {
description = "The cluster gateway IP address"
type = string
default = "192.168.96.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctreatma ,

Updated in latest commits, Can you have a quick review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the default value, made it as User Optional Input, In case user didn't provided it, module it's self generate from mandatory varialbe cluster_subnet.

README.md Show resolved Hide resolved
@codinja1188
Copy link
Contributor Author

@displague ,

PR ready for review

main.tf Outdated
@@ -20,6 +21,10 @@ resource "terraform_data" "input_validation" {
condition = (var.metal_project_name != "" && var.metal_project_id == "") || (var.metal_project_name == "" && var.metal_project_id != "")
error_message = "One (and only one) of `metal_project_name` or `metal_project_id` is required"
}
precondition {
condition = (var.cluster_subnet != "" && var.cluster_subnet == "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is impossible. I don't believe we need a precondition for this at all.

Did you mean to check if var.cluster_subnet is not empty but local.cluster_gateway is empty, in which case cluster_gateway would be required?

variables.tf Show resolved Hide resolved
main.tf Show resolved Hide resolved
README.md Show resolved Hide resolved
@codinja1188
Copy link
Contributor Author

@ctreatma / @displague ,

data.local_file.cvm_ip_address: Reading...
data.local_file.cvm_ip_address: Read complete after 0s [id=a77db81fae96a895fc3db05af25db50e6fc84a91]
╷
│ Error: error reserving IP address block: json: cannot unmarshal array into Go struct field ErrorResponse.errors of type string
│
│   with equinix_metal_reserved_ip_block.nutanix,
│   on main.tf line 109, in resource "equinix_metal_reserved_ip_block" "nutanix":
│  109: resource "equinix_metal_reserved_ip_block" "nutanix" {
│

Is there any known issues in Infra side. oberserved the issue in main branch too.

@ctreatma
Copy link
Contributor

ctreatma commented Jul 11, 2024

@codinja1188 I think someone else ran into a similar problem recently. Could you run with TF_LOG=debug in order to log HTTP details to stdout and share the request URL and response body that lead to this error?

One thing to note is that the parse error is happening on an error response from the API, which likely means there's something wrong with the attributes being passed in to terraform rather than a problem inside the terraform provider. When we've seen this parse error before, it seemed to be triggered by sending invalid IP addresses to the API, so you should double-check that you're not including unnecessary slashes or cidr notation in the network attribute or other attributes of the equinix_metal_reserved_ip_block resource.

@codinja1188
Copy link
Contributor Author

@displague / @ctreatma ,

I verified, it's working. plz check and approve it.

@displague displague merged commit dff27b9 into equinix-labs:main Jul 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants