Skip to content

Latest commit

 

History

History
293 lines (203 loc) · 22.1 KB

File metadata and controls

293 lines (203 loc) · 22.1 KB

Nutanix Cluster on Equinix Metal

This Terraform module will deploy a proof-of-concept demonstrative Nutanix Cluster in Layer 2 isolation on Equinix Metal. DNS, DHCP, and Cluster internet access are managed by an Ubuntu 22.04 bastion/gateway node.

See https://deploy.equinix.com/labs/terraform-equinix-metal-nutanix-cluster/ for a workshop and recorded demonstration of this Terraform module.

Acronyms and Terms

  • AOS: Acropolis Operating System
  • NOS: Nutanix Operating System (Used interchangeably with AOS)
  • AHV: AOS Hypervisor
  • Phoenix: The AOS/NOS Installer
  • CVM: Cluster Virtual Machine
  • Prism: AOS Cluster Web UI

Nutanix Installation in a Nutshell

For those who are unfamiliar with Nutanix. Nutanix is HCI (Hyperconverged Infrastructure) software. See https://www.nutanix.com/products/nutanix-cloud-infrastructure for more details from Nutanix.

Nutanix AOS is typically deployed in a private network without public IPs assigned directly to the host. This experience differs from what many cloud users would expect in an OS deployment.

This POC Terraform module is inspired by the Deploying a multi-node Nutanix cluster on Metal guide which goes into detail about how to deploy Nutanix and the required networking configuration on Equinix Metal. Follow that guide for step-by-step instructions that you can customize along the way.

By deploying this POC Terraform module, you will get an automated and opinionated minimal Nutanix Cluster that will help provide a quick introduction to the platform's capabilities.

⚠️ Warning: This project is NOT intended to demonstrate best practices or Day-2 operations, including security, scale, monitoring, and disaster recovery. Working with your account representative to run this demo with workload-optimized servers is HIGHLY recommended.

To accommodate deployment requirements, this module will create:

ℹ️ Important: Workload Optimized hardware reservations are required for capacity and to ensure hardware compatibility with Nutanix. This project will deploy On-demand instances by default, which requires authorized access to a special nutanix_lts_6_5_poc OS image. See "On-Demand Instances" notes below for more details.

  • 1x VLAN and Metal Gateway with VRF

    The VRF will route a /22 IP range within the VLAN, providing ample IP space for POC purposes.

    The bastion node will attach to this VLAN, and the Nutanix nodes will passively access this as their Native VLAN with DHCP addresses from the VRF space assigned by the bastion node.

  • 1x SSH Key configured to access the bastion node

    Terraform will create an SSH key scoped to this deployment. The key will be stored in the Terraform workspace.

  • 1x Metal Project

    Optionally deploy a new project to test the POC in isolation or deploy it within an existing project.

Terraform installation

You'll need Terraform installed and an Equinix Metal account with an API key.

If you have the Metal CLI configured, the following will setup your authentication and project settings in an OSX or Linux shell environment.

eval $(metal env -o terraform --export) #
export TF_VAR_metal_metro=sl # Deploy to Seoul

Otherwise, copy terraform.tfvars.example to terraform.tfvars and edit the input values before continuing.

Run the following from your console terminal:

terraform init
terraform apply

When complete, after roughly 45m, you'll see something like the following:

Outputs:

bastion_public_ip = "ipv4-address"
nutanix_sos_hostname = [
  "[email protected]",
  "[email protected]",
  "[email protected]",
]
ssh_private_key = "/terraform/workspace/ssh-key-abc123"
ssh_forward_command = "ssh -L 9440:1.2.3.4:9440 -i /terraform/workspace/ssh-key-abc123 root@ipv4-address"

See "Known Problems" if your terraform apply does not complete successfully.

Next Steps

You have several ways to access the bastion node, Nutanix nodes, and the cluster. The following sections offer commands to access the nodes. See the additional examples for more ways to use this Terraform module.

Login to Prism GUI

  • First, create an SSH port forward session with the bastion host:

    Mac or Linux

    $(terraform output -raw ssh_forward_command)

    Windows

    invoke-expression $(terraform output -raw ssh_forward_command)
  • Then open a browser and navigate to https://localhost:9440 (the certificate will not match the domain)

  • See Logging Into Prism Central for more details (including default credentials)

Access the Bastion host over SSH

For access to the bastion node, for troubleshooting the installation, or to for network access to the Nutanix nodes, you can SSH into the bastion host using:

ssh -i $(terraform output -raw ssh_private_key) root@$(terraform output -raw bastion_public_ip)

Access the Nutanix nodes over SSH

You can open a direct SSH session to the Nutanix nodes using the bastion host as a jumpbox. Debug details from the Cluster install can be found within /home/nutanix/data/logs.

ssh -i $(terraform output -raw ssh_private_key) -j root@$(terraform output -raw bastion_public_ip) nutanix@$(terraform output -raw cvim_ip_address)

Access the Nutanix nodes out-of-band

You can use the SOS (Serial-Over-SSH) interface for out-of-bands access using the default credentials for Nutanix nodes.

ssh -i $(terraform output -raw ssh_private_key) $(terraform output -raw nutanix_sos_hostname[0]) # access the first node
ssh -i $(terraform output -raw ssh_private_key) $(terraform output -raw nutanix_sos_hostname[1]) # access the second node
ssh -i $(terraform output -raw ssh_private_key) $(terraform output -raw nutanix_sos_hostname[2]) # access the third node

Known Problems

On-Demand Instances

This POC allocates a m3.small.x86 node for the Bastion host by default, you can change this to another instance type of your choosing by setting the metal_bastion_plan variable.

This POC allocates m3.large.x86 instances for the Nutanix nodes. Please note that the nutanix_lts_6_5 and nutanix_lts_6_5_poc OS images are only available for certified hardware plans, not all on-demand m3.large.x86 nodes will work, and it is HIGHLY recommended to work with your account representative for access to workload-optimized servers.

If you have preapproved access to the nutanix_lts_6_5_poc OS images, at the time of writing, we recommend the SL (Seoul), AM (Amsterdam), and TR (Toronto) Metros for deployment.

If a Nutanix node fails to provision, please try to terraform apply again. A node that fails to provision with the Nutanix AOS will be automatically removed from your project. Terraform will subsequently attempt to replace those servers.

Production deployments should use qualified Workload Optimized instances for Nutanix nodes. Create a hardware reservation or contact Equinix Metal to obtain validated Nutanix compatible servers. You can also convert a successfully deployed on-demand instance to a hardware reservation. Hardware Reservations will ensure that you get the correct hardware for your Nutanix deployments. Remember that this project is for initial proof-of-concept builds only, not production deployments.

SSH failures while running on macOS

The Nutanix devices have sshd configured with MaxSessions 1. In most cases this is not a problem, but in our testing on macOS we observed frequent SSH connection errors. These connection errors can be resolved by turning off the SSH agent in your terminal before running terraform apply. To turn off your SSH agent in a macOS terminal, run unset SSH_AUTH_SOCK.

Error messages that match this problem:

  • Error chmodding script file to 0777 in remote machine: ssh: rejected: administratively prohibited (open failed)
  • Failed to upload script: ssh: rejected: administratively prohibited (open failed)

VLAN Cleanup Failure

During the execution of a Terraform destroy operation, the deletion of a VLAN may fail with an HTTP 422 Unprocessable Entity response. The debug logs indicate that the DELETE request to remove the VLAN was sent successfully, but the response from the Equinix Metal API indicated a failure to process the request. The specific VLAN identified by the ID "xxxx" could not be deleted.

Fix:

If you encounter this issue, re-run the terraform destroy command to clean up the resources.

terraform destroy

Other Timeouts and Connection issues

This POC project has not ironed out all potential networking and provisioning timing hiccups that can occur. In many situations, running terraform apply again will progress the deployment to the next step. If you do not see progress after 3 attempts, open an issue on GitHub: https://github.com/equinix-labs/terraform-equinix-metal-nutanix-cluster/issues/new.

Error messages that match this problem:

  • timeout while waiting for state to become 'active, failed' (last state: 'provisioning', timeout:

Examples

For additional examples of this module, please see the examples directory.

  • cluster-migration - Demonstrates migration of VMs between two Nutanix clusters in the same project and metro.
  • cluster-with-ad - Demonstrates configuration of connecting a Nutanix Cluster to Windows Active Directory.

Requirements

Name Version
terraform >= 1.0
equinix >= 1.30
local >= 2.5
null >= 3
random >= 3

Providers

Name Version
equinix >= 1.30
local >= 2.5
null >= 3
random >= 3
terraform n/a

Modules

Name Source Version
ssh ./modules/ssh/ n/a

Resources

Name Type
equinix_metal_device.bastion resource
equinix_metal_device.nutanix resource
equinix_metal_gateway.gateway resource
equinix_metal_port.bastion_bond0 resource
equinix_metal_port.nutanix resource
equinix_metal_project.nutanix resource
equinix_metal_reserved_ip_block.nutanix resource
equinix_metal_vlan.nutanix resource
equinix_metal_vrf.nutanix resource
null_resource.finalize_cluster resource
null_resource.get_cvm_ip resource
null_resource.reboot_nutanix resource
null_resource.wait_for_dhcp resource
null_resource.wait_for_firstboot resource
random_string.vrf_name_suffix resource
terraform_data.input_validation resource
equinix_metal_project.nutanix data source
equinix_metal_vlan.nutanix data source
equinix_metal_vrf.nutanix data source
local_file.cvm_ip_address data source

Inputs

Name Description Type Default Required
metal_auth_token Equinix Metal API token. string n/a yes
metal_metro The metro to create the cluster in. string n/a yes
cluster_gateway The cluster gateway IP address string "" no
cluster_name The name of the Nutanix cluster, used as a prefix for resources. string "nutanix" no
cluster_subnet nutanix cluster subnet string "192.168.100.0/22" no
create_project (Optional) to use an existing project matching metal_project_name, set this to false. bool true no
create_vlan Whether to create a new VLAN for this project. bool true no
create_vrf Whether to create a new VRF for this project. bool true no
metal_bastion_plan Which plan to use for the bastion host. string "m3.small.x86" no
metal_nutanix_os Which OS to use for the Nutanix nodes. string "nutanix_lts_6_5" no
metal_nutanix_plan Which plan to use for the Nutanix nodes (must be Nutanix compatible, see https://deploy.equinix.com/developers/os-compatibility/) string "m3.large.x86" no
metal_organization_id The ID of the Metal organization in which to create the project if create_project is true. string null no
metal_project_id The ID of the Metal project in which to deploy to cluster. If create_project is false and
you do not specify a project name, the project will be looked up by ID. One (and only one) of
metal_project_name or metal_project_id is required or metal_project_id must be set.
string "" no
metal_project_name The name of the Metal project in which to deploy the cluster. If create_project is false and
you do not specify a project ID, the project will be looked up by name. One (and only one) of
metal_project_name or metal_project_id is required or metal_project_id must be set.
Required if create_project is true.
string "" no
metal_vlan_description Description to add to created VLAN. string "ntnx-demo. Deployed with Terraform module terraform-equinix-metal-nutanix-cluster." no
metal_vlan_id ID of the VLAN you wish to use. number null no
nutanix_node_count The number of Nutanix nodes to create. number 3 no
nutanix_reservation_ids Hardware reservation IDs to use for the Nutanix nodes. If specified, the length of this list must
be the same as nutanix_node_count. Each item can be a reservation UUID or next-available. If
you use reservation UUIDs, make sure that they are in the same metro specified in metal_metro.
list(string) [] no
skip_cluster_creation Skip the creation of the Nutanix cluster. bool false no
vrf_id ID of the VRF you wish to use. string null no

Outputs

Name Description
bastion_public_ip The public IP address of the bastion host
cluster_gateway The Nutanix cluster gateway IP
cvim_ip_address The IP address of the CVM
iscsi_data_services_ip Reserved IP for cluster ISCSI Data Services IP
nutanix_metal_project_id Project Id for the nutanix cluster
nutanix_metal_vlan_id VLan Id for the nutanix cluster
nutanix_sos_hostname The SOS address to the nutanix machine.
prism_central_ip_address Reserved IP for Prism Central VM
ssh_forward_command SSH port forward command to use to connect to the Prism GUI
ssh_key_id The ssh key Id for the SSH keypair
ssh_private_key The private key for the SSH keypair
ssh_private_key_contents The private key contents for the SSH keypair
virtual_ip_address Reserved IP for cluster virtual IP

Contributing

If you would like to contribute to this module, see CONTRIBUTING page.

License

Apache License, Version 2.0. See LICENSE.