Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault nodes won't join raft cluster when using transit auto-unseal #29258

Open
derekrobertson opened this issue Dec 24, 2024 · 0 comments
Open

Comments

@derekrobertson
Copy link

Describe the bug
I have an upstream Vault Enterprise deployment that implements namespaces.
I create a namespace /apps/myapp and create a transit secret engine within that namespace.
I created auto-unseal keys within the transit secret engine that I intend to use to auto-unseal a downstream Vault opensource cluster.

Policy is applied and periodic orphan token generated as per process described at:
Auto-unseal Vault using transit secrets engine | Vault | HashiCorp Developer

On my downstream Vault Opensource cluster, I add the seal stanza as described in the docs. The docs state that I can either:

  • Put token field in the seal stanza.
  • Alternatively, use the VAULT_TOKEN environment variable.

https://developer.hashicorp.com/vault/docs/configuration/seal/transit#token
token (string: ): The Vault token to use. This may also be specified by the VAULT_TOKEN environment variable.

If I use the token field in the seal stanza - everything works fine as expected.

However, if I use the VAULT_TOKEN environment variable instead, the first node in the cluster auto-unseals ok using vault operater init but the other peer nodes cannot join the raft cluster due to this error:

2024-12-20T10:40:09.426Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault-os-1.*****.*****.co.uk:8200
 error=
 | error during raft bootstrap init call: Error making API request.
 |
 | URL: PUT https://vault-os-1.*****.*****.co.uk:8200/v1/sys/storage/raft/bootstrap/challenge
 | Code: 500. Errors:
 |
 | * error performing token check: failed to look up namespace from the token: no namespace

Tested on Vault opensource 1.17.2 and 1.18.3

To Reproduce
Steps to reproduce the behavior:

  1. Have upstream Vault Enteprise cluster that implements namespaces.
  2. Within a test namespace on the upstream cluster, create unseal policy and a periodic orphaned token as described in hashicorp docs linked above.
  3. Deploy downstream 3 node Vault opensource cluster with vault.hcl shown below.
  4. In file /etc/vault.d/vault.env set value VAULT_TOKEN=<token>
  5. On node 1, set VAULT_ADDR as normal, then run vault operator init. The vault will unseal and you will be presented with Recovery keys and root token.
  6. The other nodes in the downstream Vault opensource cluster will NOT join the cluster.
vault operator raft list-peers
Node                                     Address                                       State     Voter
----                                     -------                                       -----     -----
vault-os-1.******.******.co.uk    vault-os-1.******.******.co.uk:8201    leader    true

Expected behavior
After running vault operator init on the first node of the cluster, the other Raft nodes should join the cluster due to the retry_join in the configuration.

In a working environment, the output of vault operator raft list-peers should show the 3 nodes:

Node Address State Voter
---- ------- ----- -----
vault-os-1.*****.*****.co.uk vault-os-1.*****.*****.co.uk:8201 leader true
vault-os-3.*****.*****.co.uk vault-os-3.*****.*****.co.uk:8201 follower true
vault-os-2.*****.*****.co.uk vault-os-2.*****.*****.co.uk:8201 follower true

Environment:

  • Vault Server Version (retrieve with vault status): 1.18.3

  • Vault CLI Version (retrieve with vault version): Vault v1.18.3 (7ae4eca), built 2024-12-16T14:00:53Z

  • Server Operating System/Architecture: Tested on Ubuntu 22.04 and OEL 9u5

  • Vault with HAProxy layer 4 loadbalancer in front which has frontend on port 443 and backend on port 8200.

Vault server configuration file(s):

#Node 1
# Full configuration options can be found at https://developer.hashicorp.com/vault/docs/configuration

ui = true

api_addr      = "https://vault-os.*****.******.co.uk:443"

cluster_name  = "vault"
cluster_addr  = "https://vault-os-1.*****.******.co.uk:8201"

disable_mlock = true

listener "tcp" {
  address            = "vault-os-1.*****.******.co.uk:8200"
  tls_cert_file      = "/opt/vault/tls/vault-cert.pem"
  tls_key_file       = "/opt/vault/tls/vault-key.pem"
  tls_client_ca_file = "/opt/vault/tls/vault-ca.pem"
}

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-os-1.*****.******.co.uk"

  retry_join {
    leader_tls_servername   = "vault-os-2.*****.******.co.uk"
    leader_api_addr         = "https://vault-os-2.*****.******.co.uk:8200"
    leader_client_cert_file = "/opt/vault/tls/vault-cert.pem"
    leader_client_key_file  = "/opt/vault/tls/vault-key.pem"
    leader_ca_cert_file     = "/opt/vault/tls/vault-ca.pem"
  }
  retry_join {
    leader_tls_servername   = "vault-os-3.*****.******.co.uk"
    leader_api_addr         = "https://vault-os-3.*****.******.co.uk:8200"
    leader_client_cert_file = "/opt/vault/tls/vault-cert.pem"
    leader_client_key_file  = "/opt/vault/tls/vault-key.pem"
    leader_ca_cert_file     = "/opt/vault/tls/vault-ca.pem"
  }
}


seal "transit" {
  address = "https://vault.*****.*****.co.uk"
  namespace = "apps/myapp"
  disable_renewal = "false"
  key_name = "autounseal"
  mount_path = "transit/"
  tls_skip_verify = "false"
}

enable_response_header_hostname = "true"
enable_response_header_raft_node_id = "true"

log_level = "info"
log_file = "/var/log/vault/vault.log"
log_rotate_duration = "24h"
log_rotate_max_files = "30"
#Node 2
# Full configuration options can be found at https://developer.hashicorp.com/vault/docs/configuration

ui = true

api_addr      = "https://vault-os.*****.*****.co.uk:443"

cluster_name  = "vault"
cluster_addr  = "https://vault-os-2.*****.*****.co.uk:8201"

disable_mlock = true

listener "tcp" {
  address            = "vault-os-2.*****.*****.co.uk:8200"
  tls_cert_file      = "/opt/vault/tls/vault-cert.pem"
  tls_key_file       = "/opt/vault/tls/vault-key.pem"
  tls_client_ca_file = "/opt/vault/tls/vault-ca.pem"
}

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-os-2.*****.*****.co.uk"

  retry_join {
    leader_tls_servername   = "vault-os-1.*****.*****.co.uk"
    leader_api_addr         = "https://vault-os-1.*****.*****.co.uk:8200"
    leader_client_cert_file = "/opt/vault/tls/vault-cert.pem"
    leader_client_key_file  = "/opt/vault/tls/vault-key.pem"
    leader_ca_cert_file     = "/opt/vault/tls/vault-ca.pem"
  }
  retry_join {
    leader_tls_servername   = "vault-os-3.*****.*****.co.uk"
    leader_api_addr         = "https://vault-os-3.*****.*****.co.uk:8200"
    leader_client_cert_file = "/opt/vault/tls/vault-cert.pem"
    leader_client_key_file  = "/opt/vault/tls/vault-key.pem"
    leader_ca_cert_file     = "/opt/vault/tls/vault-ca.pem"
  }
}


seal "transit" {
  address = "https://vault.*****.*****.co.uk"
  namespace = "apps/myapp"
  disable_renewal = "false"
  key_name = "autounseal"
  mount_path = "transit/"
  tls_skip_verify = "false"
}

enable_response_header_hostname = "true"
enable_response_header_raft_node_id = "true"

log_level = "info"
log_file = "/var/log/vault/vault.log"
log_rotate_duration = "24h"
log_rotate_max_files = "30"
#Node 3
# Full configuration options can be found at https://developer.hashicorp.com/vault/docs/configuration

ui = true

api_addr      = "https://vault-os.*****.*****.co.uk:443"

cluster_name  = "vault"
cluster_addr  = "https://vault-os-3.*****.*****.co.uk:8201"

disable_mlock = true

listener "tcp" {
  address            = "vault-os-3.*****.*****.co.uk:8200"
  tls_cert_file      = "/opt/vault/tls/vault-cert.pem"
  tls_key_file       = "/opt/vault/tls/vault-key.pem"
  tls_client_ca_file = "/opt/vault/tls/vault-ca.pem"
}

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-os-3.*****.*****.co.uk"

  retry_join {
    leader_tls_servername   = "vault-os-1.*****.*****.co.uk"
    leader_api_addr         = "https://vault-os-1.*****.*****.co.uk:8200"
    leader_client_cert_file = "/opt/vault/tls/vault-cert.pem"
    leader_client_key_file  = "/opt/vault/tls/vault-key.pem"
    leader_ca_cert_file     = "/opt/vault/tls/vault-ca.pem"
  }
  retry_join {
    leader_tls_servername   = "vault-os-2.*****.*****.co.uk"
    leader_api_addr         = "https://vault-os-2.*****.*****.co.uk:8200"
    leader_client_cert_file = "/opt/vault/tls/vault-cert.pem"
    leader_client_key_file  = "/opt/vault/tls/vault-key.pem"
    leader_ca_cert_file     = "/opt/vault/tls/vault-ca.pem"
  }
}


seal "transit" {
  address = "https://vault.*****.*****.co.uk"
  namespace = "apps/myapp"
  disable_renewal = "false"
  key_name = "autounseal"
  mount_path = "transit/"
  tls_skip_verify = "false"
}

enable_response_header_hostname = "true"
enable_response_header_raft_node_id = "true"

log_level = "info"
log_file = "/var/log/vault/vault.log"
log_rotate_duration = "24h"
log_rotate_max_files = "30"

Additional context
If I set the token directly in the vault.hcl file like below, everything works correctly.

seal "transit" {
  address = "https://vault.*****.******.co.uk"
  namespace = "apps/myapp"
  token = "hvs.*********************"
  disable_renewal = "false"
  key_name = "autounseal"
  mount_path = "transit/"
  tls_skip_verify = "false"
}

The problem seems to be the VAULT_TOKEN in the vault.env file is either not respected, or is interfering with the Raft challenge workflow.

The vault.env file is loaded with the systemd service:

[Service]
Type=notify
EnvironmentFile=/etc/vault.d/vault.env
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant