Skip to content

Commit

Permalink
Merge pull request #136 from ExpediaGroup/fix/tcp_keep_alive_in_eks
Browse files Browse the repository at this point in the history
Added keepalive config for EKS
  • Loading branch information
patduin authored Jul 1, 2024
2 parents 7a78f06 + 5db127f commit b243ea6
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 6 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).


## [4.5.3] - 2024-07-01
### Added
- Added support for setting the TCP keepalive settings of Waggledance.

## [4.5.2] - 2024-06-04
### Updated
- Changed Service account creation to make it work with eks 1.24 and later.
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,10 @@ For more information please refer to the main [Apiary](https://github.com/Expedi
| root_vol_type | Waggle Dance EC2 root volume type. | string | `gp2` | no |
| root_vol_size | Waggle Dance EC2 root volume size. | string | `10` | no |
| enable_query_functions_across_all_metastores | This controls the thrift call for `get_all_functions`. It is generally used to initialize a client and get built-in functions and registered UDF's from a metastore. Setting this to `false` is more performant as WD then only gets the functions from the `primary` metastore. However, setting this to `true` will collate results by calling `get_all_functions` from all configured metastores. This could be potentially slow if some of the metastores are slow to respond. If all the metastores configured are of the same version and no additional UDF's are installed, then WD gets the same functions back so it's not very useful to call this across metastores. For backwards compatibility, this property can be set to `true`. Further read: https://github.com/ExpediaGroup/waggle-dance#server | bool | false | no |
| tcp_keepalive_time | Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS. | number | `200` | no |
| tcp_keepalive_intvl | Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS. | number | `30` | no |
| tcp_keepalive_probes | Sets net.ipv4.tcp_keepalive_probes (seconds), currently only supported in ECS. | number | `2` | no |
| enable_tcp_keepalive | tcp_keepalive settings on the Waggledance pods. To use this you need to enable the ability to cahnge sysctl settings on your kubernetes cluster. For EKS you need to allow this on your cluster (https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ check EKS version for details). If your EKS version is below 1.24 you need to create a PodSecurityPolicy allowing the following sysctls "net.ipv4.tcp_keepalive_time", "net.ipv4.tcp_keepalive_intvl","net.ipv4.tcp_keepalive_probes" and a ClusterRole + Rolebinding for the service account running the HMS pods or all services accounts in the namespace where Apiary is running so that kubernetes can apply the tcp)keepalive configuration. For EKS 1.25 and above check this https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes. Also see tcp_keepalive_* variables. | bool | false | no |
| tcp_keepalive_time | Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS. | number | `200` | no |
| tcp_keepalive_intvl | Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS. | number | `30` | no |
| tcp_keepalive_probes | Sets net.ipv4.tcp_keepalive_probes (seconds), currently only supported in ECS. | number | `2` | no |
| datadog_key_secret_name | Name of the secret containing the DataDog API key. This needs to be created manually in AWS secrets manager. | string | | no |
| datadog_agent_version | Version of the Datadog Agent running in the ECS cluster. | string | `7.46.0-jmx` | no |
| include_datadog_agent | Whether to include the datadog-agent container alongside Waggledance. | string | bool | no |
Expand Down
17 changes: 17 additions & 0 deletions k8s.tf
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,23 @@ resource "kubernetes_deployment_v1" "waggle_dance" {
spec {
service_account_name = kubernetes_service_account_v1.waggle_dance[0].metadata.0.name
automount_service_account_token = true
dynamic "security_context" {
for_each = var.enable_tcp_keepalive ? ["enabled"] : []
content {
sysctl {
name = "net.ipv4.tcp_keepalive_time"
value = var.tcp_keepalive_time
}
sysctl {
name = "net.ipv4.tcp_keepalive_intvl"
value = var.tcp_keepalive_intvl
}
sysctl {
name = "net.ipv4.tcp_keepalive_probes"
value = var.tcp_keepalive_probes
}
}
}
container {
image = "${var.docker_image}:${var.docker_version}"
name = local.instance_alias
Expand Down
13 changes: 10 additions & 3 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -394,24 +394,31 @@ variable "datadog_metrics_enabled" {
default = false
}

variable "enable_tcp_keepalive" {
description = "Enable tcp keepalive settings on the waggledance pods."
type = bool
default = false
}

variable "tcp_keepalive_time" {
description = "Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS."
description = "Sets net.ipv4.tcp_keepalive_time (seconds)."
type = number
default = 200
}

variable "tcp_keepalive_intvl" {
description = "Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS."
description = "Sets net.ipv4.tcp_keepalive_intvl (seconds)."
type = number
default = 30
}

variable "tcp_keepalive_probes" {
description = "Sets net.ipv4.tcp_keepalive_probes (number), currently only supported in ECS."
description = "Sets net.ipv4.tcp_keepalive_probes (number)."
type = number
default = 2
}


variable "datadog_key_secret_name" {
description = "Name of the secret containing the DataDog API key. This needs to be created manually in AWS secrets manager. This is only applicable to ECS deployments."
type = string
Expand Down

0 comments on commit b243ea6

Please sign in to comment.