Skip to content

Latest commit

 

History

History
363 lines (333 loc) · 89.2 KB

VARIABLES.md

File metadata and controls

363 lines (333 loc) · 89.2 KB

Variables

Inputs

Name Description Type Default Required
apiary_assume_roles Cross account AWS IAM roles allowed write access to managed Apiary S3 buckets using assume policy. list(any) [] no
apiary_consumer_iamroles AWS IAM roles allowed unrestricted (not subject to apiary_customer_condition) read access to all data in managed Apiary S3 buckets. list(string) [] no
apiary_consumer_prefix_iamroles AWS IAM roles allowed unrestricted (not subject to apiary_customer_condition) read access to certain prefixes in managed Apiary S3 buckets. See below section for more information and format. map(map(list(string))) {} no
apiary_customer_accounts AWS account IDs for clients of this Metastore. list(string) [] no
apiary_customer_condition IAM policy condition applied to customer account S3 object access. string "" no
apiary_database_name Database name to create in RDS for Apiary. string "apiary" no
apiary_deny_iamrole_actions List of S3 actions that 'apiary_deny_iamroles' are not allowed to perform. list(string)
[
"s3:Abort*",
"s3:Bypass*",
"s3:Delete*",
"s3:GetObject",
"s3:GetObjectTorrent",
"s3:GetObjectVersion",
"s3:GetObjectVersionTorrent",
"s3:ObjectOwnerOverrideToBucketOwner",
"s3:Put*",
"s3:Replicate*",
"s3:Restore*"
]
no
apiary_deny_iamroles AWS IAM roles denied access to Apiary managed S3 buckets. list(string) [] no
apiary_domain_name Apiary domain name for Route 53. string "" no
apiary_domain_private_zone Apiary domain private zone 53. bool true no
apiary_governance_iamroles AWS IAM governance roles allowed read and tagging access to managed Apiary S3 buckets. list(string) [] no
apiary_log_bucket Bucket for Apiary logs.If this is blank, module will create a bucket. string "" no
apiary_log_prefix Prefix for Apiary logs. string "" no
apiary_managed_schemas List of maps, each map contains schema name from which S3 bucket names will be derived, and various properties. The corresponding S3 bucket will be named as apiary_instance-aws_account-aws_region-schema_name. list(map(string)) [] no
apiary_producer_iamroles AWS IAM roles allowed write access to managed Apiary S3 buckets. map(any) {} no
apiary_rds_additional_sg Comma-separated string containing additional security groups to attach to RDS. list(any) [] no
apiary_shared_schemas Schema names which are accessible from read-only metastore, default is all schemas. list(any) [] no
apiary_tags Common tags that get put on all resources. map(any) n/a yes
atlas_cluster_name Name of the Atlas cluster where metastore plugin will send DDL events. Defaults to var.instance_name if not set. string "" no
atlas_kafka_bootstrap_servers Kafka instance url. string "" no
aws_region AWS region. string n/a yes
apiary_common_producer_iamroles AWS IAM roles allowed general (not tied to schema) write access to managed Apiary S3 buckets. list(string) [] no
dashboard_namespace k8s namespace to deploy grafana dashboard. string "monitoring" no
db_apply_immediately Specifies whether any cluster modifications are applied immediately, or during the next maintenance window. bool false no
db_backup_retention The number of days to retain backups for the RDS Metastore DB. string "7" yes
db_backup_window Preferred backup window for the RDS Metastore DB in UTC. string "02:00-03:00" no
db_copy_tags_to_snapshot Copy all Cluster tags to snapshots. bool true no
db_enable_performance_insights Enable RDS Performance Insights bool false no
db_enhanced_monitoring_interval RDS monitoring interval (in seconds) for enhanced monitoring. Valid values are 0, 1, 5, 10, 15, 30, 60. Default is 0. number 0 no
db_instance_class Instance type for the RDS Metastore DB. string "db.t4g.medium" yes
db_instance_count Desired count of database cluster instances. string "2" no
db_maintenance_window Preferred maintenance window for the RDS Metastore DB in UTC. string "wed:03:00-wed:04:00" no
db_master_username Aurora cluster MySQL master user name. string "apiary" no
db_ro_secret_name Aurora cluster MySQL read-only user SecretsManger secret name. string "" no
db_rw_secret_name Aurora cluster MySQL read/write user SecretsManager secret name. string "" no
disallow_incompatible_col_type_changes Hive metastore setting to disallow validation when incompatible schema type changes. bool true no
docker_registry_auth_secret_name Docker Registry authentication SecretManager secret name. string "" no
ecs_domain_extension Domain name to use for hosted zone created by ECS service discovery. string "lcl" no
elb_timeout Idle timeout for Apiary ELB. string "1800" no
enable_apiary_s3_log_hive Create hive database to archive s3 logs in parquet format.Only applicable when module manages logs S3 bucket. bool true no
enable_autoscaling Enable read only Hive Metastore k8s horizontal pod autoscaling. bool true no
enable_data_events Enable managed buckets S3 event notifications. bool false no
enable_gluesync Enable metadata sync from Hive to the Glue catalog. bool false no
enable_hive_metastore_metrics Enable sending Hive Metastore metrics to CloudWatch. bool false no
enable_metadata_events Enable Hive Metastore SNS listener. bool false no
enable_s3_paid_metrics Enable managed S3 buckets request and data transfer metrics. bool false no
enable_vpc_endpoint_services Enable metastore NLB, Route53 entries VPC access and VPC endpoint services, for cross-account access. bool true no
encrypt_db Specifies whether the DB cluster is encrypted bool false no
external_data_buckets Buckets that are not managed by Apiary but added to Hive Metastore IAM role access. list(any) [] no
external_database_host External Metastore database host to support legacy installations, MySQL database won't be created by Apiary when this option is specified. string "" no
external_database_host_readonly External Metastore database host to support legacy installations. string "" no
hive_metastore_port Port on which both Hive Metastore readwrite and readonly will run. number 9083 no
hms_additional_environment_variables Additional environment variables for the Hive Metastore. map(any) {} no
hms_housekeeper_additional_environment_variables Additional environment variables for Hive Housekeeper. map(any) {} no
hms_autogather_stats Read-write Hive metastore setting to enable/disable statistics auto-gather on table/partition creation. bool true no
hms_docker_image Docker image ID for the Hive Metastore. string n/a yes
hms_docker_version Version of the Docker image for the Hive Metastore. string n/a yes
hms_instance_type Hive Metastore instance type, possible values: ecs,k8s. string "ecs" no
hms_log_level Log level for the Hive Metastore. string "INFO" no
hms_nofile_ulimit Ulimit for the Hive Metastore container. string "32768" no
hms_ro_cpu CPU for the read only Hive Metastore ECS task.
Valid values can be 256, 512, 1024, 2048 and 4096.
Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
string "512" no
hms_ro_db_connection_pool_size Read-only Hive metastore setting for size of the MySQL connection pool. Default is 10. number 10 no
hms_ro_ecs_task_count Desired ECS task count of the read only Hive Metastore service. string "3" no
hms_ro_heapsize Heapsize for the read only Hive Metastore.
Valid values: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
string "2048" no
hms_ro_k8s_replica_count Initial Number of read only Hive Metastore k8s pod replicas to create. number "2048" no
hms_ro_k8s_max_replica_count Max Number of read only Hive Metastore k8s pod replicas to create. number "2048" no
hms_rw_k8s_pdb_settings Add PodDisruptionBudget to the HMS rw pods. object max_unavailable = 1 no
hms_rw_k8s_rolling_update_strategy Configure HMS RW deployment rolling strategy. object max_unavailable = 1 no
hms_ro_target_cpu_percentage Read only Hive Metastore autoscaling threshold for CPU target usage. number "2048" no
hms_ro_request_partition_limit Read only Hive Metastore limits of request partitions. string n/a no
hms_ro_node_affinity Add node affinities to the Hive metastore pods. list(object) n/a no
hms_ro_tolerations Add tolerations to the Hive metastore pods. list(object) n/a no
hms_rw_cpu CPU for the read/write Hive Metastore ECS task.
Valid values can be 256, 512, 1024, 2048 and 4096.
Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
string "512" no
hms_rw_db_connection_pool_size Read-write Hive metastore setting for size of the MySQL connection pool. Default is 10. number 10 no
hms_rw_ecs_task_count Desired ECS task count of the read/write Hive Metastore service. string "3" no
hms_rw_heapsize Heapsize for the read/write Hive Metastore.
Valid values: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
string "2048" no
hms_rw_k8s_replica_count Initial Number of read/write Hive Metastore k8s pod replicas to create. number "2048" no
hms_rw_k8s_pdb_settings Add PodDisruptionBudget to the HMS rw pods. object max_unavailable = 1 no
hms_rw_k8s_rolling_update_strategy Configure HMS RW deployment rolling strategy. object max_unavailable = 1 no
hms_rw_request_partition_limit Read Write Hive Metastore limits of request partitions. string n/a no
hms_rw_node_affinity Add node affinities to the Hive metastore pods. list(object) n/a no
hms_rw_tolerations Add tolerations to the Hive metastore pods. list(object) n/a no
iam_name_root Name to identify Hive Metastore IAM roles. string "hms" no
ingress_cidr Generally allowed ingress CIDR list. list(string) n/a yes
instance_name Apiary instance name to identify resources in multi-instance deployments. string "" no
k8s_docker_registry_secret Docker Registry authentication K8s secret name. string "" no
kafka_bootstrap_servers Kafka bootstrap servers to send metastore events, setting this enables Hive Metastore Kafka listener. string "" no
kafka_topic_name Kafka topic to send metastore events. string "" no
kiam_arn Kiam server IAM role ARN. string "" no
ldap_base Active directory LDAP base DN to search users and groups. string "" no
ldap_ca_cert Base64 encoded Certificate Authority bundle to validate LDAPS connections. string "" no
ldap_secret_name Active directory LDAP bind DN SecretsManager secret name. string "" no
ldap_url Active directory LDAP URL to configure Hadoop LDAP group mapping. string "" no
metastore_namespace k8s namespace to deploy metastore containers. string "metastore" no
oidc_provider EKS cluster OIDC provider name, required for configuring IAM using IRSA. string "" no
private_subnets Private subnets. list(any) n/a yes
ranger_audit_db_url Ranger DB audit provider configuration. string "" no
ranger_audit_secret_name Ranger DB audit secret name. string "" no
ranger_audit_solr_url Ranger Solr audit provider configuration. string "" no
ranger_policy_manager_url Ranger admin URL to synchronize policies. string "" no
rds_max_allowed_packet RDS/MySQL setting for parameter 'max_allowed_packet' in bytes. Default is 128MB (Note that MySQL default is 4MB). number 134217728 no
rw_ingress_cidr Read-Write metastore ingress CIDR list. If not set, defaults to var.ingress_cidr. list(string) [] no
s3_enable_inventory Enable S3 inventory configuration. bool false no
s3_inventory_customer_accounts AWS account IDs allowed to access s3 inventory database. list(string) [] no
s3_inventory_format Output format for S3 inventory results. Can be Parquet, ORC, CSV string "ORC" no
s3_inventory_update_schedule Cron schedule to update S3 inventory tables (if enabled). Defaults to every 12 hours. string "0 */12 * * *" no
s3_lifecycle_abort_incomplete_multipart_upload_days Number of days after which incomplete multipart uploads will be deleted. string "7" no
s3_lifecycle_policy_transition_period S3 Lifecycle Policy number of days for Transition rule string "30" no
s3_log_expiry Number of days after which Apiary S3 bucket logs expire. string "365" no
s3_logs_sqs_delay_seconds The time in seconds that the delivery of all messages in the queue will be delayed. number 300 no
s3_logs_sqs_message_retention_seconds Time in seconds after which message will be deleted from the queue. number 345600 no
s3_logs_sqs_receive_wait_time_seconds The time for which a ReceiveMessage call will wait for a message to arrive (long polling) before returning. number 10 no
s3_logs_sqs_visibility_timeout_seconds Time in seconds after which message will be returned to the queue if it is not deleted. number 3600 no
s3_storage_class S3 storage class after transition using lifecycle policy string "INTELLIGENT_TIERING" no
secondary_vpcs List of VPCs to associate with Service Discovery namespace. list(any) [] no
system_schema_customer_accounts AWS account IDs allowed to access system database. list(string) [] no
system_schema_name Name for the internal system database string "apiary_system" no
table_param_filter A regular expression for selecting necessary table parameters for the SNS listener. If the value isn't set, then no table parameters are selected. string "" no
vpc_id VPC ID. string n/a yes
enable_dashboard make EKS & ECS dashboard optional bool true no
rds_family RDS Family string aurora5.6 no
datadog_metrics_enabled Enable Datadog metrics for HMS bool false no
datadog_metrics_hms_readwrite_readonly Prometheus Metrics sent to datadog list(string) ["metrics_classloading_loaded_value","metrics_threads_count_value","metrics_memory_heap_max_value","metrics_init_total_count_tables_value","metrics_init_total_count_dbs_value","metrics_memory_heap_used_value","metrics_init_total_count_partitions_value"] no
datadog_metrics_port Port in which metrics will be send for Datadog string 8080 no
datadog_key_secret_name Name of the secret containing the DataDog API key. This needs to be created manually in AWS secrets manager. This is only applicable to ECS deployments. string null no
datadog_agent_version Version of the Datadog Agent running in the ECS cluster. This is only applicable to ECS deployments. string 7.50.3-jmx no
datadog_agent_enabled Whether to include the datadog-agent container. This is only applicable to ECS deployments. string false no
enable_tcp_keepalive tcp_keepalive settings on HMS pods. To use this you need to enable the ability to cahnge sysctl settings on your kubernetes cluster. For EKS you need to allow this on your cluster (https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ check EKS version for details). If your EKS version is below 1.24 you need to create a PodSecurityPolicy allowing the following sysctls "net.ipv4.tcp_keepalive_time", "net.ipv4.tcp_keepalive_intvl","net.ipv4.tcp_keepalive_probes" and a ClusterRole + Rolebinding for the service account running the HMS pods or all services accounts in the namespace where Apiary is running so that kubernetes can apply the tcp)keepalive configuration. For EKS 1.25 and above check this https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes. Also see tcp_keepalive_* variables. bool false no
tcp_keepalive_time Sets net.ipv4.tcp_keepalive_time (seconds). number 200 no
tcp_keepalive_intvl Sets net.ipv4.tcp_keepalive_intvl (seconds) number 30 no
tcp_keepalive_probes Sets net.ipv4.tcp_keepalive_probes (seconds) number 2 no

apiary_assume_roles

A list of maps. Each map entry describes a role that is created in this account, and a list of principals (IAM ARNs) in other accounts that are allowed to assume this role. Each entry also specifies a list of Apiary schemas that this role is allowed to write to.

An example entry looks like:

apiary_assume_roles = [
  {
    name = "client_name"
    principals = [ "arn:aws:iam::account_number:role/cross-account-role" ]
    schema_names = [ "dm","lz","test_1" ]
    max_role_session_duration_seconds = "7200",
    allow_cross_region_access = true
  }
]

apiary_assume_roles map entry fields:

Name Description Type Default Required
name Short name of the IAM role to be created. Full name will be apiary-<name>-<region>. string - yes
principals List of IAM role ARNs from other accounts that can assume this role. list(string) - yes
schema_names List of Apiary schemas that this role can read/write. list(string) - yes
max_role_session_duration_seconds Number of seconds that the assumed credentials are valid for. string "3600" no
allow_cross_region_access If true, will allow this role to write these Apiary schemas in all AWS regions that these schemas exist in (in this account). If false, can only write in this region. bool false no

apiary_managed_schemas

A list of maps. Schema names from which S3 bucket names will be derived, corresponding S3 bucket will be named as apiary_instance-aws_account-aws_region-schema_name, along with S3 storage properties like storage class and number of days for transitions.

An example entry looks like:

apiary_managed_schemas = [
  {
   schema_name = "sandbox"
   s3_lifecycle_policy_transition_period = "30"
   s3_storage_class = "INTELLIGENT_TIERING"
   s3_object_expiration_days = 60
   tags=jsonencode({ Domain = "search", ComponentInfo = "1234" })
   enable_data_events_sqs = "1"
   encryption   = "aws:kms" //supported values for encryption are AES256,aws:kms
   admin_roles = "role1_arn,role2_arn" //kms key management will be restricted to these roles.
   client_roles = "role3_arn,role4_arn" //s3 bucket read/write and kms key usage will be restricted to these roles.
   customer_accounts = "account_id1,account_id2" //this will override module level apiary_customer_accounts
  }
]

apiary_managed_schemas map entry fields:

Name Description Type Default Required
schema_name Name of the S3 bucket. Full name will be apiary_instance-aws_account-aws_region-schema_name. string - yes
enable_data_events_sqs If set to "1", S3 data event notifications for ObjectCreated and ObjectRemoved will be sent to an SQS queue for processing by external systems. string - no
s3_lifecycle_policy_transition_period Number of days for transition to a different storage class using lifecycle policy. string "30" No
s3_storage_class Destination S3 storage class for transition in the lifecycle policy. For valid values for S3 Storage classes, reference: https://www.terraform.io/docs/providers/aws/r/s3_bucket.html#storage_class string "INTELLIGENT_TIERING" No
s3_object_expiration_days Number of days after which objects in Apiary managed schema buckets expire. number null No
tags Additional tags added to the S3 data bucket. The map of tags must be encoded as a string using jsonencode (see sample above). If the var.apiary_tags collection and the tags passed to apiary_managed_schemas both contain the same tag name, the tag value passed to apiary_managed_schemas will be used. string null no
encryption S3 objects encryption type, supported values are AES256,aws:kms. string null no
admin_roles IAM roles configured with admin access on corresponding KMS keys,required when encryption type is aws:kms. string null no
client_roles IAM roles configured with usage access on corresponding KMS keys,required when encryption type is aws:kms. string null no

apiary_consumer_iamroles

A list of cross-account IAM role ARNs that are allowed to read all data in all Apiary managed schemas. These roles are not subject to any restrictions imposed by apiary_customer_condition policies.

An example entry looks like:

apiary_consumer_iamroles = [
  "arn:aws:iam::<account_id>:role/<iam_role_1>",
  "arn:aws:iam::<account_id>:role/<iam_role_2>",
  ...
]

apiary_consumer_prefix_iamroles

A map of map of list of IAM roles. Each top-level map entry is the name of an Apiary managed schema. Each entry in that map is an S3 prefix in that schema. The value of that map entry is a list of IAM roles that has unrestricted read access to objects under that S3 prefix. These roles are not subject to any restrictions imposed by apiary_customer_condition policies.

An example entry looks like:

apiary_consumer_prefix_iamroles = {
  sandbox = {
    "prefix1/with/several/levels" = [
      "arn:aws:iam::<account_id>:role/<iam_role_1>",
      "arn:aws:iam::<account_id>:role/<iam_role_2>"
    ]
    prefix2 = [
      "arn:aws:iam::<account_id>:role/<iam_role_1>"
    ]
  }
  test = {
    prefixroletest = [
      "arn:aws:iam::<account_id>:role/<iam_role_1>",
      "arn:aws:iam::<account_id>:role/<iam_role_2>"
    ]
    "prefixroletest2" = [
      "arn:aws:iam::<account_id>:role/<iam_role_2>"
    ]
  }
}

apiary_customer_condition

A string that defines a list of conditions that restrict which objects in an Apiary schema's S3 bucket may be read cross-account by accounts in the customer_accounts list. The string is a semicolon-delimited list of comma-delimited strings that specify conditions that are valid in AWS S3 bucket policy Condition sections. This condition is applied to every Apiary schema's S3 bucket policy.

An example entry to limit access to:

  • Only requests from certain VPC CIDR blocks
  • And only to objects that have:
    • Either an S3 tag of data-sensitivity=false or
    • An S3 tag of data-type=image* looks like:
apiary_customer_condition = <<EOF
  "IpAddress": {"aws:VpcSourceIp": ["10.0.0.0/8","100.64.0.0/10"]},
  "StringEquals": {"s3:ExistingObjectTag/data-sensitivity": "false" };
  "IpAddress": {"aws:VpcSourceIp": ["10.0.0.0/8","100.64.0.0/10"]},
  "StringLike": {"s3:ExistingObjectTag/data-type": "image*" }
EOF

Each semicolon-demlimited section will create a new statement entry in the bucket policy's Statement array. Each comma-delimited section will create an entry in the Condition section of the Statement entry. For the above example, the Statement and Condition entries would be:

    "Statement": [
        {
            "Sid": "Apiary customer account object permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "<customer_account1_ARN>",
                    "<customer_account2_ARN>",
                    ...
                    "customer_accountN_ARN"
                ]
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl"
            ],
            "Resource": "arn:aws:s3:::apiary-<account_num>-<region>-<schema_name>/*",
            "Condition": {
                "StringEquals": {
                    "s3:ExistingObjectTag/data-sensitivity": "false"
                },
                "IpAddress": {
                    "aws:VpcSourceIp": [
                        "10.0.0.0/8",
                        "100.64.0.0/10"
                    ]
                }
            }
        },
        {
            "Sid": "Apiary customer account object permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "<customer_account1_ARN>",
                    "<customer_account2_ARN>",
                    ...
                    "customer_accountN_ARN"
                ]
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl"
            ],
            "Resource": "arn:aws:s3:::apiary-<account_num>-<region>-<schema_name>/*",
            "Condition": {
                "StringLike": {
                    "s3:ExistingObjectTag/data-type": "image*"
                },
                "IpAddress": {
                    "aws:VpcSourceIp": [
                        "10.0.0.0/8",
                        "100.64.0.0/10"
                    ]
                }
            }
        },
    ]

Interactions with apiary_consumer_iamroles and apiary_consumer_prefix_iamroles

  • Note that any IAM roles in apiary_consumer_iamroles would not be subject to the restrictions from apiary_customer_condition, and so could read any S3 object, even if they don't have a data-sensitivity tag, or if the data-sensitivity tag is true, or if there is no data-type tag of image*.
  • Note that any IAM roles in apiary_consumer_prefix_iamroles would not be subject to the restrictions from apiary_customer_condition for the schemas and prefixes specified in the map, and so could read any S3 object under those prefixes, even if they don't have a data-sensitivity tag, or if the data-sensitivity tag is true, or if there is no data-type tag of image*.

common_producer_iamroles

A list of cross-account IAM role ARNs that are allowed to read and write data in all Apiary managed schemas.

An example entry looks like:

common_producer_iamroles = [
  "arn:aws:iam::<account_id>:role/<iam_role_1>",
  "arn:aws:iam::<account_id>:role/<iam_role_2>",
  ...
]

Deny global writes to bucket - deny_global_write_access and producer_roles

Write access is granted by default for roles within the same AWS account. If you would like to protect the bucket so only certain roles can write you can use deny_global_write_access and producer_roles.

If you would like to protect all buckets you can set the default variable deny_global_write_access to true. However, enabling only one bucket looks like this:

apiary_managed_schemas = [
  {
   schema_name = "sandbox"
   ...
   deny_global_write_access = true,
   producer_roles = "arn:aws:iam::000000000:role/role-1,arn:aws:iam::000000000:role/role-2"
  }
]