Skip to content

givanovexpe/apiary-data-lake

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repo contains a Terraform module to deploy the Apiary data lake component. The module deploys various stateful components in a typical Hadoop-compatible data lake in AWS.

For more information please refer to the main Apiary project page.

Architecture

Datalake  architecture

Key Features

  • Highly Available(HA) metastore service - packaged as Docker container and running on an ECS Fargate Cluster.
  • PrivateLinks - Network load balancers and VPC endpoints to enable federated access to read-only and read/write metastores.
  • Managed schemas - integrated way of managing Hive schemas, S3 buckets and bucket policies.
  • SNS Listener - A Hive metastore event listener to publish all metadata updates to a SNS topic, see ApiarySNSListener for more details.
  • Gluesync - A metastore event listener to replay Hive metadata events in a Glue catalog.
  • Metastore authorization - A metastore pre-event listener to handle authorization using Ranger.

Variables

Please refer to VARIABLES.md.

Usage

Example module invocation:

module "apiary" {
  source                   = "git::https://github.com/ExpediaGroup/apiary-data-lake.git"
  aws_region               = "us-west-2"
  instance_name            = "test"
  apiary_tags              = "${var.tags}"
  private_subnets          = ["subnet1", "subnet2", "subnet3"]
  vpc_id                   = "vpc-123456"
  hms_docker_image         = "${aws_account}.dkr.ecr.${aws_region}.amazonaws.com/apiary-metastore"
  hms_docker_version       = "1.0.0"
  hms_ro_cpu               = "2048"
  hms_rw_cpu               = "2048"
  hms_ro_heapsize          = "8192"
  hms_rw_heapsize          = "8192"
  apiary_log_bucket        = "s3-logs-bucket"
  db_instance_class        = "db.t2.medium"
  db_backup_retention      = "7"
  apiary_managed_schemas   = ["db1", "db2", "dm"]
  apiary_customer_accounts = ["aws_account_no_1", "aws_account_no_2"]
  ingress_cidr             = ["10.0.0.0/8"]
}

Notes

The Apiary metastore Docker image is not yet published to a public repository, you can build from this repo and then publish it to your own ECR.

Contact

Mailing List

If you would like to ask any questions about or discuss Apiary please join our mailing list at

https://groups.google.com/forum/#!forum/apiary-user

Legal

This project is available under the Apache 2.0 License.

Copyright 2018-2019 Expedia, Inc.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HCL 95.1%
  • Shell 4.9%