π― The GitOps Platform for Data Analytics utilizes Kubernetes (K8s) and HashiCorp's Terraform Infrastructure as Code (IaC) on the AWS Cloud π₯οΈ, offering speed, scalability, agility, and cost efficiency. β‘
The diagram below showcases the wide array of open-source data tools, Kubernetes operators, and frameworks supported by DoK8s. It also highlights the seamless integration of Data Analytics managed services with the powerful capabilities of DoK8s open-source tools: reusable, composable, configurable.
Data on K8s (DoK8s) solution is categorized into the following focus areas.
- π― Data Analytics on K8s
- π― AI/ML on K8s
- π― Streaming Platforms on K8s
- π― Scheduler Workflow Platforms on K8s
- π― Distributed Databases & Query Engine on K8s
- π Reproducible Local Development with Dev Containers: VSCode, K8s, TF, Python/R
- data-engineering-python: Docker + VScode + Python = β€οΈ
- π JupyterHub on EKS π This blueprint deploys a self-managed JupyterHub on EKS with Amazon Cognito authentication.
- π Spark Operator with Apache YuniKorn on EKS π This blueprint deploys EKS cluster and uses Spark Operator and Apache YuniKorn for running self-managed Spark jobs
- π Self-managed Airflow on EKS π This blueprint sets up a self-managed Apache Airflow on an Amazon EKS cluster, following best practices.
- π Argo Workflows on EKS π This blueprint sets up a self-managed Argo Workflow on an Amazon EKS cluster, following best practices.
- π Kafka on EKS π This blueprint deploys a self-managed Kafka on EKS using the popular Strimzi Kafka operator.
Built with β€οΈ at AWS π₯οΈ K8s π Terraform π.