Debussy is a free, open-source, opinionated Data Architecture and Engineering framework. It enables data analysts and engineers to build better data platforms through first class data pipelines, following a low-code and self-service approach.
Description
·
Key Features
·
Key Benefits
·
Quick Start
·
Integrations
Full Documentation
·
Communication
·
Contributions
·
License
In the data engineering field, everyone is reinventing the wheel all the time – it's still rare to see the adoption of software engineering best practices, such as DRY, KISS or YAGNI. Despite the existence of several tools for data orchestration (e.g. Apache Airflow, Prefect, Dagster) and distributed data processing (e.g. Apache Spark, Apache Beam), every time a new data pipeline demand arises it usually implies lengthy development projects. Think of developing a web application without the help of a web framework such as Django or Flask!
What's even worse, although sharing key concepts, these data orchestration tools have very distinct syntaxes and features, making migrations a daunting task! Moreover, simply adopting these tools does not guarantee that best practices are being followed, including with regard to data architecture (think of data modeling, data management lifecycle, among others).
While lots of companies have faced these same issues, most of them have decided to develop their own in-house solutions, missing the opportunity for colaboration and wider adoption of data architecture and sofware engineering best practices.
With that in mind, we created Debussy! Debussy Concert is the core component of Debussy. It's a code generation engine for orchestration tools, currently supporting only Airflow, but with others on the Roadmap. It provides abstraction layers in the form of a musical themed semantic model, decoupling the pipeline logic to the underlying orchestration tool, and enabling a low-code approach to data engineering. We also provides pipelines templates (e.g. data ingestion, data transformation and reverse ETL) built with our engine, while always striving to offer the aforementioned best practices.
- Dynamic data pipeline generation from YAML configuration files or directly through Python
- Provides a semantic model for data pipeline development, abstracting the inner orchestration engine
- Enables seamless integration of first class data projects, such as Airflow, Spark, and dbt
✔ It provides lower time to delivery and costs related to data pipeline development, while enabling higher ROI
✔ Avoid pipeline debt by following sound software engineering design principles
✔ Ensure your platform is following data architecture best practices
Debussy works on any installation of Apache Airflow 2.0, but since we currently support only GCP based data platforms as the target Data Lakehouse, we recommend a deployment to Cloud Composer.
In order to use Debussy, you first need to go through the following steps:
- Select or create a Google Cloud Platform project.
- Enable billing for your project.
- Create a Cloud Composer 2 environment.
- Install Debussy on your Cloud Composer instance: just upload the project to your
plugins/
folder. - Check our User's Guide and examples to learn how to use it!
Debussy works with the tools and systems that you're already using with your data, including:
See the Wiki for full documentation, examples, operational details and other information.
We welcome all community contributions!
In order to have a more open and welcoming community, Debussy adheres to a code of conduct adapted from Contributor Covenant.
Please read through our contributing guidelines. Included are directions for opening issues, coding standards, and notes on development.
Copyright 2022 Dotz, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.