Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding information about Open Data Discovery (ODD) integration #857

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RamanDamayeu
Copy link

Adding information about Open Data Discovery (ODD) to the list of "Tools integrating with Airflow". It leverages Listeners capabilities of Airflow. Implemented with https://github.com/opendatadiscovery/odd-airflow-2

Adding information about Open Data Discovery (ODD) to the list of "Tools integrating with Airflow". It leverages Listeners capabilities of Airflow. Implemented with https://github.com/opendatadiscovery/odd-airflow-2
@potiuk
Copy link
Member

potiuk commented Aug 29, 2023

Could you change it with the link to the Github project? the link you proposed does not refer to airflow in any way and it's unclear why it would be mentioned on Airflow's ecosystem page.

@RamanDamayeu
Copy link
Author

Thanks, very valid point!

Of course, I left a link to the implementation of ODD integration with Airflow in a comment, but I agree that it will not be very easy for users to understand how exactly this platform is connected with Airflow.

Please advise how best to proceed here. The bottom line is that the ODD application implies several components, the main of which is the so-called platform itself. Here is a link to a repository with it.

Also for various integrations with the platform (that is, in fact, the components that are mediators between external systems and the platform itself) are implemented as collectors (if we want to implement the pull approach, that is, the components themselves go to the systems to collect meta-information with some periodicity, for example, there are such collectors for a set of databases implemented here, or for collecting metadata from some of AWS services, there is own for GCP, for Azure, etc. with their own repositories) or as adapters, if we want to expand the functionality of some external system and implement the push approach (here just like for the Airflow, this is a repository for versions up to 1.10.15 and for versions >= 2.5.1 here is a new rep (uses Listeners for integration), the same approach is used to integrate Great Expectations, dbt, Spark). The push/pull approach is described here.

An overview of the architecture we could find here: https://docs.opendatadiscovery.org/architecture

Having said all this, as you say, taking into account that there is a whole set of repositories - and for integration with Airflow we need to use at least two: the platform itself and the adapter - as we need to leave a link on the page should I change it to the adapter for Airflow versions >= 2.5.1, to a platform or maybe to be generally simple to the git organization of ODD?

@potiuk
Copy link
Member

potiuk commented Aug 30, 2023

I think it's best to link to one of your GitHub repos and have a readme there explaining how to integrate airflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants