Data.Rentgen is a DataLineage service compatible with OpenLineage specification.
Note: service is under active development, and is not ready to use.
- Collect lineage events produced by OpenLineage clients & integrations (Spark, Airflow, Flink, custom ones).
- Store operation-grained events (instead of job grained Marquez), for better detalization.
- Provide API for run ↔ dataset lineage, as well as parent run → children run lineage.
- Support handling large amounts of lineage events, using Kafka as event buffer and storing data in tables partitioned by event timestamp.
- This is not a data catalog. Use Datahub or OpenMetadata instead.
- Static dataset → dataset lineage (like view → table) is not supported.
- Currently column-level lineage is not supported.