Design End-to-End Lineage Solution — Presentation
In the last post, I summarized the most common lineage cases, and possible solutions for each case. I also mentioned a solution (implemented in metadata discovery tool such as Datahub. Atlas might have also implemented it but I have not done my verification) to provide the end-to-end lineage graph representation while a dataset might be from different systems and time.
In this post, I am focusing on the lineage presentation. In general, this is more like a demo to answer the question: now I have all the lineage info, how will it looks like for end user and how much I can gain?
First of all, a lineage presentation should give the full picture of a dataset (table schema) deriving history
In this example, Snowflake’s table was from S3, then a view is generated after this table within Snowflake. You click each block, and will see the details about this table or view.
From what I have seen, tools such as Datahub, Alation or Atlas, also provide a feature called Impact Analysis. Here is an example from Datahub.
In Datahub’s example, the impact analysis seems too simple. It’s just another way to show how an upstream dataset affects the downstream datasets.
There are some other tools such as Alation, especially, using Query Log parser based approach, also highlight the certain part of SQL statement when you click one of those tables.