Why Column Lineage Should Be Built-In, Not Bolted-On
In my first few data engineering jobs, I spent hours, sometimes days, debugging data quality issues. My workflow was always the same: dig through SQL queries, check information schema, scan query logs, and try to piece together what went wrong. For impact analysis, I'd do the same thing in reverse: trace forward through the code to see what might break.
I found myself drawing diagrams on paper to visualize the relationships between columns. Sometimes within a single query with complex CTEs, sometimes across multiple queries in a pipeline. These hand-drawn graphs were the only way I could keep track of how data flowed.