Visualization API
Functions for creating GraphViz visualizations of pipeline lineage.
from clgraph import (
visualize_pipeline_lineage,
visualize_table_dependencies,
visualize_table_dependencies_with_levels,
visualize_column_lineage,
visualize_lineage_path,
visualize_column_path,
)
# Quick example
dot = visualize_pipeline_lineage(pipeline.column_graph)
print(f"Generated DOT with {len(dot.source)} characters")
Pipeline Visualization
visualize_pipeline_lineage
Visualize column lineage across the entire pipeline.
visualize_pipeline_lineage(
graph: PipelineLineageGraph,
max_columns: int = 200,
return_debug_info: bool = False
) -> graphviz.Digraph | tuple[graphviz.Digraph, dict]
Parameters:
- graph: The pipeline's column lineage graph (pipeline.column_graph)
- max_columns: Maximum columns to display (default: 200)
- return_debug_info: If True, returns tuple with debug info
Returns: graphviz.Digraph object (or tuple if return_debug_info=True)
Example:
from clgraph import Pipeline, visualize_pipeline_lineage
queries = [
("staging", "CREATE TABLE staging AS SELECT id, name FROM raw.users"),
("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
# Create visualization
dot = visualize_pipeline_lineage(pipeline.column_graph)
# Save to file
with open("lineage.dot", "w") as f:
f.write(dot.source)
print(f"Created visualization: {len(dot.source)} chars")
With debug info:
from clgraph import Pipeline, visualize_pipeline_lineage
queries = [
("staging", "CREATE TABLE staging AS SELECT id, name FROM raw.users"),
("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
dot, debug_info = visualize_pipeline_lineage(
pipeline.column_graph,
return_debug_info=True
)
print(f"Columns displayed: {debug_info['columns_displayed']}")
print(f"Edges displayed: {debug_info['edges_displayed']}")
Table Dependency Visualization
visualize_table_dependencies
Visualize table-level dependencies as a DAG.
Parameters:
- table_graph: The pipeline's table dependency graph (pipeline.table_graph)
Returns: graphviz.Digraph object
Example:
from clgraph import Pipeline, visualize_table_dependencies
queries = [
("staging", "CREATE TABLE staging AS SELECT id FROM raw.users"),
("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
dot = visualize_table_dependencies(pipeline.table_graph)
with open("tables.dot", "w") as f:
f.write(dot.source)
print("Table dependency graph created")
visualize_table_dependencies_with_levels
Visualize table dependencies with execution levels (showing parallelization).
visualize_table_dependencies_with_levels(
table_graph: TableDependencyGraph,
pipeline: Pipeline
) -> graphviz.Digraph
Parameters:
- table_graph: The pipeline's table dependency graph
- pipeline: The Pipeline object (needed for execution levels)
Returns: graphviz.Digraph object with tables grouped by execution level
Example:
from clgraph import Pipeline, visualize_table_dependencies_with_levels
queries = [
("staging", "CREATE TABLE staging AS SELECT id FROM raw.users"),
("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
dot = visualize_table_dependencies_with_levels(pipeline.table_graph, pipeline)
with open("tables_levels.dot", "w") as f:
f.write(dot.source)
print("Table dependency graph with levels created")
Single-Query Visualization
visualize_column_lineage
Visualize column lineage for a single query.
Parameters:
- graph: A single query's column lineage graph (pipeline.query_graphs["query_id"])
- max_nodes: Maximum nodes to display (default: 100)
Returns: graphviz.Digraph object
Example:
from clgraph import Pipeline, visualize_column_lineage
queries = [
("my_query", "CREATE TABLE output AS SELECT id, UPPER(name) as name FROM source"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
# Get single query graph
query_graph = pipeline.query_graphs["my_query"]
dot = visualize_column_lineage(query_graph)
with open("query_lineage.dot", "w") as f:
f.write(dot.source)
print("Single query lineage created")
Lineage Path Visualization
visualize_lineage_path
Visualize a traced lineage path (from backward or forward tracing).
visualize_lineage_path(
nodes: list[ColumnNode],
edges: list[ColumnEdge],
is_backward: bool = True
) -> graphviz.Digraph
Parameters:
- nodes: List of nodes from trace_column_backward_full() or trace_column_forward_full()
- edges: List of edges from the same tracing method
- is_backward: True for backward trace, False for forward trace (affects styling)
Returns: graphviz.Digraph object
Example:
from clgraph import Pipeline, visualize_lineage_path
queries = [
("staging", "CREATE TABLE staging AS SELECT id, amount FROM raw.orders"),
("output", "CREATE TABLE output AS SELECT SUM(amount) as total FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
# Backward trace
nodes, edges = pipeline.trace_column_backward_full("output", "total")
dot = visualize_lineage_path(nodes, edges, is_backward=True)
with open("backward_trace.dot", "w") as f:
f.write(dot.source)
# Forward trace
nodes, edges = pipeline.trace_column_forward_full("raw.orders", "amount")
dot = visualize_lineage_path(nodes, edges, is_backward=False)
with open("forward_trace.dot", "w") as f:
f.write(dot.source)
print("Lineage path visualizations created")
visualize_column_path
Visualize the path to a specific column.
visualize_column_path(
graph: ColumnLineageGraph | PipelineLineageGraph,
column_name: str
) -> graphviz.Digraph
Parameters:
- graph: Either a single query graph or pipeline column graph
- column_name: Full column name (e.g., "table.column")
Returns: graphviz.Digraph object
Example:
from clgraph import Pipeline, visualize_column_path
queries = [
("staging", "CREATE TABLE staging AS SELECT id FROM raw.users"),
("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")
dot = visualize_column_path(pipeline.column_graph, "output.id")
with open("column_path.dot", "w") as f:
f.write(dot.source)
print("Column path visualization created")
Rendering DOT Files
All visualization functions return graphviz.Digraph objects. Convert to images using:
GraphViz CLI
# PNG (raster)
dot -Tpng lineage.dot -o lineage.png
# SVG (vector, recommended for web)
dot -Tsvg lineage.dot -o lineage.svg
# PDF (vector, for documents)
dot -Tpdf lineage.dot -o lineage.pdf
Python graphviz library
# Render directly (requires graphviz system package)
dot.render("lineage", format="png", cleanup=True)
# Or get SVG string
svg_content = dot.pipe(format="svg").decode("utf-8")
Function Reference
| Function | Input | Purpose |
|---|---|---|
visualize_pipeline_lineage() |
pipeline.column_graph |
Full pipeline column lineage |
visualize_table_dependencies() |
pipeline.table_graph |
Table-level DAG |
visualize_table_dependencies_with_levels() |
table_graph, pipeline |
Table DAG with execution levels |
visualize_column_lineage() |
pipeline.query_graphs[id] |
Single query column lineage |
visualize_lineage_path() |
nodes, edges |
Traced path visualization |
visualize_column_path() |
graph, column_name |
Path to specific column |