Skip to content

Visualization API

Functions for creating GraphViz visualizations of pipeline lineage.

from clgraph import (
    visualize_pipeline_lineage,
    visualize_table_dependencies,
    visualize_table_dependencies_with_levels,
    visualize_column_lineage,
    visualize_lineage_path,
    visualize_column_path,
)

# Quick example
dot = visualize_pipeline_lineage(pipeline.column_graph)
print(f"Generated DOT with {len(dot.source)} characters")

Pipeline Visualization

visualize_pipeline_lineage

Visualize column lineage across the entire pipeline.

visualize_pipeline_lineage(
    graph: PipelineLineageGraph,
    max_columns: int = 200,
    return_debug_info: bool = False
) -> graphviz.Digraph | tuple[graphviz.Digraph, dict]

Parameters: - graph: The pipeline's column lineage graph (pipeline.column_graph) - max_columns: Maximum columns to display (default: 200) - return_debug_info: If True, returns tuple with debug info

Returns: graphviz.Digraph object (or tuple if return_debug_info=True)

Example:

from clgraph import Pipeline, visualize_pipeline_lineage

queries = [
    ("staging", "CREATE TABLE staging AS SELECT id, name FROM raw.users"),
    ("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

# Create visualization
dot = visualize_pipeline_lineage(pipeline.column_graph)

# Save to file
with open("lineage.dot", "w") as f:
    f.write(dot.source)

print(f"Created visualization: {len(dot.source)} chars")

With debug info:

from clgraph import Pipeline, visualize_pipeline_lineage

queries = [
    ("staging", "CREATE TABLE staging AS SELECT id, name FROM raw.users"),
    ("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

dot, debug_info = visualize_pipeline_lineage(
    pipeline.column_graph,
    return_debug_info=True
)

print(f"Columns displayed: {debug_info['columns_displayed']}")
print(f"Edges displayed: {debug_info['edges_displayed']}")


Table Dependency Visualization

visualize_table_dependencies

Visualize table-level dependencies as a DAG.

visualize_table_dependencies(
    table_graph: TableDependencyGraph
) -> graphviz.Digraph

Parameters: - table_graph: The pipeline's table dependency graph (pipeline.table_graph)

Returns: graphviz.Digraph object

Example:

from clgraph import Pipeline, visualize_table_dependencies

queries = [
    ("staging", "CREATE TABLE staging AS SELECT id FROM raw.users"),
    ("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

dot = visualize_table_dependencies(pipeline.table_graph)

with open("tables.dot", "w") as f:
    f.write(dot.source)

print("Table dependency graph created")

visualize_table_dependencies_with_levels

Visualize table dependencies with execution levels (showing parallelization).

visualize_table_dependencies_with_levels(
    table_graph: TableDependencyGraph,
    pipeline: Pipeline
) -> graphviz.Digraph

Parameters: - table_graph: The pipeline's table dependency graph - pipeline: The Pipeline object (needed for execution levels)

Returns: graphviz.Digraph object with tables grouped by execution level

Example:

from clgraph import Pipeline, visualize_table_dependencies_with_levels

queries = [
    ("staging", "CREATE TABLE staging AS SELECT id FROM raw.users"),
    ("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

dot = visualize_table_dependencies_with_levels(pipeline.table_graph, pipeline)

with open("tables_levels.dot", "w") as f:
    f.write(dot.source)

print("Table dependency graph with levels created")


Single-Query Visualization

visualize_column_lineage

Visualize column lineage for a single query.

visualize_column_lineage(
    graph: ColumnLineageGraph,
    max_nodes: int = 100
) -> graphviz.Digraph

Parameters: - graph: A single query's column lineage graph (pipeline.query_graphs["query_id"]) - max_nodes: Maximum nodes to display (default: 100)

Returns: graphviz.Digraph object

Example:

from clgraph import Pipeline, visualize_column_lineage

queries = [
    ("my_query", "CREATE TABLE output AS SELECT id, UPPER(name) as name FROM source"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

# Get single query graph
query_graph = pipeline.query_graphs["my_query"]

dot = visualize_column_lineage(query_graph)

with open("query_lineage.dot", "w") as f:
    f.write(dot.source)

print("Single query lineage created")


Lineage Path Visualization

visualize_lineage_path

Visualize a traced lineage path (from backward or forward tracing).

visualize_lineage_path(
    nodes: list[ColumnNode],
    edges: list[ColumnEdge],
    is_backward: bool = True
) -> graphviz.Digraph

Parameters: - nodes: List of nodes from trace_column_backward_full() or trace_column_forward_full() - edges: List of edges from the same tracing method - is_backward: True for backward trace, False for forward trace (affects styling)

Returns: graphviz.Digraph object

Example:

from clgraph import Pipeline, visualize_lineage_path

queries = [
    ("staging", "CREATE TABLE staging AS SELECT id, amount FROM raw.orders"),
    ("output", "CREATE TABLE output AS SELECT SUM(amount) as total FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

# Backward trace
nodes, edges = pipeline.trace_column_backward_full("output", "total")
dot = visualize_lineage_path(nodes, edges, is_backward=True)

with open("backward_trace.dot", "w") as f:
    f.write(dot.source)

# Forward trace
nodes, edges = pipeline.trace_column_forward_full("raw.orders", "amount")
dot = visualize_lineage_path(nodes, edges, is_backward=False)

with open("forward_trace.dot", "w") as f:
    f.write(dot.source)

print("Lineage path visualizations created")

visualize_column_path

Visualize the path to a specific column.

visualize_column_path(
    graph: ColumnLineageGraph | PipelineLineageGraph,
    column_name: str
) -> graphviz.Digraph

Parameters: - graph: Either a single query graph or pipeline column graph - column_name: Full column name (e.g., "table.column")

Returns: graphviz.Digraph object

Example:

from clgraph import Pipeline, visualize_column_path

queries = [
    ("staging", "CREATE TABLE staging AS SELECT id FROM raw.users"),
    ("output", "CREATE TABLE output AS SELECT id FROM staging"),
]
pipeline = Pipeline.from_tuples(queries, dialect="bigquery")

dot = visualize_column_path(pipeline.column_graph, "output.id")

with open("column_path.dot", "w") as f:
    f.write(dot.source)

print("Column path visualization created")


Rendering DOT Files

All visualization functions return graphviz.Digraph objects. Convert to images using:

GraphViz CLI

# PNG (raster)
dot -Tpng lineage.dot -o lineage.png

# SVG (vector, recommended for web)
dot -Tsvg lineage.dot -o lineage.svg

# PDF (vector, for documents)
dot -Tpdf lineage.dot -o lineage.pdf

Python graphviz library

# Render directly (requires graphviz system package)
dot.render("lineage", format="png", cleanup=True)

# Or get SVG string
svg_content = dot.pipe(format="svg").decode("utf-8")

Function Reference

Function Input Purpose
visualize_pipeline_lineage() pipeline.column_graph Full pipeline column lineage
visualize_table_dependencies() pipeline.table_graph Table-level DAG
visualize_table_dependencies_with_levels() table_graph, pipeline Table DAG with execution levels
visualize_column_lineage() pipeline.query_graphs[id] Single query column lineage
visualize_lineage_path() nodes, edges Traced path visualization
visualize_column_path() graph, column_name Path to specific column