FAQ
Frequently asked questions about clgraph.
General
Do I need to change my SQL?
No. clgraph works with your existing SQL files. No annotations, no special syntax required.
What databases are supported?
BigQuery, Snowflake, PostgreSQL, DuckDB, Redshift, and many more. clgraph uses sqlglot for parsing, which supports 20+ SQL dialects.
Is it open source?
Yes. MIT license. View on GitHub.
Do I need to migrate my entire codebase at once?
No. You can adopt clgraph incrementally, query by query. There's no big-bang rewrite required.
Start with a few queries, see the lineage, and expand from there. As a bonus, writing lineage-friendly SQL (explicit column names, clear transformations) makes your code easier to review anyway—so migration improves code quality along the way.
Performance
Does it work with large pipelines?
Yes. Tested on 1,000+ queries and 10,000+ columns. Parse time is typically under 5 seconds.
How much memory does it use?
Memory usage scales with pipeline size. A pipeline with 1,000 columns typically uses ~50MB.
Integration
Can I use it with Airflow?
Yes. Generate Airflow DAGs automatically with pipeline.to_airflow_dag(). See Pipeline Orchestration for details.
Can I use it with dbt?
Yes. Use dbt's compiled SQL output with clgraph:
See Template Variables for more details.
Can I export lineage to my data catalog?
Yes. Export to JSON, CSV, or GraphViz formats:
# JSON for data catalogs
pipeline.to_json()
# CSV for spreadsheets
from clgraph.export import CSVExporter
CSVExporter.export_columns_to_file(pipeline, "columns.csv")
Features
How does column lineage work?
clgraph parses your SQL and builds a graph of column dependencies. It tracks:
- Direct column references
- Transformations (SUM, JOIN, CASE, etc.)
- Star expansion (
SELECT *) - CTE and subquery resolution
See From SQL to Lineage Graph for details.
How does metadata propagation work?
When you mark a column as PII or add tags, clgraph can propagate that metadata through the lineage graph:
pipeline.columns["raw.users.email"].pii = True
pipeline.propagate_all_metadata()
# All downstream columns now marked as PII
See Metadata from Comments for details.
Can I split a pipeline into smaller pieces?
Yes. Use pipeline.split() to create sub-pipelines by sink tables:
See Pipeline Orchestration for details.
Troubleshooting
My SQL isn't parsing correctly
- Check the dialect is correct:
Pipeline.from_sql_files("sql/", dialect="bigquery") - Verify SQL is valid in your target database
- Check for unsupported syntax (some edge cases may not be supported)
Column lineage is missing some columns
This can happen with:
- Dynamic SQL or templated queries (use
template_contextparameter) SELECT *from external tables (clgraph can't know the schema)- Unqualified column names in JOINs (use table aliases)
I'm getting import errors
Make sure clgraph is installed in your active environment:
For LLM features, install with extras:
More Questions?
- GitHub Issues: Report a bug or ask a question
- Discussions: Start a discussion