It was late he kind of quiet evening when every human questions the meaning of life and their pipelines of course :).I’d been debugging a transformation script that broke for the fifth time that week. The problem wasn’t the data; it was the process.
That’s when I stumbled upon dbt (Data Build Tool), not a new ETL tool, but a framework that thinks like a software engineer. SQL models, modularity, testing, and lineage suddenly, transformation felt… elegant.
If you’ve ever written SQL transformations in a warehouse and wondered why everything feels disconnected — dbt is the missing link.
dbt is a data transformation framework that turns your warehouse into a software development environment. You define models as .sql files, manage dependencies with ref(), run automated tests, and version your transformations — all in Git.
Think of dbt as:
In essence, Bronze, Silver and Gold aren’t just a naming convention — it’s a design philosophy. dbt lets you build, test, and document each layer with confidence.
\
Like every good story, this one starts with chaos.
The Bronze layer is where raw data lands — unfiltered, imperfect, and full of potential.
In my project, the Bronze models ingested four datasets: orders, products, users, and reviews.
-- models/bronze/bronze_orders.sql {{ config(materialized='table') }}
SELECT order_id, user_id, product_id, quantity, unit_price, order_date, current_timestamp() AS ingestion_timestamp FROM {{ source('raw', 'orders') }}
This stage is all about fidelity over perfection — preserving every record from the source system. If data were gold ore, the Bronze layer is the mining site.
The Silver layer is where the alchemy begins. Here, the goal is to clean, standardize, and enrich — turning raw ore into polished data assets.
-- models/silver/silver_orders.sql {{ config(materialized='table') }}
SELECT o.order_id, o.user_id, o.product_id, o.quantity, o.unit_price, (o.quantity * o.unit_price) AS order_amount, u.country, p.category, p.brand FROM {{ ref('bronze_orders') }} o JOIN {{ ref('bronze_users') }} u ON o.user_id = u.user_id JOIN {{ ref('bronze_products') }} p ON o.product_id = p.product_id
This layer introduces business logic — standardizing currency, validating foreign keys, and computing new metrics.
I even used a simple unit test in dbt to validate calculations:
# models/silver/_silver.yml unit_tests: - name: test_order_amount_calc model: silver_orders given: - input: ref('bronze_orders') format: csv rows: | id,quantity,unit_price 1,10,100.00 2,2,5.00 expect: format: csv rows: | id,order_amount 1,1000.00 2,10.00
Testing is where dbt shines — imagine asserting your data logic before it breaks production.
By the time data reaches the Gold layer, it has survived the turbulence of the Bronze lake and the refinements of the Silver refinery.
Now, it’s ready to speak business.
In the Medallion Architecture, the Gold layer is the semantic and analytics layer where data is no longer raw or transactional but aggregated, curated, and meaningful. It’s here that analysts, decision-makers, and BI tools finally meet the story your data has been trying to tell.
While the Silver layer focuses on conformance and cleansing, the Gold layer is where business value is defined.
This layer answers questions like:
These are not raw metrics — they are insights, built from Silver tables that have already handled all the data quality, relationships, and enrichments.
My first Gold model, goldsalesdaily.sql, takes transactional-level data from silverorders and aggregates it into daily revenue facts.
-- models/gold/gold_sales__daily.sql {{ config(materialized='view') }}
SELECT order_date, SUM(order_amount) AS total_sales, COUNT(DISTINCT order_id) AS total_orders FROM {{ ref('silver_orders') }} GROUP BY order_date
Each record in this table represents the business pulse for a single day: total revenue, number of orders, and sales volume. This model is perfect for trend analysis, daily dashboards, and KPI tracking in tools like Power BI, Preset, or Tableau.
Under the hood, dbt automatically understands dependencies via the ref(‘silverorders’) function — ensuring this model only runs after silverorders has successfully built.
The second Gold model, goldavgrating__daily.sql, captures how customers feel about their purchases — a perfect complement to sales metrics.
-- models/gold/gold_avg_rating__daily.sql {{ config(materialized='view') }}
SELECT review_date, AVG(rating) AS avg_rating, COUNT(review_id) AS total_reviews FROM {{ ref('silver_reviews') }} GROUP BY review_date
Here, I’m transforming individual review events into a daily sentiment summary, providing an at-a-glance view of customer satisfaction trends.
Like the Silver layer, the Gold layer includes unit tests to verify that business logic holds even as upstream data changes.
# models/gold/_gold.yml version: 2
unit_tests: - name: test_gold_avg_rating model: gold_avg_rating__daily given: - input: ref('bronze_reviews') format: csv rows: | id,created_at,product_id,rating 1,2018-01-04T00:57:04.845+00:00,1,5 2,2018-01-04T00:00:00.140+00:00,1,4 3,2020-01-04T00:00:10.040+00:00,2,3 - input: ref('silver_products') format: csv rows: | id,product_name 1,'product1' 2,'product2' expect: format: csv rows: | review_date,product_id,avg_rating 2018-01-04,1,4.5 2020-01-04,2,3
This test asserts that when reviews are averaged by date and product, the resulting daily ratings are correct — proof that the business logic in the model works as intended.
At this point, the Gold layer feeds directly into dashboards and data applications. The same datasets that began as chaotic CSVs in Bronze are now refined insights driving executive reports, automated KPIs, and machine learning pipelines.
For example:
With dbt, the entire journey from ingestion to insight is transparent, testable, and reproducible — the holy trinity of modern data engineering. The Gold layer is where data storytelling truly happens.
Your transformations are no longer technical — they are narrative. Each SQL statement contributes to a chapter in the business story, one that explains not just what happened, but why it matters.
By the time I reached this stage, I realized that dbt wasn’t just transforming data. Under the hood, dbt runs on an elegant philosophy: if you can describe your logic clearly, dbt can orchestrate it reliably. This chapter is about the machinery — the quiet genius that holds the Medallion Architecture together.
Every dbt project starts with a heart — a file called dbt_project.yml.
It’s the project’s DNA, describing where models live, how they’re built, and which schema or materialization rules to apply.
Here’s what mine looks like:
# dbt_project.yml name: 'dbt_databricks_project' version: '1.0.0' profile: 'default'
model-paths: ["models"] macro-paths: ["macros"] test-paths: ["tests"] snapshot-paths: ["snapshots"]models: dbt_databricks_project: bronze: +schema: bronze +materialized: table silver: +schema: silver +materialized: table gold: +schema: gold +materialized: view
This single file tells dbt:
Simple, declarative, and consistent. That’s one of DBT’s core superpowers — configuration over chaos.
In a traditional pipeline, you’d write scripts to manually define job dependencies. In dbt, this logic is handled elegantly with one word: ref().
SELECT * FROM {{ ref('silver_orders') }}
That single Jinja function call tells dbt:
From there, dbt automatically constructs a directed acyclic graph (DAG) — the same kind of dependency tree you’d see in tools like Airflow or Dagster.
When you run:
dbt run --select gold
dbt doesn’t just execute a SQL script — it resolves the full dependency chain:
dbt encourages modularity through packages, just like Python or Node.
# packages.yml packages: - package: dbt-labs/dbt_utils version: 1.3.1
The dbt_utils package brings in a toolbox of reusable macros — functions that simplify things like testing, joins, and common data patterns. This modular design means you never reinvent the wheel. Need to check if a column is unique or not null? One line of YAML does it. dbt treats reusability as a design principle, not an afterthought.
Tests in dbt are not unit tests in code — they’re trust checks for your data.
They assert truths like:
These can be written inline in your YAML schema files or as custom tests using macros.
When you run:
dbt test
dbt runs every test in parallel, ensuring your pipelines are safe, validated, and production-ready.
Then comes my favourite feature — dbt Docs.
With a single command:
dbt docs generate dbt docs serve
dbt builds a self-documenting data catalogue with Column descriptions, Data types, and source links. And a beautiful lineage graph showing how Bronze flows into Silver, and Silver into Gold.
It’s not just documentation; it’s transparency — the kind that builds confidence between engineers, analysts, and executives.
When all these pieces come together — configurations, references, packages, tests, and docs — dbt orchestrates a seamless symphony. You’re not writing isolated SQL scripts anymore; You’re building a data ecosystem where every model knows its place and purpose.
So when you run:
dbt run --select silver
dbt knows exactly what to do.
It compiles, orders, and executes every dependent model in sequence — no Airflow DAG required.
dbt keeps the Medallion Architecture alive by automating discipline:
Every model has a schema, purpose, and test.
Every dependency is explicit.
Every layer — Bronze, Silver, and Gold — is reproducible from scratch.
It’s the invisible scaffolding that ensures your Lakehouse doesn’t collapse under complexity. In a world where pipelines often break in silence, dbt whispers clarity.
Building this project offered several practical insights into implementing dbt and the Medallion Architecture effectively within a modern data platform.
Successful data projects begin with a clear architectural vision, not SQL syntax.
Defining data flow from raw ingestion to curated output ensures every model has a specific purpose within the pipeline. Planning dependencies, naming conventions, and transformation logic upfront reduces complexity and rework later.
Reliable analytics depend on trusted data.
Integrating dbt tests into each layer (Bronze, Silver, Gold) enforces data validation and prevents downstream issues. Tests for nulls, duplicates, and calculation accuracy (e.g., order amount and average rating) help maintain integrity as data scales. In dbt, testing becomes part of the engineering process rather than an afterthought.
Documentation is as critical as modelling.
dbt automatically generates lineage, descriptions, and schema visibility through dbt docs generate, enabling transparency across teams. Clear documentation accelerates onboarding, simplifies debugging, and improves collaboration between engineering and analytics functions.
dbt encourages data teams to think like software engineers: modular design, version control, and CI/CD integration.
Breaking logic into reusable components, leveraging packages such as dbt_utils, and maintaining models in Git repositories results in scalable, maintainable pipelines. Consistency and automation improve reliability and simplify operational governance.
Effective data architecture prioritizes readability and structure over complexity.DBT’s declarative configuration, ref() dependencies, and schema-based organization make pipelines self-describing and predictable. Simplicity ensures long-term maintainability and reduces friction when adapting to business changes.
The Medallion Architecture emphasizes progression and control:
Viewing the pipeline as an evolving system, not isolated scripts, enforces discipline and traceability throughout the data lifecycle.
Ultimately, modern data engineering is about trust in process, models, and outcomes. dbt and the Lakehouse pattern provide the structure to make that trust measurable through testing, documentation, and transparency.
Every model should answer a clear business question and pass objective validation.
This project reinforced that scalable data platforms depend on design, discipline, and documentation.
By combining dbt’s engineering-driven approach with the Medallion Architecture, teams can build reproducible, testable, and insight-ready data systems that evolve reliably over time.
In the end, dbt isn’t just another data tool; it’s a movement.
It bridges the gap between analytics and engineering, empowering teams to build data like software, tested, versioned, and documented.


