Hands-on Guide to Data Modeling with dbt and Visualization with Lightdash

aibunny
10 min readDec 20, 2023

--

Introduction

Welcome to our hands-on guide where we’ll explore the powerful combination of dbt (data build tool) for data modeling and Lightdash for visualization. In this blog, I’ll take you through the process of transforming raw data into meaningful insights using the well-known Microsoft Northwind database.

Why dbt and Lightdash?

Data modeling and visualization are crucial components of any data-driven decision-making process. dbt simplifies the data modeling process, allowing you to define transformations and relationships in your data warehouse effortlessly. Lightdash, on the other hand, brings these models to life with an intuitive interface for data exploration and visualization.

What to Expect

In this guide, I’ll walk you through setting up your environment, modeling data using dbt, and visualizing the results with Lightdash. Whether you’re a data analyst, engineer, or business user, this hands-on experience will provide you with practical insights into the seamless integration of dbt and Lightdash.

Prerequisites

Outline the tools and resources needed:

  • PostgreSQL database
  • Microsoft Northwind database
  • Github
  • dbt (install instructions)
  • Lightdash (install instructions)

Section 1: Setup PostgreSQL and Create a Database

1.1 Create an Account on Supabase

Visit Supabase and sign up for a new account.

Create a New Project:

  • Once logged in, click on the “New Project” button.
  • Provide a name for your project give it a few minutes to start the project and proceed.

1.2 Access the PostgreSQL Database on Supabase

Navigate to the settings Tab:

  • After creating your project, go to the “settings” tab in the left menu then database, here you’ll find the details for your PostgreSQL database (host, user, password, port, dbname).

1.3 Connect to Supabase PostgreSQL Database Locally

Install psql:

  • Ensure you have PostgreSQL’s command-line tool (psql) installed locally.

Connect to Supabase PostgreSQL:

  • Open your terminal and run the following command:
psql -h <YOUR_SUPABASE_DB_HOST> -p <YOUR_SUPABASE_DB_PORT> -d <YOUR_SUPABASE_DB_NAME> -U <YOUR_SUPABASE_DB_USER>
  • Replace <YOUR_SUPABASE_DB_HOST>, <YOUR_SUPABASE_DB_PORT>, <YOUR_SUPABASE_DB_NAME>, and <YOUR_SUPABASE_DB_USER> with the details from your Supabase project.
  • Enter your password when prompted.

1.4 Upload Northwind Database to Supabase

Download Northwind SQL Dump:

  • Download the Northwind SQL dump file from this link.

Execute SQL Dump:

  • Once connected to the Supabase PostgreSQL database, run the following command:
\i /path/to/downloaded/northwind.sql
  • Replace /path/to/downloaded/northwind.sql with the actual path where you downloaded the Northwind SQL dump file.

Section 2: Setting Up dbt with PostgreSQL and Creating Model Files

Now let’s install dbt and initialize a dbt project connected to our above the postgres database above.

1.1: Create and Activate a Virtual Environment

Choose a directory for your project and create a virtual environment. Open your terminal and run:

mkdir your_project_directory
cd your_project_directory
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`

1.2 Install dbt

For this project we will use dbt-core this is dbt’s python library for local development, we’ll also install dbt's postgres adapterdbt-postgres . Open your teminal and run

pip install dbt-core dbt-postgres

Verify the installation:

dbt --version

1.3 Initialize a New dbt Project

In this folder run dbt init project_name this command creates the necessary structure and configuration files for your dbt project, ensuring a seamless setup for your data transformations.This command creates the necessary structure and configuration files for your dbt project, ensuring a seamless setup for your data transformations.

project_name/
├── dbt_project.yml
├── models/
├── seeds/
├── sources/
├── tests/
├── macros/
├── analysis/
└── docs/

For the purpose of this hands on experience let’s name our project `dbt_northwind`

dbt init dbt_northwind

dbt will start your project and will also prompt you to select a database you would like to use this mainly depends on the adaptors you have installed in your projects environment but since we only have postgres, that will the only one listed from our choices.

enter number one then enter the details for your postgres database

This step creates the the profiles.yml file in your home directory at ~/.dbt which centralizes dbt connection information. You can view/edit the file by running on linux.

nano ~/.dbt/profiles.yml

Now let’s run `dbt debug` to confirm that our PSQL connection works

Section 3: Creating dbt Models for Northwind Database

This step-by-step walkthrough will help you set up meaningful dbt models, enabling powerful analytics on your data. Below, we define models with clear purposes and dimensions for the Northwind database.

Step 1: Define dbt Models

In your models directory you’ll find an example direcory with sample models by dbt delete this directory it’s just there for demonstation purposes. , create SQL files for each model. We'll define four models, each serving a distinct purpose, but before that we’ll create a schema file in the models directory which will instruct dbt on which schema to use.

#models/northwind/schema.yml
version: 2

sources:
- name: northwind_data
schema: public
tables:
- name: order_details
- name: customers
- name: products
- name: orders

This ensures that your models are part of the specified schema (public) and are well-organized within your dbt project.

Now let’s define our models

  1. Customer Orders:
  • Purpose: Analyzing customer orders to derive insights into customer purchasing behavior.
  • SQL Model:
-- models/northwind/customer_orders.sql
WITH customer_orders AS (
SELECT
o.order_id,
c.customer_id,
c.company_name,
COUNT(od.product_id) AS num_products_ordered,
SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_order_amount
FROM
{{ source('northwind_data', 'orders') }} o
JOIN {{ source('northwind_data', 'customers') }} c ON o.customer_id = c.customer_id
LEFT JOIN {{ source('northwind_data', 'order_details') }} od ON o.order_id = od.order_id
GROUP BY
o.order_id, c.customer_id, c.company_name
)

SELECT *
FROM customer_orders;

2. Customer Segmentation:

  • Purpose: Segmenting customers based on their order count and total spend for targeted marketing strategies.
  • SQL Model:
-- models/northwind/customer_segmentation.sql
WITH customer_segmentation AS (
SELECT
c.customer_id,
c.company_name,
COUNT(o.order_id) AS num_orders,
SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_spent
FROM
{{ source('northwind_data', 'customers') }} c
LEFT JOIN {{ source('northwind_data', 'orders') }} o ON c.customer_id = o.customer_id
LEFT JOIN {{ source('northwind_data', 'order_details') }} od ON o.order_id = od.order_id
GROUP BY
c.customer_id, c.company_name
)

SELECT *,
CASE
WHEN num_orders >= 10 AND total_spent >= 1000 THEN 'High Value'
WHEN num_orders >= 5 THEN 'Mid Value'
ELSE 'Low Value'
END AS customer_segment
FROM customer_segmentation;

3. Product Sales:

  • Purpose: Analyzing product sales to identify top-performing products and optimize inventory.
  • SQL Model:
-- models/northwind/product_sales.sql
WITH product_sales AS (
SELECT
p.product_id,
p.product_name,
COUNT(od.order_id) AS num_orders,
SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_sales
FROM
{{ source('northwind_data', 'products') }} p
LEFT JOIN {{ source('northwind_data', 'order_details') }} od ON p.product_id = od.product_id
GROUP BY
p.product_id, p.product_name
)

SELECT *
FROM product_sales;

4. Revenue by Region:

  • Purpose: Understanding revenue distribution across different shipping regions.
  • SQL Model:
-- models/northwind/revenue_by_region.sql
WITH revenue_by_region AS (
SELECT
o.ship_region,
COUNT(o.order_id) AS num_orders,
SUM(od.unit_price * od.quantity * (1 - od.discount)) AS total_sales
FROM
{{ source('northwind_data', 'orders') }} o
LEFT JOIN {{ source('northwind_data', 'order_details') }} od ON o.order_id = od.order_id
GROUP BY
o.ship_region
)

SELECT *
FROM revenue_by_region;

Step 2: Define Dimensions for our models

Create a models.yml file to define dimensions for each model, providing context and enhancing the data modeling process.

# models/northwind/models.yml
version: 2

models:
- name: customer_orders
columns:
- name: customer_id
meta:
dimension:
type: number
label: 'Customer ID'
description: 'The unique identifier for a customer.'
- name: company_name
meta:
dimension:
type: string
label: 'Company Name'
description: 'The name of the customer company.'
- name: num_products_ordered
meta:
dimension:
type: number
label: 'Number of Products Ordered'
description: 'The total number of products ordered.'
- name: total_order_amount
meta:
dimension:
type: number
label: 'Total Order Amount'
description: 'The total amount of the order.'

- name: customer_segmentation
columns:
- name: customer_id
meta:
dimension:
type: number
label: 'Customer ID'
description: 'The unique identifier for a customer.'
- name: company_name
meta:
dimension:
type: string
label: 'Company Name'
description: 'The name of the customer company.'
- name: num_orders
meta:
dimension:
type: number
label: 'Number of Orders'
description: 'The total number of orders for the customer.'
- name: total_spent
meta:
dimension:
type: number
label: 'Total Spent'
description: 'The total amount spent by the customer.'
- name: customer_segment
meta:
dimension:
type: string
label: 'Customer Segment'
description: 'Segmentation based on order count and total spend.'

- name: product_sales
columns:
- name: product_name
meta:
dimension:
type: string
label: 'Product Name'
description: 'The name of the product.'
- name: num_orders
meta:
dimension:
type: number
label: 'Number of Orders'
description: 'The total number of orders for the product.'
- name: total_sales
meta:
dimension:
type: number
label: 'Total Sales'
description: 'The total sales amount for the product.'

- name: revenue_by_region
columns:
- name: ship_region
meta:
dimension:
type: string
label: 'Ship Region'
description: 'The region to which the order is shipped.'
- name: num_orders
meta:
dimension:
type: number
label: 'Number of Orders'
description: 'The total number of orders for the ship region.'
- name: total_sales
meta:
dimension:
type: number
label: 'Total Sales'
description: 'The total sales amount for the ship region.'

Now, with defined models, dimensions, and schema, you’re ready to run dbt and explore the analytical possibilities within the Northwind database.

Let’s now run dbt create the views on our database using dbt run

since the project is running successfully now we can got to the next step.

Section 4: Upload Your dbt Project to GitHub

To enhance collaboration and facilitate seamless integration with Lightdash, we encourage you to upload your dbt project to GitHub. By doing so, you'll leverage version control, enable collaborative development, and ensure a smooth transition to Lightdash for impactful data visualization.

Follow these simple steps:

  1. Create a GitHub Repository:
  • Head to GitHub and create a new repository for your dbt project.

2. Upload Your dbt Project:

  • Push your dbt project files, including the dbt_project.yml, models, and any associated configurations, to your GitHub repository.

Section 5: Connect Lightdash to your project

Now that your dbt project is securely hosted on GitHub, let’s seamlessly integrate it with Lightdash Cloud for powerful and interactive data visualization. Follow these steps:

  1. Sign Up for Lightdash Cloud:

Once you’ve signed up select postgres on the get started page

Then select manually so you can upload your github project from github

Choose I’ve define them on the next screen.

On this screen fill in with your psql details and ensure you disable the ssl mode this is only applicable for this project since we did not set up any ssl certificate for our database.

Now connect to your GitHub repository that contains the dbt project we did above by providing the necessary permissions.

Finally click on test and compile button. If your project has compiled successfully you’ll and on this page where you can see you model tables.

Once you click on one table eg; product sale then select all dimension and run a query by clicking on the `run query`

You can be able to see all the results and also a sample chart by clicking the dropdown on the chart below are sample results.

Conclusion: Elevate Your Data Modeling and Visualization with dbt and Lightdash

Congratulations on mastering the art of data modeling with dbt and creating stunning visualizations with Lightdash! As we wrap up this journey, here’s a quick recap and some insights for the road ahead:

Recap:

  • You began by setting up a PostgreSQL database using Supabase, effortlessly importing the Microsoft Northwind database to kickstart your modeling journey.
  • The installation and initialization of dbt paved the way for an organized project structure, complete with models, seeds, sources, and tests.
  • We delved into creating meaningful models, each designed to uncover valuable insights — from customer segmentation to product sales and revenue by region.

Next Steps:

  • Leverage the flexibility of dbt to connect to other databases effortlessly. Whether it’s BigQuery, Snowflake, or Redshift, dbt’s adaptability makes cross-database modeling a breeze.
  • Lightdash opens up a world of possibilities for creating compelling visuals. Dive deeper into the rich feature set of Lightdash to enhance your analytics game. Explore advanced functionalities such as custom metrics, filters, and drill-downs for more nuanced insights.

Powerful Collaboration:

  • Hosting your dbt project on GitHub ensures seamless collaboration and version control. Lightdash Cloud’s integration with GitHub allows for effortless synchronization, enabling you and your team to stay on top of the latest changes.
  • Collaborate with fellow data enthusiasts, share insights, and iterate on models to continually refine your data strategy.

Nerdy Thrills Await:

  • Dive into the world of advanced analytics with dbt and Lightdash. Leverage dbt’s ability to create powerful transformations, and let Lightdash unleash the full potential of your data through interactive dashboards.
  • Embrace the nerdiness of fine-tuning SQL queries, optimizing data models, and crafting visuals that tell compelling data stories.

Stay Curious, Stay Data-Driven: Your journey into the realms of dbt and Lightdash has just begun. Stay curious, explore new datasets, experiment with advanced features, and continue refining your skills. As the data landscape evolves, your ability to adapt and innovate will set you apart.

checkout this project on github

Happy Modeling and Visualizing! 🚀

--

--

aibunny

Software engineer. Tech goofball. Anything Data engineering and Backend Engineering 😉 https://www.theaibunny.com/