Enrolling now

Data Engineering

Build the pipelines and warehouses that power modern data teams.

Three modules: data warehousing, Azure data engineering, and core engineering with Python, Spark, Airflow, and Snowflake. Designed to take you from junior to job-ready Azure Data Engineer.

Apply Now Talk to an advisor

Next cohort starts

June 10, 2026

Enrollment open — reserve your seat before it fills.

Duration

20 weeks

Courses

Class schedule

Tue + Thu · 6:00–8:00 PM MST

Curriculum

8 courses across the 20-week program.

Each course runs as a self-contained unit with live instruction, weekly assignments, and a hands-on lab. Click any course to see the full lesson list.

DE101

Introduction to Data Engineering

2 lessons

Foundational concepts of data engineering and how data engineers fit into the modern data ecosystem. Map common business scenarios to the right architectural choices and describe the end-to-end data engineering lifecycle.

LESSON 01
What is Data Engineering?
- Why data and the modern data economy
- The role of a Data Engineer
- Data Engineer vs Data Analyst vs ML/AI Engineer
- Modern data stack overview
LESSON 02
The Data Engineering Lifecycle
- Sources and ingestion
- Storage layer (warehouse, lake, lakehouse)
- Transformation (ETL vs ELT)
- Serving and consumption (analytics, ML, AI)
- Orchestration, observability, and security
- OLTP vs OLAP, batch vs streaming

DE102

Databases & Data Modeling

5 lessons

Relational database internals, normalization, indexing, and query performance — plus the major NoSQL families and how warehouses, lakes, and lakehouses fit together. Choose the right storage technology for any workload.

LESSON 01
Foundations of Databases
- Why databases vs files and spreadsheets
- DBMS overview and architecture
- ACID properties and transactions
- Isolation levels and concurrency
LESSON 02
Relational Databases & Schema Design
- Tables, rows, columns, and data types
- Primary keys, foreign keys, and relationships
- Normalization (1NF, 2NF, 3NF, BCNF)
- Constraints, defaults, and check constraints
LESSON 03
Indexing & Query Performance
- B-tree and hash indexes
- Clustered vs non-clustered indexes
- Covering indexes and included columns
- Statistics and the optimizer
- When not to add an index
LESSON 04
NoSQL Databases
- Document stores (MongoDB)
- Key-value stores (Redis, DynamoDB)
- Column-family stores (Cassandra)
- Graph databases (Neo4j)
- CAP theorem and consistency trade-offs
- Real-world use cases
LESSON 05
Modern Data Platforms
- OLTP vs OLAP
- Data warehouses (Snowflake, Synapse)
- Data lakes (ADLS Gen2, S3)
- Lakehouses (Databricks, Iceberg, Delta)
- Streaming systems (Kafka, Azure Event Hub)

DE103

SQL for Data Engineering

10 lessons

SQL from beginner through advanced — querying, joins, set theory, table expressions, window functions, performance tuning, stored procedures, MERGE, and warehouse-specific patterns. Write production-quality SQL on Snowflake, Azure SQL, and SQL Server.

LESSON 01
Querying Data
- Selecting Data
- Sorting Data
- Data Types and Nullability
- Limiting Results
LESSON 02
Filtering Data & Preparing Outputs
- The Where Clause
- Operators
- Working with Null
LESSON 03
Aggregation
- Group By
- SQL Aggregate Functions
- Filtering with Aggregates
LESSON 04
Expressions & Functions
- Conditional Logic
- Handling unknowns
- String Functions
- Date Functions
- Other Functions
LESSON 05
Joins
- Keys discussion
- Cross joins
- Filtering with Inner Join
- Outer joins — mismatches and unknowns
LESSON 06
Set Theory
- Union, Intersect, Except
- Precedence
LESSON 07
Subqueries
- Correlated vs Non-Correlated
- Using IN
- EXISTS Operator
LESSON 08
Table Expressions
- Common Table Expressions
- Recursive CTEs
- Derived Tables
- Views and Materialized Views
LESSON 09
Window Functions
- Numbering, Counting, and Ranking
- Frames and Panes
- Analytic Functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE)
LESSON 10
Performance Tuning, Stored Procedures & Advanced Patterns
- Reading Execution Plans
- Indexes for Query Patterns
- MERGE Statements
- Stored Procedures
- Pivoting and Unpivoting
- Dynamic SQL safely

DE104

Dimensional Modeling

3 lessons

Kimball methodology, star and snowflake schemas, slowly changing dimensions, plus modern approaches like Data Vault 2.0 and the medallion architecture. Design dimensional models from business requirements and ship them on Snowflake, Synapse, or Databricks.

LESSON 01
Dimensional Modeling Foundations
- Kimball methodology
- Fact tables vs dimension tables
- Grain — the most important decision
- Additive, semi-additive, and non-additive measures
- Conformed dimensions
LESSON 02
Schemas & Slowly Changing Dimensions
- Star vs snowflake schemas
- SCD Type 1 (overwrite)
- SCD Type 2 (history)
- SCD Type 3 (limited history) and Type 6 (hybrid)
- Surrogate keys and natural keys
- Junk, degenerate, and role-playing dimensions
LESSON 03
Modern Modeling Approaches
- Data Vault 2.0 — hubs, links, satellites
- One Big Table (OBT) and denormalization patterns
- Medallion architecture (bronze / silver / gold)
- Inmon vs Kimball comparison

DE201

Git & GitHub for Data Engineers

2 lessons

Version control fundamentals using Git and team collaboration on GitHub, with branching strategies, pull requests, code review, conflict resolution, and GitHub Actions for data-pipeline CI/CD. Operate confidently inside a professional Git workflow on day one.

LESSON 01
Git Fundamentals
- Init, clone, and the working directory
- Add, commit, and the staging area
- Branches, checkout, switch, and merge
- Push, pull, and remotes
- Rebase vs merge
- .gitignore and resolving conflicts
LESSON 02
GitHub Workflows for Data Engineers
- Pull requests and code review etiquette
- Branching strategies (Git Flow vs trunk-based)
- GitHub Actions for data pipeline CI/CD
- Secrets and environment management
- Protecting main and required reviews

DE202

Python for Data Engineering

10 lessons

Python for production-grade data engineering — language fundamentals, working with structured/semi-structured data, integrating with Azure SQL, ADLS Gen2, and Snowflake, designing ETL/ELT pipelines, Kafka streaming, and Apache Airflow orchestration. Capstone: an end-to-end pipeline from REST API → ADLS → Snowflake.

LESSON 01
Python Foundations for Engineers
- Virtual environments (venv, conda)
- Modules, packages, and project structure
- OOP fundamentals and type hints
- Logging and structured error handling
- Unit testing with pytest
LESSON 02
Working with Data in Python
- File I/O — CSV, JSON, Parquet, Avro
- Calling REST APIs with requests
- Parsing semi-structured data
- Handling large files efficiently
LESSON 03
Pandas & Data Manipulation Patterns
- DataFrames and Series
- Transformations, joins, and group-by
- Common ETL patterns in Pandas
- Performance tips and when not to use Pandas
LESSON 04
Database Connectivity & Azure SQL
- SQLAlchemy and pyodbc
- Connecting to Azure SQL Database
- Parameterized queries and SQL injection
- Bulk inserts and transactions
- Reading data from Azure SQL into Python
LESSON 05
Cloud Storage with Azure Data Lake Storage Gen2
- ADLS Gen2 hierarchical namespace
- Authentication (service principals, managed identities)
- Reading and writing files (CSV, JSON, Parquet)
- Partitioning by date and key
- Writing landing-zone data from a pipeline
LESSON 06
Snowflake Integration
- snowflake-connector-python
- Snowpark for Python
- Stages, COPY INTO, and bulk loads
- Loading dimensional models
- Project: load API data into Snowflake tables
LESSON 07
ETL vs ELT Design Patterns
- Extraction strategies (full vs incremental)
- Staging and idempotency
- Change Data Capture (CDC) patterns
- Backfills and reprocessing
- Error handling and dead-letter queues
LESSON 08
Streaming with Apache Kafka
- Topics, partitions, and offsets
- Producers and consumers in kafka-python
- Schema Registry overview
- Stream-to-table patterns
LESSON 09
Workflow Orchestration with Apache Airflow
- DAGs, operators, and sensors
- Scheduling, intervals, and backfills
- XComs and task dependencies
- Idempotency and retries
- Deploying Airflow on Azure
LESSON 10
End-to-End Pipeline Project
- Design: source, target, and transformations
- Ingest from REST API into ADLS
- Transform with Pandas and PySpark
- Load into Snowflake dimensional model
- Orchestrate with Airflow and monitor
- Code review and deployment via GitHub

DE203

Apache Spark & Databricks

4 lessons

Distributed data processing with Apache Spark and the Databricks platform — Spark fundamentals, the PySpark DataFrames API, Spark SQL, performance tuning, Delta Lake, and the medallion architecture. Build, tune, and operate big-data pipelines against terabyte-scale datasets.

LESSON 01
Distributed Computing & Spark Fundamentals
- Why distributed computing
- RDDs vs DataFrames vs Datasets
- Drivers, executors, and the cluster manager
- Lazy evaluation, transformations vs actions
LESSON 02
PySpark DataFrames API
- Creating DataFrames from files and tables
- Transformations: select, filter, withColumn, groupBy, join
- User-Defined Functions (UDFs) and Pandas UDFs
- Partitioning, repartition, and coalesce
LESSON 03
Spark SQL & Performance Tuning
- Catalyst optimizer and the Tungsten engine
- Broadcast joins and skew handling
- Caching and persistence
- Adaptive Query Execution (AQE)
- Reading explain plans
LESSON 04
Databricks & Delta Lake
- Databricks workspace, notebooks, jobs, and clusters
- Delta Lake operations — MERGE, time travel, vacuum, optimize
- Medallion architecture (bronze / silver / gold)
- Unity Catalog overview
- Project: end-to-end medallion pipeline on Databricks

DE301

AI-Era Data Engineering

3 lessons

The modern, AI-era responsibilities of the data engineer — data quality and observability, vector databases and embedding-based RAG pipelines, and feature stores. Be ready for the new AI Data Engineer and ML Platform Engineer roles.

LESSON 01
Data Quality, Testing & Observability
- Great Expectations and dbt tests
- Data contracts
- OpenLineage and lineage tracking
- Monitoring, alerting, and SLAs for data
- PII, GDPR, and CCPA basics
LESSON 02
Vector Databases & RAG Pipelines
- Embeddings — what they are and how to compute them
- pgvector, Pinecone, and Azure AI Search
- Chunking strategies for documents
- Building an ingestion pipeline for RAG
- Hybrid search and re-ranking
LESSON 03
Feature Stores & ML/AI Data Infrastructure
- Why feature stores exist
- Online vs offline features
- Feast and Databricks Feature Store
- Training-serving skew
- The data engineer's role on ML and LLM teams

Skills you'll build

SQLPythonApache SparkDimensional ModelingAI-Era Engineering

Tools you'll use

Microsoft TeamsSQL Server Management StudioSnowflakeVisual Studio CodeAnaconda / JupyterLabGit + GitHubDatabricksApache KafkaApache AirflowAzure (ADLS Gen2, Azure SQL)pgvector / Pinecone / Azure AI Search

Prerequisites

Basic computer literacy and a steady internet connection. No prior tech experience required — the program builds from foundations to advanced engineering.

Outcomes

Job-ready as a Data Engineer, AI Data Engineer, or ML Platform Engineer
End-to-end pipeline portfolio: REST API → ADLS → Snowflake on Airflow
Hands-on Apache Spark, Delta Lake, and medallion architecture on Databricks
RAG pipeline + vector database project for the AI-era role market
Capstone team project + 1-on-1 mentorship throughout
Dedicated career services: resume, mock interviews, placement support

Cohort enrolling now

Ready to start? Apply now.

Talk to a program advisor, get your questions answered, and reserve your seat in the next cohort.

Apply Now Talk to an advisor

Data Engineering

8 courses across the 20-week program.

Introduction to Data Engineering

What is Data Engineering?

The Data Engineering Lifecycle

Databases & Data Modeling

Foundations of Databases

Relational Databases & Schema Design

Indexing & Query Performance

NoSQL Databases

Modern Data Platforms

SQL for Data Engineering

Querying Data

Filtering Data & Preparing Outputs

Aggregation

Expressions & Functions

Joins

Set Theory

Subqueries

Table Expressions

Window Functions

Performance Tuning, Stored Procedures & Advanced Patterns

Dimensional Modeling

Dimensional Modeling Foundations

Schemas & Slowly Changing Dimensions

Modern Modeling Approaches

Git & GitHub for Data Engineers

Git Fundamentals

GitHub Workflows for Data Engineers

Python for Data Engineering

Python Foundations for Engineers

Working with Data in Python

Pandas & Data Manipulation Patterns

Database Connectivity & Azure SQL

Cloud Storage with Azure Data Lake Storage Gen2

Snowflake Integration

ETL vs ELT Design Patterns

Streaming with Apache Kafka

Workflow Orchestration with Apache Airflow

End-to-End Pipeline Project

Apache Spark & Databricks

Distributed Computing & Spark Fundamentals

PySpark DataFrames API

Spark SQL & Performance Tuning

Databricks & Delta Lake

AI-Era Data Engineering

Data Quality, Testing & Observability

Vector Databases & RAG Pipelines

Feature Stores & ML/AI Data Infrastructure

Ready to start? Apply now.