Data Engineering
Build the pipelines and warehouses that power modern data teams.
Three modules: data warehousing, Azure data engineering, and core engineering with Python, Spark, Airflow, and Snowflake. Designed to take you from junior to job-ready Azure Data Engineer.
June 10, 2026
Enrollment open — reserve your seat before it fills.
20 weeks
8
Tue + Thu · 5:00–8:00 PM MST
8 courses across the 20-week program.
Each course runs as a self-contained unit with live instruction, weekly assignments, and a hands-on lab. Click any course to see the full lesson list.
DE101Introduction to Data Engineering
2 lessons
Introduction to Data Engineering
Foundational concepts of data engineering and how data engineers fit into the modern data ecosystem. Map common business scenarios to the right architectural choices and describe the end-to-end data engineering lifecycle.
- LESSON 01
What is Data Engineering?
- Why data and the modern data economy
- The role of a Data Engineer
- Data Engineer vs Data Analyst vs ML/AI Engineer
- Modern data stack overview
- LESSON 02
The Data Engineering Lifecycle
- Sources and ingestion
- Storage layer (warehouse, lake, lakehouse)
- Transformation (ETL vs ELT)
- Serving and consumption (analytics, ML, AI)
- Orchestration, observability, and security
- OLTP vs OLAP, batch vs streaming
DE102Databases & Data Modeling
5 lessons
Databases & Data Modeling
Relational database internals, normalization, indexing, and query performance — plus the major NoSQL families and how warehouses, lakes, and lakehouses fit together. Choose the right storage technology for any workload.
- LESSON 01
Foundations of Databases
- Why databases vs files and spreadsheets
- DBMS overview and architecture
- ACID properties and transactions
- Isolation levels and concurrency
- LESSON 02
Relational Databases & Schema Design
- Tables, rows, columns, and data types
- Primary keys, foreign keys, and relationships
- Normalization (1NF, 2NF, 3NF, BCNF)
- Constraints, defaults, and check constraints
- LESSON 03
Indexing & Query Performance
- B-tree and hash indexes
- Clustered vs non-clustered indexes
- Covering indexes and included columns
- Statistics and the optimizer
- When not to add an index
- LESSON 04
NoSQL Databases
- Document stores (MongoDB)
- Key-value stores (Redis, DynamoDB)
- Column-family stores (Cassandra)
- Graph databases (Neo4j)
- CAP theorem and consistency trade-offs
- Real-world use cases
- LESSON 05
Modern Data Platforms
- OLTP vs OLAP
- Data warehouses (Snowflake, Synapse)
- Data lakes (ADLS Gen2, S3)
- Lakehouses (Databricks, Iceberg, Delta)
- Streaming systems (Kafka, Azure Event Hub)
DE103SQL for Data Engineering
10 lessons
SQL for Data Engineering
SQL from beginner through advanced — querying, joins, set theory, table expressions, window functions, performance tuning, stored procedures, MERGE, and warehouse-specific patterns. Write production-quality SQL on Snowflake, Azure SQL, and SQL Server.
- LESSON 01
Querying Data
- Selecting Data
- Sorting Data
- Data Types and Nullability
- Limiting Results
- LESSON 02
Filtering Data & Preparing Outputs
- The Where Clause
- Operators
- Working with Null
- LESSON 03
Aggregation
- Group By
- SQL Aggregate Functions
- Filtering with Aggregates
- LESSON 04
Expressions & Functions
- Conditional Logic
- Handling unknowns
- String Functions
- Date Functions
- Other Functions
- LESSON 05
Joins
- Keys discussion
- Cross joins
- Filtering with Inner Join
- Outer joins — mismatches and unknowns
- LESSON 06
Set Theory
- Union, Intersect, Except
- Precedence
- LESSON 07
Subqueries
- Correlated vs Non-Correlated
- Using IN
- EXISTS Operator
- LESSON 08
Table Expressions
- Common Table Expressions
- Recursive CTEs
- Derived Tables
- Views and Materialized Views
- LESSON 09
Window Functions
- Numbering, Counting, and Ranking
- Frames and Panes
- Analytic Functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE)
- LESSON 10
Performance Tuning, Stored Procedures & Advanced Patterns
- Reading Execution Plans
- Indexes for Query Patterns
- MERGE Statements
- Stored Procedures
- Pivoting and Unpivoting
- Dynamic SQL safely
DE104Dimensional Modeling
3 lessons
Dimensional Modeling
Kimball methodology, star and snowflake schemas, slowly changing dimensions, plus modern approaches like Data Vault 2.0 and the medallion architecture. Design dimensional models from business requirements and ship them on Snowflake, Synapse, or Databricks.
- LESSON 01
Dimensional Modeling Foundations
- Kimball methodology
- Fact tables vs dimension tables
- Grain — the most important decision
- Additive, semi-additive, and non-additive measures
- Conformed dimensions
- LESSON 02
Schemas & Slowly Changing Dimensions
- Star vs snowflake schemas
- SCD Type 1 (overwrite)
- SCD Type 2 (history)
- SCD Type 3 (limited history) and Type 6 (hybrid)
- Surrogate keys and natural keys
- Junk, degenerate, and role-playing dimensions
- LESSON 03
Modern Modeling Approaches
- Data Vault 2.0 — hubs, links, satellites
- One Big Table (OBT) and denormalization patterns
- Medallion architecture (bronze / silver / gold)
- Inmon vs Kimball comparison
DE201Git & GitHub for Data Engineers
2 lessons
Git & GitHub for Data Engineers
Version control fundamentals using Git and team collaboration on GitHub, with branching strategies, pull requests, code review, conflict resolution, and GitHub Actions for data-pipeline CI/CD. Operate confidently inside a professional Git workflow on day one.
- LESSON 01
Git Fundamentals
- Init, clone, and the working directory
- Add, commit, and the staging area
- Branches, checkout, switch, and merge
- Push, pull, and remotes
- Rebase vs merge
- .gitignore and resolving conflicts
- LESSON 02
GitHub Workflows for Data Engineers
- Pull requests and code review etiquette
- Branching strategies (Git Flow vs trunk-based)
- GitHub Actions for data pipeline CI/CD
- Secrets and environment management
- Protecting main and required reviews
DE202Python for Data Engineering
10 lessons
Python for Data Engineering
Python for production-grade data engineering — language fundamentals, working with structured/semi-structured data, integrating with Azure SQL, ADLS Gen2, and Snowflake, designing ETL/ELT pipelines, Kafka streaming, and Apache Airflow orchestration. Capstone: an end-to-end pipeline from REST API → ADLS → Snowflake.
- LESSON 01
Python Foundations for Engineers
- Virtual environments (venv, conda)
- Modules, packages, and project structure
- OOP fundamentals and type hints
- Logging and structured error handling
- Unit testing with pytest
- LESSON 02
Working with Data in Python
- File I/O — CSV, JSON, Parquet, Avro
- Calling REST APIs with requests
- Parsing semi-structured data
- Handling large files efficiently
- LESSON 03
Pandas & Data Manipulation Patterns
- DataFrames and Series
- Transformations, joins, and group-by
- Common ETL patterns in Pandas
- Performance tips and when not to use Pandas
- LESSON 04
Database Connectivity & Azure SQL
- SQLAlchemy and pyodbc
- Connecting to Azure SQL Database
- Parameterized queries and SQL injection
- Bulk inserts and transactions
- Reading data from Azure SQL into Python
- LESSON 05
Cloud Storage with Azure Data Lake Storage Gen2
- ADLS Gen2 hierarchical namespace
- Authentication (service principals, managed identities)
- Reading and writing files (CSV, JSON, Parquet)
- Partitioning by date and key
- Writing landing-zone data from a pipeline
- LESSON 06
Snowflake Integration
- snowflake-connector-python
- Snowpark for Python
- Stages, COPY INTO, and bulk loads
- Loading dimensional models
- Project: load API data into Snowflake tables
- LESSON 07
ETL vs ELT Design Patterns
- Extraction strategies (full vs incremental)
- Staging and idempotency
- Change Data Capture (CDC) patterns
- Backfills and reprocessing
- Error handling and dead-letter queues
- LESSON 08
Streaming with Apache Kafka
- Topics, partitions, and offsets
- Producers and consumers in kafka-python
- Schema Registry overview
- Stream-to-table patterns
- LESSON 09
Workflow Orchestration with Apache Airflow
- DAGs, operators, and sensors
- Scheduling, intervals, and backfills
- XComs and task dependencies
- Idempotency and retries
- Deploying Airflow on Azure
- LESSON 10
End-to-End Pipeline Project
- Design: source, target, and transformations
- Ingest from REST API into ADLS
- Transform with Pandas and PySpark
- Load into Snowflake dimensional model
- Orchestrate with Airflow and monitor
- Code review and deployment via GitHub
DE203Apache Spark & Databricks
4 lessons
Apache Spark & Databricks
Distributed data processing with Apache Spark and the Databricks platform — Spark fundamentals, the PySpark DataFrames API, Spark SQL, performance tuning, Delta Lake, and the medallion architecture. Build, tune, and operate big-data pipelines against terabyte-scale datasets.
- LESSON 01
Distributed Computing & Spark Fundamentals
- Why distributed computing
- RDDs vs DataFrames vs Datasets
- Drivers, executors, and the cluster manager
- Lazy evaluation, transformations vs actions
- LESSON 02
PySpark DataFrames API
- Creating DataFrames from files and tables
- Transformations: select, filter, withColumn, groupBy, join
- User-Defined Functions (UDFs) and Pandas UDFs
- Partitioning, repartition, and coalesce
- LESSON 03
Spark SQL & Performance Tuning
- Catalyst optimizer and the Tungsten engine
- Broadcast joins and skew handling
- Caching and persistence
- Adaptive Query Execution (AQE)
- Reading explain plans
- LESSON 04
Databricks & Delta Lake
- Databricks workspace, notebooks, jobs, and clusters
- Delta Lake operations — MERGE, time travel, vacuum, optimize
- Medallion architecture (bronze / silver / gold)
- Unity Catalog overview
- Project: end-to-end medallion pipeline on Databricks
DE301AI-Era Data Engineering
3 lessons
AI-Era Data Engineering
The modern, AI-era responsibilities of the data engineer — data quality and observability, vector databases and embedding-based RAG pipelines, and feature stores. Be ready for the new AI Data Engineer and ML Platform Engineer roles.
- LESSON 01
Data Quality, Testing & Observability
- Great Expectations and dbt tests
- Data contracts
- OpenLineage and lineage tracking
- Monitoring, alerting, and SLAs for data
- PII, GDPR, and CCPA basics
- LESSON 02
Vector Databases & RAG Pipelines
- Embeddings — what they are and how to compute them
- pgvector, Pinecone, and Azure AI Search
- Chunking strategies for documents
- Building an ingestion pipeline for RAG
- Hybrid search and re-ranking
- LESSON 03
Feature Stores & ML/AI Data Infrastructure
- Why feature stores exist
- Online vs offline features
- Feast and Databricks Feature Store
- Training-serving skew
- The data engineer's role on ML and LLM teams
Ready to start? Apply now.
Talk to a program advisor, get your questions answered, and reserve your seat in the next cohort.
