NookTek
Enrolling now

Data Engineering

Build the pipelines and warehouses that power modern data teams.

Three modules: data warehousing, Azure data engineering, and core engineering with Python, Spark, Airflow, and Snowflake. Designed to take you from junior to job-ready Azure Data Engineer.

Next cohort starts

June 10, 2026

Enrollment open — reserve your seat before it fills.

Duration

20 weeks

Courses

8

Class schedule

Tue + Thu · 5:00–8:00 PM MST

Curriculum

8 courses across the 20-week program.

Each course runs as a self-contained unit with live instruction, weekly assignments, and a hands-on lab. Click any course to see the full lesson list.

DE101

Introduction to Data Engineering

2 lessons

Foundational concepts of data engineering and how data engineers fit into the modern data ecosystem. Map common business scenarios to the right architectural choices and describe the end-to-end data engineering lifecycle.

  1. LESSON 01

    What is Data Engineering?

    • Why data and the modern data economy
    • The role of a Data Engineer
    • Data Engineer vs Data Analyst vs ML/AI Engineer
    • Modern data stack overview
  2. LESSON 02

    The Data Engineering Lifecycle

    • Sources and ingestion
    • Storage layer (warehouse, lake, lakehouse)
    • Transformation (ETL vs ELT)
    • Serving and consumption (analytics, ML, AI)
    • Orchestration, observability, and security
    • OLTP vs OLAP, batch vs streaming
DE102

Databases & Data Modeling

5 lessons

Relational database internals, normalization, indexing, and query performance — plus the major NoSQL families and how warehouses, lakes, and lakehouses fit together. Choose the right storage technology for any workload.

  1. LESSON 01

    Foundations of Databases

    • Why databases vs files and spreadsheets
    • DBMS overview and architecture
    • ACID properties and transactions
    • Isolation levels and concurrency
  2. LESSON 02

    Relational Databases & Schema Design

    • Tables, rows, columns, and data types
    • Primary keys, foreign keys, and relationships
    • Normalization (1NF, 2NF, 3NF, BCNF)
    • Constraints, defaults, and check constraints
  3. LESSON 03

    Indexing & Query Performance

    • B-tree and hash indexes
    • Clustered vs non-clustered indexes
    • Covering indexes and included columns
    • Statistics and the optimizer
    • When not to add an index
  4. LESSON 04

    NoSQL Databases

    • Document stores (MongoDB)
    • Key-value stores (Redis, DynamoDB)
    • Column-family stores (Cassandra)
    • Graph databases (Neo4j)
    • CAP theorem and consistency trade-offs
    • Real-world use cases
  5. LESSON 05

    Modern Data Platforms

    • OLTP vs OLAP
    • Data warehouses (Snowflake, Synapse)
    • Data lakes (ADLS Gen2, S3)
    • Lakehouses (Databricks, Iceberg, Delta)
    • Streaming systems (Kafka, Azure Event Hub)
DE103

SQL for Data Engineering

10 lessons

SQL from beginner through advanced — querying, joins, set theory, table expressions, window functions, performance tuning, stored procedures, MERGE, and warehouse-specific patterns. Write production-quality SQL on Snowflake, Azure SQL, and SQL Server.

  1. LESSON 01

    Querying Data

    • Selecting Data
    • Sorting Data
    • Data Types and Nullability
    • Limiting Results
  2. LESSON 02

    Filtering Data & Preparing Outputs

    • The Where Clause
    • Operators
    • Working with Null
  3. LESSON 03

    Aggregation

    • Group By
    • SQL Aggregate Functions
    • Filtering with Aggregates
  4. LESSON 04

    Expressions & Functions

    • Conditional Logic
    • Handling unknowns
    • String Functions
    • Date Functions
    • Other Functions
  5. LESSON 05

    Joins

    • Keys discussion
    • Cross joins
    • Filtering with Inner Join
    • Outer joins — mismatches and unknowns
  6. LESSON 06

    Set Theory

    • Union, Intersect, Except
    • Precedence
  7. LESSON 07

    Subqueries

    • Correlated vs Non-Correlated
    • Using IN
    • EXISTS Operator
  8. LESSON 08

    Table Expressions

    • Common Table Expressions
    • Recursive CTEs
    • Derived Tables
    • Views and Materialized Views
  9. LESSON 09

    Window Functions

    • Numbering, Counting, and Ranking
    • Frames and Panes
    • Analytic Functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE)
  10. LESSON 10

    Performance Tuning, Stored Procedures & Advanced Patterns

    • Reading Execution Plans
    • Indexes for Query Patterns
    • MERGE Statements
    • Stored Procedures
    • Pivoting and Unpivoting
    • Dynamic SQL safely
DE104

Dimensional Modeling

3 lessons

Kimball methodology, star and snowflake schemas, slowly changing dimensions, plus modern approaches like Data Vault 2.0 and the medallion architecture. Design dimensional models from business requirements and ship them on Snowflake, Synapse, or Databricks.

  1. LESSON 01

    Dimensional Modeling Foundations

    • Kimball methodology
    • Fact tables vs dimension tables
    • Grain — the most important decision
    • Additive, semi-additive, and non-additive measures
    • Conformed dimensions
  2. LESSON 02

    Schemas & Slowly Changing Dimensions

    • Star vs snowflake schemas
    • SCD Type 1 (overwrite)
    • SCD Type 2 (history)
    • SCD Type 3 (limited history) and Type 6 (hybrid)
    • Surrogate keys and natural keys
    • Junk, degenerate, and role-playing dimensions
  3. LESSON 03

    Modern Modeling Approaches

    • Data Vault 2.0 — hubs, links, satellites
    • One Big Table (OBT) and denormalization patterns
    • Medallion architecture (bronze / silver / gold)
    • Inmon vs Kimball comparison
DE201

Git & GitHub for Data Engineers

2 lessons

Version control fundamentals using Git and team collaboration on GitHub, with branching strategies, pull requests, code review, conflict resolution, and GitHub Actions for data-pipeline CI/CD. Operate confidently inside a professional Git workflow on day one.

  1. LESSON 01

    Git Fundamentals

    • Init, clone, and the working directory
    • Add, commit, and the staging area
    • Branches, checkout, switch, and merge
    • Push, pull, and remotes
    • Rebase vs merge
    • .gitignore and resolving conflicts
  2. LESSON 02

    GitHub Workflows for Data Engineers

    • Pull requests and code review etiquette
    • Branching strategies (Git Flow vs trunk-based)
    • GitHub Actions for data pipeline CI/CD
    • Secrets and environment management
    • Protecting main and required reviews
DE202

Python for Data Engineering

10 lessons

Python for production-grade data engineering — language fundamentals, working with structured/semi-structured data, integrating with Azure SQL, ADLS Gen2, and Snowflake, designing ETL/ELT pipelines, Kafka streaming, and Apache Airflow orchestration. Capstone: an end-to-end pipeline from REST API → ADLS → Snowflake.

  1. LESSON 01

    Python Foundations for Engineers

    • Virtual environments (venv, conda)
    • Modules, packages, and project structure
    • OOP fundamentals and type hints
    • Logging and structured error handling
    • Unit testing with pytest
  2. LESSON 02

    Working with Data in Python

    • File I/O — CSV, JSON, Parquet, Avro
    • Calling REST APIs with requests
    • Parsing semi-structured data
    • Handling large files efficiently
  3. LESSON 03

    Pandas & Data Manipulation Patterns

    • DataFrames and Series
    • Transformations, joins, and group-by
    • Common ETL patterns in Pandas
    • Performance tips and when not to use Pandas
  4. LESSON 04

    Database Connectivity & Azure SQL

    • SQLAlchemy and pyodbc
    • Connecting to Azure SQL Database
    • Parameterized queries and SQL injection
    • Bulk inserts and transactions
    • Reading data from Azure SQL into Python
  5. LESSON 05

    Cloud Storage with Azure Data Lake Storage Gen2

    • ADLS Gen2 hierarchical namespace
    • Authentication (service principals, managed identities)
    • Reading and writing files (CSV, JSON, Parquet)
    • Partitioning by date and key
    • Writing landing-zone data from a pipeline
  6. LESSON 06

    Snowflake Integration

    • snowflake-connector-python
    • Snowpark for Python
    • Stages, COPY INTO, and bulk loads
    • Loading dimensional models
    • Project: load API data into Snowflake tables
  7. LESSON 07

    ETL vs ELT Design Patterns

    • Extraction strategies (full vs incremental)
    • Staging and idempotency
    • Change Data Capture (CDC) patterns
    • Backfills and reprocessing
    • Error handling and dead-letter queues
  8. LESSON 08

    Streaming with Apache Kafka

    • Topics, partitions, and offsets
    • Producers and consumers in kafka-python
    • Schema Registry overview
    • Stream-to-table patterns
  9. LESSON 09

    Workflow Orchestration with Apache Airflow

    • DAGs, operators, and sensors
    • Scheduling, intervals, and backfills
    • XComs and task dependencies
    • Idempotency and retries
    • Deploying Airflow on Azure
  10. LESSON 10

    End-to-End Pipeline Project

    • Design: source, target, and transformations
    • Ingest from REST API into ADLS
    • Transform with Pandas and PySpark
    • Load into Snowflake dimensional model
    • Orchestrate with Airflow and monitor
    • Code review and deployment via GitHub
DE203

Apache Spark & Databricks

4 lessons

Distributed data processing with Apache Spark and the Databricks platform — Spark fundamentals, the PySpark DataFrames API, Spark SQL, performance tuning, Delta Lake, and the medallion architecture. Build, tune, and operate big-data pipelines against terabyte-scale datasets.

  1. LESSON 01

    Distributed Computing & Spark Fundamentals

    • Why distributed computing
    • RDDs vs DataFrames vs Datasets
    • Drivers, executors, and the cluster manager
    • Lazy evaluation, transformations vs actions
  2. LESSON 02

    PySpark DataFrames API

    • Creating DataFrames from files and tables
    • Transformations: select, filter, withColumn, groupBy, join
    • User-Defined Functions (UDFs) and Pandas UDFs
    • Partitioning, repartition, and coalesce
  3. LESSON 03

    Spark SQL & Performance Tuning

    • Catalyst optimizer and the Tungsten engine
    • Broadcast joins and skew handling
    • Caching and persistence
    • Adaptive Query Execution (AQE)
    • Reading explain plans
  4. LESSON 04

    Databricks & Delta Lake

    • Databricks workspace, notebooks, jobs, and clusters
    • Delta Lake operations — MERGE, time travel, vacuum, optimize
    • Medallion architecture (bronze / silver / gold)
    • Unity Catalog overview
    • Project: end-to-end medallion pipeline on Databricks
DE301

AI-Era Data Engineering

3 lessons

The modern, AI-era responsibilities of the data engineer — data quality and observability, vector databases and embedding-based RAG pipelines, and feature stores. Be ready for the new AI Data Engineer and ML Platform Engineer roles.

  1. LESSON 01

    Data Quality, Testing & Observability

    • Great Expectations and dbt tests
    • Data contracts
    • OpenLineage and lineage tracking
    • Monitoring, alerting, and SLAs for data
    • PII, GDPR, and CCPA basics
  2. LESSON 02

    Vector Databases & RAG Pipelines

    • Embeddings — what they are and how to compute them
    • pgvector, Pinecone, and Azure AI Search
    • Chunking strategies for documents
    • Building an ingestion pipeline for RAG
    • Hybrid search and re-ranking
  3. LESSON 03

    Feature Stores & ML/AI Data Infrastructure

    • Why feature stores exist
    • Online vs offline features
    • Feast and Databricks Feature Store
    • Training-serving skew
    • The data engineer's role on ML and LLM teams
Cohort enrolling now

Ready to start? Apply now.

Talk to a program advisor, get your questions answered, and reserve your seat in the next cohort.