All projects
2026

Modern Data Engineering Guide

A 2026 first-principles data engineering guide — how data moves from raw source to trusted, query-ready tables: storage & file formats, SQL & query engines, modeling & warehousing, Spark, ingestion/CDC, dbt, orchestration, streaming, and the lakehouse.

DocusaurusTypeScriptReactMDXMermaid
Visit live sitePrivate repo · code available on request

Highlights

  • Authored all 12 chapters — foundations, storage & formats, SQL & query engines, modeling & warehousing, Spark, ingestion & CDC, dbt, orchestration, streaming (Kafka), the lakehouse, data quality & governance, and career
  • Curriculum mined from real 2026 data-engineering job postings and gap-checked by a completeness critic; authored concept-first (columnar storage, desired-state pipelines, exactly-once) then mapped to today's tools (Snowflake/BigQuery, dbt, Airflow, Kafka, Iceberg/Delta)
  • Built on the shared `@throughline/guide-kit` design system with checkpoint quizzes; source-available (© 2026 To Yin Yu, All Rights Reserved), deployed on Vercel