SlideShare a Scribd company logo
1 of 32
Download to read offline
Data Engineering
Zoomcamp
DataTalks.Club
January 17, 2022
Plan
● Instructors
● Course
○ Is it for me?
○ Syllabus
● Course logistics
● DataTalks.Club slack
● Questions
Ankush Khanna
● Senior Data Engineer at Wayfair
● Anything Streaming or Batch
Picture
Sejal Vaidya
● Data & ML Engineer
● Course Topics: Platform Infrastructure
and Data Ingestion
Picture
Victoria Perez Mola
● Team lead of the Analytics platform at
Tier Mobility
● Not a data engineer
● Course topics: Analytics engineering
Picture
Alexey Grigorev
● Principal Data Scientist at OLX Group
● Not a data engineer =)
● Instructor for ML Zoomcamp
Picture
Plan
● Instructors
● Course
○ Is it for me?
○ Syllabus
● Course logistics
● DataTalks.Club slack
● Questions
Is it for me?
● Pre-requisites
○ Experience with programming (Python)
○ Being comfortable with command line (Git, etc)
○ Exposure to SQL
● Not required
○ Previous experience with data engineering
https://github.com/DataTalksClub/data-engineering-zoomcamp
https://github.com/DataTalksClub/data-engineering-zoomcamp
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Course overview
● Week 1: Introduction & Prerequisites
● Week 2: Ingestion and orchestration
● Week 3: Data warehouse (Big query)
● Week 4: Analytics engineering (dbt)
● Week 5: Batch processing (Spark)
● Week 6: Streaming (Kafka)
● Weeks 7-10: Project
Week 1: Introduction & Prerequisites
● Setting up the environment
○ Google Cloud account
○ Docker
○ Terraform
● Running Postgres in Docker
● Taking a look at the NY taxi dataset
● SQL refresher
Week 2: Ingestion and orchestration
● Data Lake
○ What is a Data Lake, ELT vs. ETL, Using GCS
● Orchestration
○ What is an Orchestration Pipeline, Data Ingestion, Introducing & Using Airflow
● Demo:
○ Setting up Airflow with Docker
○ Data ingestion DAG
■ Extraction, Pre-processing (parquet, partitioning), Loading, Exploration
with BigQuery, etc.
● Best Practices
Week 3: Data Warehouse
● What is Data warehouse
● BigQuery?
○ Partitioning and Clustering
○ With Airflow
○ Best practices
Week 4: Analytics Engineering
● What is dbt and how does it fit the tech stack?
● Using dbt:
○ Anatomy of a dbt model
○ Seeds
○ Jinja, Macros and tests
○ Documentation
○ Packages
● Build a dashboard in Google data studio
Week 5: Batch processing
● Spark internals
● Broadcasting
● Partitioning
● Shuffling
● Spark + Airflow
● Apache Flink as alternative to Spark
Week 6: Stream processing
● Basics of Kafka
● Consumer-Producer
● Kafka Streams
● Kafka Connect
Project
● Putting everything we learned in practice
Plan
● Instructors
● Course
○ Is it for me?
○ Syllabus
● Course logistics
● DataTalks.Club slack
● Questions
Course logistics
● Lessons
○ Pre-recorded
○ Published on our YouTube channel
● Office hours
○ Live on Mondays 17:00 CET
○ Homework solutions and answering questions
● Project
○ 1-2 weeks - working on the project
○ 3 week - peer reviewing
https://www.youtube.com/c/DataTalksClub
Course logistics
● Certificate - for passing the project
● Leaderboard
○ Scores for homework
○ Project
○ Learning in public
Learning in public
● LinkedIn
● Twitter
● Blogs
Plan
● Instructors
● Course
○ Is it for me?
○ Syllabus
● Course logistics
● DataTalks.Club slack
● Questions
Plan
● Instructors
● Course
○ Is it for me?
○ Syllabus
● Course logistics
● DataTalks.Club slack
● Questions
Data engineering zoomcamp  introduction

More Related Content

What's hot

Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
Databricks
 

What's hot (20)

Pre processing big data
Pre processing big dataPre processing big data
Pre processing big data
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Neo4j Training Series - Spring Data Neo4j
Neo4j Training Series - Spring Data Neo4jNeo4j Training Series - Spring Data Neo4j
Neo4j Training Series - Spring Data Neo4j
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 

Similar to Data engineering zoomcamp introduction

Similar to Data engineering zoomcamp introduction (20)

Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
 
Running Moodle for High Concurrent Users
Running Moodle for High Concurrent UsersRunning Moodle for High Concurrent Users
Running Moodle for High Concurrent Users
 
Google Associate Cloud Engineer Certification Tips
Google Associate Cloud Engineer Certification TipsGoogle Associate Cloud Engineer Certification Tips
Google Associate Cloud Engineer Certification Tips
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine Learning
 
Google cloud infrastructure workshop
Google cloud infrastructure workshopGoogle cloud infrastructure workshop
Google cloud infrastructure workshop
 
Data science workflows: from notebooks to production
Data science workflows: from notebooks to productionData science workflows: from notebooks to production
Data science workflows: from notebooks to production
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with Python
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Machine Learning with Python
Machine Learning with Python Machine Learning with Python
Machine Learning with Python
 
SQL for Data Science
SQL for Data ScienceSQL for Data Science
SQL for Data Science
 
GraphQL Bangkok meetup 5.0
GraphQL Bangkok meetup 5.0GraphQL Bangkok meetup 5.0
GraphQL Bangkok meetup 5.0
 
Role of ML engineer
Role of ML engineerRole of ML engineer
Role of ML engineer
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Data Analytics with DBMS
Data Analytics with DBMSData Analytics with DBMS
Data Analytics with DBMS
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
 
Zettabyte File System (ZFS)
Zettabyte File System (ZFS)Zettabyte File System (ZFS)
Zettabyte File System (ZFS)
 

More from Alexey Grigorev

More from Alexey Grigorev (20)

MLOps week 1 intro
MLOps week 1 introMLOps week 1 intro
MLOps week 1 intro
 
Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour Karessli
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - Kubernetes
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data Science
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learning
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for Classification
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for Classification
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office Hours
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplaces
 
ML Zoomcamp 2 - Slides
ML Zoomcamp 2 - SlidesML Zoomcamp 2 - Slides
ML Zoomcamp 2 - Slides
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction Project
 
ML Zoomcamp - Course Overview and Logistics
ML Zoomcamp - Course Overview and LogisticsML Zoomcamp - Course Overview and Logistics
ML Zoomcamp - Course Overview and Logistics
 
ML Zoomcamp 1.10 - Summary
ML Zoomcamp 1.10 - SummaryML Zoomcamp 1.10 - Summary
ML Zoomcamp 1.10 - Summary
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

Data engineering zoomcamp introduction