SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
You can do it in SQL
Drew Banin (he/him)
Chief Product Officer
2
mo’ data mo’ problems
@drewbanin on Twitter
● Open core data modeling software
● dbt provides a SQL-first workflow for:
- Data modeling
- Data testing
- Data documentation
- And more!
You can learn more at getdbt.com
We’re the makers of dbt
Confessions & disclaimers
● I have never written a line of production Scala code in my life
● I have written a whole lot of SQL in my life
● People who tolerate Python packaging don’t get to complain about the JVM
I ❤ SQL
, Expressive
, Accessible
, Universal
, Declarative
, Thought provoking….?
I ❤ SQL
, Expressive - SQL reads more or less like a real sentence
, Accessible
, Universal
, Declarative
, Thought provoking….?
I ❤ SQL
, Expressive
, Accessible - Lots of people can learn in pretty quickly
, Universal
, Declarative
, Thought provoking….?
I ❤ SQL
, Expressive
, Accessible
, Universal - Most databases speak SQL with pretty consistent support
, Declarative
, Thought provoking….?
I ❤ SQL
, Expressive
, Accessible
, Universal
, Declarative - No moving parts, limited opportunities for the bad kind of magic
, Thought provoking….?
I ❤ SQL
, Expressive
, Accessible
, Universal
, Declarative
, Thought provoking….?
Some things you can do in SQL
Technique: Pivoting Columns
What: convert rows of data into columns of data
Why: to derive new metrics from existing rows in a table
Pivoting columns
What: convert rows of data into columns of data
Why: to derive new metrics from existing rows in a table
Pivoting columns
user_id event_name happened_at
1 signup 2021-01-01
1 purchase 2021-01-02
2 signup 2021-01-03
3 signup 2021-01-04
3 purchase 2021-01-05
1 purchase 2021-01-05
What: convert rows of data into columns of data
Why: to derive new metrics from existing rows in a table
Pivoting columns
user_id event_name happened_at
1 signup 2021-01-01
1 purchase 2021-01-02
2 signup 2021-01-03
3 signup 2021-01-04
3 purchase 2021-01-05
1 purchase 2021-01-05
What: convert rows of data into columns of data
Why: to derive new metrics from existing rows in a table
Pivoting columns
user_id event_name happened_at
1 signup 2021-01-01
1 purchase 2021-01-02
2 signup 2021-01-03
3 signup 2021-01-04
3 purchase 2021-01-05
1 purchase 2021-01-05
user_id signup_date purchases
1 2021-01-01 2
2 2021-01-02 0
3 2021-01-04 1
Technique: Appending datasets
Unions
What: append tables together
Why: combine tables into a taller table
Unions
What: append tables together
Why: combine tables into a taller table
user_id event_name happened_at
1 signup 2021-01-01
1 purchase 2021-01-02
2 signup 2021-01-03
user_id event_name happened_at
1 app_install 2021-01-01
1 screen_view 2021-01-02
2 app_install 2021-01-03
web_events mobile_events
Unions
What: append tables together
Why: combine tables into a taller table
source user_id event_name happened_at
mobile 1 app_install 2021-01-01
mobile 1 screen_view 2021-01-02
mobile 2 app_install 2021-01-03
web 1 signup 2021-01-01
web 1 purchase 2021-01-02
web 2 signup 2021-01-03
all_events
Technique: Window functions
Window functions
What: apply transformations over part of a table
Why: use cases abound!
Window functions
What: apply transformations over part of a table
Why: use cases abound!
user_id happened_at page_path
1 2021-01-01 /product
1 2021-01-02 /signup
2 2021-01-03 /app
pageviews
Window functions
What: apply transformations over part of a table
Why: use cases abound!
user_id happened_at page_path
1 2021-01-01 /product
1 2021-01-02 /signup
2 2021-01-03 /docs
pageviews
user_id first_page_path
1 /product
2 /docs
user_touches
Helpful window functions
● first_value
● last_value
● lag
● lead
● row_number
● rank
● sum
● min
● max
● avg
● count
When you write a lot of SQL...
SQL: The bad parts
● No “functions” for code reuse
● Lots of typing → lots of typos
● It might be the most copy/pasted language of all time!
○ Citation needed
● But…. what if we add templating?
● Yes!
● Accessibility is 🔑
● What happens when you teach people SQL?
○ Data (engineers|scientists|analysts) can collaborate in code
○ Non-engineers are empowered to understand business logic
○ More people can contribute!
You can do it in SQL… but should you?
What does the future hold?
● More data workloads will converge around SQL
● Databases will improve support for sophisticated SQL operations
- AI & ML?
- Streaming?
- ….?
● More roles will require SQL knowledge and aptitude
● More & better tools will exist to aid in writing and managing SQL
What do you think?

Más contenido relacionado

La actualidad más candente

SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Databricks
 
The Rise of Vector Data
The Rise of Vector DataThe Rise of Vector Data
The Rise of Vector DataDatabricks
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraDatabricks
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLDatabricks
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschLeveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschDatabricks
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowDatabricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeDatabricks
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing DataDatabricks
 
A High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta LakeA High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta LakeDatabricks
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowDatabricks
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeDatabricks
 

La actualidad más candente (20)

SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
 
The Rise of Vector Data
The Rise of Vector DataThe Rise of Vector Data
The Rise of Vector Data
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & Privacera
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschLeveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
A High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta LakeA High Performance Mutable Engagement Activity Delta Lake
A High Performance Mutable Engagement Activity Delta Lake
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
 

Similar a You Can Do It in SQL

Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!Amanda Lam
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentCrossing Minds
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2Daniel Egan
 
BI on Big Data Presentation
BI on Big Data PresentationBI on Big Data Presentation
BI on Big Data PresentationArcadia Data
 
E-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer ReviewE-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer ReviewIRJET Journal
 
What's new in VisibleThread 3.0
What's new in VisibleThread 3.0What's new in VisibleThread 3.0
What's new in VisibleThread 3.0VisibleThread
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development Open Party
 
A case for teaching SQL to scientists
A case for teaching SQL to scientistsA case for teaching SQL to scientists
A case for teaching SQL to scientistsdhalperi
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentationrenjan131
 
Library management system project
Library management system projectLibrary management system project
Library management system projectAJAY KUMAR
 
CV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQL
CV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQLCV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQL
CV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQLAnantha TMS
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data WarehousingDavide Mauri
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
 
Real-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationReal-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationPatrick Van Renterghem
 

Similar a You Can Do It in SQL (20)

Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time Deployment
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2
 
BI on Big Data Presentation
BI on Big Data PresentationBI on Big Data Presentation
BI on Big Data Presentation
 
E-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer ReviewE-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer Review
 
Mr bi amrp
Mr bi amrpMr bi amrp
Mr bi amrp
 
What's new in VisibleThread 3.0
What's new in VisibleThread 3.0What's new in VisibleThread 3.0
What's new in VisibleThread 3.0
 
Project report
Project report Project report
Project report
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
 
A case for teaching SQL to scientists
A case for teaching SQL to scientistsA case for teaching SQL to scientists
A case for teaching SQL to scientists
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
 
Library management system project
Library management system projectLibrary management system project
Library management system project
 
CV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQL
CV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQLCV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQL
CV_Anantha_Krishnan_Senior_Oracle_ETL_PLSQL
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
Ganesh profile
Ganesh profileGanesh profile
Ganesh profile
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
 
Real-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse AutomationReal-life Customer Cases using Data Vault and Data Warehouse Automation
Real-life Customer Cases using Data Vault and Data Warehouse Automation
 
Ebook9
Ebook9Ebook9
Ebook9
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Último (20)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

You Can Do It in SQL

  • 1. You can do it in SQL
  • 2. Drew Banin (he/him) Chief Product Officer 2 mo’ data mo’ problems @drewbanin on Twitter
  • 3. ● Open core data modeling software ● dbt provides a SQL-first workflow for: - Data modeling - Data testing - Data documentation - And more! You can learn more at getdbt.com We’re the makers of dbt
  • 4. Confessions & disclaimers ● I have never written a line of production Scala code in my life ● I have written a whole lot of SQL in my life ● People who tolerate Python packaging don’t get to complain about the JVM
  • 5. I ❤ SQL , Expressive , Accessible , Universal , Declarative , Thought provoking….?
  • 6. I ❤ SQL , Expressive - SQL reads more or less like a real sentence , Accessible , Universal , Declarative , Thought provoking….?
  • 7. I ❤ SQL , Expressive , Accessible - Lots of people can learn in pretty quickly , Universal , Declarative , Thought provoking….?
  • 8. I ❤ SQL , Expressive , Accessible , Universal - Most databases speak SQL with pretty consistent support , Declarative , Thought provoking….?
  • 9. I ❤ SQL , Expressive , Accessible , Universal , Declarative - No moving parts, limited opportunities for the bad kind of magic , Thought provoking….?
  • 10. I ❤ SQL , Expressive , Accessible , Universal , Declarative , Thought provoking….?
  • 11. Some things you can do in SQL
  • 13. What: convert rows of data into columns of data Why: to derive new metrics from existing rows in a table Pivoting columns
  • 14. What: convert rows of data into columns of data Why: to derive new metrics from existing rows in a table Pivoting columns user_id event_name happened_at 1 signup 2021-01-01 1 purchase 2021-01-02 2 signup 2021-01-03 3 signup 2021-01-04 3 purchase 2021-01-05 1 purchase 2021-01-05
  • 15. What: convert rows of data into columns of data Why: to derive new metrics from existing rows in a table Pivoting columns user_id event_name happened_at 1 signup 2021-01-01 1 purchase 2021-01-02 2 signup 2021-01-03 3 signup 2021-01-04 3 purchase 2021-01-05 1 purchase 2021-01-05
  • 16. What: convert rows of data into columns of data Why: to derive new metrics from existing rows in a table Pivoting columns user_id event_name happened_at 1 signup 2021-01-01 1 purchase 2021-01-02 2 signup 2021-01-03 3 signup 2021-01-04 3 purchase 2021-01-05 1 purchase 2021-01-05 user_id signup_date purchases 1 2021-01-01 2 2 2021-01-02 0 3 2021-01-04 1
  • 17.
  • 19. Unions What: append tables together Why: combine tables into a taller table
  • 20. Unions What: append tables together Why: combine tables into a taller table user_id event_name happened_at 1 signup 2021-01-01 1 purchase 2021-01-02 2 signup 2021-01-03 user_id event_name happened_at 1 app_install 2021-01-01 1 screen_view 2021-01-02 2 app_install 2021-01-03 web_events mobile_events
  • 21. Unions What: append tables together Why: combine tables into a taller table source user_id event_name happened_at mobile 1 app_install 2021-01-01 mobile 1 screen_view 2021-01-02 mobile 2 app_install 2021-01-03 web 1 signup 2021-01-01 web 1 purchase 2021-01-02 web 2 signup 2021-01-03 all_events
  • 22.
  • 24. Window functions What: apply transformations over part of a table Why: use cases abound!
  • 25. Window functions What: apply transformations over part of a table Why: use cases abound! user_id happened_at page_path 1 2021-01-01 /product 1 2021-01-02 /signup 2 2021-01-03 /app pageviews
  • 26. Window functions What: apply transformations over part of a table Why: use cases abound! user_id happened_at page_path 1 2021-01-01 /product 1 2021-01-02 /signup 2 2021-01-03 /docs pageviews user_id first_page_path 1 /product 2 /docs user_touches
  • 27.
  • 28. Helpful window functions ● first_value ● last_value ● lag ● lead ● row_number ● rank ● sum ● min ● max ● avg ● count
  • 29. When you write a lot of SQL...
  • 30. SQL: The bad parts ● No “functions” for code reuse ● Lots of typing → lots of typos ● It might be the most copy/pasted language of all time! ○ Citation needed ● But…. what if we add templating?
  • 31.
  • 32.
  • 33. ● Yes! ● Accessibility is 🔑 ● What happens when you teach people SQL? ○ Data (engineers|scientists|analysts) can collaborate in code ○ Non-engineers are empowered to understand business logic ○ More people can contribute! You can do it in SQL… but should you?
  • 34. What does the future hold? ● More data workloads will converge around SQL ● Databases will improve support for sophisticated SQL operations - AI & ML? - Streaming? - ….? ● More roles will require SQL knowledge and aptitude ● More & better tools will exist to aid in writing and managing SQL
  • 35. What do you think?