SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
© 2023
A Modern & Sustainable
Analytics Stack
W A L D
PyConDE / PyData 2023, April 17th
Florian Wilhelm
Mathematical Modelling
Modern Data Warehousing & Analytics
Personalisation & RecSys
Uncertainty Quantification & Causality
Python Data Stack
OSS Contributor & Creator of PyScaffold
Dr. Florian Wilhelm
• H E A D O F D ATA S C I E N C E
FlorianWilhelm.info
florian.wilhelm@inovex.de
Florian Wilhelm
@FlorianWilhelm
‣ Application Development (Web Platforms,
Mobile Apps, Smart Devices and Robotics, UI/
UX design,Backend Services)
‣ Data Management and Analytics (Business
Intelligence, Big Data, Searches, Data Science
and Deep Learning, Machine Perception and
Artificial Intelligence)
‣ Scalable IT-Infrastructures (IT Engineering,
Cloud Services, DevOps, Replatforming,
Security)
‣ Training and Coaching (inovex Academy)
is an innovation and quality-
driven IT project house with a
focus on digital transformation.
Using technology to
inspire our clients.
And ourselves.
Berlin · Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg · Erlangen
www.inovex.de
V S
Data Warehouse
Data Lake
V S
ETL
ELT
&
C O M PA R I S O N
Data Lake vs Data Warehouse
Data Lake Data Warehouse
Data Structure Raw Processed
Purpose of Data Not yet determined Currently in use
Users Data scientists /
engineers
Business professionals
Accessibility Highly accessible and
quick to update
More complicated and
costly to make changes
C O M PA R I S O N
ETL — Extract, Transform, Load
S TA G I N G A R E A
I N D ATA L A K E
T R A N S F O R M
S Q L R D B M S
N O S Q L D B M S
X M L F I L E S
S A A S P L AT F O R M
E X T R A C T
D ATA
WA R E H O U S E
L O A D
A N A LY T I C S
A N A L Y S E
C O M PA R I S O N
ELT — Extract, Load, Transform
D ATA
WA R E H O U S E
T R A N S F O R M
S Q L R D B M S
N O S Q L D B M S
X M L F I L E S
S A A S P L AT F O R M
E X T R A C T L O A D
A N A LY T I C S
A N A L Y S E
P R O G R E S S
The Evolution of Databricks & Snowflake
D ATA L A K E
& E T L
D ATA WA R E H O U S E
& E LT
P R O G R E S S
The Evolution of Databricks & Snowflake
D ATA B R I C K S
L A K E H O U S E
S N OW F L A K E
D ATA C L O U D
U N I F I C AT I O N
O F D ATA WA R E H O U S E & L A K E
W A L D
S T A C K
W A L D
S T A C K
WA L D S TA C K
A Modern & Sustainable Analytics Stack
Warehouse like
Snowflake, Google
BigQuery, Databricks
Airbyte for
Integration of
hundreds of data
sources
Lightdash for
Business
Intelligence
DBT for data
transformations
WA L D S TA C K
WALD based on Google Big Query
‣ fully managed enterprise data
warehouse
‣ only available on Google Cloud
Platform
‣ uses dbt-bigquery to run Python
UDFs with Dataproc using
PySpark
WA L D S TA C K
WALD based on Databricks
‣ uses Databrick’s SQL warehouse
‣ available on all major cloud
providers
‣ uses dbt-databricks to run Python
UDFs with PySpark
WA L D S TA C K
WALD based on Snowflake
‣ comes with all the benefits of
Snowflake, e.g. governance, easy
collaboration, security, near zero
maintenance
‣ runs on all major cloud providers
‣ tons of great documentation
‣ the easiest to use and full-
featured option to get started
with WALD
WA L D S TA C K
Modern Analytics Stack
DATA CLOUD
DATA SOURCES
E X T R A C T L O A D T R A N S F O R M A n a l y s e
SNOWPARK
WA L D S TA C K
Example
Use-Case
>find all code & details under
https://waldstack.org
F O R M U L A 1
Analysis and Position Prediction
1. Upload data using Airbyte to
Snowflake
2. Use dbt for some basic analysis
3. Use dbt with Snowpark for some
ML to predict future position
4. Visualise our results with
Lightdash
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
S N OW F L A K E
Snowflake Setup
‣ activate the Anaconda Packages
(Preview Feature)
‣ create a warehouse to be used
for dbt
‣ that’s it already!
A I R B Y T E
Ingesting the
Data with Airbyte
Simply select
‣ file as source
‣ Snowflake as destination
and hit Launch
P Y T H O N + S N OW PA R K
Setup Python/Snowpark for dbt
‣ just install along dbt also
‣ dbt-snowflake
‣ snowflake-snowpark-
python
‣ snowflake-connector-
python
‣ and configure Snowflake in your
profiles.yml
P Y T H O N + S N OW PA R K
Training a Python Model in dbt/Snowpark
D B T / S N OW PA R K
Predicting with a trained Model in dbt/Snowpark
L I G H T D A S H
Visualisation with Lightdash
Lightdash pulls dimensions and metrics from dbt.
Lightdash
‣ runs ad-hoc queries
‣ creates charts from queries
‣ creates dashboards from charts
‣ organises everything in spaces
What are the avg lap times split by year and grand prix?
M E T R I C D I M E N S I O N S
L I G H T D A S H
Main View of Lightdash
L I G H T D A S H
Analysing the Average Lap Times
L I G H T D A S H
Example of a Simple Dashboard
F O R M U L A 1
Analysis and Position Prediction
One more thing…
O N M O R E T H I N G O N …
Streamlit
‣ provides faster way to build and
share data apps
‣ was recently acquired by
Snowflake
‣ might be a nice addition to
Lightdash for interactive data
apps
Dr. Florian Wilhelm, 2023
WA L D S TA C K
Conclusion
The WALD stack combines 4 powerful
technologies that work very well
together.
The WALD stack is flexible enough for
many use-cases from analytics to
building a full-fledged end-to-end
data product at scale.
C H E E R S T O T H E C O M M U N I T Y
Credits & Resources
‣ python-snowpark-formula
Github repo by Hope Watson
from dbt labs
‣ kind support by Michael Gorkow
& Marko Cavar from Snowflake
‣ these awesome slides by
Michael Hofmann from inovex
© 2023
Thank you!
Dr. Florian Wilhelm
Head of Data Science
P y C o n D E / P y D a t a 2 0 2 3
inovex.de
florian.wilhelm@inovex.de
@inovexlife
@inovexgmbh
waldstack.org

Más contenido relacionado

La actualidad más candente

Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data eraPieter De Leenheer
 
AI today and its power to transform healthcare
AI today and its power to transform healthcareAI today and its power to transform healthcare
AI today and its power to transform healthcareBonnie Cheuk
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfVaticle
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data GovernanceDATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
 
La Big Data et ses applications
La Big Data et ses applicationsLa Big Data et ses applications
La Big Data et ses applicationsAffinity Engine
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtJon Su
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...DATAVERSITY
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best PracicesAndy Petrella
 
The Challenges of model based and traditional plm
The Challenges of model based and traditional plmThe Challenges of model based and traditional plm
The Challenges of model based and traditional plmJos Voskuil
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019DataKitchen
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Digital transformation in 50 soundbites
Digital transformation in 50 soundbitesDigital transformation in 50 soundbites
Digital transformation in 50 soundbitesJulie Dodd
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying dataHans Verstraeten
 

La actualidad más candente (20)

Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
 
AI today and its power to transform healthcare
AI today and its power to transform healthcareAI today and its power to transform healthcare
AI today and its power to transform healthcare
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data Governance
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
La Big Data et ses applications
La Big Data et ses applicationsLa Big Data et ses applications
La Big Data et ses applications
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
The Challenges of model based and traditional plm
The Challenges of model based and traditional plmThe Challenges of model based and traditional plm
The Challenges of model based and traditional plm
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Digital transformation in 50 soundbites
Digital transformation in 50 soundbitesDigital transformation in 50 soundbites
Digital transformation in 50 soundbites
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 

Similar a WALD: A Modern & Sustainable Analytics Stack

How to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven OrganizationHow to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven OrganizationWarrenCruz3
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSjohnpainter_id_au
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Denodo
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDenodo
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AITrivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AITrivadis
 
Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai MichaelRoenker
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopSkillspeed
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Cloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's WorldCloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's WorldKirti Khanna
 
How to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBHow to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBVoltDB
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run GraphVaticle
 
The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeEric Kavanagh
 
Process big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public CloudProcess big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public CloudOVHcloud
 

Similar a WALD: A Modern & Sustainable Analytics Stack (20)

How to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven OrganizationHow to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven Organization
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AITrivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AI
 
Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Cloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's WorldCloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's World
 
Data democratised
Data democratisedData democratised
Data democratised
 
What is CRATE?
What is CRATE?What is CRATE?
What is CRATE?
 
How to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBHow to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDB
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
 
The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
 
Process big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public CloudProcess big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public Cloud
 

Más de Florian Wilhelm

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingFlorian Wilhelm
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Florian Wilhelm
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...Florian Wilhelm
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Florian Wilhelm
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Florian Wilhelm
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AIFlorian Wilhelm
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...Florian Wilhelm
 
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle MarketplaceDeep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle MarketplaceFlorian Wilhelm
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Florian Wilhelm
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and ProgrammingFlorian Wilhelm
 
Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Florian Wilhelm
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19Florian Wilhelm
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Florian Wilhelm
 

Más de Florian Wilhelm (16)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...
 
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle MarketplaceDeep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and Programming
 
Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

WALD: A Modern & Sustainable Analytics Stack

  • 1. © 2023 A Modern & Sustainable Analytics Stack W A L D PyConDE / PyData 2023, April 17th Florian Wilhelm
  • 2. Mathematical Modelling Modern Data Warehousing & Analytics Personalisation & RecSys Uncertainty Quantification & Causality Python Data Stack OSS Contributor & Creator of PyScaffold Dr. Florian Wilhelm • H E A D O F D ATA S C I E N C E FlorianWilhelm.info florian.wilhelm@inovex.de Florian Wilhelm @FlorianWilhelm
  • 3. ‣ Application Development (Web Platforms, Mobile Apps, Smart Devices and Robotics, UI/ UX design,Backend Services) ‣ Data Management and Analytics (Business Intelligence, Big Data, Searches, Data Science and Deep Learning, Machine Perception and Artificial Intelligence) ‣ Scalable IT-Infrastructures (IT Engineering, Cloud Services, DevOps, Replatforming, Security) ‣ Training and Coaching (inovex Academy) is an innovation and quality- driven IT project house with a focus on digital transformation. Using technology to inspire our clients. And ourselves. Berlin · Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg · Erlangen www.inovex.de
  • 4. V S Data Warehouse Data Lake V S ETL ELT &
  • 5. C O M PA R I S O N Data Lake vs Data Warehouse Data Lake Data Warehouse Data Structure Raw Processed Purpose of Data Not yet determined Currently in use Users Data scientists / engineers Business professionals Accessibility Highly accessible and quick to update More complicated and costly to make changes
  • 6. C O M PA R I S O N ETL — Extract, Transform, Load S TA G I N G A R E A I N D ATA L A K E T R A N S F O R M S Q L R D B M S N O S Q L D B M S X M L F I L E S S A A S P L AT F O R M E X T R A C T D ATA WA R E H O U S E L O A D A N A LY T I C S A N A L Y S E
  • 7. C O M PA R I S O N ELT — Extract, Load, Transform D ATA WA R E H O U S E T R A N S F O R M S Q L R D B M S N O S Q L D B M S X M L F I L E S S A A S P L AT F O R M E X T R A C T L O A D A N A LY T I C S A N A L Y S E
  • 8. P R O G R E S S The Evolution of Databricks & Snowflake D ATA L A K E & E T L D ATA WA R E H O U S E & E LT
  • 9. P R O G R E S S The Evolution of Databricks & Snowflake D ATA B R I C K S L A K E H O U S E S N OW F L A K E D ATA C L O U D U N I F I C AT I O N O F D ATA WA R E H O U S E & L A K E
  • 10. W A L D S T A C K
  • 11. W A L D S T A C K WA L D S TA C K A Modern & Sustainable Analytics Stack Warehouse like Snowflake, Google BigQuery, Databricks Airbyte for Integration of hundreds of data sources Lightdash for Business Intelligence DBT for data transformations
  • 12. WA L D S TA C K WALD based on Google Big Query ‣ fully managed enterprise data warehouse ‣ only available on Google Cloud Platform ‣ uses dbt-bigquery to run Python UDFs with Dataproc using PySpark
  • 13. WA L D S TA C K WALD based on Databricks ‣ uses Databrick’s SQL warehouse ‣ available on all major cloud providers ‣ uses dbt-databricks to run Python UDFs with PySpark
  • 14. WA L D S TA C K WALD based on Snowflake ‣ comes with all the benefits of Snowflake, e.g. governance, easy collaboration, security, near zero maintenance ‣ runs on all major cloud providers ‣ tons of great documentation ‣ the easiest to use and full- featured option to get started with WALD
  • 15. WA L D S TA C K Modern Analytics Stack DATA CLOUD DATA SOURCES E X T R A C T L O A D T R A N S F O R M A n a l y s e SNOWPARK
  • 16. WA L D S TA C K Example Use-Case >find all code & details under https://waldstack.org
  • 17. F O R M U L A 1 Analysis and Position Prediction 1. Upload data using Airbyte to Snowflake 2. Use dbt for some basic analysis 3. Use dbt with Snowpark for some ML to predict future position 4. Visualise our results with Lightdash
  • 18. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 19. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 20. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 21. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 22. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 23. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 24. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 25. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 26. S N OW F L A K E Snowflake Setup ‣ activate the Anaconda Packages (Preview Feature) ‣ create a warehouse to be used for dbt ‣ that’s it already!
  • 27. A I R B Y T E Ingesting the Data with Airbyte Simply select ‣ file as source ‣ Snowflake as destination and hit Launch
  • 28. P Y T H O N + S N OW PA R K Setup Python/Snowpark for dbt ‣ just install along dbt also ‣ dbt-snowflake ‣ snowflake-snowpark- python ‣ snowflake-connector- python ‣ and configure Snowflake in your profiles.yml
  • 29. P Y T H O N + S N OW PA R K Training a Python Model in dbt/Snowpark
  • 30. D B T / S N OW PA R K Predicting with a trained Model in dbt/Snowpark
  • 31. L I G H T D A S H Visualisation with Lightdash Lightdash pulls dimensions and metrics from dbt. Lightdash ‣ runs ad-hoc queries ‣ creates charts from queries ‣ creates dashboards from charts ‣ organises everything in spaces What are the avg lap times split by year and grand prix? M E T R I C D I M E N S I O N S
  • 32. L I G H T D A S H Main View of Lightdash
  • 33. L I G H T D A S H Analysing the Average Lap Times
  • 34. L I G H T D A S H Example of a Simple Dashboard
  • 35. F O R M U L A 1 Analysis and Position Prediction
  • 37. O N M O R E T H I N G O N … Streamlit ‣ provides faster way to build and share data apps ‣ was recently acquired by Snowflake ‣ might be a nice addition to Lightdash for interactive data apps
  • 38. Dr. Florian Wilhelm, 2023 WA L D S TA C K Conclusion The WALD stack combines 4 powerful technologies that work very well together. The WALD stack is flexible enough for many use-cases from analytics to building a full-fledged end-to-end data product at scale.
  • 39. C H E E R S T O T H E C O M M U N I T Y Credits & Resources ‣ python-snowpark-formula Github repo by Hope Watson from dbt labs ‣ kind support by Michael Gorkow & Marko Cavar from Snowflake ‣ these awesome slides by Michael Hofmann from inovex
  • 40. © 2023 Thank you! Dr. Florian Wilhelm Head of Data Science P y C o n D E / P y D a t a 2 0 2 3 inovex.de florian.wilhelm@inovex.de @inovexlife @inovexgmbh waldstack.org