Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

•

1 recomendación•2,077 vistas

The term “Lambda Architecture” stands for a generic, scalable and fault-tolerant data processing architecture. As the hyper-scale now offers a various PaaS services for data ingestion, storage and processing, the need for a revised, cloud-native implementation of the lambda architecture is arising. In this talk we demonstrate the blueprint for such an implementation in Microsoft Azure, with Azure Databricks — a PaaS Spark offering – as a key component. We go back to some core principles of functional programming and link them to the capabilities of Apache Spark for various end-to-end big data analytics scenarios. We also illustrate the “Lambda architecture in use” and the associated tread-offs using the real customer scenario – Rijksmuseum in Amsterdam – a terabyte-scale Azure-based data platform handles data from 2.500.000 visitors per year.

Datos y análisis

Andrei Varanovich, InSpark
Lambda Architecture in
the Cloud with Azure
Databricks
#SAISDev6

Selfie
Data & AI Lead
@DrGigabit
andrei.varanovich
andrei.varanovich@inspark.nl
Data
Programmability
Cloud
High-
performance
teams
Neural
Networks
##SAISDev6

Big Data problem is many
small data problems
##SAISDev6

4
2.500.000 visitors per year
8.000 objects of art and history
1.000.000 objects stored from
the year 1200
##SAISDev6

Under the hood
5
Retail
Sponsorship engagement
Occupancy rate
Service Management
Incident Management
Marketing Performance
Warehouse Management
Capacity Planning
Financial Performance
Ticket Sales
##SAISDev6

THE DATA JOURNEY
6##SAISDev6
Consolidation
Insights
Innovation
Consolidate data
in a centralized
store
Organizational
processes and
efficiency
New ideas,
leveraging
machine learning

7
IN THE NEED FOR THE PLATFORM
START
• Begin small and
focused
• Prove value
GROW
• Grow organically
as more use cases
arise
SCALE
• Go production and
scale to the revel
required
We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and
operations expenses, and put business value discovery first. The platform should support the [big]data
projects in any stage, without the need to reengineer the whole solution.
##SAISDev6

Lambda Architecture on Azure
8
INGEST BATCH
INGEST STREAM STORE ANALYZE
Azure Data
Lake Store
Azure Data Factory
Azure Databricks
(managed Spark;
batch & streaming)
Social
LOB
Graph
IoT
Image
CRM
AI models/ APIs
Cognitive Services
Azure container
Service & registry
Insight sharing
Power BI/ other tools
Event Hubs
Stream
Batch
SECURITY &
MANAGEMENT Azure Log
Analytics
Azure Graph
API
Cost
monitoring
Azure Active Directory
##SAISDev6

9
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6

10
Simplicity is the ultimate sophistication
Leonardo da Vinci
##SAISDev6

##SAISDev6 13
Composition of
functions is applying
one function to the
result of another

14
f(x) = x+1
(g º f)(x) = g(f(x))
(g º f)(x) = (x+1)2
g(x) = x2
input input+1 input2
##SAISDev6
input+1
(input+1)2

15
Transformation pipeline as a series of transitions
s1
f3
f2
f1
sum

Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Production jobs & workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
##SAISDev6

Conclusions
17
… with proper design, the features come cheaply. This
approach is arduous, but continues to succeed.
—Dennis Ritchie
##SAISDev6
• Standardization on Apache Spark allows us to move forward without
introducing extra complexity.
• 100% PaaS offering is important – no need to maintain the
infrastructure. All components we use offered as PaaS on Azure.
• Data pipelines as function composition allows us to ensure end-to-end
consistency and spot the errors quickly.
• Saving intermediate states allows to quickly inspect the data sets.

Más contenido relacionado

La actualidad más candente

Azure data analytics platform - A reference architecture Rajesh Kumar

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen

Azure data platform overviewJames Serra

Introduction to Oracle Cloudjohnnhernandez

DW Migration Webinar-March 2022.pptxDatabricks

Azure Synapse 101 Webinar PresentationMatthew W. Bowers

Building an Effective Data Warehouse ArchitectureJames Serra

Changing the game with cloud dwelephantscale

Introducing Azure SQL DatabaseJames Serra

The ABCs of Treating Data as ProductDATAVERSITY

DataMinds 2022 Azure Purview Erwin de KreukErwin de Kreuk

Introduction to Azure DatabricksJames Serra

Benefits of the Azure cloudJames Serra

Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI

1- Introduction of Azure data factory.pptxBRIJESH KUMAR

Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock

Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner

Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY

Microsoft PurviewMohammed Chaaraoui

Snowflake Data Science and AI/ML at ScaleAdam Doyle

La actualidad más candente (20)

Azure data analytics platform - A reference architecture

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...

Azure data platform overview

Introduction to Oracle Cloud

DW Migration Webinar-March 2022.pptx

Azure Synapse 101 Webinar Presentation

Building an Effective Data Warehouse Architecture

Changing the game with cloud dw

Introducing Azure SQL Database

The ABCs of Treating Data as Product

DataMinds 2022 Azure Purview Erwin de Kreuk

Introduction to Azure Databricks

Benefits of the Azure cloud

Migrating on premises workload to azure sql database

1- Introduction of Azure data factory.pptx

Data Mesh Part 4 Monolith to Mesh

Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture

Data Architecture, Solution Architecture, Platform Architecture — What’s the ...

Microsoft Purview

Snowflake Data Science and AI/ML at Scale

Similar a Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Managing data analytics in a hybrid cloudKaran Singh

Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks

Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks

Talend introduction v1Softnix Technology

Microsoft R Server for Data ScienceaData Science Thailand

132177_16x9_GlobalAlliances_Presentation_NetApp_01_D[1]Ruth White-Cabbell

Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis

Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks

NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services

Trivadis - Microsoft Transform your data estate with cloud, data and AITrivadis

Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey

Introduction to Azure Synapse WebinarPeter Ward

EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads Srikanth Ramakrishnan

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

Be the Data Hero in Your Organization with SAP and CA Analytic SolutionsCA Technologies

SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?Denodo

ASUG SAPPHIRENOW 2017 - SAP Leonardo Internet of Things - Briefing BookPushkar Ranjan

Analytics in a Day Ft. Synapse Virtual WorkshopCCG

Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent

SAP IQ 16 Product AnnoucementDobler Consulting

Similar a Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich (20)

Managing data analytics in a hybrid cloud

Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...

Databricks + Snowflake: Catalyzing Data and AI Initiatives

Talend introduction v1

Microsoft R Server for Data Sciencea

132177_16x9_GlobalAlliances_Presentation_NetApp_01_D[1]

Achieving Business Value by Fusing Hadoop and Corporate Data

Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...

NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017

Trivadis - Microsoft Transform your data estate with cloud, data and AI

Azure Databricks - An Introduction (by Kris Bock)

Introduction to Azure Synapse Webinar

EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads

Azure Databricks & Spark @ Techorama 2018

Be the Data Hero in Your Organization with SAP and CA Analytic Solutions

SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?

ASUG SAPPHIRENOW 2017 - SAP Leonardo Internet of Things - Briefing Book

Analytics in a Day Ft. Synapse Virtual Workshop

Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...

SAP IQ 16 Product Annoucement

Más de Databricks

Data Lakehouse Symposium | Day 1 | Part 1Databricks

Data Lakehouse Symposium | Day 1 | Part 2Databricks

Data Lakehouse Symposium | Day 2Databricks

Data Lakehouse Symposium | Day 4Databricks

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Democratizing Data Quality Through a Centralized PlatformDatabricks

Learn to Use Databricks for Data ScienceDatabricks

Why APM Is Not the Same As ML MonitoringDatabricks

The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks

Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks

Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks

Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks

Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks

Sawtooth Windows for Feature AggregationsDatabricks

Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Raven: End-to-end Optimization of ML Prediction QueriesDatabricks

Processing Large Datasets for ADAS Applications using Apache SparkDatabricks

Massive Data Processing in Adobe Using Delta LakeDatabricks

Machine Learning CI/CD for Email Attack DetectionDatabricks

Más de Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 4

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Machine Learning CI/CD for Email Attack Detection

Último

Easter Eggs From Star Wars and in cars 1 and 217djon017

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics

Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann

RadioAdProWritingCinderellabyButleri.pdfgstagge

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss

modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

ASML's Taxonomy Adventure by Daniel Cantervoginip

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

1. Andrei Varanovich, InSpark Lambda Architecture in the Cloud with Azure Databricks #SAISDev6

2. Selfie Data & AI Lead @DrGigabit andrei.varanovich andrei.varanovich@inspark.nl Data Programmability Cloud High- performance teams Neural Networks ##SAISDev6

3. Big Data problem is many small data problems ##SAISDev6

4. 4 2.500.000 visitors per year 8.000 objects of art and history 1.000.000 objects stored from the year 1200 ##SAISDev6

5. Under the hood 5 Retail Sponsorship engagement Occupancy rate Service Management Incident Management Marketing Performance Warehouse Management Capacity Planning Financial Performance Ticket Sales ##SAISDev6

6. THE DATA JOURNEY 6##SAISDev6 Consolidation Insights Innovation Consolidate data in a centralized store Organizational processes and efficiency New ideas, leveraging machine learning

7. 7 IN THE NEED FOR THE PLATFORM START • Begin small and focused • Prove value GROW • Grow organically as more use cases arise SCALE • Go production and scale to the revel required We are in the need of the truly elastic data platform, to avoid any upfront planning, deployment and operations expenses, and put business value discovery first. The platform should support the [big]data projects in any stage, without the need to reengineer the whole solution. ##SAISDev6

8. Lambda Architecture on Azure 8 INGEST BATCH INGEST STREAM STORE ANALYZE Azure Data Lake Store Azure Data Factory Azure Databricks (managed Spark; batch & streaming) Social LOB Graph IoT Image CRM AI models/ APIs Cognitive Services Azure container Service & registry Insight sharing Power BI/ other tools Event Hubs Stream Batch SECURITY & MANAGEMENT Azure Log Analytics Azure Graph API Cost monitoring Azure Active Directory ##SAISDev6

9. 9 Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6

10. 10 Simplicity is the ultimate sophistication Leonardo da Vinci ##SAISDev6

11. 11

12. LAMBDA TO THE RESCUE 12 ##SAISDev6

13. ##SAISDev6 13 Composition of functions is applying one function to the result of another

14. 14 f(x) = x+1 (g º f)(x) = g(f(x)) (g º f)(x) = (x+1)2 g(x) = x2 input input+1 input2 ##SAISDev6 input+1 (input+1)2

15. 15 Transformation pipeline as a series of transitions s1 f3 f2 f1 sum

16. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Production jobs & workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST ##SAISDev6

17. Conclusions 17 … with proper design, the features come cheaply. This approach is arduous, but continues to succeed. —Dennis Ritchie ##SAISDev6 • Standardization on Apache Spark allows us to move forward without introducing extra complexity. • 100% PaaS offering is important – no need to maintain the infrastructure. All components we use offered as PaaS on Azure. • Data pipelines as function composition allows us to ensure end-to-end consistency and spot the errors quickly. • Saving intermediate states allows to quickly inspect the data sets.

18. Thank you! 18 ##SAISDev6

19. Questions? 19 ##SAISDev6

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Similar a Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich (20)

Más de Databricks

Más de Databricks (20)

Último

Último (20)

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich