SlideShare una empresa de Scribd logo
1 de 11
Lambda Architecture
Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the
problem of computing arbitrary functions on arbitrary data in real time. In a real time system
the requirement is something like this -
result = function (all data)
With increasing volume of data, the query will take a significant amount of time to execute no
matter what resources we have used.
Lambda Architecture uses three layer architecture and a concept of pre-computed views to
solve this problem. Three layers are
● Batch Layer
● Speed Layer
● Serving Layer
Batch Layer
Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views.
Function of batch layer can be summarized as
batch view = function (all data)
Batch layer continuously does this job and updates batch views.
Traffic from Social Media
Serving Layer
Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views.
When batch layer computes new views, they are updated in Serving Layer by Batch Layer.
The Serving Layer can be achieved by using a random access database.
Speed Layer
While batch layer computes batch view, it will not include data which came while re-computing batch views.
The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views.
These views are called real time views.
A Speed Layer can be summarized as
real time view = function (real time view, new data)
So, our final query can be served by speed layer or serving layer.
batch view = function (all data)
real time view = function (real time view, new data)
result = merge (query (batch view), query (real time view))
An Example using Apache Spark
Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture
using Apache Spark to build this system.
Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save
it to Cassandra database table.
Batch.java
Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature.
We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this
real time view to Cassandra for simplicity.
Speed.java :
Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query
Cassandra tables to get the final result in real time.
Unique Page Views
References and image credits
http://www.databasetube.com/database/big-data-lambda-architecture/
Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren

Más contenido relacionado

Más de Quovantis

9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve BlankQuovantis
 
How caring for each design element changes everything!
How caring for each design element changes everything!How caring for each design element changes everything!
How caring for each design element changes everything!Quovantis
 
How to be an amazing presenter
How to be an amazing presenterHow to be an amazing presenter
How to be an amazing presenterQuovantis
 
Quovantis design principles
Quovantis design principlesQuovantis design principles
Quovantis design principlesQuovantis
 
How to succeed as technical lead or development manager
How to succeed as technical lead or development managerHow to succeed as technical lead or development manager
How to succeed as technical lead or development managerQuovantis
 
Frisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkFrisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkQuovantis
 
Who is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureWho is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureQuovantis
 

Más de Quovantis (7)

9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank9 Deadliest Start-up Sins by Steve Blank
9 Deadliest Start-up Sins by Steve Blank
 
How caring for each design element changes everything!
How caring for each design element changes everything!How caring for each design element changes everything!
How caring for each design element changes everything!
 
How to be an amazing presenter
How to be an amazing presenterHow to be an amazing presenter
How to be an amazing presenter
 
Quovantis design principles
Quovantis design principlesQuovantis design principles
Quovantis design principles
 
How to succeed as technical lead or development manager
How to succeed as technical lead or development managerHow to succeed as technical lead or development manager
How to succeed as technical lead or development manager
 
Frisby: Rest API Automation Framework
Frisby: Rest API Automation FrameworkFrisby: Rest API Automation Framework
Frisby: Rest API Automation Framework
 
Who is an architect and Why care about Architecture
Who is an architect and Why care about ArchitectureWho is an architect and Why care about Architecture
Who is an architect and Why care about Architecture
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Lambda Architecture using Apache Spark – with Java code examples

  • 1.
  • 3. Lambda architecture, devised by Nathan Marz, is a layered architecture which solves the problem of computing arbitrary functions on arbitrary data in real time. In a real time system the requirement is something like this - result = function (all data) With increasing volume of data, the query will take a significant amount of time to execute no matter what resources we have used. Lambda Architecture uses three layer architecture and a concept of pre-computed views to solve this problem. Three layers are ● Batch Layer ● Speed Layer ● Serving Layer
  • 4.
  • 5. Batch Layer Batch layer stores immutable master data, computes arbitrary functions on all data and creates batch views. Function of batch layer can be summarized as batch view = function (all data) Batch layer continuously does this job and updates batch views.
  • 6. Traffic from Social Media Serving Layer Purpose of Serving Layer is to store batch views obtained from batch layer and provide random access to batch views. When batch layer computes new views, they are updated in Serving Layer by Batch Layer. The Serving Layer can be achieved by using a random access database. Speed Layer While batch layer computes batch view, it will not include data which came while re-computing batch views. The purpose of Speed layer is to compute incremental views on recent data that is not included in batch views. These views are called real time views. A Speed Layer can be summarized as real time view = function (real time view, new data) So, our final query can be served by speed layer or serving layer. batch view = function (all data) real time view = function (real time view, new data) result = merge (query (batch view), query (real time view))
  • 7.
  • 8. An Example using Apache Spark Suppose we want to build a system to find popular hash tags in a twitter stream, we can implement lambda architecture using Apache Spark to build this system. Batch Layer Implementation - Batch layer will read a file of tweets and calculate hash tag frequency map and will save it to Cassandra database table. Batch.java
  • 9. Speed Layer Implementation - Speed layer can also be written in Apache spark using spark streaming feature. We can get a stream of recent tweets and calculate recent real time view from this stream we can also save this real time view to Cassandra for simplicity. Speed.java :
  • 10. Serving Layer implementation - Serving layer can be implemented as a RESTful web service which will query Cassandra tables to get the final result in real time.
  • 11. Unique Page Views References and image credits http://www.databasetube.com/database/big-data-lambda-architecture/ Big Data Principles and best practices of scalable real time data systems by Nathan Marz and James Warren