SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
diciembre 2010
Kappa Architecture
Our Experience
Who am I
CDO ASPgems
Former President of Hispalinux (Spanish
LUG)
Author “La Pastilla Roja” first spanish book
about Free Software.
Menu
A little context about Kappa Architecture
What’s Kappa Architecture
What is not Kappa Architecture
How we implement it
Real use cases with KA
A little context
July 2, 2014 Jay Kreps coined the term
Kappa Architecture in an article for
O’reilly Radar
Who is Jay Kreps
Jay has been involved in lots of projects:
Author of the essay:
The Log: What every software engineer
should know about real-time data's
unifying abstraction (12/16/2013)
https://engineering.linkedin.com/distributed-systems/log-what-every-software-
engineer-should-know-about-real-time-datas-unifying
Jay Kreps
Author of the book: I ♥ Logs
Jay Kreps
Involved with projects as:
Apache Kafka
Apache Samza
Voldemort
Azkaban
Ex-Linkedin
Now co-founder and CEO of Confluent
Lambda Architecture
Look something like this:
https://www.mapr.com/developercentral/lambda-architecture
Lambda Architecture
Batch layer that provides the following
functionality
managing the master dataset, an
immutable, append-only set of raw
data.
pre-computing arbitrary query
functions, called batch views.
https://www.mapr.com/developercentral/lambda-architecture
Lambda Architecture
Serving layer
This layer indexes the batch views so
that they can be queried in ad hoc
with low latency.
Speed layer
This layer accommodates all requests
that are subject to low latency
requirements. Using fast and
incremental algorithms, the speed
layer deals with recent data only.
Lambda Architecture
batch layer datasets can be in a distributed
filesystem, while MapReduce can be used to create
batch views that can be fed to the serving layer.
The serving layer can be implemented using NoSQL
technologies such as HBase,Apache Druid, etc.
Querying can be implemented by technologies such as
Apache Drill or Impala
Speed layer can be realized with data streaming
technologies such as Apache Storm or Spark Streaming
https://www.mapr.com/developercentral/lambda-architecture
Pros of Lambda
Architecture
Retain the input data unchanged.
Think about modeling data transformations,
series of data states from the original input.
Lambda architecture take in account the problem
of reprocessing data.
this happens all the time, the code will
change, and you will need to reprocess all the
information. Lots of reasons and you will need
to live with this.
Cons of Lambda
Architecture
Maintain the code that need to produce the same
result from two complex distributed system is
painful.
Very different code for MapReduce and Storm/
Apache Spark
Not only is about different code, is also about
debugging and interaction with other products like
(hive, Oozie, Cascading, etc)
At the end is a problem about different and
diverging programming paradigms.
So what is Kappa
Architecture
The proposal of Jay Kreps is so simple:
Use kafka (or other system) that will let you
retain the full log of the data you need to
reprocess.
When you want to do the reprocessing, start a
second instance of your stream processing job
that starts processing from the beginning of
the retained data, but direct this output data to
a new output table.
So what is Kappa
Architecture
part II
When the second job has caught up, switch the
application to read from the new table.
Stop the old version of the job, and delete the
old output table.
So what is Kappa
Architecture
This architecture looks something like this:
So what is Kappa
Architecture
The first benefit is that only you need to
reprocessing only when you change the code.
You can check if the new version is working ok and
if not reverse to the old output table.
You can mirror a Kafka topic to HDFS so you are
not limited to the Kafka retention configuration.
You have only a code to maintain with an unique
framework.
So what is Kappa
Architecture
The real advantage is not about efficiency at all
(You will need extra temporarily storage when
reprocessing for example) is allowing your team
to develop, test, debug and operate their systems
on top of a single processing framework.
What is not Kappa
Architecture
Is not a silver bullet to solve every problem at
Big Data.
Is not a list of prescriptions of technologies. You
can implement with your favorite frameworks.
Is not a rigid set of rules. But helps to maintain
the complex projects simple.
How we use Kappa
Architecture
We start working with projects with a complex
structure like Linkedin looks at early stage.
That’s very usual.
How we use Kappa
Architecture
How we use Kappa
Architecture
We try to refactoring the data flows to fix in a
Kappa Architecture.
How we use Kappa
Architecture
How we use Kappa
Architecture
We use Kafka as Stream Data Platform
Instead of Samza we feel more comfortable with
Spark Streaming.
At ASPGems we choose Apache Spark as our
Analytics Engine and not only for Spark
Streaming.
How we use Kappa
Architecture
At the end, Kappa Architecture is design pattern
for us.
We use/clone this pattern in almost our projects.
We have projects of every size, volume of data
or speed needing and fix with the Kappa
Architecture.
Use Cases
Telefónica - MSS
We use KA to calculate near real time KPIs,
SLAs related with the managed security system.
We simplify the data flow of the input data.
Kafka in the streaming data platform.
As MPP we use CassandraDB.
IOT - OBD II
One of our clients install On Board Devices in
the cars of its customers.
We implement an API to got all the information
in real time and inject the information in Kafka.
The business rules are implemented in a CEP
running into Apache Spark Streaming.
As MPP we use Elastic Search.
Insurance Company
We implement Kappa Architecture to process
click stream in real time and clustering users
We show content and offers that better fix users
Energy Facility
We implement Kappa Architecture to process
and predict energy consume.
Our customer include energy storage systems
and we got all the information about energy
storage (ultra-capacitors and batteries).
We process this information to calculate the
effective lifetime of the components and its
degradation.
Questions
diciembre 2010
Thank you
Juantomás García
juantomas@aspgems.com
@juantomas

Más contenido relacionado

La actualidad más candente

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 

La actualidad más candente (20)

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 

Destacado

Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016LibreCon
 
El software como acción humana
El software como acción humanaEl software como acción humana
El software como acción humanaOpenSistemas
 
El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.OpenSistemas
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosOpenSistemas
 
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...Socialmetrix
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersAhsan Javed Awan
 
Bai tap thuc_hanh_excel_2010
Bai tap thuc_hanh_excel_2010Bai tap thuc_hanh_excel_2010
Bai tap thuc_hanh_excel_2010mainth_gtvt
 
Real time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid CloudReal time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid CloudNeeraj Sabharwal
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to ProductionVinoth Chandar
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector saludBEEVA_es
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 

Destacado (20)

Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
El software como acción humana
El software como acción humanaEl software como acción humana
El software como acción humana
 
El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.El futuro Data Driven en e-Learning y RR.HH.
El futuro Data Driven en e-Learning y RR.HH.
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
 
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
Construyendo una Infraestructura de Big Data rentable y escalable (la evoluci...
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Arquitectura Lambda
Arquitectura LambdaArquitectura Lambda
Arquitectura Lambda
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
 
Bai tap thuc_hanh_excel_2010
Bai tap thuc_hanh_excel_2010Bai tap thuc_hanh_excel_2010
Bai tap thuc_hanh_excel_2010
 
Real time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid CloudReal time data ingestion and Hybrid Cloud
Real time data ingestion and Hybrid Cloud
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector salud
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 

Similar a ASPgems - kappa architecture

Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtssiddharth30121
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewMario Cartia
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Jyotasana Bharti
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the SurfaceJosi Aranda
 

Similar a ASPgems - kappa architecture (20)

Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
Data streaming
Data streamingData streaming
Data streaming
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
AI at Scale
AI at ScaleAI at Scale
AI at Scale
 
Module01
 Module01 Module01
Module01
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 

Más de Juantomás García Molina

#AbadIA machine learning pipelines commit conf 2019
#AbadIA   machine learning pipelines commit conf 2019#AbadIA   machine learning pipelines commit conf 2019
#AbadIA machine learning pipelines commit conf 2019Juantomás García Molina
 
AbadIA: the abbey of the crime AI - GDG Cloud London 2018
AbadIA:  the abbey of the crime AI - GDG Cloud London 2018AbadIA:  the abbey of the crime AI - GDG Cloud London 2018
AbadIA: the abbey of the crime AI - GDG Cloud London 2018Juantomás García Molina
 
#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018
#AbadIA:  the abbey of the crime AI - IO18 extended madrid 2018#AbadIA:  the abbey of the crime AI - IO18 extended madrid 2018
#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018Juantomás García Molina
 
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018Juantomás García Molina
 
AbadIA: the abbey of the crime AI - Vaas Madrid 2018
AbadIA: the abbey of the crime AI - Vaas Madrid 2018AbadIA: the abbey of the crime AI - Vaas Madrid 2018
AbadIA: the abbey of the crime AI - Vaas Madrid 2018Juantomás García Molina
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina
 
Codemotion madrid 2017 Arquitectura kappa 2.0
Codemotion madrid 2017  Arquitectura kappa 2.0Codemotion madrid 2017  Arquitectura kappa 2.0
Codemotion madrid 2017 Arquitectura kappa 2.0Juantomás García Molina
 
Meetup big data developers 2017 madrid - spark real use cases
Meetup big data developers 2017 madrid - spark real use casesMeetup big data developers 2017 madrid - spark real use cases
Meetup big data developers 2017 madrid - spark real use casesJuantomás García Molina
 
Gdg cloud london 2017 kappa architecture 2.0 copia
Gdg cloud london 2017   kappa architecture 2.0 copiaGdg cloud london 2017   kappa architecture 2.0 copia
Gdg cloud london 2017 kappa architecture 2.0 copiaJuantomás García Molina
 
Datascience lab 2017 odessa kappa architecture 2.0
Datascience lab 2017 odessa   kappa architecture 2.0Datascience lab 2017 odessa   kappa architecture 2.0
Datascience lab 2017 odessa kappa architecture 2.0Juantomás García Molina
 
Databeers madrid 2017 - Paas pigeons as a service
Databeers madrid 2017 - Paas pigeons as a serviceDatabeers madrid 2017 - Paas pigeons as a service
Databeers madrid 2017 - Paas pigeons as a serviceJuantomás García Molina
 

Más de Juantomás García Molina (20)

#AbadIA machine learning pipelines commit conf 2019
#AbadIA   machine learning pipelines commit conf 2019#AbadIA   machine learning pipelines commit conf 2019
#AbadIA machine learning pipelines commit conf 2019
 
AbadIA - sphere it krakow 2019
AbadIA -   sphere it krakow 2019AbadIA -   sphere it krakow 2019
AbadIA - sphere it krakow 2019
 
AbadIA ING Direct - Madrid 2019
AbadIA ING Direct - Madrid 2019AbadIA ING Direct - Madrid 2019
AbadIA ING Direct - Madrid 2019
 
AbadIA US Secret Tour - Pittsburgh'19
AbadIA US Secret Tour - Pittsburgh'19AbadIA US Secret Tour - Pittsburgh'19
AbadIA US Secret Tour - Pittsburgh'19
 
From alpha go to alpha zero TLP innova 2018
From alpha go to alpha zero  TLP innova 2018From alpha go to alpha zero  TLP innova 2018
From alpha go to alpha zero TLP innova 2018
 
AbadIA: the abbey of the crime AI - GDG Cloud London 2018
AbadIA:  the abbey of the crime AI - GDG Cloud London 2018AbadIA:  the abbey of the crime AI - GDG Cloud London 2018
AbadIA: the abbey of the crime AI - GDG Cloud London 2018
 
#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018
#AbadIA:  the abbey of the crime AI - IO18 extended madrid 2018#AbadIA:  the abbey of the crime AI - IO18 extended madrid 2018
#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018
 
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018
 
AbadIA: the abbey of the crime AI - Vaas Madrid 2018
AbadIA: the abbey of the crime AI - Vaas Madrid 2018AbadIA: the abbey of the crime AI - Vaas Madrid 2018
AbadIA: the abbey of the crime AI - Vaas Madrid 2018
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Alpha zero - London 2018
Alpha zero  - London 2018 Alpha zero  - London 2018
Alpha zero - London 2018
 
Codemotion madrid 2017 Arquitectura kappa 2.0
Codemotion madrid 2017  Arquitectura kappa 2.0Codemotion madrid 2017  Arquitectura kappa 2.0
Codemotion madrid 2017 Arquitectura kappa 2.0
 
JBCN barcelona 2017 kappa architecture 2.0
JBCN barcelona 2017 kappa architecture 2.0JBCN barcelona 2017 kappa architecture 2.0
JBCN barcelona 2017 kappa architecture 2.0
 
Meetup big data developers 2017 madrid - spark real use cases
Meetup big data developers 2017 madrid - spark real use casesMeetup big data developers 2017 madrid - spark real use cases
Meetup big data developers 2017 madrid - spark real use cases
 
Gdg cloud madrid 2017 - GDG kick off metuup
Gdg cloud madrid 2017  - GDG kick off metuupGdg cloud madrid 2017  - GDG kick off metuup
Gdg cloud madrid 2017 - GDG kick off metuup
 
Scalaua 2017 kyev kappa architecture 2.0
Scalaua 2017 kyev   kappa architecture 2.0Scalaua 2017 kyev   kappa architecture 2.0
Scalaua 2017 kyev kappa architecture 2.0
 
Icea 2017 big data - recursos humanos
Icea 2017   big data - recursos humanosIcea 2017   big data - recursos humanos
Icea 2017 big data - recursos humanos
 
Gdg cloud london 2017 kappa architecture 2.0 copia
Gdg cloud london 2017   kappa architecture 2.0 copiaGdg cloud london 2017   kappa architecture 2.0 copia
Gdg cloud london 2017 kappa architecture 2.0 copia
 
Datascience lab 2017 odessa kappa architecture 2.0
Datascience lab 2017 odessa   kappa architecture 2.0Datascience lab 2017 odessa   kappa architecture 2.0
Datascience lab 2017 odessa kappa architecture 2.0
 
Databeers madrid 2017 - Paas pigeons as a service
Databeers madrid 2017 - Paas pigeons as a serviceDatabeers madrid 2017 - Paas pigeons as a service
Databeers madrid 2017 - Paas pigeons as a service
 

Último

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Último (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

ASPgems - kappa architecture

  • 2. Who am I CDO ASPgems Former President of Hispalinux (Spanish LUG) Author “La Pastilla Roja” first spanish book about Free Software.
  • 3. Menu A little context about Kappa Architecture What’s Kappa Architecture What is not Kappa Architecture How we implement it Real use cases with KA
  • 4. A little context July 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar
  • 5. Who is Jay Kreps Jay has been involved in lots of projects: Author of the essay: The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013) https://engineering.linkedin.com/distributed-systems/log-what-every-software- engineer-should-know-about-real-time-datas-unifying
  • 6. Jay Kreps Author of the book: I ♥ Logs
  • 7. Jay Kreps Involved with projects as: Apache Kafka Apache Samza Voldemort Azkaban Ex-Linkedin Now co-founder and CEO of Confluent
  • 8. Lambda Architecture Look something like this: https://www.mapr.com/developercentral/lambda-architecture
  • 9. Lambda Architecture Batch layer that provides the following functionality managing the master dataset, an immutable, append-only set of raw data. pre-computing arbitrary query functions, called batch views. https://www.mapr.com/developercentral/lambda-architecture
  • 10. Lambda Architecture Serving layer This layer indexes the batch views so that they can be queried in ad hoc with low latency. Speed layer This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.
  • 11. Lambda Architecture batch layer datasets can be in a distributed filesystem, while MapReduce can be used to create batch views that can be fed to the serving layer. The serving layer can be implemented using NoSQL technologies such as HBase,Apache Druid, etc. Querying can be implemented by technologies such as Apache Drill or Impala Speed layer can be realized with data streaming technologies such as Apache Storm or Spark Streaming https://www.mapr.com/developercentral/lambda-architecture
  • 12. Pros of Lambda Architecture Retain the input data unchanged. Think about modeling data transformations, series of data states from the original input. Lambda architecture take in account the problem of reprocessing data. this happens all the time, the code will change, and you will need to reprocess all the information. Lots of reasons and you will need to live with this.
  • 13. Cons of Lambda Architecture Maintain the code that need to produce the same result from two complex distributed system is painful. Very different code for MapReduce and Storm/ Apache Spark Not only is about different code, is also about debugging and interaction with other products like (hive, Oozie, Cascading, etc) At the end is a problem about different and diverging programming paradigms.
  • 14. So what is Kappa Architecture The proposal of Jay Kreps is so simple: Use kafka (or other system) that will let you retain the full log of the data you need to reprocess. When you want to do the reprocessing, start a second instance of your stream processing job that starts processing from the beginning of the retained data, but direct this output data to a new output table.
  • 15. So what is Kappa Architecture part II When the second job has caught up, switch the application to read from the new table. Stop the old version of the job, and delete the old output table.
  • 16. So what is Kappa Architecture This architecture looks something like this:
  • 17. So what is Kappa Architecture The first benefit is that only you need to reprocessing only when you change the code. You can check if the new version is working ok and if not reverse to the old output table. You can mirror a Kafka topic to HDFS so you are not limited to the Kafka retention configuration. You have only a code to maintain with an unique framework.
  • 18. So what is Kappa Architecture The real advantage is not about efficiency at all (You will need extra temporarily storage when reprocessing for example) is allowing your team to develop, test, debug and operate their systems on top of a single processing framework.
  • 19. What is not Kappa Architecture Is not a silver bullet to solve every problem at Big Data. Is not a list of prescriptions of technologies. You can implement with your favorite frameworks. Is not a rigid set of rules. But helps to maintain the complex projects simple.
  • 20. How we use Kappa Architecture We start working with projects with a complex structure like Linkedin looks at early stage. That’s very usual.
  • 21. How we use Kappa Architecture
  • 22. How we use Kappa Architecture We try to refactoring the data flows to fix in a Kappa Architecture.
  • 23. How we use Kappa Architecture
  • 24. How we use Kappa Architecture We use Kafka as Stream Data Platform Instead of Samza we feel more comfortable with Spark Streaming. At ASPGems we choose Apache Spark as our Analytics Engine and not only for Spark Streaming.
  • 25. How we use Kappa Architecture At the end, Kappa Architecture is design pattern for us. We use/clone this pattern in almost our projects. We have projects of every size, volume of data or speed needing and fix with the Kappa Architecture.
  • 27. Telefónica - MSS We use KA to calculate near real time KPIs, SLAs related with the managed security system. We simplify the data flow of the input data. Kafka in the streaming data platform. As MPP we use CassandraDB.
  • 28. IOT - OBD II One of our clients install On Board Devices in the cars of its customers. We implement an API to got all the information in real time and inject the information in Kafka. The business rules are implemented in a CEP running into Apache Spark Streaming. As MPP we use Elastic Search.
  • 29. Insurance Company We implement Kappa Architecture to process click stream in real time and clustering users We show content and offers that better fix users
  • 30. Energy Facility We implement Kappa Architecture to process and predict energy consume. Our customer include energy storage systems and we got all the information about energy storage (ultra-capacitors and batteries). We process this information to calculate the effective lifetime of the components and its degradation.
  • 32. diciembre 2010 Thank you Juantomás García juantomas@aspgems.com @juantomas