SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
FAST DATA
SMACK DOWN
Data Done Right
Created by /Stefan Siprell @stefansiprell
SMACK?
Spark
Mesos
Akka
Cassandra
Kafka
Spark
Swiss Army Knife for Data
ETL Jobs? No problem.
µ-Batching on Streams? No problem.
SQL and Joins on non-RDBMS? No problem.
Graph Operations on non-Graphs? No problem.
Map/Reduce in super fast? Thanks.
Mesos
Distributed Kernel for the
Cloud
Links machines to a logical instance
Static deployment of Mesos
Dynamic deployment of the workload
Good integration with Hadoop, Kafka, Spark, and Akka
Lots and lots of machines - data-centers
Akka
Framework for reactive
applications
Highly performant
Simple concurrency via asynchronous processing
elastic and without single point of failure
resilient
Lots and lots of performance - 50 million messages per machine
and second
Cassandra
Performant and Always-Up No-
SQL Database
Linear scaling - approx. 10'000 requests per machine and second
No Downtime
Comfort of a column index with append-only performance
Data-Safety over multiple data-centers
Strong in denormalized models
Kafka
Messaging for Big Data
Fast - Delivers hundreds of MegaBytes per second to 1000s of
clients
Scales - Partitions data to manageable volumes
Data-Safety - Append Only allows buffering of TBs without a
performance impact
Distributed - from the ground up
In the beginning there was
HADOOP
Big-Batch saw
that this was
good
und has mapped and reduced ever since
But Business does not wait.
It always demands more...
ever faster
Way too fast
Realtime Processing is For
High Frequency Trading
Power Grid Monitoring
Data-Center Monitoring
is based on one-thread-per-core, cache-optimized, memory-barrier
ring-buffer, is lossy and work on a very limited context-footprint.
Between Batch and Realtime...
...lies a new Sweet Spot
SMACK
Updating News Pages
User Classification
(Business) Realtime Bidding for Advertising
Automative IoT
Industry 4.0
What do we
need?
Reliable Ingestion
Flexible Storage und Query Alternatives
Sophisticated Analysis Features
Management Out-Of-the-Box
What is SMACK -
Part II
Architecture Toolbox
Best-of-Breed Platform
Also a Product
µ-Batching
When do I need this?
I don't want to react on individual events.
I want to establish a context.
Abuse Classification
User Classification
Site Classification
How does it work?
Spark Streaming collects events und generates RDDs out of
windows.
Uses Kafka, Databases, Akka or streams as input
Windows can be flushed to persistent storage
Windows can be queried / modified per SQL In-Memory/Disk
Cascades of aggregations / classifications can be assembled /
flushed
What about λ-Architectures?
Spark Operations cab be run unaltered in either batch or stream
mode - it is always an RDD!
I need Realtime!
The Bot need to stop!
Which ad do we want to deliver?
Which up-sell offer shall I show??
Using Akka I can react to individual events. With Spark and
Cassandra I have two quick data-stores to establish a sufficient large
context.
3 Streams of
Happiness
Direct streams between Kafka and Spark.
Raw streams for TCP, UDP connections, Files, etc.
Reactive streams for back-pressure support.
Kafka can translate between raw und reactive streams.
Backpressure?
During Peak Times, the amount of incoming data may massively
exceed the capacity - just think of IoT. The back-pressure in the
processing pipelines needs to be actively managed, otherwise data is
lost.
to be continued
Flow
If a an event needs specific handling - a reaction - it needs to be dealt
with in Akka.
Why Kafka?
Append Only:Consumer may be offline for days.
Broker:Reduction of point-to-point connections.
Router:Routing of Streams including (de-)multiplexing.
Buffer:Compensate short overloads.
Replay:Broken Algorithm on incorrect data? Redeploy and Replay!
Exactly Once?
Whoever demands aexactly-once runtime, has no clue of
distributed systems.
Kafka supports at-least once. Make your business-logic idempotent.
How to deal with repetitive requests is a requirement.
Cloud? Bare
Metal?
Bare Metal is possible.
Cloud offers more redundancy and elasticity.
Mesos requires no virtualization oder containerization. Big Data
tools can run natively on the host OS.. The workload defines the
cluster setup.
Streams
... are an unbound and continuous sequence of events. Throughput
and end are not defined.
Conventional
Streams
Streams run over long periods of time and have a threatening
fluctuation in load.
Buffer can compensate peaks. Unbound Buffer load to fatal
problems, once the available memory is exhausted.
Bound Buffer can react in different ways to exhausted memory. FIFO
Drop? FILO Drop? Reduced Sampling?
Reactive Streams
If a consumer cannot cope with the load or bind the buffer, it falls
back from PUSH to PULL.
This fall-back may propagate its way against the pipeline to the
source.
The source is the last-line of defense and needs to deal with the
problem.
SMACK Reactive
Streams
Akka implements Reactive Streams.
Spark Streaming 1.5 implements Reactive Streams.
Spark Streaming 1.6 allows it's clients to use Reactive Stream as a
Protocol.
Kafka is a perfect candidate of a bound buffer for streams - the last
line of defense.
Mesos can scale up consumers on-the-fly during the fall-back.
Functional?
Streams love to be processed in parallel. Streams love Scala!
Events therefore need to be immutable - nobody likes production-
only concurrency issues.
Functions do not have side-effects - do not track state between
function calls!
Functions need to be 1st class citizens - maximize reuse of code.
Reuse?
sparkContext.textFile("/path/to/input")
.map { line =>
val array = line.split(", ", 2)
(array(0), array(1))
}.flatMap {
case (id,contents) => toWords(contents).map(w => ((w, id), 1))
}.reduceByKey {
(count1, count2) => count1 + count2
}.map {
case ((word, path), n) => (word, (path, n))
}.groupByKey
.map {
case(word, list) => (word, sortByCount(list))
}.saveAsTextFile("/path/to/output")
‫شكرا‬
Thank You

Más contenido relacionado

La actualidad más candente

2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 

La actualidad más candente (20)

Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 

Destacado

Destacado (17)

Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Real-time Personal Trainer on the SMACK Stack
Real-time Personal Trainer on the SMACK StackReal-time Personal Trainer on the SMACK Stack
Real-time Personal Trainer on the SMACK Stack
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
SMACK Stack @ Nitro
SMACK Stack @ NitroSMACK Stack @ Nitro
SMACK Stack @ Nitro
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 

Similar a SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai

TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
DataStax Academy
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
sandeep_tata
 

Similar a SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai (20)

BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
Stefano Rocco, Roberto Bentivoglio - Scala in increasingly demanding environm...
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
Scala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZScala in increasingly demanding environments - DATABIZ
Scala in increasingly demanding environments - DATABIZ
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Shifting gear reactive, a reactive story in banking
Shifting gear reactive, a reactive story in bankingShifting gear reactive, a reactive story in banking
Shifting gear reactive, a reactive story in banking
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Getting Started with Spark Streaming
Getting Started with Spark StreamingGetting Started with Spark Streaming
Getting Started with Spark Streaming
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
How kafka is transforming hadoop, spark & storm
How kafka is transforming hadoop, spark & stormHow kafka is transforming hadoop, spark & storm
How kafka is transforming hadoop, spark & storm
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 

Más de Codemotion Dubai

Más de Codemotion Dubai (11)

Embrace chatOps, stop installing deployment software by Geshan Manandhar at C...
Embrace chatOps, stop installing deployment software by Geshan Manandhar at C...Embrace chatOps, stop installing deployment software by Geshan Manandhar at C...
Embrace chatOps, stop installing deployment software by Geshan Manandhar at C...
 
Microservices for Mortals by Bert Ertman at Codemotion Dubai
 Microservices for Mortals by Bert Ertman at Codemotion Dubai Microservices for Mortals by Bert Ertman at Codemotion Dubai
Microservices for Mortals by Bert Ertman at Codemotion Dubai
 
Patterns and practices for real-world event-driven microservices by Rachel Re...
Patterns and practices for real-world event-driven microservices by Rachel Re...Patterns and practices for real-world event-driven microservices by Rachel Re...
Patterns and practices for real-world event-driven microservices by Rachel Re...
 
Dockerize it: stop living in the past and embrace the future by Alex Nadalin
 Dockerize it: stop living in the past and embrace the future by Alex Nadalin Dockerize it: stop living in the past and embrace the future by Alex Nadalin
Dockerize it: stop living in the past and embrace the future by Alex Nadalin
 
Modelling complex game economy with Neo4j by Yan Cui at Codemotion Dubai
Modelling complex game economy with Neo4j by Yan Cui at Codemotion DubaiModelling complex game economy with Neo4j by Yan Cui at Codemotion Dubai
Modelling complex game economy with Neo4j by Yan Cui at Codemotion Dubai
 
Chaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
Chaos Testing with F# and Azure by Rachel Reese at Codemotion DubaiChaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
Chaos Testing with F# and Azure by Rachel Reese at Codemotion Dubai
 
Getting Developers hooked on your API by Nicolas Garnier at Codemotion Dubai
Getting Developers hooked on your API by Nicolas Garnier at Codemotion DubaiGetting Developers hooked on your API by Nicolas Garnier at Codemotion Dubai
Getting Developers hooked on your API by Nicolas Garnier at Codemotion Dubai
 
F# in social gaming by Yan Cui at Codemotion Dubai
F# in social gaming by Yan Cui at Codemotion DubaiF# in social gaming by Yan Cui at Codemotion Dubai
F# in social gaming by Yan Cui at Codemotion Dubai
 
Building event driven serverless apps by Danilo Poccia at Codemotion Dubai
Building event driven serverless apps by Danilo Poccia at Codemotion DubaiBuilding event driven serverless apps by Danilo Poccia at Codemotion Dubai
Building event driven serverless apps by Danilo Poccia at Codemotion Dubai
 
Javascript leverage: Isomorphic Applications by Luciano Colosio at Codemotion...
Javascript leverage: Isomorphic Applications by Luciano Colosio at Codemotion...Javascript leverage: Isomorphic Applications by Luciano Colosio at Codemotion...
Javascript leverage: Isomorphic Applications by Luciano Colosio at Codemotion...
 
Demystifying the 3D Web by Pietro Grandi @ Codemotion Dubai 2016
Demystifying the 3D Web by Pietro Grandi @ Codemotion Dubai 2016Demystifying the 3D Web by Pietro Grandi @ Codemotion Dubai 2016
Demystifying the 3D Web by Pietro Grandi @ Codemotion Dubai 2016
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai

  • 1. FAST DATA SMACK DOWN Data Done Right Created by /Stefan Siprell @stefansiprell
  • 3. Spark Swiss Army Knife for Data ETL Jobs? No problem. µ-Batching on Streams? No problem. SQL and Joins on non-RDBMS? No problem. Graph Operations on non-Graphs? No problem. Map/Reduce in super fast? Thanks.
  • 4. Mesos Distributed Kernel for the Cloud Links machines to a logical instance Static deployment of Mesos Dynamic deployment of the workload Good integration with Hadoop, Kafka, Spark, and Akka Lots and lots of machines - data-centers
  • 5. Akka Framework for reactive applications Highly performant Simple concurrency via asynchronous processing elastic and without single point of failure resilient Lots and lots of performance - 50 million messages per machine and second
  • 6. Cassandra Performant and Always-Up No- SQL Database Linear scaling - approx. 10'000 requests per machine and second No Downtime Comfort of a column index with append-only performance Data-Safety over multiple data-centers Strong in denormalized models
  • 7. Kafka Messaging for Big Data Fast - Delivers hundreds of MegaBytes per second to 1000s of clients Scales - Partitions data to manageable volumes Data-Safety - Append Only allows buffering of TBs without a performance impact Distributed - from the ground up
  • 8. In the beginning there was HADOOP
  • 9. Big-Batch saw that this was good und has mapped and reduced ever since
  • 10. But Business does not wait. It always demands more... ever faster
  • 11. Way too fast Realtime Processing is For High Frequency Trading Power Grid Monitoring Data-Center Monitoring is based on one-thread-per-core, cache-optimized, memory-barrier ring-buffer, is lossy and work on a very limited context-footprint.
  • 12. Between Batch and Realtime... ...lies a new Sweet Spot SMACK Updating News Pages User Classification (Business) Realtime Bidding for Advertising Automative IoT Industry 4.0
  • 13. What do we need? Reliable Ingestion Flexible Storage und Query Alternatives Sophisticated Analysis Features Management Out-Of-the-Box
  • 14. What is SMACK - Part II Architecture Toolbox Best-of-Breed Platform Also a Product
  • 15. µ-Batching When do I need this? I don't want to react on individual events. I want to establish a context. Abuse Classification User Classification Site Classification
  • 16. How does it work? Spark Streaming collects events und generates RDDs out of windows. Uses Kafka, Databases, Akka or streams as input Windows can be flushed to persistent storage Windows can be queried / modified per SQL In-Memory/Disk Cascades of aggregations / classifications can be assembled / flushed
  • 17. What about λ-Architectures? Spark Operations cab be run unaltered in either batch or stream mode - it is always an RDD!
  • 18. I need Realtime! The Bot need to stop! Which ad do we want to deliver? Which up-sell offer shall I show?? Using Akka I can react to individual events. With Spark and Cassandra I have two quick data-stores to establish a sufficient large context.
  • 19. 3 Streams of Happiness Direct streams between Kafka and Spark. Raw streams for TCP, UDP connections, Files, etc. Reactive streams for back-pressure support. Kafka can translate between raw und reactive streams.
  • 20. Backpressure? During Peak Times, the amount of incoming data may massively exceed the capacity - just think of IoT. The back-pressure in the processing pipelines needs to be actively managed, otherwise data is lost. to be continued
  • 21. Flow If a an event needs specific handling - a reaction - it needs to be dealt with in Akka. Why Kafka? Append Only:Consumer may be offline for days. Broker:Reduction of point-to-point connections. Router:Routing of Streams including (de-)multiplexing. Buffer:Compensate short overloads. Replay:Broken Algorithm on incorrect data? Redeploy and Replay!
  • 22. Exactly Once? Whoever demands aexactly-once runtime, has no clue of distributed systems. Kafka supports at-least once. Make your business-logic idempotent. How to deal with repetitive requests is a requirement.
  • 23. Cloud? Bare Metal? Bare Metal is possible. Cloud offers more redundancy and elasticity. Mesos requires no virtualization oder containerization. Big Data tools can run natively on the host OS.. The workload defines the
  • 25. Streams ... are an unbound and continuous sequence of events. Throughput and end are not defined.
  • 26. Conventional Streams Streams run over long periods of time and have a threatening fluctuation in load. Buffer can compensate peaks. Unbound Buffer load to fatal problems, once the available memory is exhausted. Bound Buffer can react in different ways to exhausted memory. FIFO Drop? FILO Drop? Reduced Sampling?
  • 27. Reactive Streams If a consumer cannot cope with the load or bind the buffer, it falls back from PUSH to PULL. This fall-back may propagate its way against the pipeline to the source. The source is the last-line of defense and needs to deal with the problem.
  • 28. SMACK Reactive Streams Akka implements Reactive Streams. Spark Streaming 1.5 implements Reactive Streams. Spark Streaming 1.6 allows it's clients to use Reactive Stream as a Protocol. Kafka is a perfect candidate of a bound buffer for streams - the last line of defense.
  • 29. Mesos can scale up consumers on-the-fly during the fall-back. Functional? Streams love to be processed in parallel. Streams love Scala! Events therefore need to be immutable - nobody likes production- only concurrency issues. Functions do not have side-effects - do not track state between function calls! Functions need to be 1st class citizens - maximize reuse of code.
  • 30. Reuse? sparkContext.textFile("/path/to/input") .map { line => val array = line.split(", ", 2) (array(0), array(1)) }.flatMap { case (id,contents) => toWords(contents).map(w => ((w, id), 1)) }.reduceByKey { (count1, count2) => count1 + count2 }.map { case ((word, path), n) => (word, (path, n)) }.groupByKey .map { case(word, list) => (word, sortByCount(list)) }.saveAsTextFile("/path/to/output")