SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Dean Wampler (Typesafe), Patrick Di Loreto (William Hill)
Cassandra, Spark
and Kafka:
The Streaming Data Troika
2
About Typesafe
Typesafe Reactive Platform
• Akka, Play, and Spark, for Scala and Java.
• typesafe.com/reactive-big-data
3
What’s Reactive?
Responsive
Elastic Resilient
Message Driven
4
About
Online Sportsbook and Gaming provider
• Every day we push more than 5
millions price changes
• 160TB of data flowing through our
platform each day
We're	
  Hiring

https://careers.williamhill.com
WH Apple Watch App Interactive Scoreboard Virtual Reality Horse Race

Oculus Rift
6
Big Data Circa 2010
7
Big Data Circa 2010
Generally two camps. One was the offline, batch-mode processing of massive data sets done with Hadoop.
8
Big Data Circa 2010
Akka
The other was the online, real-time processing and storage of data of “transactional” data at scale, as exemplified by Cassandra for the data store and middleware tools
and libraries like Akka, Spring, etc.
9
Big Data Circa 2010
Akka
?
Two camps together with some overlap and connectivity, but not a lot.
10
Big Data Circa 2015
11
Big Data Circa 2015
We still have this:
Akka
?
Five years later (this year), we still have these architectures in wide use, but…
12
Big Data Circa 2015
But now we have this:
Big Data
Streaming
Mesos, EC2, or Bare
A new, streaming-oriented architecture is emerging, which can also be used for batch mode analysis, if we process resident data sets as finite streams.
Topic A
General Principles
• Spark Streaming: Analytics/aggregations
• C*: Storage, queries
• Kafka: durable message store; allows
replay of messages lost downstream.
Spark Streaming provides rich analytics.

Need a durable system of record, like Kafka, which allows repeat reads in case of loss. See https://medium.com/@foundev/real-time-analytics-with-spark-streaming-and-
cassandra-2f90d03342f7 for a nice summary of design patterns and tips.
Mesos, EC2, or Bare Metal
14
Let’s explore this.
Mesos, EC2, or Bare Metal
15
Cassandra remains the flexible, scalable datastore suitable for scalable ingesting of streaming data, such as event streams (e.g., click streams from web apps) and logs.
Mesos, EC2, or Bare Metal
16
Kafka is growing popular as a tool for durable ingestion of diverse event streams with partitioning for scale and organization into topics (like a typical message queue) for
downstream consumers.
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N * M links ConsumersProducers
One use of Kafka is to solve the problem of N*M direct links between producers and consumers. This is hard to manage and it couples services to directly, which is
fragile when a given service needs to be scaled up through replication or replacement and sometimes in the protocol that both ends need to speak.
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N + M links ConsumersProducers
So Kafka can function as a central hub, yet it’s distributed and scalable so it isn’t a bottleneck or single point of failure.
n+5
n+4
n+3
n+2
n+1
n
Consumer 1
Producer 1
Producer 2
n+?
n+?
Consumer 2
Kafka Usage
Topic A
The message queue structure looks basically like this. Where different producers can write to append messages to a topic and different consumers can read the
messages in the queue at their own pace, in order.
Kafka Resiliency
Data loss downstream? Can replay lost
messages.
Could use C* for this, but then you’ve changed the read/write load (and hence tuning, scaling, etc. of your C* ring).
Mesos, EC2, or Bare Metal
21
The third element of the “troika” is Spark, the next generation, scalable compute engine that is replacing MapReduce in Hadoop. However, Spark is flexible enough to run
in many cluster configurations, including a local mode for development, a simple standalone cluster mode for simple scenarios, Mesos for general scalability and
flexibility, and integrated with Cassandra itself.
Topic A
Spark Streaming Dos/Don’ts
Do
• Use for rich analytics and aggregations.
• Use with Kafka/C* source if data loss not
tolerable. Or, use the write ahead log
(WAL) - less optimal.
Spark Streaming offers rich analytics, even SQL, machine learning, and graph representations. It’s a more complex engine, so there is more “room” for data loss. Hence,
use Kafka or C* for durability and replay capabilities, but if you do ingest data directly from other sources without replay capability, at least use the WAL.
Topic A
Spark Streaming Don’ts
Don’t
• Use for counting (use C*).
• Low-latency, per-event processing.
C* is faster and more accurate for counting, because repeat execution of Spark tasks (for error recovery, speculative execution, etc.) will cause over-counting (e.g., using
the “aggregator” feature). Also, Spark is a mini-batch system, for processing time slices of events (down to ~1 sec.). If you need low-latency and/or per-event processing,
use Akka…
Mesos, EC2, or Bare Metal
24
Other parts of complete infrastructure include a distributed file system like CSFv2, when you don’t need a full database, e.g., for logs that you’ll dump into the file system
and then process in batches later on with Spark.
Mesos, EC2, or Bare Metal
25
Typesafe Reactive Platform provides infrastructure tools for integrating these and other components, including Akka Streams for resilient, low-latency event processing
(based on the Reactive Streams standard for streams with dynamic back pressure), ConductR for orchestrating services, and Play for web services and consoles.
Topic A
Typesafe Reactive Platform
• Akka Streams: low-latency, per-event
processing.
• ConductR for orchestrating services.
• Play for web services, consoles.
• … and commercial Spark support.
Akka Streams implements the Reactive Streams standard for streams with dynamic back pressure. It sits on top of the more general Akka Actor framework for highly
distributed concurrent applications.

Typesafe offers commercial support for development teams developing advanced Spark applications. We offer production runtime support for Spark running on Mesos
clusters.
Mesos, EC2, or Bare Metal
27
Finally, there’s a wealth of cluster systems possible. You could deploy these tools on your servers for you Cassandra Ring, which has an excellent integration with Spark.
You can run in EC2 or bare metal. You can use a general-purpose cluster management system like Mesos.
Presented by Patrick Di Loreto
R&D Engineering Lead
Site: https://developer.williamhill.com
Twitter: https://twitter.com/patricknoir
OMNIA



Distributed & Reactive 

platform for data management

Motivations
29Omnia: Distributed & Reactive platform for data management
Users
Feeds
System
3	
  Party
In order to be in a position to innovate we need to control and
understand our data
Social	
  
Networks
IoT
William Hill
Need	
  for	
  control	
  over	
  the	
  data
DMP based on the Lambda architecture and the Reactive principles
What is Omnia?
30
Chronos
DataSource
NeoCortex
Speed Layer
Fates
Batch Layer
Hermes
ServingLayer
Data Flow
Input Output
Omnia: Distributed & Reactive platform for data management
Lambda	
  architecture	
  
Reactive principles
31
Responsive
Resilient
Message Driven
Elastic
The Reactive Manifesto
http://www.reactivemanifesto.org/
Omnia: Distributed & Reactive platform for data management
Reactive	
  Manifesto
Chronos is a reliable and scalable component which collect data from different
sources and organize them into Streams of observable events.
Chronos: Data acquisition
32
Incident: {
type: “bet”,
version: “1.0”,
time: “2015-09-03 06:00:10”,
acquisitionTime: “2015-09-03 06:00:06”,
source: “BetSystem”,
payload: {…. Any valid JSON}
}
Omnia: Distributed & Reactive platform for data management
Chronos
DataSource
TCP
HTTP
WS
…
JMS
HTTP
Poll
SSE
Adapter
Streams
Converter Persistence
BetsDeposits	
  Prices
Stream = Adapter + Converter + Persistence
Chronos: Data acquisition
33Omnia: Distributed & Reactive platform for data management
Chronos 1
(SSE, Bets placed)
Chronos 2
(JMS, Deposits)
Chronos 3
(HTTP, Events)
Chronos N
(SSE, Twitter)
….…
Chronos 2
(JMS, Deposits)
(SSE, Bet Placed)
High throughput distributed messaging system
• Highly Availability
• Efficiency
• Durable
Chronos: Why Kafka
Kafka	
  is	
  a	
  high-­‐throughput	
  distributed	
  messaging	
  system	
  
Design	
  Principles:	
  
Highly	
  Available:	
  Replicated	
  Distributed	
  
High	
  throughput:	
  Stateless	
  Broker	
  
Efficiency:	
  	
  
Disk	
  Efficiency	
  :	
  “Don’t	
  fear	
  the	
  file	
  system”	
  –	
  modern	
  OSs	
  optimize	
  sequential	
  disk	
  operations/disk	
  caching	
  strategy	
  
Usage	
  of	
  OS	
  filesystem	
  cache	
  rather	
  than	
  application	
  level	
  cache:	
  
More	
  efficient	
  (no	
  usage	
  of	
  GC)	
  
Survive	
  on	
  application	
  restart	
  
I/O	
  Efficiency	
  :	
  Batching	
  –	
  Reduces	
  small	
  I/O	
  operations,	
  this	
  mortize	
  network	
  roundtrip	
  overhead,	
  enhance	
  larger	
  sequential	
  disk	
  operations	
  
Durable	
  
Fates represents the long term memory of Omnia. It organizes the incidents that
Chronos collected into timelines and also elaborates new information as views by
using machine learning, logical reasoning and time series analysis.
Fates: Batch layer
35Omnia: Distributed & Reactive platform for data management
Customer: 123
Login
Deposit
Bet placed
…
Logout
Event: 78
Started
Fault
Penalty
…
Goal
Timelines & Views
Bets Deposits	
  Events Session
Fates
Batch Layer
Fates: Batch layer
36Omnia: Distributed & Reactive platform for data management
Timelines
Views
Jobs
Fates
Fates: Cassandra
Cassandra is the long term storage for our data.
• Highly Available (CAP)
• Linear Scalability
• Multi DC – Separation of Concerns (Production and Analytic DCs)
• High performance and optimal for WRITE operations
NeoCortex represents the short term memory of Omnia. It offers a framework to
develop micro services on top of Apache Spark. It performs fast and real time data
processing with the data acquired from Chronos and Fates.
NeoCortex: Speed layer
38Omnia: Distributed & Reactive platform for data management
NeoCortex
BetsDeposits	
  EventsSession
Micro Services
Output
Hermes is a scalable and full duplex communication for B2C and B2B.
Hermes: Serving Layer
39Omnia: Distributed & Reactive platform for data management
B2C
Browser
B2B
Loadbalancer
Push
Server
Distribute
Cache
Push
Server
Push
Server
…
TCP
WS
HTTP
JSAPI
WH
Apps
Cache
Cache
Apps
Custom advert, bonus, data load prediction, bot detection...
Omnia Data Flow
40
Chronos
DataSource
NeoCortex
Speed Layer
Fates
Batch Layer
Hermes
ServingLayer
Input Output
Omnia: Distributed & Reactive platform for data management
Users become a new data producer
Real time monitoring and elasticity
Docker and Mesos: Scale In&Out based on demand,
Omnia on Omnia
41
Chronos
DataSource
NeoCortex
Speed Layer
Fates
Batch Layer
Hermes
ServingLayer
Input Output
Omnia: Distributed & Reactive platform for data management
JMX
JMX
JMX
Omnia infrastructure
42Omnia: Distributed & Reactive platform for data management
Omnia
Docker
Marathon
Mesos
Node Node NodeNodeNode
Thank you
careers.williamhillplc.com
omnia.williamhill.com/
`typesafe.com/reactive-big-data

Más contenido relacionado

La actualidad más candente

Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayJacob Park
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackAnirvan Chakraborty
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016 Hiromitsu Komatsu
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionLightbend
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Helena Edelson
 

La actualidad más candente (20)

Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 

Destacado

[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...Camp de Bases (Webedia Data Services)
 
Cassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache CassandraCassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache CassandraDataStax Academy
 
Cassandra Basics: Indexing
Cassandra Basics: IndexingCassandra Basics: Indexing
Cassandra Basics: IndexingBenjamin Black
 
The DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms ExplainedThe DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms ExplainedEddy Widerker
 
Achieving High Load in Advertising Technology
Achieving High Load in Advertising TechnologyAchieving High Load in Advertising Technology
Achieving High Load in Advertising TechnologyPeter Milne
 

Destacado (6)

[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
[Infographie] Comment ameliorer la qualité de vos données pour votre DMP mark...
 
Cassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache CassandraCassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache Cassandra
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
 
Cassandra Basics: Indexing
Cassandra Basics: IndexingCassandra Basics: Indexing
Cassandra Basics: Indexing
 
The DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms ExplainedThe DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms Explained
 
Achieving High Load in Advertising Technology
Achieving High Load in Advertising TechnologyAchieving High Load in Advertising Technology
Achieving High Load in Advertising Technology
 

Similar a Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data Troika

kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfPriyamTomar1
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For BeginnersRiby Varghese
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Streaming Data with Apache Kafka
Streaming Data with Apache KafkaStreaming Data with Apache Kafka
Streaming Data with Apache KafkaMarkus Günther
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla Maheedhar Gunturu
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingPalani Kumar
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedEdureka!
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 

Similar a Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data Troika (20)

kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Streaming Data with Apache Kafka
Streaming Data with Apache KafkaStreaming Data with Apache Kafka
Streaming Data with Apache Kafka
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 

Más de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Más de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data Troika

  • 1. Dean Wampler (Typesafe), Patrick Di Loreto (William Hill) Cassandra, Spark and Kafka: The Streaming Data Troika
  • 2. 2 About Typesafe Typesafe Reactive Platform • Akka, Play, and Spark, for Scala and Java. • typesafe.com/reactive-big-data
  • 4. 4 About Online Sportsbook and Gaming provider • Every day we push more than 5 millions price changes • 160TB of data flowing through our platform each day
  • 5. We're  Hiring
 https://careers.williamhill.com WH Apple Watch App Interactive Scoreboard Virtual Reality Horse Race
 Oculus Rift
  • 7. 7 Big Data Circa 2010 Generally two camps. One was the offline, batch-mode processing of massive data sets done with Hadoop.
  • 8. 8 Big Data Circa 2010 Akka The other was the online, real-time processing and storage of data of “transactional” data at scale, as exemplified by Cassandra for the data store and middleware tools and libraries like Akka, Spring, etc.
  • 9. 9 Big Data Circa 2010 Akka ? Two camps together with some overlap and connectivity, but not a lot.
  • 11. 11 Big Data Circa 2015 We still have this: Akka ? Five years later (this year), we still have these architectures in wide use, but…
  • 12. 12 Big Data Circa 2015 But now we have this: Big Data Streaming Mesos, EC2, or Bare A new, streaming-oriented architecture is emerging, which can also be used for batch mode analysis, if we process resident data sets as finite streams.
  • 13. Topic A General Principles • Spark Streaming: Analytics/aggregations • C*: Storage, queries • Kafka: durable message store; allows replay of messages lost downstream. Spark Streaming provides rich analytics. Need a durable system of record, like Kafka, which allows repeat reads in case of loss. See https://medium.com/@foundev/real-time-analytics-with-spark-streaming-and- cassandra-2f90d03342f7 for a nice summary of design patterns and tips.
  • 14. Mesos, EC2, or Bare Metal 14 Let’s explore this.
  • 15. Mesos, EC2, or Bare Metal 15 Cassandra remains the flexible, scalable datastore suitable for scalable ingesting of streaming data, such as event streams (e.g., click streams from web apps) and logs.
  • 16. Mesos, EC2, or Bare Metal 16 Kafka is growing popular as a tool for durable ingestion of diverse event streams with partitioning for scale and organization into topics (like a typical message queue) for downstream consumers.
  • 17. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers One use of Kafka is to solve the problem of N*M direct links between producers and consumers. This is hard to manage and it couples services to directly, which is fragile when a given service needs to be scaled up through replication or replacement and sometimes in the protocol that both ends need to speak.
  • 18. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers So Kafka can function as a central hub, yet it’s distributed and scalable so it isn’t a bottleneck or single point of failure.
  • 19. n+5 n+4 n+3 n+2 n+1 n Consumer 1 Producer 1 Producer 2 n+? n+? Consumer 2 Kafka Usage Topic A The message queue structure looks basically like this. Where different producers can write to append messages to a topic and different consumers can read the messages in the queue at their own pace, in order.
  • 20. Kafka Resiliency Data loss downstream? Can replay lost messages. Could use C* for this, but then you’ve changed the read/write load (and hence tuning, scaling, etc. of your C* ring).
  • 21. Mesos, EC2, or Bare Metal 21 The third element of the “troika” is Spark, the next generation, scalable compute engine that is replacing MapReduce in Hadoop. However, Spark is flexible enough to run in many cluster configurations, including a local mode for development, a simple standalone cluster mode for simple scenarios, Mesos for general scalability and flexibility, and integrated with Cassandra itself.
  • 22. Topic A Spark Streaming Dos/Don’ts Do • Use for rich analytics and aggregations. • Use with Kafka/C* source if data loss not tolerable. Or, use the write ahead log (WAL) - less optimal. Spark Streaming offers rich analytics, even SQL, machine learning, and graph representations. It’s a more complex engine, so there is more “room” for data loss. Hence, use Kafka or C* for durability and replay capabilities, but if you do ingest data directly from other sources without replay capability, at least use the WAL.
  • 23. Topic A Spark Streaming Don’ts Don’t • Use for counting (use C*). • Low-latency, per-event processing. C* is faster and more accurate for counting, because repeat execution of Spark tasks (for error recovery, speculative execution, etc.) will cause over-counting (e.g., using the “aggregator” feature). Also, Spark is a mini-batch system, for processing time slices of events (down to ~1 sec.). If you need low-latency and/or per-event processing, use Akka…
  • 24. Mesos, EC2, or Bare Metal 24 Other parts of complete infrastructure include a distributed file system like CSFv2, when you don’t need a full database, e.g., for logs that you’ll dump into the file system and then process in batches later on with Spark.
  • 25. Mesos, EC2, or Bare Metal 25 Typesafe Reactive Platform provides infrastructure tools for integrating these and other components, including Akka Streams for resilient, low-latency event processing (based on the Reactive Streams standard for streams with dynamic back pressure), ConductR for orchestrating services, and Play for web services and consoles.
  • 26. Topic A Typesafe Reactive Platform • Akka Streams: low-latency, per-event processing. • ConductR for orchestrating services. • Play for web services, consoles. • … and commercial Spark support. Akka Streams implements the Reactive Streams standard for streams with dynamic back pressure. It sits on top of the more general Akka Actor framework for highly distributed concurrent applications. Typesafe offers commercial support for development teams developing advanced Spark applications. We offer production runtime support for Spark running on Mesos clusters.
  • 27. Mesos, EC2, or Bare Metal 27 Finally, there’s a wealth of cluster systems possible. You could deploy these tools on your servers for you Cassandra Ring, which has an excellent integration with Spark. You can run in EC2 or bare metal. You can use a general-purpose cluster management system like Mesos.
  • 28. Presented by Patrick Di Loreto R&D Engineering Lead Site: https://developer.williamhill.com Twitter: https://twitter.com/patricknoir OMNIA
 
 Distributed & Reactive 
 platform for data management

  • 29. Motivations 29Omnia: Distributed & Reactive platform for data management Users Feeds System 3  Party In order to be in a position to innovate we need to control and understand our data Social   Networks IoT William Hill Need  for  control  over  the  data
  • 30. DMP based on the Lambda architecture and the Reactive principles What is Omnia? 30 Chronos DataSource NeoCortex Speed Layer Fates Batch Layer Hermes ServingLayer Data Flow Input Output Omnia: Distributed & Reactive platform for data management Lambda  architecture  
  • 31. Reactive principles 31 Responsive Resilient Message Driven Elastic The Reactive Manifesto http://www.reactivemanifesto.org/ Omnia: Distributed & Reactive platform for data management Reactive  Manifesto
  • 32. Chronos is a reliable and scalable component which collect data from different sources and organize them into Streams of observable events. Chronos: Data acquisition 32 Incident: { type: “bet”, version: “1.0”, time: “2015-09-03 06:00:10”, acquisitionTime: “2015-09-03 06:00:06”, source: “BetSystem”, payload: {…. Any valid JSON} } Omnia: Distributed & Reactive platform for data management Chronos DataSource TCP HTTP WS … JMS HTTP Poll SSE Adapter Streams Converter Persistence BetsDeposits  Prices Stream = Adapter + Converter + Persistence
  • 33. Chronos: Data acquisition 33Omnia: Distributed & Reactive platform for data management Chronos 1 (SSE, Bets placed) Chronos 2 (JMS, Deposits) Chronos 3 (HTTP, Events) Chronos N (SSE, Twitter) ….… Chronos 2 (JMS, Deposits) (SSE, Bet Placed)
  • 34. High throughput distributed messaging system • Highly Availability • Efficiency • Durable Chronos: Why Kafka Kafka  is  a  high-­‐throughput  distributed  messaging  system   Design  Principles:   Highly  Available:  Replicated  Distributed   High  throughput:  Stateless  Broker   Efficiency:     Disk  Efficiency  :  “Don’t  fear  the  file  system”  –  modern  OSs  optimize  sequential  disk  operations/disk  caching  strategy   Usage  of  OS  filesystem  cache  rather  than  application  level  cache:   More  efficient  (no  usage  of  GC)   Survive  on  application  restart   I/O  Efficiency  :  Batching  –  Reduces  small  I/O  operations,  this  mortize  network  roundtrip  overhead,  enhance  larger  sequential  disk  operations   Durable  
  • 35. Fates represents the long term memory of Omnia. It organizes the incidents that Chronos collected into timelines and also elaborates new information as views by using machine learning, logical reasoning and time series analysis. Fates: Batch layer 35Omnia: Distributed & Reactive platform for data management Customer: 123 Login Deposit Bet placed … Logout Event: 78 Started Fault Penalty … Goal Timelines & Views Bets Deposits  Events Session Fates Batch Layer
  • 36. Fates: Batch layer 36Omnia: Distributed & Reactive platform for data management Timelines Views Jobs Fates
  • 37. Fates: Cassandra Cassandra is the long term storage for our data. • Highly Available (CAP) • Linear Scalability • Multi DC – Separation of Concerns (Production and Analytic DCs) • High performance and optimal for WRITE operations
  • 38. NeoCortex represents the short term memory of Omnia. It offers a framework to develop micro services on top of Apache Spark. It performs fast and real time data processing with the data acquired from Chronos and Fates. NeoCortex: Speed layer 38Omnia: Distributed & Reactive platform for data management NeoCortex BetsDeposits  EventsSession Micro Services Output
  • 39. Hermes is a scalable and full duplex communication for B2C and B2B. Hermes: Serving Layer 39Omnia: Distributed & Reactive platform for data management B2C Browser B2B Loadbalancer Push Server Distribute Cache Push Server Push Server … TCP WS HTTP JSAPI WH Apps Cache Cache Apps
  • 40. Custom advert, bonus, data load prediction, bot detection... Omnia Data Flow 40 Chronos DataSource NeoCortex Speed Layer Fates Batch Layer Hermes ServingLayer Input Output Omnia: Distributed & Reactive platform for data management Users become a new data producer
  • 41. Real time monitoring and elasticity Docker and Mesos: Scale In&Out based on demand, Omnia on Omnia 41 Chronos DataSource NeoCortex Speed Layer Fates Batch Layer Hermes ServingLayer Input Output Omnia: Distributed & Reactive platform for data management JMX JMX JMX
  • 42. Omnia infrastructure 42Omnia: Distributed & Reactive platform for data management Omnia Docker Marathon Mesos Node Node NodeNodeNode