SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Cassandra &
Next Generation Analysis
Cassandra for a high-velocity data
ingestion and real-time analysis system.
Ameet Chaubal & Fausto Inestroza
Presentation Route
• Describe	
  conven,onal	
  technology	
  
solu,on	
  
• Highlight	
  deficiencies	
  
• Showcase	
  new	
  solu,on	
  
implemented	
  using	
  Cassandra	
  
• Layout	
  architecture	
  with	
  
improvements	
  
Business Case
•  Capture messages from high-volume e-
Commerce site.
•  Store them into a database
•  Perform near real-time queries for
troubleshooting
•  Perform deeper analysis a la BI.
Olden Days …
JMS Queue
Transient
Storage
RDBMS
Data
warehouse
Analysis
eCommerce Website
Business Case, Details…
Messages: 5000 msg/sec
~ 250 million / day
Message size : 1 Kb
JMS Queue
Transient
Storage
RDBMS
Data
warehouse
eCommerce Website
Decouple UI from storage
Multiple sinks
Dedicated storage Triage
Data Analysis
Business Intelligence
What’s the problem?
JMS Queue
Data
warehouse
SITE I
SITE II
JMS Queue
•  Queue
Replication
problems
•  Message Loss
•  Other applications
affected in case of
failover
•  Triage data isolated
•  No universal view
•  Data Consolidation
adds delay
•  Inability to keep up
with increasing
messages
•  Analysis always
lagging the action
•  No low-latency
queries
Batch Load
Transient
storage
Problems Recap
• Over	
  5000	
  msg/sec	
  High	
  Write	
  Speed	
  
• Extrac9on	
  &	
  Load	
  very	
  slow	
  ETL	
  from	
  Transient	
  storage	
  to	
  Data	
  
warehouse	
  takes	
  over	
  4	
  hours	
  
• Analysis	
  always	
  lags	
  events	
  by	
  hours	
  ETL	
  performed	
  in	
  batches	
  4	
  hours	
  apart	
  
• No	
  high	
  availability	
  No	
  Geo-­‐Redundancy	
  for	
  Transient	
  
Storage	
  
• Data	
  stored	
  in	
  disparate	
  buckets	
  No	
  Universal	
  view	
  of	
  data	
  for	
  “Triage”	
  
applica9ons/troubleshoo9ng	
  
• No	
  dashboard	
  	
  No	
  low-­‐latency	
  queries	
  
•  No	
  immediate	
  alert,	
  paRern	
  detec9on	
  No	
  real-­‐9me	
  analysis	
  
Thrift
Connection
Pool
Online e-Commerce
Application
Event JMS
A3
Load
Balancer
VIP
A6A5
Replication
Consumers
Hector /
Java Client -1
Hector /
Java Client -2
Hector /
Java Client -n
JMS
Publisher
A1
A2
Cassandra
A7
A4
Write
event to
queue
Fetch
from
queue
Cassandra + Hadoop
A8
Map/Reduce
Hive Queries/
BI
Real-Time
Dashboard
A9
A10
A12
Solution Blueprint
Role of Data Model
Before we get there,
what features are missing from Cassandra in
comparison to traditional RDBMS
Shortcomings… Opportunities
•  No Joins across Column Families
•  No analytical functions such as sum, count…
•  Difficulty in constructing “WHERE” clause
predicates across composite columns
•  Inability to order range of Keys in Random
Partitioner
Importance of Data model - Cassandra
•  In lieu of JOINS, “smart” de-normalization techniques
are crucial.
•  Need to use “FEATURES” of Cassandra to effectively
model the business rules and business data
•  “Client” or “Application” code becomes extremely
important.
•  “APPLICATION” + “DATABASE” => Full Package
Features of Cassandra Modeling
•  “WIDE” Column Family
–  Organize data in “horizontal” as opposed to “vertical” fashion as in RDBMS
•  Automatic Sorting of Columns
–  Important to “MODEL” the data in “COLUMNS” as opposed to rows.
•  Faster Access to ALL COLUMNS of a Row Key
–  All columns of a row key stored on ONE server =>fast iteration/aggregations
•  Useful info in “COLUMN NAME”
–  Ground breaking from RDBMS perspective
–  Enables “MORE” “INFORMATION” to be PACKED
–  “COLUMN” as entity becomes “MORE POWERFUL”.
•  COMPOSITE Column NAMES:
–  Column names can be COMPOSITES !!! Made up of multiple columns
–  Auto sorting still works
Data Model
Wide	
  rows	
  with	
  sharding	
  
Row	
  Key	
  =	
  “<min>|<part#>”	
  
Role	
  of	
  par99on	
  #:	
  	
  
•  Each	
  row	
  is	
  stored	
  by	
  a	
  single	
  server	
  and	
  with	
  5,000x60=300,000	
  events	
  per	
  minute,	
  that	
  
would	
  put	
  large	
  load	
  for	
  a	
  minute	
  on	
  a	
  single	
  server.	
  	
  
•  A	
  “par99on”	
  contrap9on	
  aims	
  to	
  “break”	
  this	
  huge	
  row,	
  remove	
  hotspots	
  and	
  spread	
  the	
  
load	
  to	
  possibly	
  all	
  servers	
  
•  The	
  #	
  of	
  par99ons,	
  some	
  mul9ple	
  of	
  the	
  #	
  of	
  servers	
  
•  Finite	
  #	
  of	
  par99ons	
  –	
  s9ll	
  maintains	
  the	
  row	
  key	
  as	
  meaningful,	
  i.e.	
  we	
  can	
  construct	
  the	
  
keys	
  for	
  a	
  certain	
  minute	
  and	
  fetch	
  records	
  for	
  them.	
  
Composite Columns
•  Composite Columns:
–  Actual message stored as part of composite column
•  Variable granularity grouping
–  Minute: Row key based on minute
Min_par((on	
  (TEXT)	
   DC:TimeUUID:UserID:Message(Composite)	
   …	
  
2012-­‐07-­‐18-­‐08-­‐13-­‐p-­‐1	
   Status	
  
…	
   …	
  
2012-­‐07-­‐19-­‐11-­‐21-­‐p-­‐3	
   Status	
  
Benefits
Data Center 3 (RO)
Data Center 2
(RW)
Data Center 1
(RW)
Geo-Redundancy
16
Data Center 4 (RO)
Data Consolidation and Extraction
•  Single view of data across multiple locations
•  Data extraction can be performed in parallel
•  Data extraction process performed in
dedicated cluster of machines.
Low-Latency & Batch Applications
•  Triaging
–  Troubleshooting customer issues within 10 minutes of
occurrence
–  Feeding a dashboard of live feed data through
aggregations performed in Counter CFs
•  Analysis
–  Analytical and ad Hoc queries to replace the need
for remote data warehouse eventually
–  Map/Reduce via Hive without ETL
Opportunities Remaining
•  Near real-time pattern detection and
response
•  Message loss in JMS queue
•  JMS queue replication.
•  reducing the impact of Queue failover on
other applications
Further Improvements…
HOW ???
Accenture	
  Cloud	
  PlaAorm	
  
Recommender	
  as	
  a	
  
Service	
  
…	
  
Network	
  Analy9cs	
  
Services	
  
Big Data Platform
Drivers
consumer devices
video usage
Issues
Operational Costs
Understanding service quality degradation
Inefficient capacity planning
INGEST	
   PROCESS	
  
VISUALIZE	
  
ANALYZE	
  
STORE	
  
WHY STORM?
Scalability
Reliability
Data types, size, velocity
Mission critical data
Processing, computation, etc.
Time series / pattern
analysis
Fault-tolerance
What do we need?
Multiple use cases
How do we get this from Storm?
Processing guarantees
Low-level
Primitives
Parallelization
Robust fail-over strategies
Scalability
Reliability
Fault-tolerance
Processing, computation,
etc.
PRIMITIVES	
  
Stream	
  
Spout	
  
Bolt	
  
Topology	
  
Subop(mal	
  network	
  
speed,	
  geospa(al	
  analysis	
  	
  
Request	
  info	
  (IP,	
  user-­‐agent,	
  
etc)	
  
Pull	
  messages	
  from	
  
distributed	
  queue	
  
Sessioniza(on,	
  speed	
  
calcula(on	
  	
  
Tuple	
   Tuple	
   Tuple	
  
Integration with Cassandra
Cassandra
Optimal for time series data
Near-linear scalable
Low read/write latency
Scales in conjunction with Storm
Custom Bolt
Uses Hector API to access Cassandra
Creates dynamic columns per request
Stores relevant network data
SUBOPTIMAL NETWORK SPEED TOPOLOGY
AN EXAMPLE
KaUa	
  
Spout	
  
Pre-­‐process	
   Sessionize	
  
Calculate	
  N/
W	
  Speed	
  per	
  
Session	
  
Update	
  
Speed	
  per	
  IP	
  
Iden(fy	
  Sub-­‐
Op(mal	
  
Speed	
  
Store	
  in	
  
Cassandra	
  
Cassandra	
  
Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1)	
  Tuple	
  (ip	
  1)	
  
Cassandra	
  
KaUa	
  
Spout	
  
Pre-­‐process	
   Sessionize	
  
Calculate	
  N/
W	
  Speed	
  per	
  
Session	
  
Update	
  
Speed	
  per	
  IP	
  
Iden(fy	
  Sub-­‐
Op(mal	
  
Speed	
  
Store	
  in	
  
Cassandra	
  
Tuple	
  (ip	
  2)	
  Tuple	
  (ip	
  2)	
  Tuple	
  (ip	
  2)	
  
Tuple	
  (ip	
  1)	
  
Tuple	
  (ip	
  2)	
  
Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1)	
  
Tuple	
  (ip	
  2)	
   Tuple	
  (ip	
  2)	
  Tuple	
  (ip	
  2)	
  
Tuple	
  (ip	
  1)	
  
Tuple	
  (ip	
  2)	
  
Tuple	
  (ip	
  1)	
  
Tuple	
  (ip	
  2)	
  
Tuple	
  (ip	
  1)	
  Tuple	
  (ip	
  1)	
  Tuple	
  (ip	
  1)	
  
Tuple	
  (ip	
  1)	
  
Parallelism	
  
Cassandra	
  
KaUa	
  
Spout	
  
Pre-­‐process	
   Sessionize	
  
Calculate	
  N/
W	
  Speed	
  per	
  
Session	
  
Update	
  
Speed	
  per	
  IP	
  
Join	
  
Compare	
  
Speed	
  
Store	
  in	
  
Cassandra	
  
Speed	
  by	
  
Loca(on	
  
Stream	
  1	
  
Stream	
  2	
  
KaUa	
  
Spout	
  
Tuple	
  (ip	
  1)	
   Tuple	
  (ip	
  1/NY)	
  
Tuple	
  (NY)	
  
Tuple	
  (ip	
  1/NY)	
  
Branching	
  and	
  Joins	
  
Lessons Learned
•  Rebalance Topology	

•  Tweak parallelism in bolt	

•  Isolation of Topologies	

•  Use TimeUUIDUtils	

•  Log4j level set to INFO by default
Thank You
Q & A

Más contenido relacionado

La actualidad más candente

Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasifthuguk
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Jay Patel
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatalagethue
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Dave Gardner
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streamsvito jeng
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsScyllaDB
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationDataStax Academy
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 

La actualidad más candente (20)

Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasift
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatala
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streams
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 

Similar a C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, Now & Future by Ameet Chaubal and Fausto Inestroza

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike, Inc.
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...DataStax Academy
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehousePrecisely
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Similar a C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, Now & Future by Ameet Chaubal and Fausto Inestroza (20)

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

Más de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Más de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, Now & Future by Ameet Chaubal and Fausto Inestroza

  • 1. Cassandra & Next Generation Analysis Cassandra for a high-velocity data ingestion and real-time analysis system. Ameet Chaubal & Fausto Inestroza
  • 2. Presentation Route • Describe  conven,onal  technology   solu,on   • Highlight  deficiencies   • Showcase  new  solu,on   implemented  using  Cassandra   • Layout  architecture  with   improvements  
  • 3. Business Case •  Capture messages from high-volume e- Commerce site. •  Store them into a database •  Perform near real-time queries for troubleshooting •  Perform deeper analysis a la BI.
  • 4. Olden Days … JMS Queue Transient Storage RDBMS Data warehouse Analysis eCommerce Website
  • 5. Business Case, Details… Messages: 5000 msg/sec ~ 250 million / day Message size : 1 Kb JMS Queue Transient Storage RDBMS Data warehouse eCommerce Website Decouple UI from storage Multiple sinks Dedicated storage Triage Data Analysis Business Intelligence
  • 6. What’s the problem? JMS Queue Data warehouse SITE I SITE II JMS Queue •  Queue Replication problems •  Message Loss •  Other applications affected in case of failover •  Triage data isolated •  No universal view •  Data Consolidation adds delay •  Inability to keep up with increasing messages •  Analysis always lagging the action •  No low-latency queries Batch Load Transient storage
  • 7. Problems Recap • Over  5000  msg/sec  High  Write  Speed   • Extrac9on  &  Load  very  slow  ETL  from  Transient  storage  to  Data   warehouse  takes  over  4  hours   • Analysis  always  lags  events  by  hours  ETL  performed  in  batches  4  hours  apart   • No  high  availability  No  Geo-­‐Redundancy  for  Transient   Storage   • Data  stored  in  disparate  buckets  No  Universal  view  of  data  for  “Triage”   applica9ons/troubleshoo9ng   • No  dashboard    No  low-­‐latency  queries   •  No  immediate  alert,  paRern  detec9on  No  real-­‐9me  analysis  
  • 8. Thrift Connection Pool Online e-Commerce Application Event JMS A3 Load Balancer VIP A6A5 Replication Consumers Hector / Java Client -1 Hector / Java Client -2 Hector / Java Client -n JMS Publisher A1 A2 Cassandra A7 A4 Write event to queue Fetch from queue Cassandra + Hadoop A8 Map/Reduce Hive Queries/ BI Real-Time Dashboard A9 A10 A12 Solution Blueprint
  • 9. Role of Data Model Before we get there, what features are missing from Cassandra in comparison to traditional RDBMS
  • 10. Shortcomings… Opportunities •  No Joins across Column Families •  No analytical functions such as sum, count… •  Difficulty in constructing “WHERE” clause predicates across composite columns •  Inability to order range of Keys in Random Partitioner
  • 11. Importance of Data model - Cassandra •  In lieu of JOINS, “smart” de-normalization techniques are crucial. •  Need to use “FEATURES” of Cassandra to effectively model the business rules and business data •  “Client” or “Application” code becomes extremely important. •  “APPLICATION” + “DATABASE” => Full Package
  • 12. Features of Cassandra Modeling •  “WIDE” Column Family –  Organize data in “horizontal” as opposed to “vertical” fashion as in RDBMS •  Automatic Sorting of Columns –  Important to “MODEL” the data in “COLUMNS” as opposed to rows. •  Faster Access to ALL COLUMNS of a Row Key –  All columns of a row key stored on ONE server =>fast iteration/aggregations •  Useful info in “COLUMN NAME” –  Ground breaking from RDBMS perspective –  Enables “MORE” “INFORMATION” to be PACKED –  “COLUMN” as entity becomes “MORE POWERFUL”. •  COMPOSITE Column NAMES: –  Column names can be COMPOSITES !!! Made up of multiple columns –  Auto sorting still works
  • 13. Data Model Wide  rows  with  sharding   Row  Key  =  “<min>|<part#>”   Role  of  par99on  #:     •  Each  row  is  stored  by  a  single  server  and  with  5,000x60=300,000  events  per  minute,  that   would  put  large  load  for  a  minute  on  a  single  server.     •  A  “par99on”  contrap9on  aims  to  “break”  this  huge  row,  remove  hotspots  and  spread  the   load  to  possibly  all  servers   •  The  #  of  par99ons,  some  mul9ple  of  the  #  of  servers   •  Finite  #  of  par99ons  –  s9ll  maintains  the  row  key  as  meaningful,  i.e.  we  can  construct  the   keys  for  a  certain  minute  and  fetch  records  for  them.  
  • 14. Composite Columns •  Composite Columns: –  Actual message stored as part of composite column •  Variable granularity grouping –  Minute: Row key based on minute Min_par((on  (TEXT)   DC:TimeUUID:UserID:Message(Composite)   …   2012-­‐07-­‐18-­‐08-­‐13-­‐p-­‐1   Status   …   …   2012-­‐07-­‐19-­‐11-­‐21-­‐p-­‐3   Status  
  • 16. Data Center 3 (RO) Data Center 2 (RW) Data Center 1 (RW) Geo-Redundancy 16 Data Center 4 (RO)
  • 17. Data Consolidation and Extraction •  Single view of data across multiple locations •  Data extraction can be performed in parallel •  Data extraction process performed in dedicated cluster of machines.
  • 18. Low-Latency & Batch Applications •  Triaging –  Troubleshooting customer issues within 10 minutes of occurrence –  Feeding a dashboard of live feed data through aggregations performed in Counter CFs •  Analysis –  Analytical and ad Hoc queries to replace the need for remote data warehouse eventually –  Map/Reduce via Hive without ETL
  • 19. Opportunities Remaining •  Near real-time pattern detection and response •  Message loss in JMS queue •  JMS queue replication. •  reducing the impact of Queue failover on other applications
  • 21. Accenture  Cloud  PlaAorm   Recommender  as  a   Service   …   Network  Analy9cs   Services   Big Data Platform
  • 22. Drivers consumer devices video usage Issues Operational Costs Understanding service quality degradation Inefficient capacity planning
  • 23. INGEST   PROCESS   VISUALIZE   ANALYZE   STORE  
  • 25. Scalability Reliability Data types, size, velocity Mission critical data Processing, computation, etc. Time series / pattern analysis Fault-tolerance What do we need? Multiple use cases
  • 26. How do we get this from Storm? Processing guarantees Low-level Primitives Parallelization Robust fail-over strategies Scalability Reliability Fault-tolerance Processing, computation, etc.
  • 28. Stream   Spout   Bolt   Topology   Subop(mal  network   speed,  geospa(al  analysis     Request  info  (IP,  user-­‐agent,   etc)   Pull  messages  from   distributed  queue   Sessioniza(on,  speed   calcula(on     Tuple   Tuple   Tuple  
  • 29. Integration with Cassandra Cassandra Optimal for time series data Near-linear scalable Low read/write latency Scales in conjunction with Storm Custom Bolt Uses Hector API to access Cassandra Creates dynamic columns per request Stores relevant network data
  • 30. SUBOPTIMAL NETWORK SPEED TOPOLOGY AN EXAMPLE
  • 31. KaUa   Spout   Pre-­‐process   Sessionize   Calculate  N/ W  Speed  per   Session   Update   Speed  per  IP   Iden(fy  Sub-­‐ Op(mal   Speed   Store  in   Cassandra   Cassandra   Tuple  (ip  1)   Tuple  (ip  1)   Tuple  (ip  1)   Tuple  (ip  1)   Tuple  (ip  1)   Tuple  (ip  1)  Tuple  (ip  1)  
  • 32. Cassandra   KaUa   Spout   Pre-­‐process   Sessionize   Calculate  N/ W  Speed  per   Session   Update   Speed  per  IP   Iden(fy  Sub-­‐ Op(mal   Speed   Store  in   Cassandra   Tuple  (ip  2)  Tuple  (ip  2)  Tuple  (ip  2)   Tuple  (ip  1)   Tuple  (ip  2)   Tuple  (ip  1)   Tuple  (ip  1)   Tuple  (ip  2)   Tuple  (ip  2)  Tuple  (ip  2)   Tuple  (ip  1)   Tuple  (ip  2)   Tuple  (ip  1)   Tuple  (ip  2)   Tuple  (ip  1)  Tuple  (ip  1)  Tuple  (ip  1)   Tuple  (ip  1)   Parallelism  
  • 33. Cassandra   KaUa   Spout   Pre-­‐process   Sessionize   Calculate  N/ W  Speed  per   Session   Update   Speed  per  IP   Join   Compare   Speed   Store  in   Cassandra   Speed  by   Loca(on   Stream  1   Stream  2   KaUa   Spout   Tuple  (ip  1)   Tuple  (ip  1/NY)   Tuple  (NY)   Tuple  (ip  1/NY)   Branching  and  Joins  
  • 34. Lessons Learned •  Rebalance Topology •  Tweak parallelism in bolt •  Isolation of Topologies •  Use TimeUUIDUtils •  Log4j level set to INFO by default