SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Analytics on top of Cassandra and Hadoop

Dmitry Mezhensky | Mirantis Inc

#CASSANDRAEU
What we will discuss today
● Analytics on Cassandra using Hadoop
● Various types of statistics & implementation
● Scalability of approach

#CASSANDRAEU
Problems
● Too many statistics (more that 100)
● Various types
○ Top N
○ Time series
○ Min/max/average/median
○ Extremum values on time interval
○ Fraud analysis
● Huge amount of data
● Scalability of approach
#CASSANDRAEU
Statistics implementation on Hadoop

#CASSANDRAEU
Top N
● Map phase generates <Key, Value> pairs, top N
is building by Value
● Reduce phase accumulates values, persist to
Cassandra is done via custom output format
● For top N entities in Cassandra suitable
comparator was used

#CASSANDRAEU
Top N
● One write stage to Cassandra sorting is done by
value
● On reading stage first N records will be Top N
values

#CASSANDRAEU
Time series
● Map phase generates pairs <Time, Value>
● Reduce phase accumulates (various behaviour
for different statistics)
● Persist to Cassandra using custom output format
& using one row key per statistics, one column
per date

#CASSANDRAEU
Maximum, minimum, extremum on interval
● Max/min values are simple to calculate
● Extremum on interval is calculating the similar to
time series

#CASSANDRAEU
Fraud analysis
● Fraud analysis is running after all statistics are
calculated
● Processed data is filtered by fraud filters

#CASSANDRAEU
Scalability approach
●
●
●
●

Data is reading/writing to Cassandra only
Hadoop is elastically scalable
Cassandra is elastically scalable
No bottleneck

#CASSANDRAEU
Questions?

#CASSANDRAEU
Thank you!

#CASSANDRAEU

Más contenido relacionado

La actualidad más candente

Continuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked DataContinuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked DataRuben Taelman
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short versionAlex Pinkin
 
Continuous Self-Updating Query Results over Dynamic Linked Data
Continuous Self-Updating Query Results over Dynamic Linked DataContinuous Self-Updating Query Results over Dynamic Linked Data
Continuous Self-Updating Query Results over Dynamic Linked DataRuben Taelman
 
MapReduce
MapReduceMapReduce
MapReducerobjk
 
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerC* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerDataStax Academy
 
Touch serverless with Java and AWS
Touch serverless with Java and AWSTouch serverless with Java and AWS
Touch serverless with Java and AWSPaweł Janusz
 
Moving RDF Stream Processing to the Client
Moving RDF Stream Processing to the ClientMoving RDF Stream Processing to the Client
Moving RDF Stream Processing to the ClientRuben Taelman
 
Querying Dynamic Datasources with Continuously Mapped Sensor Data
Querying Dynamic Datasources with Continuously Mapped Sensor DataQuerying Dynamic Datasources with Continuously Mapped Sensor Data
Querying Dynamic Datasources with Continuously Mapped Sensor DataRuben Taelman
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series databasefelixbarny
 
Insight Recent Demo
Insight Recent DemoInsight Recent Demo
Insight Recent Demoreza-asad
 
Insight Demo
Insight DemoInsight Demo
Insight Demoreza-asad
 
Asynchronous processing using AWS Services
Asynchronous processing using AWS ServicesAsynchronous processing using AWS Services
Asynchronous processing using AWS ServicesMichael Bui
 
Dato vs GraphX
Dato vs GraphXDato vs GraphX
Dato vs GraphXKeira Zhou
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopNajima Begum
 
PloneConf2017: serverless python for astronaut safety
PloneConf2017:  serverless python for astronaut safetyPloneConf2017:  serverless python for astronaut safety
PloneConf2017: serverless python for astronaut safetyChris Shenton
 

La actualidad más candente (20)

Continuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked DataContinuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked Data
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
 
Continuous Self-Updating Query Results over Dynamic Linked Data
Continuous Self-Updating Query Results over Dynamic Linked DataContinuous Self-Updating Query Results over Dynamic Linked Data
Continuous Self-Updating Query Results over Dynamic Linked Data
 
MapReduce
MapReduceMapReduce
MapReduce
 
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerC* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
 
Cassandra On EPAM Cloud - VDAY 2017
Cassandra On EPAM Cloud - VDAY 2017Cassandra On EPAM Cloud - VDAY 2017
Cassandra On EPAM Cloud - VDAY 2017
 
Touch serverless with Java and AWS
Touch serverless with Java and AWSTouch serverless with Java and AWS
Touch serverless with Java and AWS
 
Moving RDF Stream Processing to the Client
Moving RDF Stream Processing to the ClientMoving RDF Stream Processing to the Client
Moving RDF Stream Processing to the Client
 
Spark Scala project
Spark Scala project Spark Scala project
Spark Scala project
 
Querying Dynamic Datasources with Continuously Mapped Sensor Data
Querying Dynamic Datasources with Continuously Mapped Sensor DataQuerying Dynamic Datasources with Continuously Mapped Sensor Data
Querying Dynamic Datasources with Continuously Mapped Sensor Data
 
Css gradients
Css gradientsCss gradients
Css gradients
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
Ruby Time & Date
Ruby Time & DateRuby Time & Date
Ruby Time & Date
 
Insight Recent Demo
Insight Recent DemoInsight Recent Demo
Insight Recent Demo
 
Insight Demo
Insight DemoInsight Demo
Insight Demo
 
Asynchronous processing using AWS Services
Asynchronous processing using AWS ServicesAsynchronous processing using AWS Services
Asynchronous processing using AWS Services
 
Dato vs GraphX
Dato vs GraphXDato vs GraphX
Dato vs GraphX
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek
 
Weather Data Analytics Using Hadoop
Weather Data Analytics Using HadoopWeather Data Analytics Using Hadoop
Weather Data Analytics Using Hadoop
 
PloneConf2017: serverless python for astronaut safety
PloneConf2017:  serverless python for astronaut safetyPloneConf2017:  serverless python for astronaut safety
PloneConf2017: serverless python for astronaut safety
 

Destacado

El proceso de resolución
El proceso de resoluciónEl proceso de resolución
El proceso de resoluciónjulimarontiver
 
Bahasan 7 teknologi website
Bahasan 7 teknologi websiteBahasan 7 teknologi website
Bahasan 7 teknologi websiteririnefrianti
 
Daily light on the daily path feb
Daily light on the daily path   febDaily light on the daily path   feb
Daily light on the daily path febJocelyn Koh
 
Bahasan 6 wbt cbt elearning blog
Bahasan 6 wbt cbt elearning blogBahasan 6 wbt cbt elearning blog
Bahasan 6 wbt cbt elearning blogririnefrianti
 
Комплексная платформа РТУ
Комплексная платформа РТУКомплексная платформа РТУ
Комплексная платформа РТУLeonid Golovanov
 
【國中】整數乘除與四則
【國中】整數乘除與四則【國中】整數乘除與四則
【國中】整數乘除與四則王 維新
 
Heal our nations with Peace - Uni Papua.
Heal our nations with Peace - Uni Papua. Heal our nations with Peace - Uni Papua.
Heal our nations with Peace - Uni Papua. Uni Papua Football
 
Funcionamiento básico de una computadora cast9illo
Funcionamiento básico de una computadora cast9illoFuncionamiento básico de una computadora cast9illo
Funcionamiento básico de una computadora cast9illoJime Castillo R
 
Luke Hemmings
Luke HemmingsLuke Hemmings
Luke Hemmings18cxm
 

Destacado (18)

El proceso de resolución
El proceso de resoluciónEl proceso de resolución
El proceso de resolución
 
Grupo 3103
Grupo 3103 Grupo 3103
Grupo 3103
 
Bahasan 7 teknologi website
Bahasan 7 teknologi websiteBahasan 7 teknologi website
Bahasan 7 teknologi website
 
Las tics
Las ticsLas tics
Las tics
 
Who I met
Who I metWho I met
Who I met
 
Daily light on the daily path feb
Daily light on the daily path   febDaily light on the daily path   feb
Daily light on the daily path feb
 
Bahasan 6 wbt cbt elearning blog
Bahasan 6 wbt cbt elearning blogBahasan 6 wbt cbt elearning blog
Bahasan 6 wbt cbt elearning blog
 
Betina Mansilla
Betina MansillaBetina Mansilla
Betina Mansilla
 
Preguntas tic
Preguntas ticPreguntas tic
Preguntas tic
 
DESEMPEÑOS 1
DESEMPEÑOS 1DESEMPEÑOS 1
DESEMPEÑOS 1
 
Комплексная платформа РТУ
Комплексная платформа РТУКомплексная платформа РТУ
Комплексная платформа РТУ
 
Oer group2 final
Oer group2 finalOer group2 final
Oer group2 final
 
Tugas 2
Tugas 2Tugas 2
Tugas 2
 
Choice - Morning Tea
Choice - Morning TeaChoice - Morning Tea
Choice - Morning Tea
 
【國中】整數乘除與四則
【國中】整數乘除與四則【國中】整數乘除與四則
【國中】整數乘除與四則
 
Heal our nations with Peace - Uni Papua.
Heal our nations with Peace - Uni Papua. Heal our nations with Peace - Uni Papua.
Heal our nations with Peace - Uni Papua.
 
Funcionamiento básico de una computadora cast9illo
Funcionamiento básico de una computadora cast9illoFuncionamiento básico de una computadora cast9illo
Funcionamiento básico de una computadora cast9illo
 
Luke Hemmings
Luke HemmingsLuke Hemmings
Luke Hemmings
 

Similar a C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop

Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaJose Mº Muñoz
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Kinetic Data
 
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache CassandraApache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache CassandraAnant Corporation
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!DataStax Academy
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data prajods
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Pramati Technologies
 
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken SanfordMigrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken SanfordSri Ambati
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at PollfishPollfish
 

Similar a C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop (20)

Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
Presentation
PresentationPresentation
Presentation
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Demonstration
DemonstrationDemonstration
Demonstration
 
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
 
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache CassandraApache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache Cassandra
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
 
Cassandra Metrics
Cassandra MetricsCassandra Metrics
Cassandra Metrics
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]
 
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken SanfordMigrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 

Más de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Más de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

C* Summit EU 2013: Analytics On Top of Cassandra and Hadoop

  • 1. Analytics on top of Cassandra and Hadoop Dmitry Mezhensky | Mirantis Inc #CASSANDRAEU
  • 2. What we will discuss today ● Analytics on Cassandra using Hadoop ● Various types of statistics & implementation ● Scalability of approach #CASSANDRAEU
  • 3. Problems ● Too many statistics (more that 100) ● Various types ○ Top N ○ Time series ○ Min/max/average/median ○ Extremum values on time interval ○ Fraud analysis ● Huge amount of data ● Scalability of approach #CASSANDRAEU
  • 4. Statistics implementation on Hadoop #CASSANDRAEU
  • 5. Top N ● Map phase generates <Key, Value> pairs, top N is building by Value ● Reduce phase accumulates values, persist to Cassandra is done via custom output format ● For top N entities in Cassandra suitable comparator was used #CASSANDRAEU
  • 6. Top N ● One write stage to Cassandra sorting is done by value ● On reading stage first N records will be Top N values #CASSANDRAEU
  • 7. Time series ● Map phase generates pairs <Time, Value> ● Reduce phase accumulates (various behaviour for different statistics) ● Persist to Cassandra using custom output format & using one row key per statistics, one column per date #CASSANDRAEU
  • 8. Maximum, minimum, extremum on interval ● Max/min values are simple to calculate ● Extremum on interval is calculating the similar to time series #CASSANDRAEU
  • 9. Fraud analysis ● Fraud analysis is running after all statistics are calculated ● Processed data is filtered by fraud filters #CASSANDRAEU
  • 10. Scalability approach ● ● ● ● Data is reading/writing to Cassandra only Hadoop is elastically scalable Cassandra is elastically scalable No bottleneck #CASSANDRAEU