SlideShare a Scribd company logo
1 of 14
Download to read offline
Wide Column Store for Big Data

APACHE CASSANDRA

                        Kai Spichale
Outline

 Motivation
 Introduction to Cassandra
 Big Data Solution
„Must Haves“ for Big Data?
   What do modern businesses need for big data?

   A scalable high-performance database
    that is easy to use and
    cost effective
                                         Scalable
                                       Performance


                                     Cost      Operational
                                   Effective      Ease
„Must Haves“ for Big Data?
   „Modern businesses need to be able to manage large
    volumes of realtime data and run analytic and enterprise
    search operations on that same data as quickly as possible
    to make business decisions.“


              Real-Time                   Analytic/Search
              Databases                     Databases

                          Data Movement
                            ETL Process
Legacy RDBMS ≠ Big Data
   „Big data is comprised of (1) Velocity – how fast the data is coming in;
    (2) Variety – all types are new being captured; (3) Volume – TB‘s to
    PB‘s of data; (4) Complexity – mulit-location, data center, etc.“



   “Big data technologies describe a new generation of technologies and
    architectures, designed to economically extract value from very large
    volumes of a wide variety of data, by enabling high-velocity capture,
    discovery, and/or analysis.”



   “Big data is data that exceeds the processing capacity of conventional
    database systems. The data is too big, moves too fast, or doesn’t fit the
    strictures of your database architectures. To gain value from this data,
    you must choose an alternative way to process it.”
Trends & Challenges in Data Mngt.

Exponential Data
                          Key Value
    Growth



     Cloud               Wide Column



 Semi Structured
                          Document
      Data


 More Connected
                           Graph
      Data
Trends & Challenges in Data Mngt.

Exponential Data
                          Key Value
    Growth


                          Apache
     Cloud
                         Cassandra


 Semi Structured
                          Document
      Data


 More Connected
                           Graph
      Data
Apache Cassandra

   A massively scalable, decentralized, structured
    data store (aka database).

   Project history:
Nodes   Token

Cassandra is…                            A
                                         B
                                         C
                                                 0
                                                 4
                                                 8
                                         D       12
                                         E       16
                                         F       20
                                         G       24
 O(1) Distributed Hash Table            H       28

 Sharding, Replication
 Elastic                            H                   A


                                 G                           B
   Fault tolerant
   No Single Point of Failure
                                 F                           C
   Durable

                                     E                   D
Cassandra is…
                                          C

   AP-System (CAP Theorem)
     Eventual consistency

                                 A                   P
   Tunable trade-offs:
     Consistency vs. Latency
     Choose between synchronous or asynchronous
      replication for each update

                                     C = Consistency
                                     A = High Availability
                                     P = Partitioning Tolerance
Cassandra is…
                             Keyspace
   A BigTable Clone
                                        Column Family
   No schema                 Key          Row
                                          Column Column

                              Key          Row
                                          Column

                              Key          Row
   Predestined for                       Column Column Column

     Semi-structured data              Column Family

                                           Row
     Sparse data                         SuperColumn   SuperColumn
                                          Column Column Column Column

                                           Row
                                          SuperColumn
                                          Column Column Column
Cassandra-based Big Data
Solution
            Analytics
            Hadoop
                        Real-time
                        Cassandra
                                                   Real-time queries with
                                                    Cassandra

Analytics                           Real-time
                                    Cassandra
Hadoop
            Cassandra Cluster                      Distributed Search with
              (Replication)                         Solr
                                    Real-time
Search
 Solr
                                    Cassandra
                                                   Analytics with Hadoop
                                                    MapReduce
            Search       Search
             Solr         Solr
Summary
   Apache Cassandra is a elastic scalable, fault-
    tolerant data store

   Tunable consistency levels

   Wide Column: flexible datamodel without schema

   Supports: real-time queries, analytics through
    Hadoop integration, Solr-based fulltext search
Thank you!



             Q&A

More Related Content

What's hot

Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
Silicon Halton
 

What's hot (20)

Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Cassandra
CassandraCassandra
Cassandra
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 

Similar to Cassandra

NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
PL dream
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
Amazon Web Services
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Amazon Web Services
 

Similar to Cassandra (20)

NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
cassandra
cassandracassandra
cassandra
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Scaling data on public clouds
Scaling data on public cloudsScaling data on public clouds
Scaling data on public clouds
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Cassandra

  • 1. Wide Column Store for Big Data APACHE CASSANDRA Kai Spichale
  • 2. Outline  Motivation  Introduction to Cassandra  Big Data Solution
  • 3. „Must Haves“ for Big Data?  What do modern businesses need for big data?  A scalable high-performance database that is easy to use and cost effective Scalable Performance Cost Operational Effective Ease
  • 4. „Must Haves“ for Big Data?  „Modern businesses need to be able to manage large volumes of realtime data and run analytic and enterprise search operations on that same data as quickly as possible to make business decisions.“ Real-Time Analytic/Search Databases Databases Data Movement ETL Process
  • 5. Legacy RDBMS ≠ Big Data  „Big data is comprised of (1) Velocity – how fast the data is coming in; (2) Variety – all types are new being captured; (3) Volume – TB‘s to PB‘s of data; (4) Complexity – mulit-location, data center, etc.“  “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”  “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
  • 6. Trends & Challenges in Data Mngt. Exponential Data Key Value Growth Cloud Wide Column Semi Structured Document Data More Connected Graph Data
  • 7. Trends & Challenges in Data Mngt. Exponential Data Key Value Growth Apache Cloud Cassandra Semi Structured Document Data More Connected Graph Data
  • 8. Apache Cassandra  A massively scalable, decentralized, structured data store (aka database).  Project history:
  • 9. Nodes Token Cassandra is… A B C 0 4 8 D 12 E 16 F 20 G 24  O(1) Distributed Hash Table H 28  Sharding, Replication  Elastic H A G B  Fault tolerant  No Single Point of Failure F C  Durable E D
  • 10. Cassandra is… C  AP-System (CAP Theorem)  Eventual consistency A P  Tunable trade-offs:  Consistency vs. Latency  Choose between synchronous or asynchronous replication for each update C = Consistency A = High Availability P = Partitioning Tolerance
  • 11. Cassandra is… Keyspace  A BigTable Clone Column Family  No schema Key Row Column Column Key Row Column Key Row  Predestined for Column Column Column  Semi-structured data Column Family Row  Sparse data SuperColumn SuperColumn Column Column Column Column Row SuperColumn Column Column Column
  • 12. Cassandra-based Big Data Solution Analytics Hadoop Real-time Cassandra  Real-time queries with Cassandra Analytics Real-time Cassandra Hadoop Cassandra Cluster  Distributed Search with (Replication) Solr Real-time Search Solr Cassandra  Analytics with Hadoop MapReduce Search Search Solr Solr
  • 13. Summary  Apache Cassandra is a elastic scalable, fault- tolerant data store  Tunable consistency levels  Wide Column: flexible datamodel without schema  Supports: real-time queries, analytics through Hadoop integration, Solr-based fulltext search
  • 14. Thank you! Q&A