SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Wide Column Store for Big Data

APACHE CASSANDRA

                        Kai Spichale
Outline

 Motivation
 Introduction to Cassandra
 Big Data Solution
„Must Haves“ for Big Data?
   What do modern businesses need for big data?

   A scalable high-performance database
    that is easy to use and
    cost effective
                                         Scalable
                                       Performance


                                     Cost      Operational
                                   Effective      Ease
„Must Haves“ for Big Data?
   „Modern businesses need to be able to manage large
    volumes of realtime data and run analytic and enterprise
    search operations on that same data as quickly as possible
    to make business decisions.“


              Real-Time                   Analytic/Search
              Databases                     Databases

                          Data Movement
                            ETL Process
Legacy RDBMS ≠ Big Data
   „Big data is comprised of (1) Velocity – how fast the data is coming in;
    (2) Variety – all types are new being captured; (3) Volume – TB‘s to
    PB‘s of data; (4) Complexity – mulit-location, data center, etc.“



   “Big data technologies describe a new generation of technologies and
    architectures, designed to economically extract value from very large
    volumes of a wide variety of data, by enabling high-velocity capture,
    discovery, and/or analysis.”



   “Big data is data that exceeds the processing capacity of conventional
    database systems. The data is too big, moves too fast, or doesn’t fit the
    strictures of your database architectures. To gain value from this data,
    you must choose an alternative way to process it.”
Trends & Challenges in Data Mngt.

Exponential Data
                          Key Value
    Growth



     Cloud               Wide Column



 Semi Structured
                          Document
      Data


 More Connected
                           Graph
      Data
Trends & Challenges in Data Mngt.

Exponential Data
                          Key Value
    Growth


                           Apache
     Cloud               Wide Column
                          Cassandra


 Semi Structured
                          Document
      Data


 More Connected
                           Graph
      Data
Apache Cassandra

   A massively scalable, decentralized, structured
    data store (aka database).

   Project history:
Cassandra is…
                                         Nodes   Token
                                         A       0
                                         B       4
                                         C       8
                                         D       12
                                         E       16
                                         F       20
                                         G       24
 O(1) Distributed Hash Table            H       28

 Sharding, Replication
 Elastic                            H                   A


                                 G                           B
   Fault tolerant
   No Single Point of Failure
                                 F                           C
   Durable

                                     E                   D
Cassandra is…
                                          C

   AP-System (CAP Theorem)
     Eventual consistency

                                 A                   P
   Tunable trade-offs:
     Consistency vs. Latency
     Choose between synchronous or asynchronous
      replication for each update

                                     C = Consistency
                                     A = High Availability
                                     P = Partitioning Tolerance
Cassandra is…
                             Keyspace
   A BigTable Clone
                                        Column Family
   No schema                 Key          Row
                                          Column Column

                              Key          Row
                                          Column

                              Key          Row
   Predestined for                       Column Column Column

     Semi-structured data              Column Family

                                           Row
     Sparse data                         SuperColumn   SuperColumn
                                          Column Column Column Column

                                           Row
                                          SuperColumn
                                          Column Column Column
Cassandra-based Big Data
Solution
              H
            Analytics
            Hadoop
                            A
                        Real-time
                        Cassandra
                                                   Real-time queries with
                                                    Cassandra

    G
Analytics                             B
                                    Real-time
                                    Cassandra
Hadoop
            Cassandra Cluster                      Distributed Search with
              (Replication)                         Solr

    F
Search                                C
                                    Real-time
 Solr
                                    Cassandra
                                                   Analytics with Hadoop
                                                    MapReduce
             E
            Search         D
                         Search
             Solr         Solr
Summary
   Apache Cassandra is a elastic scalable, fault-
    tolerant data store

   Tunable consistency levels

   Wide Column: flexible datamodel without schema

   Supports: real-time queries, analytics through
    Hadoop integration, Solr-based fulltext search
Thank you!



             Q&A

Más contenido relacionado

La actualidad más candente

NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storage
kakugawa
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
Silicon Halton
 

La actualidad más candente (20)

Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storage
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 

Similar a Cassandra

NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
PL dream
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
Amazon Web Services
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 

Similar a Cassandra (20)

Cassandra
CassandraCassandra
Cassandra
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Cassandra

  • 1. Wide Column Store for Big Data APACHE CASSANDRA Kai Spichale
  • 2. Outline  Motivation  Introduction to Cassandra  Big Data Solution
  • 3. „Must Haves“ for Big Data?  What do modern businesses need for big data?  A scalable high-performance database that is easy to use and cost effective Scalable Performance Cost Operational Effective Ease
  • 4. „Must Haves“ for Big Data?  „Modern businesses need to be able to manage large volumes of realtime data and run analytic and enterprise search operations on that same data as quickly as possible to make business decisions.“ Real-Time Analytic/Search Databases Databases Data Movement ETL Process
  • 5. Legacy RDBMS ≠ Big Data  „Big data is comprised of (1) Velocity – how fast the data is coming in; (2) Variety – all types are new being captured; (3) Volume – TB‘s to PB‘s of data; (4) Complexity – mulit-location, data center, etc.“  “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”  “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
  • 6. Trends & Challenges in Data Mngt. Exponential Data Key Value Growth Cloud Wide Column Semi Structured Document Data More Connected Graph Data
  • 7. Trends & Challenges in Data Mngt. Exponential Data Key Value Growth Apache Cloud Wide Column Cassandra Semi Structured Document Data More Connected Graph Data
  • 8. Apache Cassandra  A massively scalable, decentralized, structured data store (aka database).  Project history:
  • 9. Cassandra is… Nodes Token A 0 B 4 C 8 D 12 E 16 F 20 G 24  O(1) Distributed Hash Table H 28  Sharding, Replication  Elastic H A G B  Fault tolerant  No Single Point of Failure F C  Durable E D
  • 10. Cassandra is… C  AP-System (CAP Theorem)  Eventual consistency A P  Tunable trade-offs:  Consistency vs. Latency  Choose between synchronous or asynchronous replication for each update C = Consistency A = High Availability P = Partitioning Tolerance
  • 11. Cassandra is… Keyspace  A BigTable Clone Column Family  No schema Key Row Column Column Key Row Column Key Row  Predestined for Column Column Column  Semi-structured data Column Family Row  Sparse data SuperColumn SuperColumn Column Column Column Column Row SuperColumn Column Column Column
  • 12. Cassandra-based Big Data Solution H Analytics Hadoop A Real-time Cassandra  Real-time queries with Cassandra G Analytics B Real-time Cassandra Hadoop Cassandra Cluster  Distributed Search with (Replication) Solr F Search C Real-time Solr Cassandra  Analytics with Hadoop MapReduce E Search D Search Solr Solr
  • 13. Summary  Apache Cassandra is a elastic scalable, fault- tolerant data store  Tunable consistency levels  Wide Column: flexible datamodel without schema  Supports: real-time queries, analytics through Hadoop integration, Solr-based fulltext search
  • 14. Thank you! Q&A