SlideShare una empresa de Scribd logo
1 de 18
Making Hadoop & Cassandra
       work together

          © Altoros Systems, Inc.
About Altoros
  Software delivery acceleration specialist for big data application implementation
   services
  200+ employees globally (US, Eastern Europe, UK, Denmark, Norway)
  Big data practice areas
         Automated device analytics
         Advertising analytics
         Big data warehouse


Customers




Partners

                                                         Implementation Partner


                                     © Altoros Systems, Inc.
The Product




              © Altoros Systems, Inc.
The Problem: Data is Big


 10-20 sensors per house
 Ability to support tens of thousands of households

 1 sensor ~1.1 MB/day
 1,000 Households: 11 GB/day
 500,000 Households: 5TB/day




                          © Altoros Systems, Inc.
The Dashboard




                © Altoros Systems, Inc.
Full Visibility




                  © Altoros Systems, Inc.
The Problem: Performance


 MySQL showed slow performance under intensive writes
     Target throughput isn’t scalable

 Disk performance is a bottleneck
      Monitoring with iostat -dmx

 Old fashion single-threaded batch processing is slow
      Make it parallel!




                         © Altoros Systems, Inc.
Requirements


 High responsive system with parallel processing

 Reliable
   – Partial failure is acceptable
   – Node and data recoverability

 Scalable
   – Load capacity
   – Max throughput

 Total cost of ownership
   – Data compression
                            © Altoros Systems, Inc.
NoSQL Database Requirements


   –   Fast writes are critical
   –   Querying by column and range of keys
   –   Secondary indices
   –   Good map/reduce compatibility using Apache Hadoop




                         © Altoros Systems, Inc.
© Altoros Systems, Inc.
Why Cassandra


  – Good overall balance of features, scalability, reliability
  – We wanted BigTable-like features: columns, column
    families
  – Well suited for large streams of non-transactional data
  – Provides good, consistent write throughput
  – Tunable trade-offs for distribution and replication (N,
    R, W)




                          © Altoros Systems, Inc.
File system


 HDFS
   – Is a file system behind our Cassandra implementation
   – Data coherency: write-once-read-many access




                         © Altoros Systems, Inc.
Cassandra: Best Used When…



 When you write more than you read (logging).
 If every component of the system must be in Java
 You need/may need in the future complex configuration
  requirements




                         © Altoros Systems, Inc.
Cassandra Challenges


    High, Unpredictable Write Volume
    Varying Schema, Variable Msg Size
    2 Types of Series - Data, Lookups
    All time-series, even metadata - no supplemental DB




                        © Altoros Systems, Inc.
No Cassandra Compression?


 Built-in Cassandra compression claims to compress
  across columns with identical names.
 All our data columns are timestamped, so no two will
  ever have identical names.




                         © Altoros Systems, Inc.
Numbers


          “Benchmark” Cassandra node
               LZO Compression




                    © Altoros Systems, Inc.
Lessons Learned


 Consider hybrid
     RDBMS + NoSQL + Hadoop

 Hadoop
     Is for offline processing and analysis
     Is NOT for random reading and writing records

 Cassandra complements Hadoop with querying capabilities




                          © Altoros Systems, Inc.
Thank you!

 @renatkhasanshyn
      @altoros
renat.k@altoros.com




      © Altoros Systems, Inc.

Más contenido relacionado

Más de Altoros

Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and KubernetesAltoros
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingAltoros
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple NodesAltoros
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayAltoros
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for KubernetesAltoros
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryAltoros
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFAltoros
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedAltoros
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsAltoros
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionAltoros
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesAltoros
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoTAltoros
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentAltoros
 
What's New in the Cloud Foundry Ecosystem?
What's New in the Cloud Foundry Ecosystem?What's New in the Cloud Foundry Ecosystem?
What's New in the Cloud Foundry Ecosystem?Altoros
 
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsBluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsAltoros
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in FinanceAltoros
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
Toward ML-Assisted Tumor Boards Using Cross-Modal Learning
Toward ML-Assisted Tumor Boards Using Cross-Modal LearningToward ML-Assisted Tumor Boards Using Cross-Modal Learning
Toward ML-Assisted Tumor Boards Using Cross-Modal LearningAltoros
 
Future of IoT: Key Challenges to Face
Future of IoT: Key Challenges to FaceFuture of IoT: Key Challenges to Face
Future of IoT: Key Challenges to FaceAltoros
 
Using Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
Using Hyperledger Fabric to Manage Compliance with Fund Managers and RegulatorsUsing Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
Using Hyperledger Fabric to Manage Compliance with Fund Managers and RegulatorsAltoros
 

Más de Altoros (20)

Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 
What's New in the Cloud Foundry Ecosystem?
What's New in the Cloud Foundry Ecosystem?What's New in the Cloud Foundry Ecosystem?
What's New in the Cloud Foundry Ecosystem?
 
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsBluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Toward ML-Assisted Tumor Boards Using Cross-Modal Learning
Toward ML-Assisted Tumor Boards Using Cross-Modal LearningToward ML-Assisted Tumor Boards Using Cross-Modal Learning
Toward ML-Assisted Tumor Boards Using Cross-Modal Learning
 
Future of IoT: Key Challenges to Face
Future of IoT: Key Challenges to FaceFuture of IoT: Key Challenges to Face
Future of IoT: Key Challenges to Face
 
Using Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
Using Hyperledger Fabric to Manage Compliance with Fund Managers and RegulatorsUsing Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
Using Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Making Hadoop and Cassandra Work Together

  • 1. Making Hadoop & Cassandra work together © Altoros Systems, Inc.
  • 2. About Altoros  Software delivery acceleration specialist for big data application implementation services  200+ employees globally (US, Eastern Europe, UK, Denmark, Norway)  Big data practice areas Automated device analytics Advertising analytics Big data warehouse Customers Partners Implementation Partner © Altoros Systems, Inc.
  • 3. The Product © Altoros Systems, Inc.
  • 4. The Problem: Data is Big  10-20 sensors per house  Ability to support tens of thousands of households  1 sensor ~1.1 MB/day  1,000 Households: 11 GB/day  500,000 Households: 5TB/day © Altoros Systems, Inc.
  • 5. The Dashboard © Altoros Systems, Inc.
  • 6. Full Visibility © Altoros Systems, Inc.
  • 7. The Problem: Performance  MySQL showed slow performance under intensive writes Target throughput isn’t scalable  Disk performance is a bottleneck Monitoring with iostat -dmx  Old fashion single-threaded batch processing is slow Make it parallel! © Altoros Systems, Inc.
  • 8. Requirements  High responsive system with parallel processing  Reliable – Partial failure is acceptable – Node and data recoverability  Scalable – Load capacity – Max throughput  Total cost of ownership – Data compression © Altoros Systems, Inc.
  • 9. NoSQL Database Requirements – Fast writes are critical – Querying by column and range of keys – Secondary indices – Good map/reduce compatibility using Apache Hadoop © Altoros Systems, Inc.
  • 11. Why Cassandra – Good overall balance of features, scalability, reliability – We wanted BigTable-like features: columns, column families – Well suited for large streams of non-transactional data – Provides good, consistent write throughput – Tunable trade-offs for distribution and replication (N, R, W) © Altoros Systems, Inc.
  • 12. File system  HDFS – Is a file system behind our Cassandra implementation – Data coherency: write-once-read-many access © Altoros Systems, Inc.
  • 13. Cassandra: Best Used When…  When you write more than you read (logging).  If every component of the system must be in Java  You need/may need in the future complex configuration requirements © Altoros Systems, Inc.
  • 14. Cassandra Challenges  High, Unpredictable Write Volume  Varying Schema, Variable Msg Size  2 Types of Series - Data, Lookups  All time-series, even metadata - no supplemental DB © Altoros Systems, Inc.
  • 15. No Cassandra Compression?  Built-in Cassandra compression claims to compress across columns with identical names.  All our data columns are timestamped, so no two will ever have identical names. © Altoros Systems, Inc.
  • 16. Numbers “Benchmark” Cassandra node LZO Compression © Altoros Systems, Inc.
  • 17. Lessons Learned  Consider hybrid RDBMS + NoSQL + Hadoop  Hadoop Is for offline processing and analysis Is NOT for random reading and writing records  Cassandra complements Hadoop with querying capabilities © Altoros Systems, Inc.
  • 18. Thank you! @renatkhasanshyn @altoros renat.k@altoros.com © Altoros Systems, Inc.