SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
Leveraging C* for real-time multi-dc public cloud analytics
Julien Anguenot
VP Software Engineering
@anguenot
1 iland cloud story & use case
2 data & domain constraints
3 deployment, hardware, configuration and architecture
overview
4 lessons learned
5 future platform extensions
3
iland cloud story & use case
Who are we?
• public, private, DRaaS, BaaS cloud provider
• Cisco CMSP
• VMware Vspp for 7+ years
• 20+ years in business
• HQ in Houston, TX
• http://www.iland.com
4
Yet another cloud provider? Well, …
5
• performance and stability
• custom SLA
• compliance
• security
• DRaaS
• global datacenter footprint: US, UK and Singapore
• dedicated support staff!
• iland cloud platform, Web management console and API
The iland cloud platform
6
iland cloud platform essentially
• data warehouse running across multiple data-centers
• monitoring (resource consumption / performance)
• billing (customers and internal use)
• alerting
• predictive analytics
• cloud management
• cloud services (backups, DR, etc.)
• desktop and mobile management consoles
• API
• Cassandra powered!
7
The iland cloud Web management console
8
9
10
11
12
13
14
15
16
17
So, why did we do all this?
• Initial motivations (v1)
• vendor software (VMware vCloud Director) lacking:
• performance analytics (real-time and historical)
• billing
• alerts
• cross datacenter visibility
• more private cloud type transparency
• abstract ourselves from vendors and integrate an
umbrella of heterogeneous services
• modern UX and good looking UI
18
19
data and domain constraints
Constraints
20
• write latency
• high throughput
• precision (used for billing)
• availability
• multi-data center
• scalability: tens of thousands of VMs
• agent-less
• pull/poll vs push
• high latency environs (multi-dc)
Pipeline
21
• collection of real-time data
• store
• aggregation
• correlation
• rollups (historical)
• processing
• alerting
• billing
• reporting
• querying
Real-time collected perf counters
22
• 20 seconds samples
• compute, storage, network
• 15+ perf counters collected
• ~50 data points per minute and per VM
• time series
• (timestamp, value)
• metadata
• unit
• interval
• etc.
• 1 year TTL
VM CPU 20 seconds perf counters
23
Group Name Type
CPU USAGE AVERAGE
CPU USAGE_MHZ AVERAGE
CPU READY SUMMATION
VM memory 20 seconds perf counters
24
Group Name Type
MEM ACTIVE AVERAGE
MEM CONSUMED AVERAGE
MEM VM_MEM_CTRL SUMMATION
VM network 20 seconds perf counters
25
Group Name Type
NET RECEIVED AVERAGE
NET TRANSMITTED AVERAGE
NET USAGE AVERAGE
VM disk 20 seconds perf counters
26
Group Name Type
DISK READ AVERAGE
DISK WRITE AVERAGE
DISK MAX_TOTAL_LATENCY LATEST
DISK USAGE AVERAGE
DISK PROVISIONED LATEST
DISK USED LATEST
DISK NUMBER_WRITE_AVERAGED AVERAGE
DISK NUMBER_READ_AVERAGED AVERAGE
More counters collected for 3rd party services
27
VM to time serie bindings
28
• binding on VM UUID
• serie UUID
• <VM_UUID>:disk:numberReadAveraged:average
• Simple, fast and easy to construct at application level.
29
30
VM containment and aggregation of real-time samples
31
• what’s this?
• resource pool / vs instance-based $$
• 20 seconds samples aggregated
from VM to VDC top level
• separated tables
Historical rollups and intervals
32
• VM, VAPP, VDC, ORG and network
• 1 minute (TTL = 1 year)
• 1 hour (used for billing)
• 1 day
• 1 week
• 1 month
• separated tables
• new performance counter types created
• TTL > 3 years for 1h samples for compliance & billing reasons
• application level responsibilities
1 minute rollups processing
33
• processed to trigger alerts (usage, billing)
• processed to compute real-time billing
1 hour rollups processing
34
• processed for final billing computation
• leveraging salesforce.com collected data
Data sources essentially
35
• compute
• storage
• network
• Management
• users
• cloud configuration
• salesforce.com
• 3rd party services: backups, DR, etc.
• pluggable: add / upgrade / remove services
Cassandra is the sole record keeper
36
37
deployment, configuration, hardware
and architecture overview
iland cloud platform foundation
38
• Cisco UCS
• VMware ESXi
• VMware vSphere (management)
• our Cassandra cluster runs on the exact same base
foundation as our customer public clouds.
39
Simplified architecture (each DC)
HAProxy Apache
KeyCloak
Wildfly AS
Postgres
Wildfly AS
Resteasy API
Wildfly AS
cluster
Apache
Lucene
NFS
Apache
Cassandra
Compute
Storage
Network
+ 3rd parties
Salesforce
iland cloud
Cassandra ring
API
AngularJS / API
Redis
Sentinel
AMQP
syslog-ng
Cassandra version history
40
• late 2014: 2.1.x
• early 2014: 2.0.x w/ Java CQL driver
• late 2013: 2.0 beta w/ Astanyax (CQL3) (v1)
• empty cluster
• early 2013: 1.2.x w/ Astanyax (initial proto)
iland’s cassandra cluster overall
41
• 6 datacenters
• 1 (one) keyspace
• 27 nodes
• 1.5 to 2TB per node (TTL)
42
Reston, VALA,CA
Dallas, TX
US
Singapore
Asia
London,UK
Manchester,UK
EU
Each DC
43
• 1 or 2 C* rack(s) of 3 Cassandra nodes
• endpoint_snitch: RackInferringSnitch
• RF=3
44
Each node
45
• VM
• Ubuntu 14.04 LTS
• Apache Cassandra Open Source distribution
• 32GB of RAM
• 16 CPUs
• 3 disks: system, commit logs, data
Hardware
46
• Cisco UCS B200 M3
• not very expensive
• Disks
• Initially 10K SAS disks
• now hybrid array (accelerated SSD)
• reads off SSD (75/25)
• boot time
• maintenance ops
• Cassandra CPU and RAM intensive.
• No need to get crazy on disks initially
• C* really runs well on non-SSD
Network
47
• 1G and 10G lines (currently switching all to 10G)
• Cassandra chatty but performs well in high latency
environs
• network usage is pretty much constant
• 25 Mb/s in between DC:
• default C* 2.1 outbound throttle
• Increase when streaming node is needed
• Permanent VPN in between DC (no C* SSL)
Network
48
ultimately an API for everything and everywhere
49
50
C* W
iland ReST API
iland core platform iland core platform
iland ReST API
C* R C* RC* W
C* R only deployed in: Dallas, TX - London, UK - Singapore
51
Lessons learned
Tuning Cassandra node: JVM
52
• Java 8
• MAX_HEAP_SIZE=“8G”
• HEAP_NEWSIZE=“2G”
• Still using CMS but eager to switch to G1 w/ latest
Cassandra version.
• no magic bullet
• test and monitor
• 2.0.x to 2.1.x: had to revisit drastically
Tuning Cassandra node: some config opts
53
• concurrent_writes / concurrent_reads
• nodetool tpstats
• concurrent_compactors
• nodetool compactionstats
• ++
• auto_snapshot
• batch_size_warn_threshold_in_kb
• monitor
• no magic bullet
• test and monitor
Minimize C* reads (with Redis in our case)
54
• writes are great / reads are good
• application level optimizations
• 16G of cached data in every DC
• very little in Redis. Bindings and alerts
• in-memory only (no save on disk)
Migration
55
• went live with 2.1.1 because of UDT
• suggest waiting for at least 5 or 6 dot releases
• 2.0.x / 2.1.x
• have to re-tune the whole cluster
• new features can be an issue initially (drivers)
• Python driver very slow for data migration
Don’t’s
56
• secondary indexes (or make sure you know what you’re doing)
• IN operator
• don’t forget TTL
• no easy way around range deletes
• complex “relational” type of models
Do’s
57
• design simple data model
• queries driven data model
• writes are cheap: duplicate data to accommodate queries
• prepared statements
• batches
• minimize reads from C*
• UDT
#pain
58
• bootstrapping new DC
• streaming very hard to complete OK w/ 2.0
• temp node tuning during streaming
• Cassandra 2.2 should help with bootstrap resume
• repairs
• very long and costly op
• incremental repairs broken until late 2.1.x
59
future platform extensions
Issue with in-app server aggregations and rollups
60
• JEE container works great but…
• lack of traceability / monitoring around jobs
• separation of concerns
• need to minimize reads against Cassandra
• in-memory computation
• code base growing fast (200k+ Java loc)
Spark for aggregations and rollups
61
• tackling issues in previous slides
• multiple new use cases:
• for instance, heavy throughput data for network
analysis
• machine learning
• Kafka & Spark Streaming
• currently experimenting
Multiple Keyspaces
62
• compliance / data isolation
• lower network traffic
Thank you

Más contenido relacionado

La actualidad más candente

Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsBeginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsDataStax Academy
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresDataStax Academy
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation FactoryNathan Milford
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 

La actualidad más candente (20)

Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsBeginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 

Similar a Leveraging Cassandra for real-time multi-datacenter public cloud analytics

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster inwin stack
 
"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009
"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009
"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009eLiberatica
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
Containers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. KubernetesContainers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. KubernetesDmitry Lazarenko
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCMHow to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCMAnant Corporation
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StoryBrian Cline
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverviewDimas Prasetyo
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Cédrick Lunven
 

Similar a Leveraging Cassandra for real-time multi-datacenter public cloud analytics (20)

Devops kc
Devops kcDevops kc
Devops kc
 
Server 2016 sneak peek
Server 2016 sneak peekServer 2016 sneak peek
Server 2016 sneak peek
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Accelerated SDN in Azure
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in Azure
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009
"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009
"Clouds on the Horizon Get Ready for Drizzle" by David Axmark @ eLiberatica 2009
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Containers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. KubernetesContainers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. Kubernetes
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCMHow to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer Story
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
 

Último

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Último (20)

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Leveraging Cassandra for real-time multi-datacenter public cloud analytics

  • 1. Leveraging C* for real-time multi-dc public cloud analytics Julien Anguenot VP Software Engineering @anguenot
  • 2. 1 iland cloud story & use case 2 data & domain constraints 3 deployment, hardware, configuration and architecture overview 4 lessons learned 5 future platform extensions
  • 3. 3 iland cloud story & use case
  • 4. Who are we? • public, private, DRaaS, BaaS cloud provider • Cisco CMSP • VMware Vspp for 7+ years • 20+ years in business • HQ in Houston, TX • http://www.iland.com 4
  • 5. Yet another cloud provider? Well, … 5 • performance and stability • custom SLA • compliance • security • DRaaS • global datacenter footprint: US, UK and Singapore • dedicated support staff! • iland cloud platform, Web management console and API
  • 6. The iland cloud platform 6
  • 7. iland cloud platform essentially • data warehouse running across multiple data-centers • monitoring (resource consumption / performance) • billing (customers and internal use) • alerting • predictive analytics • cloud management • cloud services (backups, DR, etc.) • desktop and mobile management consoles • API • Cassandra powered! 7
  • 8. The iland cloud Web management console 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. So, why did we do all this? • Initial motivations (v1) • vendor software (VMware vCloud Director) lacking: • performance analytics (real-time and historical) • billing • alerts • cross datacenter visibility • more private cloud type transparency • abstract ourselves from vendors and integrate an umbrella of heterogeneous services • modern UX and good looking UI 18
  • 19. 19 data and domain constraints
  • 20. Constraints 20 • write latency • high throughput • precision (used for billing) • availability • multi-data center • scalability: tens of thousands of VMs • agent-less • pull/poll vs push • high latency environs (multi-dc)
  • 21. Pipeline 21 • collection of real-time data • store • aggregation • correlation • rollups (historical) • processing • alerting • billing • reporting • querying
  • 22. Real-time collected perf counters 22 • 20 seconds samples • compute, storage, network • 15+ perf counters collected • ~50 data points per minute and per VM • time series • (timestamp, value) • metadata • unit • interval • etc. • 1 year TTL
  • 23. VM CPU 20 seconds perf counters 23 Group Name Type CPU USAGE AVERAGE CPU USAGE_MHZ AVERAGE CPU READY SUMMATION
  • 24. VM memory 20 seconds perf counters 24 Group Name Type MEM ACTIVE AVERAGE MEM CONSUMED AVERAGE MEM VM_MEM_CTRL SUMMATION
  • 25. VM network 20 seconds perf counters 25 Group Name Type NET RECEIVED AVERAGE NET TRANSMITTED AVERAGE NET USAGE AVERAGE
  • 26. VM disk 20 seconds perf counters 26 Group Name Type DISK READ AVERAGE DISK WRITE AVERAGE DISK MAX_TOTAL_LATENCY LATEST DISK USAGE AVERAGE DISK PROVISIONED LATEST DISK USED LATEST DISK NUMBER_WRITE_AVERAGED AVERAGE DISK NUMBER_READ_AVERAGED AVERAGE
  • 27. More counters collected for 3rd party services 27
  • 28. VM to time serie bindings 28 • binding on VM UUID • serie UUID • <VM_UUID>:disk:numberReadAveraged:average • Simple, fast and easy to construct at application level.
  • 29. 29
  • 30. 30
  • 31. VM containment and aggregation of real-time samples 31 • what’s this? • resource pool / vs instance-based $$ • 20 seconds samples aggregated from VM to VDC top level • separated tables
  • 32. Historical rollups and intervals 32 • VM, VAPP, VDC, ORG and network • 1 minute (TTL = 1 year) • 1 hour (used for billing) • 1 day • 1 week • 1 month • separated tables • new performance counter types created • TTL > 3 years for 1h samples for compliance & billing reasons • application level responsibilities
  • 33. 1 minute rollups processing 33 • processed to trigger alerts (usage, billing) • processed to compute real-time billing
  • 34. 1 hour rollups processing 34 • processed for final billing computation • leveraging salesforce.com collected data
  • 35. Data sources essentially 35 • compute • storage • network • Management • users • cloud configuration • salesforce.com • 3rd party services: backups, DR, etc. • pluggable: add / upgrade / remove services
  • 36. Cassandra is the sole record keeper 36
  • 38. iland cloud platform foundation 38 • Cisco UCS • VMware ESXi • VMware vSphere (management) • our Cassandra cluster runs on the exact same base foundation as our customer public clouds.
  • 39. 39 Simplified architecture (each DC) HAProxy Apache KeyCloak Wildfly AS Postgres Wildfly AS Resteasy API Wildfly AS cluster Apache Lucene NFS Apache Cassandra Compute Storage Network + 3rd parties Salesforce iland cloud Cassandra ring API AngularJS / API Redis Sentinel AMQP syslog-ng
  • 40. Cassandra version history 40 • late 2014: 2.1.x • early 2014: 2.0.x w/ Java CQL driver • late 2013: 2.0 beta w/ Astanyax (CQL3) (v1) • empty cluster • early 2013: 1.2.x w/ Astanyax (initial proto)
  • 41. iland’s cassandra cluster overall 41 • 6 datacenters • 1 (one) keyspace • 27 nodes • 1.5 to 2TB per node (TTL)
  • 43. Each DC 43 • 1 or 2 C* rack(s) of 3 Cassandra nodes • endpoint_snitch: RackInferringSnitch • RF=3
  • 44. 44
  • 45. Each node 45 • VM • Ubuntu 14.04 LTS • Apache Cassandra Open Source distribution • 32GB of RAM • 16 CPUs • 3 disks: system, commit logs, data
  • 46. Hardware 46 • Cisco UCS B200 M3 • not very expensive • Disks • Initially 10K SAS disks • now hybrid array (accelerated SSD) • reads off SSD (75/25) • boot time • maintenance ops • Cassandra CPU and RAM intensive. • No need to get crazy on disks initially • C* really runs well on non-SSD
  • 47. Network 47 • 1G and 10G lines (currently switching all to 10G) • Cassandra chatty but performs well in high latency environs • network usage is pretty much constant • 25 Mb/s in between DC: • default C* 2.1 outbound throttle • Increase when streaming node is needed • Permanent VPN in between DC (no C* SSL)
  • 49. ultimately an API for everything and everywhere 49
  • 50. 50 C* W iland ReST API iland core platform iland core platform iland ReST API C* R C* RC* W C* R only deployed in: Dallas, TX - London, UK - Singapore
  • 52. Tuning Cassandra node: JVM 52 • Java 8 • MAX_HEAP_SIZE=“8G” • HEAP_NEWSIZE=“2G” • Still using CMS but eager to switch to G1 w/ latest Cassandra version. • no magic bullet • test and monitor • 2.0.x to 2.1.x: had to revisit drastically
  • 53. Tuning Cassandra node: some config opts 53 • concurrent_writes / concurrent_reads • nodetool tpstats • concurrent_compactors • nodetool compactionstats • ++ • auto_snapshot • batch_size_warn_threshold_in_kb • monitor • no magic bullet • test and monitor
  • 54. Minimize C* reads (with Redis in our case) 54 • writes are great / reads are good • application level optimizations • 16G of cached data in every DC • very little in Redis. Bindings and alerts • in-memory only (no save on disk)
  • 55. Migration 55 • went live with 2.1.1 because of UDT • suggest waiting for at least 5 or 6 dot releases • 2.0.x / 2.1.x • have to re-tune the whole cluster • new features can be an issue initially (drivers) • Python driver very slow for data migration
  • 56. Don’t’s 56 • secondary indexes (or make sure you know what you’re doing) • IN operator • don’t forget TTL • no easy way around range deletes • complex “relational” type of models
  • 57. Do’s 57 • design simple data model • queries driven data model • writes are cheap: duplicate data to accommodate queries • prepared statements • batches • minimize reads from C* • UDT
  • 58. #pain 58 • bootstrapping new DC • streaming very hard to complete OK w/ 2.0 • temp node tuning during streaming • Cassandra 2.2 should help with bootstrap resume • repairs • very long and costly op • incremental repairs broken until late 2.1.x
  • 60. Issue with in-app server aggregations and rollups 60 • JEE container works great but… • lack of traceability / monitoring around jobs • separation of concerns • need to minimize reads against Cassandra • in-memory computation • code base growing fast (200k+ Java loc)
  • 61. Spark for aggregations and rollups 61 • tackling issues in previous slides • multiple new use cases: • for instance, heavy throughput data for network analysis • machine learning • Kafka & Spark Streaming • currently experimenting
  • 62. Multiple Keyspaces 62 • compliance / data isolation • lower network traffic