SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
©2013 DataStax Confidential. Do not distribute without consent.
Benjamin Coverston
DSE Architect, DataStax Inc.
NoSQL, Big Data, and Real Time
1
Monday, September 2, 13
Who am I?
• Ben Coverston
• DSE Architect
• DataStax since 2010
• Previous Experience in the Travel Industry
• Low Cost Airlines / Web Reservations
• Past: HP / Accenture
• Lived in Santa Catarina for a few years.
Monday, September 2, 13
What is it?
NoSql
Monday, September 2, 13
Monday, September 2, 13
What is NoSQL?
NoSQL is a term coined by Carlo Strozzi and
repurposed by Eric Evans to refer to “some”
storage systems. The NoSQL term should be used
as in the Not-Only-SQL and not as No to SQL or
Never SQL.
-- Alex Popescu
Monday, September 2, 13
What is NoSQL (Cont.)
• It’s not
• No to SQL
• About performance
• About scaling
• ACID
• Eventual consistency
• Volume
• It is:
• About choice
Monday, September 2, 13
Diversity in Data
• Big Data has the 3 (or 4 or 5) V’s
• Volume
• Variety
• Velocity
• Variability (sometimes)
• Value (other times)
Monday, September 2, 13
Diversity in Data
• The V’s don’t cover everything
• Availabilty is important
• Your use case is important too
Monday, September 2, 13
Is NoSQL Big Data?
• You can store Big Data with an RDBMS
• Is it easy?
• Is it cost effective?
• What kid of compromises do you have to make?
Monday, September 2, 13
The Problems
• In general there are two classes of data problems
• OLTP (Real-Time)
• Analytics (Batch)
• Usually you want both
• No solution is perfect for everyone
• Popularity is no indication of fitness
Monday, September 2, 13
Use Cases
• OLTP
• Low Latency
• High Throughput
• LOB Applications
• Batch
• Predictive Models
• Complex Queries
• Tomorrow (or precalculated, but now we need OLTP)
Monday, September 2, 13
Where to put your ‘Stuff’
Sharded RDBMS MPP -- Greenplum, Teradata Hadoop Key/Value
Columnar
Other
Monday, September 2, 13
Why not just one?
• Analytics
• Optimize serial IO
• Limitations in Storage
• OLTP
• Working Set
• Distribution
• Availability
• Storage Medium
Monday, September 2, 13
Why Do We Need Something Else?
• ACID semantics are often
overkill
• ACID also makes the database
layer brittle
• This means you get less
Availability (CAP Theorem)
Monday, September 2, 13
The Application Stack
www.example.com
LB2LB1 LB3
ws3ws1 ws2 ws6ws4 ws5 ws9ws7 ws8
cache
1 2 3 4
DB#
Monday, September 2, 13
Sharding
• Storage Limitations
• Working Set
• So just make more!
Monday, September 2, 13
(“The&eBay&Architecture,”&Randy&Shoup&and&Dan&Pritche:)&
Monday, September 2, 13
But Sharding
• Is Painful
• Requires ‘something else’
• Most no-sql solutions auto-shard
• Sharding requires tradeoffs.
• Which means your application will need to change
Monday, September 2, 13
Monday, September 2, 13
Which should I choose?
• Analytics
• Hadoop (probably) if your data is big
• Spark, other (sometimes faster) solutions available now
• NoSql
• Let’s talk!
Monday, September 2, 13
Decisions are about tradeoffs, never a zero-sum game
Fast, Cheap, Good -- Choose Two
Monday, September 2, 13
CAP Theorem
• More of two, less of one
• Consistency
• Availability
• Partitioning
• You have to accept P
• That leaves C and A
Monday, September 2, 13
How To Scale Anything
• Partition By Function
• Split Horizontally
• Avoid Distributed Transactions
• Avoid Synchronous Coupling
• Virtualize Everywhere
• Cache Everything
Monday, September 2, 13
Partition By Function
• Don’t put everything in the same database
• Physical
• Pools of Machines
• Geographical Distribution
• Automatic sharding (look for this)
• Make sure it works!
• Virtual
• Logical Tables, Schema
• Not 100% necessary, but schema is nice
Monday, September 2, 13
Partitioning (cont.)
• Pros
• Isolate failure
• To a region
• To a service
• Simplify Failover
• Cons
• Your DB has to handle multi-region replication
• If you chose CP (CAP) you’re going to have a bad time
• AP systems do OK here (Cassandra, actually excels)
• “Relational” part of databases becomes complex
• Everything gets denormalized
Monday, September 2, 13
Split Horizontally
• Scaling Vertically is easy
• To a point, then it gets expensive.. fast..
• Easy if your system has no state to maintain
• Or if the states are known, and small
• Sharding over dependent fields complicates design
• Some things distribute themselves easily
• key/value stores
• Others not so much
• BTree indexes, foreign keys
• P2P architecture is helpful when splitting
• In other words, avoid masters
Monday, September 2, 13
Split Horizontally (cont)
• Pros
• Can be as fast or faster than traditional design
• Can scale up as long as you can afford more machines
• Scaling is easy if you avoid having masters
• Replication and failover don’t have to be special cases
• Cons
• Even logical pieces of your app are distributed over many machines
• example: your catalog is not all in one place
• Real time analytics is difficult, or slower
Monday, September 2, 13
Avoid Distributed Transactions
• Have you tried this?
• Hard to do right
• Paxos gives us some hope
• CAS in Cassandra 2.0 looks promising
• Even then, it’s not good for everything!
• MVCC works for many use cases
• Compensating Mechanisms
• Customer Service (Amazon, inventory)
Monday, September 2, 13
Avoid Distributed Transactions
• Pros
• Consistency in a distributed environment
• Cons
• Slow
• Overkill
• Did I say slow?
• We chose CP so we get less A
• What happens when they don’t succeed?
• Do we shut the whole thing down?
Monday, September 2, 13
Avoid Synchronous Coupling
• What?
• A or B can be down
• A can be down, B continues to work
• B can suffer, while A continues to work
• If your recommendation engine fails, your customers can still buy stuff!
• Master/Slave failover is a good example of synchronous coupling
• Master is down, slave needs to take over, but in the meantime.. what happens?
Monday, September 2, 13
Avoid Syncronous Coupling
• Pros
• Fewer shared dependencies means less failure
• Less failure means more total uptime
• For the whole
• Less coupling means that your application topology is more modular
• Introducing new, decoupled services is less risky
• Cons
• More duplication of your infrastructure
• e.g. now you have an application stack for each of your services.
Monday, September 2, 13
Avoid Synchronous Processing Flows
• AKA
• Blocking Sockets
• Serialized Processes
• Locking in General
• Do what is important FIRST
• Take their money
• Modify Inventory
• Other less important stuff can be queued
• Triggers
• Joins
• Stored Procedures
• Consistency Checks
Monday, September 2, 13
Avoid Synchronous Processing Flows
• Pros
• Critical operations will not block for nice to haves
• Easy monitoring of queues and assign priority to tasks
• Problem areas are easier to identify
• Cons
• Race conditions
• More up front development cost
Monday, September 2, 13
• DON’T
• Pick your database because it has a sexy API
• Pick your database because it worked for somebody else
• DO
• Pick a database that will fit with your use case
• Virtualize your data model
• Encourage manipulation of your logical models
• DO NOT force interaction with your database
• Good virtualization means that you can change your data store later...
• And most of your code will still work.
Virtualize Everything
Monday, September 2, 13
Virtualize Everything
• Virtualization isn’t just for the programmer
• Things fall apart
• Requests have to be re-routed
• Parts Replaced
• APIs change
• Good virtualization means you can make changes w/o impacting
availability
Monday, September 2, 13
Cache Appropriately
• You can’t cache everything
• But you can cache stuff that doesn’t change
• Or is expensive to retrieve
Monday, September 2, 13
Cache Appropriately
• Pros
• Cache is fast (compared to traditoinal RDBMS access)
• Can give you a performance buffer
• Cons
• Cache Coherence
• Cache Dependency
• Is it a SPOF?
• What if it all doesn’t fit?
Monday, September 2, 13
What about NoSQL?
• All of this applies
• Evaluate Products on their Strengths
• If easy things are easy
• The hard might be impossible
• Pick a something that makes the hard things possible
Monday, September 2, 13
What are the ‘easy’ things?
• Serialization Formats
• JSON/BSON
• Data Models
• HTTP/REST/JSON APIs
• NodeJS Drivers!
• etc.
Monday, September 2, 13
The ‘hard’ things
• Automatic Sharding
• Where does the data go?
• How do I find it?
• How do I add another?
• Multi DC
• Replication
• No SPOF
• Anti-Entropy
• Continuous Availability
• Upgrades
• Failure
• Etc.
Monday, September 2, 13
What should you use?
• Your decision
• Every database is not a fit for every problem.
Monday, September 2, 13
DataStax Enterprise
• DSE
• Cassandra (OLTP)
• Analytics
• Search
• The hard things are possible
• We’re making the easy things easier
Monday, September 2, 13
©2013 DataStax Confidential. Do not distribute without consent. 43
Monday, September 2, 13

Más contenido relacionado

La actualidad más candente

Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Derek Ashmore
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemFITC
 
DevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsDevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsRandy Shoup
 
Building Enterprise Grade Front-End Applications with JavaScript Frameworks
Building Enterprise Grade Front-End Applications with JavaScript FrameworksBuilding Enterprise Grade Front-End Applications with JavaScript Frameworks
Building Enterprise Grade Front-End Applications with JavaScript FrameworksFITC
 
Scaling on DigitalOcean
Scaling on DigitalOceanScaling on DigitalOcean
Scaling on DigitalOceandavid_e_worth
 
Getting 100B Metrics to Disk
Getting 100B Metrics to DiskGetting 100B Metrics to Disk
Getting 100B Metrics to Diskjthurman42
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)guest0f8e278
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsAchievers Tech
 
Integrating multiple CDNs at Etsy
Integrating multiple CDNs at EtsyIntegrating multiple CDNs at Etsy
Integrating multiple CDNs at EtsyLaurie Denness
 
Cloud hosting your ePortfolio
Cloud hosting your ePortfolioCloud hosting your ePortfolio
Cloud hosting your ePortfolioMahara Hui
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Stig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleStig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleDATAVERSITY
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuningJohn McCaffrey
 
An Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented ArchitectureAn Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented ArchitectureEric Saxby
 
Windycityrails page performance
Windycityrails page performanceWindycityrails page performance
Windycityrails page performanceJohn McCaffrey
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16Christian Berg
 
Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on RailsJohn McCaffrey
 

La actualidad más candente (20)

Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
 
DevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsDevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of Operations
 
Building Enterprise Grade Front-End Applications with JavaScript Frameworks
Building Enterprise Grade Front-End Applications with JavaScript FrameworksBuilding Enterprise Grade Front-End Applications with JavaScript Frameworks
Building Enterprise Grade Front-End Applications with JavaScript Frameworks
 
Scaling on DigitalOcean
Scaling on DigitalOceanScaling on DigitalOcean
Scaling on DigitalOcean
 
Getting 100B Metrics to Disk
Getting 100B Metrics to DiskGetting 100B Metrics to Disk
Getting 100B Metrics to Disk
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 
Integrating multiple CDNs at Etsy
Integrating multiple CDNs at EtsyIntegrating multiple CDNs at Etsy
Integrating multiple CDNs at Etsy
 
Cloud hosting your ePortfolio
Cloud hosting your ePortfolioCloud hosting your ePortfolio
Cloud hosting your ePortfolio
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Stig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleStig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at Scale
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 
Stackato v3
Stackato v3Stackato v3
Stackato v3
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
 
An Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented ArchitectureAn Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented Architecture
 
Windycityrails page performance
Windycityrails page performanceWindycityrails page performance
Windycityrails page performance
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on Rails
 

Similar a Qcon talk

To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?Greg Lindahl
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013Amazon Web Services
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profilingJon Haddad
 
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013Amazon Web Services
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planningasya999
 
SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.Pini Krisher
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Why you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentWhy you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentVoltDB
 
Cloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayZendCon
 
RailsAdmin - Overview and Best practices
RailsAdmin - Overview and Best practicesRailsAdmin - Overview and Best practices
RailsAdmin - Overview and Best practicesBenoit Bénézech
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Bridging the Developer and the Datacenter
Bridging the Developer and the DatacenterBridging the Developer and the Datacenter
Bridging the Developer and the Datacenterlurs83
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
 
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...Insight Technology, Inc.
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 

Similar a Qcon talk (20)

Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?
 
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
 
SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Why you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise EnvironmentWhy you really want SQL in a Real-Time Enterprise Environment
Why you really want SQL in a Real-Time Enterprise Environment
 
Cloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go Away
 
RailsAdmin - Overview and Best practices
RailsAdmin - Overview and Best practicesRailsAdmin - Overview and Best practices
RailsAdmin - Overview and Best practices
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Bridging the Developer and the Datacenter
Bridging the Developer and the DatacenterBridging the Developer and the Datacenter
Bridging the Developer and the Datacenter
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’t
 
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 

Último

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Último (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Qcon talk

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. Benjamin Coverston DSE Architect, DataStax Inc. NoSQL, Big Data, and Real Time 1 Monday, September 2, 13
  • 2. Who am I? • Ben Coverston • DSE Architect • DataStax since 2010 • Previous Experience in the Travel Industry • Low Cost Airlines / Web Reservations • Past: HP / Accenture • Lived in Santa Catarina for a few years. Monday, September 2, 13
  • 3. What is it? NoSql Monday, September 2, 13
  • 5. What is NoSQL? NoSQL is a term coined by Carlo Strozzi and repurposed by Eric Evans to refer to “some” storage systems. The NoSQL term should be used as in the Not-Only-SQL and not as No to SQL or Never SQL. -- Alex Popescu Monday, September 2, 13
  • 6. What is NoSQL (Cont.) • It’s not • No to SQL • About performance • About scaling • ACID • Eventual consistency • Volume • It is: • About choice Monday, September 2, 13
  • 7. Diversity in Data • Big Data has the 3 (or 4 or 5) V’s • Volume • Variety • Velocity • Variability (sometimes) • Value (other times) Monday, September 2, 13
  • 8. Diversity in Data • The V’s don’t cover everything • Availabilty is important • Your use case is important too Monday, September 2, 13
  • 9. Is NoSQL Big Data? • You can store Big Data with an RDBMS • Is it easy? • Is it cost effective? • What kid of compromises do you have to make? Monday, September 2, 13
  • 10. The Problems • In general there are two classes of data problems • OLTP (Real-Time) • Analytics (Batch) • Usually you want both • No solution is perfect for everyone • Popularity is no indication of fitness Monday, September 2, 13
  • 11. Use Cases • OLTP • Low Latency • High Throughput • LOB Applications • Batch • Predictive Models • Complex Queries • Tomorrow (or precalculated, but now we need OLTP) Monday, September 2, 13
  • 12. Where to put your ‘Stuff’ Sharded RDBMS MPP -- Greenplum, Teradata Hadoop Key/Value Columnar Other Monday, September 2, 13
  • 13. Why not just one? • Analytics • Optimize serial IO • Limitations in Storage • OLTP • Working Set • Distribution • Availability • Storage Medium Monday, September 2, 13
  • 14. Why Do We Need Something Else? • ACID semantics are often overkill • ACID also makes the database layer brittle • This means you get less Availability (CAP Theorem) Monday, September 2, 13
  • 15. The Application Stack www.example.com LB2LB1 LB3 ws3ws1 ws2 ws6ws4 ws5 ws9ws7 ws8 cache 1 2 3 4 DB# Monday, September 2, 13
  • 16. Sharding • Storage Limitations • Working Set • So just make more! Monday, September 2, 13
  • 18. But Sharding • Is Painful • Requires ‘something else’ • Most no-sql solutions auto-shard • Sharding requires tradeoffs. • Which means your application will need to change Monday, September 2, 13
  • 20. Which should I choose? • Analytics • Hadoop (probably) if your data is big • Spark, other (sometimes faster) solutions available now • NoSql • Let’s talk! Monday, September 2, 13
  • 21. Decisions are about tradeoffs, never a zero-sum game Fast, Cheap, Good -- Choose Two Monday, September 2, 13
  • 22. CAP Theorem • More of two, less of one • Consistency • Availability • Partitioning • You have to accept P • That leaves C and A Monday, September 2, 13
  • 23. How To Scale Anything • Partition By Function • Split Horizontally • Avoid Distributed Transactions • Avoid Synchronous Coupling • Virtualize Everywhere • Cache Everything Monday, September 2, 13
  • 24. Partition By Function • Don’t put everything in the same database • Physical • Pools of Machines • Geographical Distribution • Automatic sharding (look for this) • Make sure it works! • Virtual • Logical Tables, Schema • Not 100% necessary, but schema is nice Monday, September 2, 13
  • 25. Partitioning (cont.) • Pros • Isolate failure • To a region • To a service • Simplify Failover • Cons • Your DB has to handle multi-region replication • If you chose CP (CAP) you’re going to have a bad time • AP systems do OK here (Cassandra, actually excels) • “Relational” part of databases becomes complex • Everything gets denormalized Monday, September 2, 13
  • 26. Split Horizontally • Scaling Vertically is easy • To a point, then it gets expensive.. fast.. • Easy if your system has no state to maintain • Or if the states are known, and small • Sharding over dependent fields complicates design • Some things distribute themselves easily • key/value stores • Others not so much • BTree indexes, foreign keys • P2P architecture is helpful when splitting • In other words, avoid masters Monday, September 2, 13
  • 27. Split Horizontally (cont) • Pros • Can be as fast or faster than traditional design • Can scale up as long as you can afford more machines • Scaling is easy if you avoid having masters • Replication and failover don’t have to be special cases • Cons • Even logical pieces of your app are distributed over many machines • example: your catalog is not all in one place • Real time analytics is difficult, or slower Monday, September 2, 13
  • 28. Avoid Distributed Transactions • Have you tried this? • Hard to do right • Paxos gives us some hope • CAS in Cassandra 2.0 looks promising • Even then, it’s not good for everything! • MVCC works for many use cases • Compensating Mechanisms • Customer Service (Amazon, inventory) Monday, September 2, 13
  • 29. Avoid Distributed Transactions • Pros • Consistency in a distributed environment • Cons • Slow • Overkill • Did I say slow? • We chose CP so we get less A • What happens when they don’t succeed? • Do we shut the whole thing down? Monday, September 2, 13
  • 30. Avoid Synchronous Coupling • What? • A or B can be down • A can be down, B continues to work • B can suffer, while A continues to work • If your recommendation engine fails, your customers can still buy stuff! • Master/Slave failover is a good example of synchronous coupling • Master is down, slave needs to take over, but in the meantime.. what happens? Monday, September 2, 13
  • 31. Avoid Syncronous Coupling • Pros • Fewer shared dependencies means less failure • Less failure means more total uptime • For the whole • Less coupling means that your application topology is more modular • Introducing new, decoupled services is less risky • Cons • More duplication of your infrastructure • e.g. now you have an application stack for each of your services. Monday, September 2, 13
  • 32. Avoid Synchronous Processing Flows • AKA • Blocking Sockets • Serialized Processes • Locking in General • Do what is important FIRST • Take their money • Modify Inventory • Other less important stuff can be queued • Triggers • Joins • Stored Procedures • Consistency Checks Monday, September 2, 13
  • 33. Avoid Synchronous Processing Flows • Pros • Critical operations will not block for nice to haves • Easy monitoring of queues and assign priority to tasks • Problem areas are easier to identify • Cons • Race conditions • More up front development cost Monday, September 2, 13
  • 34. • DON’T • Pick your database because it has a sexy API • Pick your database because it worked for somebody else • DO • Pick a database that will fit with your use case • Virtualize your data model • Encourage manipulation of your logical models • DO NOT force interaction with your database • Good virtualization means that you can change your data store later... • And most of your code will still work. Virtualize Everything Monday, September 2, 13
  • 35. Virtualize Everything • Virtualization isn’t just for the programmer • Things fall apart • Requests have to be re-routed • Parts Replaced • APIs change • Good virtualization means you can make changes w/o impacting availability Monday, September 2, 13
  • 36. Cache Appropriately • You can’t cache everything • But you can cache stuff that doesn’t change • Or is expensive to retrieve Monday, September 2, 13
  • 37. Cache Appropriately • Pros • Cache is fast (compared to traditoinal RDBMS access) • Can give you a performance buffer • Cons • Cache Coherence • Cache Dependency • Is it a SPOF? • What if it all doesn’t fit? Monday, September 2, 13
  • 38. What about NoSQL? • All of this applies • Evaluate Products on their Strengths • If easy things are easy • The hard might be impossible • Pick a something that makes the hard things possible Monday, September 2, 13
  • 39. What are the ‘easy’ things? • Serialization Formats • JSON/BSON • Data Models • HTTP/REST/JSON APIs • NodeJS Drivers! • etc. Monday, September 2, 13
  • 40. The ‘hard’ things • Automatic Sharding • Where does the data go? • How do I find it? • How do I add another? • Multi DC • Replication • No SPOF • Anti-Entropy • Continuous Availability • Upgrades • Failure • Etc. Monday, September 2, 13
  • 41. What should you use? • Your decision • Every database is not a fit for every problem. Monday, September 2, 13
  • 42. DataStax Enterprise • DSE • Cassandra (OLTP) • Analytics • Search • The hard things are possible • We’re making the easy things easier Monday, September 2, 13
  • 43. ©2013 DataStax Confidential. Do not distribute without consent. 43 Monday, September 2, 13