SlideShare una empresa de Scribd logo
1 de 95
Descargar para leer sin conexión
Specialties / Focus Areas / Passions:
• Performance Tuning & Troubleshooting
• Very Large Databases
• SQL Server Storage Engine
• HA/DR
• Cloud
@sqlbob
bob@bobpusateri.com
heraflux.com
bobpusateri
We were developing an IoT system
…Which needed to ingest data from thousands/millions of devices
… and that data needed to be queried within seconds?
We were building an e-commerce site
Which needed guaranteed performance and availability
… anywhere on Earth
… and needed to be able to scale up/down in response to conditions?
A globally distributed, massively scalable, multi-model database service
A globally distributed, massively scalable, multi-model database service
A globally distributed, massively scalable, multi-model database service
A globally distributed, massively scalable, multi-model database service
Key-Value GraphColumn-Family Document
A globally distributed, massively scalable, multi-model database service
Has multiple APIs as well
Table API
MongoDB
A database service featuring an engine built to excel at several things, but
especially:
Partitioning
Replication
It’s a NoSQL offering!
3 DBAS WALKED INTO
A NOSQL BAR….
A WHILE LATER THEY
WALKED OUT BECAUSE THEY
COULDN’T FIND A TABLE
• I often hear NoSQL == No Schema == No Design
 Not True
• GENERALLY NoSQL schemas
 Do Exist
 Are somewhat enforced by the database
 Are fully enforced by the application
• There are still design decisions that need to happen early on
 (And if they’re wrong you will pay for it later)
• Microsoft started having problems with internal large scale apps
 2010 – “Project Florence”
 2014 – Azure DocumentDB
 2017 – Azure Cosmos DB
• MS leverages this internally
• Designed for the cloud
• One of the fastest-growing
services on Azure
https://tinyurl.com/ycmhp6kd
• We’re developing an internal app for a global company
• Thousands of users reading/updating data
• How would we architect this?
• Cosmos DB is a Foundational Azure service
• Put your data where the users are
• Replication between regions is automatic
• Multi-homing APIs
 Clients automatically connect to the nearest region
• Adding or removing regions? No code changes!
• Manual or automatic failovers
• Designed from the ground up for HA
• Both storage and throughput can be scaled transparently
• A single machine is never a bottleneck
• Collections can scale from GB to PB across many machines and regions
• Requests are served from the nearest region
• Database engine optimized for writes, latch-free
• Indexing is synchronous and automatic
• Single-digit millisecond latency at 99th Percentile
Reads (1KB) Indexed Writes (1KB)
50th Percentile < 2ms < 6ms
99th Percentile < 10ms < 15ms
• 99.99% availability when in a single region
• 99.999% availability in multiple regions
• Highly-redundant storage architecture
• Automatic or manual failover
• All data is encrypted, period.
• In transit and at rest
• Two types of keys:
• Master Keys
 Administrative
 Grant access to the entire account (not granular)
 Read-write and Read-only
• Resource Tokens
• Used for application resources (Containers, docs, SPs, Triggers, UDF, etc.)
 Kinda like SQL Permissions
• Tokens are specific to {user, resource, permission}
• Tokens are time-sensitive (default 1 hour, max 5 hours)
• Resource Tokens
DIRECT ACCESS
Server
Object
Instance
Database
Account
Item
Database
Container
Account
Item
Database
Container
Account
Item
Database
Container
• Containers
• Users
• Permissions
Account
Item
Database
Container
• Data Model
• Document (Collection)
• Graph
• Key-Value
• Column-Value
• Throughput
Account
Item
Database
Container
• Data Model
• Document (Collection)
• Graph
• Key-Value
• Column-Value
• Throughput
ATOM RECORD SEQUENCE (ARS) SYSTEM
Atoms = primitives (string, bool, etc)
Records = structs of atoms
Sequences = arrays of {atom, record, sequence}
Cosmos DB translates & projects all data models
into an ARS model
Account
Item
Database
Container
Account
Item
Database
Container
• Depends on data model
• Document
• Node/Edge
• Row/Item
• Stored Procedures
• Triggers
• UDFs
Image: Microsoft
• RU is the rate-based currency of Cosmos DB
• Represents a combination of CPU, Memory, and IO
• 1 RU = 1 read of 1KB
• Every request is assigned a “cost” in RUs
 Reads, writes, stored procedures, etc
• Provisioned in units of RU/second
• Can be changed at any time; metered hourly
• Exceeding your RU budget = rate limiting
• When quiescent, background tasks run
 Index Maintenance
 TTL Expiration
Min RU/sec
Max RU/sec
RequestRate
Rate
Limiting
No
Limiting
Replica
Quiescent
• Define boundary values between partitions
• Map partitions to physical locations (filegroups)
• Similar values generally in the same partition
 Can lead to “hot” partitions
 Especially if on dates
• Partition management is manual
• Hard Limit: 15.000 partitions per table
• There are no “ranges”, every partition key is hashed
• Logical partitions (keys) are spread across physical partitions
• Partition management is automatic!
• No limit on number of partitions
• Hard limit: 10GB max of data per partition key
• The most important design decision in Cosmos DB
• Has a direct effect on
 How well it will scale
 How much you will pay
• Think through partitioning during the design phase, it’s easier!
Partition Key: User ID
Cosmos DB Container
Partition Key: User ID
hash(User ID)
Pseudo-random data distribution of hash values
hash(User ID)
Chase
Ryder
Tracker
Cap’n Turbot
Everest
Skye
Rocky
Rubble
Zuma
Marshall
Robo-Dog
hash(User ID)
Chase
Ryder
Tracker
Cap’n Turbot
Everest
Skye
Rocky
Rubble
Zuma
Marshall
Robo-Dog
Physical
Partition
Logical
Partition
hash(User ID)
Chase
Ryder
Tracker
Cap’n Turbot
Everest
Skye
Rocky
Rubble
Zuma
Marshall
Robo-Dog
What happens when it needs to grow?
Tracker
Cap’n Turbot
Skye
Rocky
Robo-Dog
hash(User ID)
Tracker
Cap’n Turbot
Robo-Dog
Skye
Rocky+
Partitions can be dynamically subdivided
to grow the database without affecting
availability
This is done automatically.
• Plan to distribute both request and storage volume
 Remember the 10GB limit
 Adding dates after partition values can help with this
• For greatest efficiency, queries should eliminate partitions
• Queries can be routed/filtered via partition key
• “Fan-Out” is something to try to avoid where possible
• Understand your workload!
• Understand the most frequent/expensive queries
• Understand insert vs update ratios
• Remember partition keys are logical!
 Don’t be afraid of having too many
 More key values = better scalability
• This is huge because we have multiple replicas
• If a change is replicated, what is seen elsewhere?
• Why replicate, anyway?
 HA – multiple copies for failover
 Speed!
• Bring the data closer to the user
• “cheat” the speed of light!
Image: Microsoft
North Central US
UK South
Japan East
• Relational Databases: ACID
 Atomic
 Consistent
 Independent
 Durable
• Distributed Systems: Brewer’s CAP Theorem
 Consistency
 Availability
 Partition tolerance
 (pick two)
• Problem: There’s no consistent definition of “consistent” in CS
• Transactions (ACID)
 Each transaction moves from a single valid state to another
• Replication (CAP)
 Getting a consistent view across replicated copies of data
 And CAP doesn’t even cover all cases….
• An extension of the CAP Theorem
• Partitioning: Availability vs. Consistency ELSE Latency vs. Consistency
• When partitioning a distributed system you have to choose between
availability and consistency, but also when not partitioning one must
choose between latency and consistency.
• Reader is far away from writer
• Value gets updated by writer
• Should the reader:
 See the old value? (prioritize latency)
 See the same result as the master?
 Wait for the new value (prioritize consistency)
• I love consistency models
• I also love isolation levels
• Azure Cosmos DB has 5 of them
• You can choose what gets prioritized
• Can be overridden on a per-request basis
Bounded
Staleness
Strong Consistent
Prefix
Session Eventual
• Linearizability guarantee: reads will always return the most recent version
of an item
• (Like SERIALIZABLE [maybe?])
• Writes are only visible after committed by a majority quorum of replicas
• If using this model, you are limited to a single Azure region
Bounded
Staleness
Strong Consistent
Prefix
Session Eventual
• Guarantees that “absence of any further writes, replicas will eventually
converge”
• No guarantee of order
 Client may get “new” values older than ones it had previously seen
• Lowest latency for reads and writes
 …but it’s fast!
Bounded
Staleness
Strong Consistent
Prefix
Session Eventual
• Guarantees that readers will always see writes in order
Bounded
Staleness
Strong Consistent
Prefix
Session Eventual
• Scoped to a client session
 There’s a session key that is passed around
• Provides predictable consistency within a session
 Monotonic reads & writes
 Guarantee that you can read your own writes immediately
• Great predictability for your session, good performance for everyone else
Bounded
Staleness
Strong Consistent
Prefix
Session Eventual
• Define a “window” of staleness in terms of # revisions or time
• If a replica gets too far behind (is outside the “window”)
 Cosmos DB will prioritize consistency over all else
 May even rate limit writes until stale replica catches up
Bounded
Staleness
Strong Consistent
Prefix
Session Eventual
• What if you’re not doing geo-replication? Does this matter?
• Yes it does!
• Even in local regions there are still 4 replicas of your database
1
2 3
4
• Yeah, about that….
• Schema-agnostic
• Automatic
 Every property of every record is indexed by default
 No latches involved (remember it’s highly write-optimized)
• Customizable
 You can define what is indexed (and save space)
{ "cars": [
{ "make": "Hyundai", "model": "Santa Fe" },
{ "make": "Subaru", "model": "Forester", “plate": "T SQL" }
],
"city": "Chicago"
}
city
Chicago0 1
make model make model license
cars
Hyundai
Santa
Fe
Subaru Forester T SQL
{ "cars": [
{ "make": "Tesla", "model": "X" }
],
"city": “Oslo"
}
city
Oslo0
make model
cars
Tesla X
city
Chicago0 1
make model make model license
cars
Hyundai
Santa
Fe
Subaru Forester T SQL
Oslo
{1,2}
{1,2}{1,2}
{1} {2}{1,2} {1}
Tesla X
{1,2} {1,2}
{1} {1}{2} {2} {1} {1} {1}
{1} {1}{1}
Term Postings
$/cars/0 1,2
$/cars/0/make 1,2
$/cars/0/model 1,2
$/cars/1 1
……
• Remember what I said about indexes?
• Backups are automatic
• Snapshots taken and stored separately in Azure Blob Storage
• For speed, it’s written to same region as current Cosmos DB write region
• For safety, it’s replicated to another region as well
• Taken every 4 hours
• Only the last 2 snapshots are retained
• “If the data is accidentally dropped or corrupted, contact Azure support
within eight hours.”
• You can maintain your own backups
 Azure Cosmos DB Data Migration Tool “export to JSON” option
• If you delete a container/database, backups retained for 30 days
https://docs.microsoft.com/en-us/azure/cosmos-db/use-cases
• You Pay For:
 Storage ($0.25 per GB/month)
 Throughput ($0.008 per 100 RUs/hour)
 Data Transfer for geo-replication (varies by region)
 North Central US: $0.087 per GB
• Check Azure Portal for most current pricing info
• https://azure.microsoft.com/en-us/pricing/details/cosmos-db/
• There’s a Cosmos DB Emulator!
• Run locally on your machine for free
• https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator
• There’s a Cosmos DB Emulator!
• Run locally on your machine for free
• https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator
106
@sqlbob
bob@bobpusateri.com
bobpusateri.com
bobpusateri

Más contenido relacionado

La actualidad más candente

Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Sharma Podila
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleHow LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming WorldBen Stopford
 
6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup I6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup IMathieu Chauvin
 
Load balancing theory and practice
Load balancing theory and practiceLoad balancing theory and practice
Load balancing theory and practiceFoundationDB
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Ben Stopford
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databasesiammutex
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarSijie Guo
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistencyseldo
 

La actualidad más candente (20)

Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleHow LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup I6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup I
 
Load balancing theory and practice
Load balancing theory and practiceLoad balancing theory and practice
Load balancing theory and practice
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databases
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 

Similar a Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server User Group)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB OverviewAndrew Liu
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remanijaxconf
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Dharma Shukla
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...J On The Beach
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBJan Hentschel
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 

Similar a Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server User Group) (20)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
From 0 to syncing
From 0 to syncingFrom 0 to syncing
From 0 to syncing
 
Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDB
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 

Más de Bob Pusateri

Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...Bob Pusateri
 
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...Bob Pusateri
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...Bob Pusateri
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...Bob Pusateri
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...Bob Pusateri
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...Bob Pusateri
 

Más de Bob Pusateri (7)

Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (New England SQ...
 
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
Supercharging Backups and Restores (For Fun and Profit!) (SQL Saturday Boston...
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASSDC User Gr...
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (PASS DBA Virtu...
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (Chicago Suburb...
 
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...
Locks, Blocks, and Snapshots: Maximizing Database Concurrency (SQL Saturday M...
 

Último

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server User Group)

  • 1.
  • 2. Specialties / Focus Areas / Passions: • Performance Tuning & Troubleshooting • Very Large Databases • SQL Server Storage Engine • HA/DR • Cloud @sqlbob bob@bobpusateri.com heraflux.com bobpusateri
  • 3.
  • 4. We were developing an IoT system …Which needed to ingest data from thousands/millions of devices … and that data needed to be queried within seconds?
  • 5. We were building an e-commerce site Which needed guaranteed performance and availability … anywhere on Earth … and needed to be able to scale up/down in response to conditions?
  • 6.
  • 7.
  • 8. A globally distributed, massively scalable, multi-model database service
  • 9. A globally distributed, massively scalable, multi-model database service
  • 10. A globally distributed, massively scalable, multi-model database service
  • 11. A globally distributed, massively scalable, multi-model database service Key-Value GraphColumn-Family Document
  • 12. A globally distributed, massively scalable, multi-model database service Has multiple APIs as well Table API MongoDB
  • 13. A database service featuring an engine built to excel at several things, but especially: Partitioning Replication
  • 14. It’s a NoSQL offering! 3 DBAS WALKED INTO A NOSQL BAR…. A WHILE LATER THEY WALKED OUT BECAUSE THEY COULDN’T FIND A TABLE
  • 15. • I often hear NoSQL == No Schema == No Design  Not True • GENERALLY NoSQL schemas  Do Exist  Are somewhat enforced by the database  Are fully enforced by the application • There are still design decisions that need to happen early on  (And if they’re wrong you will pay for it later)
  • 16. • Microsoft started having problems with internal large scale apps  2010 – “Project Florence”  2014 – Azure DocumentDB  2017 – Azure Cosmos DB • MS leverages this internally • Designed for the cloud • One of the fastest-growing services on Azure https://tinyurl.com/ycmhp6kd
  • 17. • We’re developing an internal app for a global company • Thousands of users reading/updating data • How would we architect this?
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. • Cosmos DB is a Foundational Azure service • Put your data where the users are • Replication between regions is automatic • Multi-homing APIs  Clients automatically connect to the nearest region • Adding or removing regions? No code changes! • Manual or automatic failovers • Designed from the ground up for HA
  • 24. • Both storage and throughput can be scaled transparently • A single machine is never a bottleneck • Collections can scale from GB to PB across many machines and regions
  • 25. • Requests are served from the nearest region • Database engine optimized for writes, latch-free • Indexing is synchronous and automatic • Single-digit millisecond latency at 99th Percentile Reads (1KB) Indexed Writes (1KB) 50th Percentile < 2ms < 6ms 99th Percentile < 10ms < 15ms
  • 26. • 99.99% availability when in a single region • 99.999% availability in multiple regions • Highly-redundant storage architecture • Automatic or manual failover
  • 27.
  • 28. • All data is encrypted, period. • In transit and at rest
  • 29. • Two types of keys: • Master Keys  Administrative  Grant access to the entire account (not granular)  Read-write and Read-only
  • 30. • Resource Tokens • Used for application resources (Containers, docs, SPs, Triggers, UDF, etc.)  Kinda like SQL Permissions • Tokens are specific to {user, resource, permission} • Tokens are time-sensitive (default 1 hour, max 5 hours)
  • 32.
  • 36. Account Item Database Container • Data Model • Document (Collection) • Graph • Key-Value • Column-Value • Throughput
  • 37. Account Item Database Container • Data Model • Document (Collection) • Graph • Key-Value • Column-Value • Throughput ATOM RECORD SEQUENCE (ARS) SYSTEM Atoms = primitives (string, bool, etc) Records = structs of atoms Sequences = arrays of {atom, record, sequence} Cosmos DB translates & projects all data models into an ARS model
  • 39. Account Item Database Container • Depends on data model • Document • Node/Edge • Row/Item • Stored Procedures • Triggers • UDFs
  • 41. • RU is the rate-based currency of Cosmos DB • Represents a combination of CPU, Memory, and IO • 1 RU = 1 read of 1KB • Every request is assigned a “cost” in RUs  Reads, writes, stored procedures, etc
  • 42. • Provisioned in units of RU/second • Can be changed at any time; metered hourly • Exceeding your RU budget = rate limiting • When quiescent, background tasks run  Index Maintenance  TTL Expiration Min RU/sec Max RU/sec RequestRate Rate Limiting No Limiting Replica Quiescent
  • 43.
  • 44. • Define boundary values between partitions • Map partitions to physical locations (filegroups) • Similar values generally in the same partition  Can lead to “hot” partitions  Especially if on dates • Partition management is manual • Hard Limit: 15.000 partitions per table
  • 45. • There are no “ranges”, every partition key is hashed • Logical partitions (keys) are spread across physical partitions • Partition management is automatic! • No limit on number of partitions • Hard limit: 10GB max of data per partition key
  • 46. • The most important design decision in Cosmos DB • Has a direct effect on  How well it will scale  How much you will pay • Think through partitioning during the design phase, it’s easier!
  • 47.
  • 48. Partition Key: User ID Cosmos DB Container
  • 49. Partition Key: User ID hash(User ID) Pseudo-random data distribution of hash values
  • 53. Tracker Cap’n Turbot Skye Rocky Robo-Dog hash(User ID) Tracker Cap’n Turbot Robo-Dog Skye Rocky+ Partitions can be dynamically subdivided to grow the database without affecting availability This is done automatically.
  • 54. • Plan to distribute both request and storage volume  Remember the 10GB limit  Adding dates after partition values can help with this • For greatest efficiency, queries should eliminate partitions • Queries can be routed/filtered via partition key • “Fan-Out” is something to try to avoid where possible
  • 55. • Understand your workload! • Understand the most frequent/expensive queries • Understand insert vs update ratios • Remember partition keys are logical!  Don’t be afraid of having too many  More key values = better scalability
  • 56.
  • 57. • This is huge because we have multiple replicas • If a change is replicated, what is seen elsewhere? • Why replicate, anyway?  HA – multiple copies for failover  Speed! • Bring the data closer to the user • “cheat” the speed of light!
  • 59.
  • 60. North Central US UK South Japan East
  • 61. • Relational Databases: ACID  Atomic  Consistent  Independent  Durable • Distributed Systems: Brewer’s CAP Theorem  Consistency  Availability  Partition tolerance  (pick two)
  • 62. • Problem: There’s no consistent definition of “consistent” in CS • Transactions (ACID)  Each transaction moves from a single valid state to another • Replication (CAP)  Getting a consistent view across replicated copies of data  And CAP doesn’t even cover all cases….
  • 63. • An extension of the CAP Theorem • Partitioning: Availability vs. Consistency ELSE Latency vs. Consistency • When partitioning a distributed system you have to choose between availability and consistency, but also when not partitioning one must choose between latency and consistency.
  • 64. • Reader is far away from writer • Value gets updated by writer • Should the reader:  See the old value? (prioritize latency)  See the same result as the master?  Wait for the new value (prioritize consistency)
  • 65. • I love consistency models • I also love isolation levels
  • 66. • Azure Cosmos DB has 5 of them • You can choose what gets prioritized • Can be overridden on a per-request basis Bounded Staleness Strong Consistent Prefix Session Eventual
  • 67. • Linearizability guarantee: reads will always return the most recent version of an item • (Like SERIALIZABLE [maybe?]) • Writes are only visible after committed by a majority quorum of replicas • If using this model, you are limited to a single Azure region Bounded Staleness Strong Consistent Prefix Session Eventual
  • 68. • Guarantees that “absence of any further writes, replicas will eventually converge” • No guarantee of order  Client may get “new” values older than ones it had previously seen • Lowest latency for reads and writes  …but it’s fast! Bounded Staleness Strong Consistent Prefix Session Eventual
  • 69. • Guarantees that readers will always see writes in order Bounded Staleness Strong Consistent Prefix Session Eventual
  • 70. • Scoped to a client session  There’s a session key that is passed around • Provides predictable consistency within a session  Monotonic reads & writes  Guarantee that you can read your own writes immediately • Great predictability for your session, good performance for everyone else Bounded Staleness Strong Consistent Prefix Session Eventual
  • 71. • Define a “window” of staleness in terms of # revisions or time • If a replica gets too far behind (is outside the “window”)  Cosmos DB will prioritize consistency over all else  May even rate limit writes until stale replica catches up Bounded Staleness Strong Consistent Prefix Session Eventual
  • 72.
  • 73. • What if you’re not doing geo-replication? Does this matter? • Yes it does! • Even in local regions there are still 4 replicas of your database 1 2 3 4
  • 74.
  • 75. • Yeah, about that….
  • 76.
  • 77.
  • 78.
  • 79.
  • 80. • Schema-agnostic • Automatic  Every property of every record is indexed by default  No latches involved (remember it’s highly write-optimized) • Customizable  You can define what is indexed (and save space)
  • 81. { "cars": [ { "make": "Hyundai", "model": "Santa Fe" }, { "make": "Subaru", "model": "Forester", “plate": "T SQL" } ], "city": "Chicago" } city Chicago0 1 make model make model license cars Hyundai Santa Fe Subaru Forester T SQL
  • 82. { "cars": [ { "make": "Tesla", "model": "X" } ], "city": “Oslo" } city Oslo0 make model cars Tesla X
  • 83. city Chicago0 1 make model make model license cars Hyundai Santa Fe Subaru Forester T SQL Oslo {1,2} {1,2}{1,2} {1} {2}{1,2} {1} Tesla X {1,2} {1,2} {1} {1}{2} {2} {1} {1} {1} {1} {1}{1} Term Postings $/cars/0 1,2 $/cars/0/make 1,2 $/cars/0/model 1,2 $/cars/1 1 ……
  • 84.
  • 85.
  • 86. • Remember what I said about indexes?
  • 87. • Backups are automatic • Snapshots taken and stored separately in Azure Blob Storage • For speed, it’s written to same region as current Cosmos DB write region • For safety, it’s replicated to another region as well
  • 88. • Taken every 4 hours • Only the last 2 snapshots are retained • “If the data is accidentally dropped or corrupted, contact Azure support within eight hours.” • You can maintain your own backups  Azure Cosmos DB Data Migration Tool “export to JSON” option • If you delete a container/database, backups retained for 30 days
  • 89.
  • 91.
  • 92. • You Pay For:  Storage ($0.25 per GB/month)  Throughput ($0.008 per 100 RUs/hour)  Data Transfer for geo-replication (varies by region)  North Central US: $0.087 per GB • Check Azure Portal for most current pricing info • https://azure.microsoft.com/en-us/pricing/details/cosmos-db/
  • 93. • There’s a Cosmos DB Emulator! • Run locally on your machine for free • https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator
  • 94. • There’s a Cosmos DB Emulator! • Run locally on your machine for free • https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator