Azure Cosmos DB

Microsoft Azure Data Services
Program
Microsoft Azure Cosmos DB
Mohamed Tawfik
Cloud Solutions Architect
Azure CoE EMEA

Cosmos DB is
designed for it
Welcome to
fast data
explosion

1 exabyte (EB) = 1,000,000,000,000,000,000
Cosmos DB is designed for big data growth

Azure Cosmos DB (How Customers Use It)
Operational database =
Analytics database =
Hot Updatable Data Lake =
Database for Serverless =
Database for AI =
Database for IoT/Time-series data =
Cloud-born database for modern apps

6
Challenges
 Relational databases can be challenging when you need to scale
out through different servers.
 There are also other challenges like storing JSON documents, or
using key‐values, or graphs structures, that do not fit well on
relational databases.
 To address these current challenges, NoSQL storage is a new
way of storing this type of data.

7
NoSQL Storage
All of the NoSQL storages share some common features:
 Simpler horizontal scale
 Flexibility on the data structure
 Most of them are BASE (Basic Availability, Soft‐state, Eventual consistency)
instead of ACID (Atomic, Consistent, Isolated, Durable)
 Schema‐free
 Simple API
Despite its name, NoSQL storage doesn’t always mean that it doesn't provide you with
SQL capabilities, like using indexes, having a structured query language, or being able to
create relationships between elements. But they are not stored and organized as SQL
databases and they provide more than just SQL features

8
Azure Table Storage
 A NoSQL key‐value store tables, which stores data as a collection of entities.
 You are not charged for compute time for inserting, updating, or retrieving your data.
You are only charged for the total storage of your data.
 Each entity has a property. Azure Tables can have 255 properties (or columns!). The
total entity size (or row size) cannot exceed 1MB.
 Azure Tables store entities based on a partition key and a row key.
 The Storage API for tables supports OData, which exposes a simple query interface for
interacting with table data.
https://<your account name>.table.core.windows.net/<your tablename> (PartitionKey=’<partition‐y>’,RowKey=’<row‐
key>’)?$select=<comma separatedproperty names>

10
Azure Table Storage vs Azure SQL Database
 Azure Tables service does not enforce any schema for tables. It simply stores the
properties of your entity based on the partition key and the row key.
 Developers need to enforce the schema on the client side.
 Azure SQL Database also has an incredible amount of features that Azure Tables do
not have including: stored procedures, triggers, indexes, constraints, functions,
default values, row and column level security, SQL injection detection, and much,
much more.
 You are not charged for compute resources when using Azure Tables, and you are
charged in Azure SQL DB. This makes Azure Tables extremely affordable for large
datasets. If we effectively use table partitioning, Azure Tables will also scale very well
without sacrificing performance.

11
Azure Table Storage vs Azure Cosmos DB
 Azure Cosmos DB is much faster, with latency lower than 10ms on reads and 15ms on writes at
any scale.
 Azure Table Storage only supports a single region with one optional readable secondary for high
availability. Azure Cosmos DB supports over 30 regions.
 Azure Table Storage only indexes the partition key and the row key. Azure Cosmos DB
automatically indexes all properties.
 Azure Table Storage only supports strong or eventual consistency. Azure Cosmos DB supports five
different consistency models and allows those models to be specified at the session level. This
means that one user or feature might have a different consistency level than a different user or
feature.

12
Azure Table Storage vs Azure Cosmos DB
 Azure Table Storage only charges you for the storage fees, not for compute fees. This makes
Azure Table Storage very affordable. Azure Cosmos DB charges for a Request Unit (RU) which
really is a way for a PaaS product to charge for compute fees. If you need more RUs, you can scale
them up. This makes Cosmos DB significantly more expensive than Azure Storage Tables.

15
Azure Cosmos DB
 You can access your databases, collections, and documents by using the existing REST API making
requests using HTTP/HTTPS. Microsoft also provides SDKs for languages .NET, Node.js, Java,
JavaScript, and Python.
 These SDKs all call the REST API underneath. Using the REST API allows you to use a language that
might not have an SDK, like Elixir.
 When you need to develop applications that integrate with Cosmos DB, there is no need to provision a
Cosmos DB account for development purposes. Microsoft provides you with an Azure Cosmos DB
Emulator for development and testing purposes on your local environment when you want to work with
SQL API. You can download the Cosmos DB Emulator from https://docs.microsoft.com/en‐
us/azure/cosmos‐db/local‐emulator.

16
Multiple APIs and Data Models

20

21

How To Do Multi-Model
Using it as Documents
Using it as Graph

Comparison with Competition
Capability Cosmos DB MongoDB Atlas MongoDB IaaS
DSE
Cassandra
AWS DynamoDB
Google Cloud
Spanner
Multi-model
Yes. documents,
graphs, key-value,
column family
Document Document Column-family
Yes, multi-model,
but not native
Relational

24
Azure Tables API
 Cosmos DB allows you to connect to your database using the same API calls that you use for your Azure
Table Storage. This allows you to move from Table Storage to Cosmos DB without changing a line of
code.

30
SQL API
 When you work with SQL API, you use a document model. This means that the information is organized
in databases, collections, and documents.
 This way, you can access your NoSQL storage by using your SQL skills.

43
MongoDB API
 MongoDB is a NoSQL storage system that uses a document data model. Similar to JSON objects, a MongoDB document is
composed of pairs of fields‐value, where a value can be other documents, arrays, or arrays of documents.
 When using the MongoDB API, you can reuse your already existing libraries, code, and tools for accessing your Cosmos DB
databases.

44
Cassandra API
 Azure Cosmos DB provides the Cassandra API
(preview) for applications that are written for
Apache Cassandra.
 his means that by using existing Apache licensed
drivers compliant with CQLv4, your application
written for Cassandra can now communicate
with the Azure Cosmos DB Cassandra API.

45
Graph API
 A graph data model is useful when your entities and the relationship between them
are equally important and you need to define properties for both types of elements.
 Azure Cosmos DB implements the property graph model. In this model each entity is
known as a vertex (or node) and represents discrete objects like a car, a person, or a
place. Vertices have relationships between them called edges. Both vertices and
edges have properties. Using NoSQL engines is usually a good option for
implementing graphs, thanks to the schema‐free structure.
 Graph API is compatible with Apache TinkerPop graph traversal language, Gremlin, or
any other TinkerPop‐compatible graph system.

46
Graph API
Attendee Session
attends
• A graph is collection of Nodes and Edges
– Nodes: Entities – for example
customer, supplier, product
– Edges: Relationships that various
entities share with each other
– Properties: Node or Edge attributes

47
Graph API
Hierarchical or interconnected
data, entities with multiple
parents.
Analyze interconnected data,
materialize new information
from existing facts. Identify non‐
obvious connections
Complex many‐to‐many
relationships. One relation
flexibly connecting multiple
entities.
A
John
Mary
Alice
Shaun
Jacob
Jerry
Natalie
Bob
leads
manages
leadsleads

48
Graph API
 Graph and relational designs can answer the same questions
 But if traversal of relationships define the primary application requirements,
Graph can scale better and solve this more intuitively and with less code

49
Graph API
 Recommendation Systems
 Fraud Detection
 Content Management
 Bill of Materials, product hierarchy
 CRM

52
Backup
 Azure performs backups of your Cosmos DB account automatically every four hours
and keeps the last two backups.
 To ensure that the backup process does not impact on the latency of your account,
Cosmos DB uses a separate Azure Blob Storage account.
 Taking a backup doesn’t consume any provisioned RU from your account.
 These automatic backups are also resilient against regional disaster by replicating the
backup data to another region using geo‐redundant storage (GRS).
 Although only two last backups are available for recovery, if you accidentally delete a
database or collection, your backup is maintained up to 30 days.
 If you need to have a longer retention time, you can use Azure Cosmos DB Data
Migration Tool and schedule additional backups.
 You can only perform a restore by opening a support ticket.

54
Global Distribution
 Cosmos DB provides two different levels of automatic failover for the region that is configured for write
operations:
 Regional If a regional outage happens, Cosmos DB automatically moves the requests to another
region. During this transition there is a potential data loss during the regional outage.
 InternalThere are internal failover mechanism for protecting you from failures at the database,
collection, or partition level. These automatic failovers are transparent for you, and you don’t have
any control over them.
 Although global distribution helps as an high availability/disaster recovery (HADR), it is primarily to get
data closer to the users with lower network latency.
 If you need to test the availability features of Cosmos DB with your application, you can manually start a
failover operation. Azure guarantees that there will be zero data loss. When dealing with failover,
Cosmos DB allows you to configure failover priorities. You can use these priorities for instructing Cosmos
DB in which order an automatic failover should happen.

55
Global Distribution
 You can only have a single write region, but you can have as many read regions as you
want. Read queries are always routed to the nearest region to the request. This way
Cosmos DB can ensure that the latency for read requests is always minimal.
 One of the main advantages of the global distribution that Cosmos DB offers is that
you don’t need to make changes to your application when you want to change or add
additional regions to the replication. When you use the Cosmos DB multi‐homing API,
you can configure your application for using logical endpoints, which are region‐
agnostic endpoints, for accessing your Cosmos DB Account. These logical endpoints
allow your application to access the storage transparently in the case of a failover of
the region. If you need more granular control from the application to redirect read and
writes to specific regions, you can use physical endpoints.

60
Throughput
 Request Unit (RU) is the measure for assigning resources per partition and for billing.
 You can consider a partition or physical partition as a server. When you need to assign
resources to your Cosmos DB account, you make it by adding RU per seconds.
 Each RU has assigned a fixed amount of resources (Memory, Core and IOPS). This unit or
currency simplifies the model for provisioning throughput to the application, since you
don’t need to differentiate between read and write capacity units. As a rule of thumb, you
should consider that a write operation needs five times the number of RUs needed for a
read operation of the same size. This means that if you need one RU for reading a
document of 1KB size, you will need five RUs for writing a document of 1KB size.
 You can estimate your throughput needs by using the Azure Cosmos DB request unit
calculator.

61
Throughput
 Within an Azure Cosmos DB database, at any time you can programmatically or through
the portal:
Provision throughput for a container.
Provision throughput for a set of containers collectively, all of which will share the
throughput.
 Standard data transfer rates apply for replication data transfer between regions.
 Globally distributed containers are billed based on the storage consumed in each region
and throughput provisioned for each Azure Cosmos DB container times the number of
regions associated with an Azure Cosmos DB database account.

63
Consistency
 Traditional relational databases have strong consistency level, which is great for data integrity but
creates problems with concurrency
 This has particularly created issues when scaling out relational databases. If a write occurs on one
partition and it hasn’t replicated to another partition, readers are frustrated that they are seeing bad or
out of date data
 Cosmos DB offers a low latency guarantee for read and write operations. Azure can provide this feature
thanks to consistency models used on data replication
 In a single geo‐location Cosmos DB collection, you cannot really see the difference in consistency
choices from the previous section. Data replicates so fast that the user always sees the latest copy of the
data with few exceptions. When replicating data around the globe, choosing the correct consistency
level becomes more important.

65
Consistency
Depending on your needs, you can configure five well‐defined different consistency models for your Cosmos DB
account:
 Strong Guarantees that the read operation returns the most recent version of an item. Any write operation will be
available for reading only when has been committed by the majority quorum of replicas. The client never sees
partially committed data. If you configure your account with this consistency model, you cannot associate more
than one region with your account. The cost associated with read operations is higher than sessions or eventual
consistency models.
 Bounded‐stalenessYou configure a staleness value based on the number of versions K or the time interval t. This
level of consistency guarantees that reads may lag writes by a maximum K number of versions or t time‐interval.
This consistency level is ideal when you want to keep low latency guarantee, but have a strong consistency. You can
associate any number of regions with your account when you use this consistency model. Costs associated with
read operations are equivalent to a strong consistency model.

66
Consistency
 Session The consistency model is scoped to a client session. This consistency model is ideal for scenarios where a
user or device typically reads its own writes. You can associate any number of regions with your account when using
this consistency level. Costs associated with read operations are lower than strong or bounded‐staleness, but higher
than eventual consistency.
 Consistent Prefix The replication within the group will eventually converge only if there are no further write
operations. This model of consistency guarantees that reads are always ordered. This means that if you wrote A, B,
C data, when you read it, you can receive A or A, B or A, B, C, but never A, C or B, A, C. You can associate any number
of regions with your account when you use this consistency model.
 Eventual The replication within the group will eventually converge only if there is no further write operations. There
is no guarantee of the order when you perform read operations. You can associate any number of regions with your
account when you use this consistency model. This consistency level has the lower cost when performing read
operations.

68
Sharding (Partitioning) in Azure SQL Database
 We may shard a database because:
It is too large to be stored in a single Azure SQL Database.
It is too much data to backup and restore in a reasonable amount of time.
Our customers require that their data is stored away from other customers
 Sharding involves rewriting a significant portion of our applications to
handle multiple databases.
 Sharding is easily implemented in Azure Table Storage and Azure Cosmos
DB, but is significantly more difficult in a relational database like Azure SQL
Database. The complexity comes from being transactionally consistent while
having data available and spread throughout several databases.

69
Partitioning
 We can shard automatically by using a partition key. Azure Cosmos
DB will automatically create multiple partitions for us. Partitioning is
completely transparent to your application. All documents with the
same partition key value will always be stored on the same partition.
Cosmos DB may store different partition keys on the same partition
or it may not. The provisioned throughput of a collection is
distributed evenly among the partitions within a collection.
 You can also have a single partition collection. It’s important to
remember that partitioning is always done at the collection, not at
the Cosmos DB account level. You can have a collection that is a
single partition alongside multiple partition collections. Single
partition collections have a 10GB storage limit and can only have up
to 10,000 RUs.

Cosmos DB Container (e.g. Collection)
Partition Key: User Id
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(User Id)
Psuedo‐random distribution of data over
range of possible hashed values

Behind the Scenes:
Physical Partition Sets hash(User Id)
….
Andrew
Mike
…
Partition 1 Partition 2 Partition n
Bob
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)

Behind the Scenes:
….
Andrew
Mike
…
Partition 1 Partition 2 Partition n
Bob
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…
What happens when partitions need to grow?

Behind the Scenes:
Partition Ranges can be dynamically sub‐divided
To seamlessly grow database as the application grows
While sedulously maintaining high availability
Best of All:
Partition management is completely taken care of by the system
You don’t have to lift a finger… the database takes care of you.
Partition X
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…
+
Dharma
Shireesh
…
Partition X1
Rimma
Karthik
…
Partition X2

79
Unique keys
By creating a unique key policy when a container is created, you ensure the uniqueness of one or more values per
partition key.

81
Indexing
 Automatically index every property of every record without having to define schemas and
indices upfront.
 No need for schema and index management
 Works across every data model
 Multiple index types: Hash, range, and geospatial

89
Security
Security & Compliance
Always encrypted at rest and in motion
Fine grained “row level” authorization
Network security with IP firewall rules
Comprehensive Azure compliance certification:
• ISO 27001
• ISO 27018
• EUMC
• HIPAA
• PCI
• SOC1 and SOC2

94
TTL
To set the TTL on a collection, you need to provide a non‐zero positive number that
indicates the period, in seconds, to expire all documents in the collection after the
last modified timestamp of the document (_ts). Or, you can set the default to ‐1,
which implies that all documents inserted in to the collection will live indefinitely by
default.
DocumentCollection collectionDefinition = new DocumentCollection();
collectionDefinition.Id = "orders";
collectionDefinition.PartitionKey.Paths.Add("/customerId");
collectionDefinition.DefaultTimeToLive = 90 * 60 * 60 * 24; // expire all documents after 90 days

98
Cosmos DB query cheat sheets
https://docs.microsoft.com/en‐us/azure/cosmos‐db/query‐cheat‐sheet

99
Azure Cosmos DB: Data migration tool
https://github.com/azure/azure‐
documentdb‐datamigrationtool

100
DEMO: MovieApp with DocumentDB API
https://github.com/mikepfeiffer/movieapp‐documentdb

Thank You
Mohamed Tawfik
Cloud Solutions Architect
Azure CoE EMEA

Azure Cosmos DB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Azure Cosmos DB

Similar to Azure Cosmos DB (20)

More from Mohamed Tawfik

More from Mohamed Tawfik (20)

Recently uploaded

Recently uploaded (20)

Azure Cosmos DB