Application developers support unprecedented rates of change – functionality must rapidly evolve to meet changing customer needs and to respond to competitive pressures while user populations can grow dramatically and unpredictably. To address these realities, developers are selecting document-oriented databases for schema flexibility, scalability and high performance data storage.
In this session, we will get hands on with Azure’s NoSQL document database service. Azure DocumentDB offers full indexing of JSON documents, SQL query capabilities and multi-document transactions. Learn how to get started with Azure DocumentDB and hear about some of the recent improvements to the service.
12. Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English
13. Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English
Lenovo Thinkpad X1 Carbon ??? ??? ???
14.
15.
16.
17.
18.
19.
20. fully managed, scalable, queryable, schemafree JSON
document database service for modern applications
transactional processing
rich query
managed as a service
elastic scale
internet accessible http/rest
schema-free data model
arbitrary data formats
23. No need to define secondary indices / schema hints for indexing!
24. -- Nested lookup against index
SELECT Books.Author
FROM Books
WHERE Books.Author.Name = "Leo Tolstoy"
-- Transformation, Filters, Array access
SELECT { Name: Books.Title, Author: Books.Author.Name }
FROM Books
WHERE Books.Price > 10 AND Books.Languages[0] = "English"
-- Joins, User Defined Functions (UDF)
SELECT CalculateRegionalTax(Books.Price, "USA", "WA")
FROM Books
JOIN LanguagesArr IN Books.Languages
WHERE LanguagesArr.Language = "Russian"
SQL Query Grammar
35. “With Azure DocumentDB, we didn’t have to say ‘no’ to
the business, and we weren’t a bottleneck to launching
the promotion — in fact, we came in ahead of schedule.”
52. No magic bullet
Think about how your data is
going to be written, read and
model accordingly
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}
53.
54. Request Unit (RU) is the
normalized currency
% Memory
% IOPS
% CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocs
args
Resource Resource
Predictable Performance
55. Operation Request units
(RUs)
consumed*
Reading a single 1KB document 1
Reading a single 2KB document 2
Query with a simple predicate for a 1KB
document
3
Creating a single 1 KB document with 10
JSON properties (consistent indexing)
14
Create a single 1 KB document with 100 JSON
properties (consistent indexing)
20
Replacing a single 1 KB document 28
Execute a stored procedure with two create
documents
30
56.
57. • Data Size
A single collection holds 10GB
• Throughput
3 Performance tiers with a max of 2,500 RU/sec
63. Hash sharding
• Examples: Profile data (user ID, app ID), (user ID), Device and vehicle data (device/vin ID),
Catalog data (item ID)
• Pros: balanced, stateless
• Cons: reshuffling is hard
Range sharding
• Examples: Operational data (timestamp), (timestamp, event ID)
• Pros: easy sliding window, range queries
• Cons: stateful
Lookup sharding
• SaaS/multitenant service (tenant ID), Metadata store (type ID)
• Pros: simple, easy to reshuffle, can span accounts
• Cons: stateful, works only on discrete keys
64.
65. How it works
Automatic indexing of documents
JSON documents are represented as
trees
Structural information and instance
values are normalized into a JSON-Path
Fixed upper bound on index size
(typically 5-10% in real production data)
Example
{"headquarters": "Belgium"} /"headquarters"/"Belgium"
{"exports": [{"city": “Moscow"}, {"city": Athens"}]} /"exports"/0/"city"/"Moscow"
and /"exports"/1/"city"/"Athens".
66. Configuration Level Options
Automatic Per collection True (default) or False
Override with each document write
Indexing Mode Per collection Consistent or Lazy
Lazy for eventual updates/bulk ingestion
Included and excluded
paths
Per path Individual path or recursive includes (? And *)
Indexing Type Per path Support Hash (Default) and Range
Hash for equality, range for range queries
Indexing Precision Per path Supports 3 – 7 per path
Tradeoff storage, query RUs and write RUs
67. Path Description/use case
/ Default path for collection. Recursive and applies to whole document tree.
/"prop"/? Serve queries like the following (with Hash or Range types respectively):
SELECT * FROM collection c WHERE c.prop = "value"
SELCT * FROM collection c WHERE c.prop > 5
/"prop"/* All paths under the specified label.
/"prop"/"subprop"/ Used during query execution to prune documents that do not have the
specified path.
/"prop"/"subprop"/? Serve queries (with Hash or Range types respectively):
SELECT * FROM collection c WHERE c.prop.subprop = "value"
SELECT * FROM collection c WHERE c.prop.subprop > 5
Notas del editor
Image licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license.
http://commons.wikimedia.org/wiki/File:Crying-girl.jpg
The “write” index for consistent queries
Highly concurrent, lock free, log structured indexing technology developed with Microsoft Research
Optimized for SSD (works well for HDD)
Resource governed for tenant isolation
Automatic indexing of JSON documents without requiring schema or secondary indices, but configurable via:
Modes
Policies
Paths
Types
Query over heterogeneous documents without defining schema or managing indexes
Query arbitrary paths, properties and values without specifying secondary indexes or indexing hints
Execute queries with consistent results in the face of sustained writes
Query through fluent language integration including LINQ for .NET developers and a “document oriented“ SQL grammar for traditional SQL developers
Extend query execution through application supplied JavaScript UDFs
Supported SQL features include; predicates, iterations (arrays), sub-queries, logical operators, UDFs, intra-document JOINs, JSON transforms
Stored Procedures and Triggers
Familiar programming model constructs for executing application logic
Registered as named, URI addressable, durable resources
Scoped to a DocumentDB collection
JavaScript as a procedural language to express business logic
Language integration
JavaScript throw statement results into aborting the transaction
Execution
JavaScript runtime is hosted on each replica
Pre-compiled on registration
The entire procedure is wrapped in an implicit database transaction
Fully resource governed and sandboxed execution
Stored Procedures and Triggers
Familiar programming model constructs for executing application logic
Registered as named, URI addressable, durable resources
Scoped to a DocumentDB collection
JavaScript as a procedural language to express business logic
Language integration
JavaScript throw statement results into aborting the transaction
Execution
JavaScript runtime is hosted on each replica
Pre-compiled on registration
The entire procedure is wrapped in an implicit database transaction
Fully resource governed and sandboxed execution
Stored Procedures and Triggers
Familiar programming model constructs for executing application logic
Registered as named, URI addressable, durable resources
Scoped to a DocumentDB collection
JavaScript as a procedural language to express business logic
Language integration
JavaScript throw statement results into aborting the transaction
Execution
JavaScript runtime is hosted on each replica
Pre-compiled on registration
The entire procedure is wrapped in an implicit database transaction
Fully resource governed and sandboxed execution
In theoretical computer science, the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:[1][2][3]
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it succeeded or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Strong: guarantees that a write is only visible after it is committed durably by the majority quorum of replicas and reads are always acknowledged by the majority read quorum
Session: Provides predictable read consistency for a session while offering the low latency writes. Reads are also low latency as it read will be served by a single replica
Bounded Staleness: Bounded Staleness consistency guarantees the total order of propagation of writes but reads may lag writes by N seconds or operations (configurable)
Eventual: Eventual consistency is the weakest form of consistency wherein a client may get the values which are older than the ones it had seen before, over time
Image licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license:
http://commons.wikimedia.org/wiki/File:Fale_F1_Monza_2004_73.jpg
Image licensed under the Creative Commons Attribution 2.0 Generic license:
http://en.wikipedia.org/wiki/File:A_smiling_baby.jpg
Talk about productivity and iterative development. No rigid schemas to weigh you down!
Source: http://en.wikipedia.org/wiki/Denormalization
In computing, denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.[1][2] In some cases, denormalization is a means of addressing performance or scalability in relational database software.
With DocumentDB, you can choose to also use a hybrid model that to mimic advantages of normalization.
With DocumentDB, you can choose to also use a hybrid model that to mimic advantages of normalization.