In this session, we will take you through the different database services that you can choose from on AWS. We will take a look at the workings of each one, from Amazon RDS for relational databases, to Amazon QLDB for ledger databases.
Al Barsha Night Partner +0567686026 Call Girls Dubai
ย
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the right DB
1. Cobus Bernard
Sr Developer Advocate
Amazon Web Services
GettingStarted with Databases onAWS:
Choosing the right DB
@cobusbernard
cobusbernard
cobusbernard
CobusCloud
2. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Agenda
History of Databases
Why so many?
Relational Data
NoSQL
Q & A
3. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
1970 1980 1990 2000
Oracle DB2
SQL Server
MySQL
PostgreSQL
DynamoDB
Redis
MongoDB
Elasticsearch
Neptune
CassandraAccess
Aurora
2010
Timestream
QLDB
Amazon
DocumentDB
4. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Modern apps create new requirements
Users: 1 million+
Data volume: TBโPBโEB
Locality: Global
Performance: Millisecondsโmicroseconds
Request rate: Millions
Access: Web, mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: No assembly requiredSocial mediaRide hailing Media streaming Dating
5. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
User search history: Amazon DynamoDB
โข Massive data volume
โข Need quick lookups for personalized search
Session state: Amazon ElastiCache
โข In-memory store for sub-millisecond fetch
Relational data: Amazon RDS
โข Referential integrity
โข Primary transactional database
6. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
โข DynamoDB: 31B items
tracking language exercises
โข Aurora: Primary transactional
db (user data)
โข ElastiCache: Instant access to
common words and phrases
300M total users
7B exercises per month
7. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Data categories and common use cases
Relational Key value Document In-memory Graph Search Time series Ledger
Referential
integrity,ACID
transactions,
schema-
on-write
Low-latency,
key lookups
with high
throughput and
fast ingestion
of data
Indexing and
storing
documents
with support
for query on
any attribute
Microseconds
latency, key-
based queries,
and specialized
data structures
Creating and
navigating
data relations
easily and quickly
Lift and shift,
EMR, CRM,
finance
Real-time
bidding,
shopping cart,
social
Content
management,
personalization,
mobile
Leaderboards,
real-time
analytics,
caching
Fraud detection,
social networks,
recommendation
engines
Indexing and
searching
semistructured
logs and data
Product
catalog, help,
and FAQs,
full text
Collect, store,
and process data
sequenced by
time
IoT
applications,
event tracking
Complete,
immutable, and
verifiable history
of all changes to
application data
Systems
of record,
supply chain,
healthcare,
registrations,
financial
8. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
AWS: Purpose-built databases
Relational Key value Document In-memory Graph Search
Amazon
DynamoDB
Amazon
Neptune
Amazon RDS
Aurora CommercialCommunity
Amazon
ElastiCache
Amazon
Elasticsearch
Service
Amazon
DocumentDB
Time series Ledger
Amazon
Timestream
Amazon
Quantum
Ledger
Database
MemcachedRedis
9. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Relational data
10. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Traditional SQL
โข TCP based wire protocol
โข Well Known, lots of uses
โข Common drivers (JDBC)
โข Frequently used with ORMs
โข Scale UP individual instances
โข Scale OUT with read replicas
โข Sharding at application level
โข Lots of flavors but very similar language
INSERT INTO users
(id, first_name, last_name)
VALUES (1, โCobusโ, โBernardโ);
SELECT col1, col2, col3
FROM table1
WHERE col4 = 1 AND col5 = 2
GROUP BY col1
HAVING count(*) > 1
ORDER BY col2
11. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Relational model
Data model
โข Data is stored in rows and tables
โข Data is normalized
โข Strict schema
โข Relationships established via
keys enforced by the system
โข Data accuracy and consistency
โข Complex queries to extract and
reshape data on-demand
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
MedicalTreatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
12. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
MedicalTreatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
Query model: SQL
SELECT
d.first_name, d.last_name, count(*)
FROM
visit as v,
hospital as h,
doctor as d
WHERE
v.hospital_id = h.hospital_id
AND h.hospital_id = d.hospital
AND v.t_date > date_trunc('weekโ,
CURRENT_TIMESTAMP - interval '1 week')
GROUP BY
d.first_name, d.last_name;
Relational model
13. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Relational Database Service (RDS)
Managed relational database service with a choice of six popular database engines
Easy to administer Available and durable Highly scalable Fast and secure
No need for infrastructure
provisioning, installing, and
maintaining DB software
Automatic Multi-AZ data replication;
automated backup, snapshots,
failover
Scale database compute
and storage with a few
clicks with no app
downtime
SSD storage and guaranteed
provisioned I/O; data
encryption at rest and in transit
14. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Aurora ascendant: How we designed a cloud-native
relational database
allthingsdistributed.com
16. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Aurora
MySQL and PostgreSQL-compatible relational database built for the cloud
Performance and availability of commercial-grade databases at 1/10th the cost
Performance
and scalability
Availability
and durability
Highly secure Fully managed
5x throughput of standard MySQL
and 3x of standard PostgreSQL;
scale-out up to
15 read replicas
Fault-tolerant, self-healing storage;
six copies of data
across three Availability Zones;
continuous backup to Amazon S3
Network isolation,
encryption at rest/transit
Managed by RDS:
No hardware provisioning, software
patching, setup, configuration, or
backups
17. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
NoSQL vs SQL
18. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
SQL vs NoSQL
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
19. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Key-Value DBs
20. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Key-value data
โข Simple key-value pairs
โข Partitioned by keys
โข Resilient to failure
โข High throughput, low-
latency reads and writes
โข Consistent performance at
scale
Table 1
โฆ
โฆ
Partitions
โฆ
Highly partitionable data
21. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Gamers
Primary Key
Attributes
GamerTag Type
Hammer57
Rank
Level Points Tier
87 4050 Elite
Status
Health Progress
90 30
Weapon
Class Damage Range
Taser 87% 50
FluffyDuffy
Rank
Level Points Tier
5 1072 Trainee
Status
Health Progress
37 8
// Status of Hammer57
GET {
TableName:"Gamers",
Key: {
"GamerTag":"Hammer57",
"Type":"Statusโ } }
// Return all Hammer57
SCAN {
TableName:โGamersโ,
KeyConditionExpression:"GamerTag = :a",
ExpressionAttributeValues: {
":aโ:โHammer57โ } }
Key-value data
22. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon DynamoDB
Fast and flexible key value database service for any scale
Comprehensive
security
Encrypts all data by default
and fully integrates with AWS
Identity and Access
Management for robust
security
Performance at scale
Consistent, single-digit millisecond
response times at any scale; build
applications with virtually unlimited
throughput
Global database for global
users and apps
Build global applications with fast
access to local data by easily
replicating tables across multiple
AWS Regions
Serverless
No server provisioning, software
patching, or upgrades; scales up
or down automatically;
continuously backs up your data
23. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Document DBs
24. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Document databases
โข Data is stored in JSON-like
documents
โข Documents map naturally to
how humans model data
โข Flexible schema and indexing
โข Expressive query language built
for documents (ad hoc queries
and aggregations)
JSON documents are
first-class objects
of the database
{
id: 1,
name: "sue",
age: 26,
email: "sue@example.com",
promotions: ["new user", "5%", "dog lover"],
memberDate: 2018-2-22,
shoppingCart: [
{product:"abc", quantity:2, cost:19.99},
{product:"edf", quantity:3, cost: 2.99}
]
}
25. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
!===
Evolution of document databases
JSON became the
de facto data
interchange format
Friction when
converting JSON
to the relational
model
Object-relational
mappings (ORMs)
were created to help
with this friction
Document
databases solved
the problem
(Client)
(App) (Database)
JSON RelationalJSON
27. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Use cases for document data
Mobile
Retail and
marketing
User profilesCatalog
Content
management
Personalization
28. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon DocumentDB
Fast, scalable, highly available, fully managed MongoDB-compatible database service
Fully Managed
Managed by AWS:
No hardware provisioning,
software patching, setup,
configuration, or backups
Fast
Millions of requests per second,
millisecond latency
MongoDB-compatible
Compatible with MongoDB
Community Edition 3.6. Use the
same drivers and tools
Reliable
Six replicas of your data across
three AZs with full backup and
restore
29. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
In-Memory stores
30. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
In-memory
โข No persistence, in-memory
โข Microsecond performance
โข Simple commands for
manipulating in memory data
structures
โข Strings, hashes, lists, sets,
and sorted sets
Database
Memory
(buffer pool)
Disk
Query processor Get/PutAPIs
Memory
Milliseconds to microseconds (10x faster)
Storage engine
31. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
In-memory ops
set a "hello" // Set key "a" with a string value and no expiration OK
get a // Get value for key "a"
"hello"
get b // Get value for key "b" results in miss
(nil)
set b "Good-bye" EX 5 // Set key "b" with a string value and a 5 second expiration
"Good-bye"
get b // Get value for key "b" "Good-bye"
// wait >= 5 seconds
get b (nil) // key has expired, nothing returned
32. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon ElastiCache
Redis and Memcached compatible, in-memory data store and cache
Secure and reliable
Network isolation, encryption
at rest/transit, HIPAA, PCI,
FedRAMP, multiAZ, and
automatic failover
Redis & Memcached
compatible
Fully compatible with open source
Redis and Memcached
Easily scalable
Scale writes and reads with sharding
and replicas
Extreme performance
In-memory data store and cache
for microsecond response times
33. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Searching data
34. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Search: Full text search
35. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Search
_search?q=house
{
"hits": {
"total": 85,
"max_score": 6.6137657,
"hits": [{
"_index": "movies",
"_type": "movie",
"_id": "tt0077975",
"_score": 6.6137657,
"_source": {
"directors": [ "John Landis" ],
"release_date": "2020-08-11T00:00:00Z",
"rating": 7.5,
"genres": [ "Comedy", "Romance" ],
"image_url": "http://ia.jpg",
"plot": "In a webinar in 2020, the beard gets cut!",
"title": "Animal House",
"rank": 527,
"running_time_secs": 6540,
"actors": [ "John Belushi","Karen Allen","Tom Hulce" ],
"year": 1978, "id": "tt0077975"
}
}]
}
},
36. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Elasticsearch Service
Fully managed, reliable, and scalable Elasticsearch service
Easy to use Scalable Highly available Secure
Deploy a production-ready
Elasticsearch
cluster in minutes
Resize your cluster
with a few clicks
or a single API call
Replicate across
AZs, with monitoring and
automated self-healing
Deploy intoVPC
and restrict access
using security groups
and IAM policies
37. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Graph data
38. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Graph
Use cases
โข Social Networking
โข Recommendation Engines
โข Fraud Detection
โข Knowledge Graphs
โข Life Sciences
โข Network / IT Operations
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
39. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Graph use case
// Product recommendation to a user
gremlin> V().has(โnameโ,โcobusโ).as(โcustomerโ).out(โfollowsโ).in(โfollowsโ).out(โpurchasedโ)
( (โcustomerโ)).dedup() (โnameโ) ('name')
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
FOLLOWS
// Identify a friend in common and make
a recommendation
gremlin> g.V().has('name','mary').as(โstartโ).
both('knows').both('knowsโ).
where(neq(โstartโ)).
dedup().by('name').properties('name')
40. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Neptune
Fully managed graph database
Easy
Build powerful queries easily
with Gremlin and SPARQL
Fast
Query billions of relationships with
millisecond latency
Open
SupportsApacheTinkerPop &W3C
RDF graph models
Reliable
Six replicas of your data across
three AZs with full backup and
restore
41. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Time series
42. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Time series data
What is time series data?
What is special about a time
series database? A sequence of data points
recorded over a time interval
Time is the
single primary axis
of the data model
t
43. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Time series use cases
Application events
IoT sensor readings
DevOps data
Humidity
%Water vapor
91.094.086.093.0
44. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Existing time-series databasesRelational databases
Difficult to
maintain high
availability
Difficult to scale Limited data
lifecycle
management
Inefficient
time-series data
processing
Unnatural for
time-series data
Rigid schema
inflexible for fast
moving time-series
data
Building with time-series data is challenging
45. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
AmazonTimestream
Fast, scalable, fully managed time-series database
1,000x faster and 1/10th the cost
of relational databases
Collect data at the rate of
millions of inserts per second
(10M/second)
Trillions of
daily events
Adaptive query processing
engine maintains steady,
predictable performance
Time-series analytics
Built-in functions for
interpolation, smoothing, and
approximation
Serverless
Automated setup, configuration,
server provisioning, software
patching
46. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Ledgers & journals
47. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Common customer use cases
Ledgers with centralized control
Healthcare
Verify and track hospital
equipment inventory
Manufacturers
Track distribution of a
recalled product
HR & Payroll
Track changes to an
individualโs profile
Government
Track vehicle title
history
48. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Challenges with building ledgers
Adds unnecessary
complexity
BlockchainRDBMS - audit tables
Difficult to
maintain
Hard to use and
slow
Hard to build
Custom audit functionality using
triggers or stored procedures
Impossible to verify
No way to verify changes made to
data by sys admins
49. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Ledger database concepts
C | H
J Journal
C | H Current | History
Current | History
Journal
Ledger comprises
J
L
Ledger databaseL
Journal determines Current | History
50. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Quantum Ledger Database (QLDB)
Fully managed ledger database
Track and verify history of all changes made to your applicationโs data
Immutable
Maintains a sequenced record of
all changes to your data, which
cannot be deleted or modified;
you have the ability to query and
analyze the full history
Cryptographically
verifiable
Uses cryptography to
generate a secure output
file of your dataโs history
Easy to use
Easy to use, letting you
use familiar database
capabilities like SQL APIs for
querying the data
Highly scalable
Executes 2โ3X as many
transactions than ledgers
in common blockchain
frameworks
51. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
Purpose-built
52. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
AWS: Purpose-built databases
Relational Key value Document In-memory Graph Search
Amazon
DynamoDB
Amazon
Neptune
Amazon RDS
Aurora CommercialCommunity
Amazon
ElastiCache
Amazon
Elasticsearch
Service
Amazon
DocumentDB
Time series Ledger
Amazon
Timestream
Amazon
Quantum
Ledger
Database
MemcachedRedis
53. ยฉ 2020, Amazon Web Services, Inc. or its Affiliates.
twitch.tv/aws โ Mo/Fr @ 11am SAST Bean Streaming
twitch.tv/aws โ Thu @ 12:00 SAST AWS Africa Office Hours
youtube.com/c/CobusCloud
youtube.com/c/Ruptwelve
bit.ly/notC_notD (Watch the recorded weekly sessions)
54. Thank you!
ยฉ 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cobus Bernard
Sr Developer Advocate
Amazon Web Services
@cobusbernard
cobusbernard
cobusbernard
CobusCloud
Editor's Notes
Letโs take a look back at how database technology has evolved over the years.
In the 1970s and 1980s, we had relational databases and then in the 1990s we added open source options like MySQL and PostgreSQL.
Then, around mid 2000s to the current day, we have seen a significant growth of specialized databases that differ from the relational model. I donโt think it is a coincidence that these new database emerged at the same the time cloud was taking off as customers were beginning to build internet-scale apps that demanded functionality, performance, and scale for many different use cases in the same application.
If you look across the data categories there are a number of different ways to model data, each of those models has a distinct purpose, each of those models lends themselves well to a particular to a set of use cases.
If you look at our offerings and how they align to these categories, our database strategy is fairly simple, we want to ensure you as developers have the very best purpose-built databases in each of these categories so that you never have to sacrifice on scale, performance, and functionality.
Today weโre going to talk about the Amazon DocumentDB - Fully managedย MongoDBย compatible database service designed from the ground up to be fast, scalable, and highly available
RDS makes it easy to set up, operate, and scale a relational database in the cloud.
It provides cost-efficient and resizable capacity
Automating time-consuming administration tasks
- hardware provisioning, database setup, patching and backups.
Freeโs your time to work on apps.
Available in several db instance types (optimized for memory, perf or I/O)
Familiar engines (Aurora, PostgreSQL, MySQL, MariaDB, Ora DB and SQL Server)
Amazon Aurora is a MySQL and PostgreSQL-compatibleย relational databaseย built for the cloud,
Aurora 5x faster than standard MySQL and 3x standard PostgreSQL
Provides security, availability, and reliability of commercial databases at 1/10th the cost.
automates time-consuming administration tasks like hardware provisioning, database setup, patching, and backups.
Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across three Availability Zones (AZs).
The first consideration that needs to be made when selecting a database is the characteristics of the data you are looking to leverage. If the data has a simple tabular structure, like an accounting spreadsheet, then the relational model could be adequate. Data such as geo-spatial, engineering parts, or molecular modeling, on the other hand, tends to be very complex.
The purpose of DynamoDB is to provide consistent single-digit millisecond latency for any scale of workloads.
fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale.
fully managed cloud database and supports both document and key-value store models.
flexible data model, reliable performance, and automatic scaling of throughput capacity, makes it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.
Backup Notes
1/Performance at scale - Consistent, single-digit millisecond response times at any scale. Build applications with virtually unlimited throughput and storage, backed by aย service level agreement for reliability.
2/Serverless - No hardware provisioning, software patching, or upgrades. Scales up or down automatically to accommodate your performance needs. Optimize costs by paying for only the resources you use. Protect your data with on-demand and continuous backups with no downtime.
3/Comprehensive Security- Encrypts all data by default and fully integrates with AWS Identity and Access Management for robust security. Get oversight of your tables by using integrated monitoring on audit logs with AWS CloudTrail, and network isolation with Amazon Virtual Private Cloud.
4/Modern applications - A database for serverless applications that includes AWS Lambda integration. Supports ACID transactions for business-critical applications. Build global applications with fast access to local data by easily replicating tables across multiple AWS Regions.
----
https://www.allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html
As I have talked about before, one of theย reasons why we built Amazon DynamoDBย was that Amazon was pushing the limits of what was a leading commercial database at the time and we were unable to sustain the availability, scalability, and performance needs that our growing Amazon.com business demanded. We found that about 70 percent of our operations were key-value lookups, where only a primary key was used and a single row would be returned. With no need for referential integrity and transactions, we realized these access patterns could be better served by a different type of database.
This doesn't mean relational databases do not provide utility in present-day development and are not available, scalable, or provide high performance. The opposite is true. In fact, this is been proven by our customers as Amazon Aurora remainsย the fastest growing service in AWS history. What we experienced at Amazon.com was using a database beyond its intended purpose. That learning is at the heart of this blog postโdatabases are built for a purpose and matching the use case with the database will help you write high-performance, scalable, and more functional applications faster.
Data is stored in JSON-like documents and JSON documents are first-class objects within the database โ documents are not a data type or a value, they are the key design point of the database
Document databases have flexible schema and make the representation of hierarchical and semi-structured data easy โ they also enable powerful indexing to make the querying to such documents fast
Documents map naturally to object-oriented programming, which makes the flow of data with your app to persistent layer easier
Expressive query languages built for documents that enable ad hoc queries and aggregations across documents
Together Document databases help developers build applications faster and iterate quickly
If you look at the evolution of document databases, what was the need that document databases filed in this new world?
At some point, JSON became the de facto standard for data interchange and data modeling within applications
Using JSON in the application and then trying to map JSON to relational introduced friction and complication
Object Relational Mappers (ORMs) were created to help with this friction, but there were complications with performance and functionality
A crop of document databases popped up to really solve the problem.
Letโs take user profiles for example, letโs say that Susan plays a new game called ExploidingSnails, we can easily add that information to her profile โ notice here that we didnโt have to design a complicated schema or create any new tables, we simply added a new set of fields in our document.
Similarly, we can add an array for the promotions that Susan has achieved.
The document model really enables developers to evolve their applications quickly over time and build applications faster.
Content management โ if you think about a lot of the content that we read today from news articles, to blogs, to recipes, to patient records, all of that data lends itself really well to a document model
Catalogs โ being able to record output of machine learning experiments, inventory descriptions, output of pharmaceutical trials, mapping of content and meta data in S3 โ we see a broad set of customers that use documents for these use cases
MongoDB compatibility
fast, scalable, highly available, and fully managed document database
Hard to build HA, scalable to multi TB, 100kโs reads + writes / sec
- MongoDB clusters hard at scale
DocDB designed for perf, scale, and avail
Implements Apache 2.0 open source MongoDB 3.6 API
- So use existing drivers + tools
uses a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64 TB per database cluster.
----
Other notes I might include depending on time when I do a run through.
DocumentDB uses a MongoDB compatible APIs, so can be suitable for those customers that don't want to use proprietary AWS APIs of DynamoDB or that already run unmanaged MongoDB and want to easily switch to a managed DB.
There are many differences between DynamoDB and DocumentDB. Few main ones which affect which of the DBs will suit a particular use case:- DynamoDB doesn't have any aggregation abilities (sum, average, group by,...), MongoDB (and so DocumentDB) has such support.- DocumentDB doesn't use sharding, so db can only grow up to 64TB and a single table can grow to 32TB. DynamoDB tables can grow to much more than that. Also the absence of sharding in DocumentDB means that the max amount of parallel queries is limited, while with DynamoDB it is only limited by the amount of capacity units you provision to your tables.- MongoDB (and so DocumentDB) is much more flexible regarding the abilities to store and query different types of data, specifically complex nested structures.- Under significant load DynamoDB should be able to provide lower and more consistent response times, which if needed can be decreased even more with DAX- Items in DynamoDB are limited to 400KB, with DocumentDB the limit is 16MB.- DocumentDB doesn't currently provide cross region replication abilities, while DynamoDB provide it with global tables.
What does in memory db give you: Query by key with microsecond latency, performance that canโt be achieved with disk based stores.
Improve perf using fast in memory data stores
Key value in memory DB (persisted only via snapshot in redis)
Redis-compatible in-memory service
availability, reliability and performance suitable for the most demanding applications.
15 node cluster available, up to 6.1 TiB of in-memory data.
Memcached - a widely adopted memory object caching system.
Fully mgd: hardware provisioning, software patching, setup, config, monitoring, failure recovery, and backups (redis)
Easily scalable: scale out, in and up to meet demands. Write and memory scaling with sharding. Replicas for read scaling.
Use cases: Leaderboards, real-time analytics, caching
Financial services, Ecommerce, web, and mobile application have use cases such as leaderboards, session stores, and real-time analytics that require microsecond response times and can have large spikes in traffic coming at any time. We builtย Amazon ElastiCache, offering Memcached and Redis, to serveย low latency, high throughputย workloads, such asย McDonald's, that cannot be served with disk-based data stores.ย
----
Donโt think I have time to look at the below, but will nail the in memory use case instead.
Memcached:
You need the simplest model possible.
You need to run large nodes with multiple cores or threads.
You need the ability to scale out and in, adding and removing nodes as demand on your system increases and decreases.
You need to cache objects, such as a database.
Redis:
You need complex data types, such as strings, hashes, lists, sets, sorted sets, and bitmaps.
You need to sort or rank in-memory datasets.
You need persistence of your key store.
You need to replicate your data from the primary to one or more read replicas for read intensive applications.
You need automatic failover if your primary node fails.
You need publish and subscribe (pub/sub) capabilitiesโto inform clients about events on the server.
You need backup and restore capabilities.
You need to support multiple databases.
My favorite scenario here is going to a bank/credit card website to notify them you will be out of the country. Now given all the menu options, its nearly impossible to find it by visual inspection, so I use to search for help field.
You can tell when you type โtravel internationalโ how well the experience has been architected by the responses that come back.
Working backwards from our customers and these challenges, we built Amazon DocumentDB to be a fast, scalable, fully-managed, and MongoDB-compatible AWS database service.
What does Graph give you: Quickly and easily create and navigate relationships between data
A graph database's purpose is to make it easy to build and run applications that work with highly connected datasets.
Typical use cases for a graph database include social networking, recommendation engines, fraud detection, and knowledge graphs.
Relationships are first-class objects - They have attributes that can be queried
Simple diagram we have Verticies, customers, categories (Product and sport)
Edges are the connections between these nodes and they can have attributes
- This is what you are querying
e.g. on next page
Buildโs out
1) Product recommendation
- Bill and Kevin follow sport and purchased product
2+3) Sara signs up to app and follows sport
4) goes to shop to make purhcase
5) we make the recommendation
6) this is the gremlin query for that.
7) FRIEND IN COMMON
8) Mary knows Amit
9+10) we suggest she might know Kevin
11) Gremlin for this query.
Better than complex SQL which is hard to optimize and tune
Point again about how complex this would be in a different service like relational DB, but simple in graph.
Neptune fast, reliable, fully-managed graph database service
Purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency.ย
Supports graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL,
Reliable: 99.99% avail โ 6 copies across 3 AZโs
Backups in S3
< 30 seconds failover
Customers can choose the best model for their application needs.
Devs like the Property Graphs because it is somewhat familiar to relational models,
TinkerPop Gremlin provides a way to quickly traverse property graphs
Devs like RDF as it provides flexibility for modeling complex information and there are a lot of existing public domain data sets available in RDF including Wikidata
----
Other notes not for preso at the moment.
ย Amazon Neptuneย is a fully-managed graph database service. Neptune supports both the Property Graph model and the Resource Description Framework (RDF), giving you the choice of two graph APIs: TinkerPop and RDF/SPARQL. Current Neptune users are building knowledge graphs, making in-game offer recommendations, and detecting fraud. For example, Thomson Reuters is helping their customers navigate a complex web of global tax policies and regulations by using Neptune.
Challenges in trying to build these workloads
Relational: donโt need rigid schema, might want to change attributes on the fly, want to collect attributes on sensor now, struggle to interpolate missing data points. Hard to do in SQL. (800 lines of code vs function in a time series DBโs)
Existing Time series DBโs
Scale constraints
Starts to purge data
Or could turn off ingest
VERY LARGE VOL DATA
Want a policy to move data between in memory to warm to to cold storage which is cheap.
Hard to mange that lifecycle, should be just a policy not full time job
Amazon Timestream is a purpose-built, time-series database designed specifically for collecting, storing, and analyzing time series data.
ย
At the very core of Amazon Timestream, time isnโt just an attribute, but rather the single primary axis of the data model. This allows for simplification and specialization across the database.
1/ 1,000X faster, and at 1/10th the cost of relational databases
2/ Trillions of daily events
3/ Analytics optimized for time series data
4/ Serverless
Amazon Timestream is available for preview today.
---
Need to sprinkle in notes from below:
1/ 1,000X faster, and at 1/10th the cost of relational databases - Amazon Timestream can collect fast moving time-series data from multiple sources at the rate of millions of inserts per second (10M/second). Amazon Timestream organizes data by time intervals, reducing the amount of data that needs to be scanned to answer a query. Amazon Timestream also executes inserts and queries in separate processing tiers which eliminates resource contention and improves performance. [Note: This claim is based on internal benchmarks and applies to both queries and inserts. The service team is comfortable making the claim].
ย
2/ Trillions of daily eventsโ With its speed and purpose-built architecture, Amazon Timestream is capable of processing trillions of events daily. This opens up the door to more IoT devices, more sensor reads and ultimately a larger data set to make smarter decisions on time series and machine data. Amazon Timestreamโs adaptive query processing engine and data retention policies adjust the query performance and storage capacity to maintain steady, predictable performance at the lowest possible cost as your data grows over time.
ย
3/ Analytics optimized for time series dataโ Analyzing time-series data with Amazon Timestream is easy, with built-in functions for interpolation, smoothing, and approximation that can be used to identify trends, patterns, and anomalies. For example, a smart home device manufacturer can use Amazon Timestream to collect motion or temperature data from the device sensors in a home, interpolate that data to identify the time ranges without any motion in the home, and alert consumers to take actions such as turning off the lights or turning down the heat to save energy during times when no one is in the house.
ย
4/ Serverless โ With Amazon Timestream, there are no servers to manage. As your application needs change, Amazon Timestream automatically scales up or down to adjust capacity and performance. Amazon Timestream takes care of the time-consuming tasks such as server provisioning, software patching, setup, and configuration so you can focus on building your applications. In addition, You can set policies to automate the retention and tiering of how data is stored, which can significantly reduce your manual effort, storage requirements, and cost.
ย
Track every application change, crypto verifiable.
Have a central trust authority
E.g. use cases
- People donโt ask for a ledger DB, instead want
Data immutable
Crypto Verifiable
Manufacturing: Trust lineage of data that it hasnโt changed
Health care: want to verify and track hospital equipment.
We built this several years ago for our own services, only recently customers have been asking for.
Challenges in building a ledger in RDBMS + Blockchain
RDBMS
Hard to build an audit table, store procs, triggers, what if something changes in audit table, how do we record thatโฆ
Even if audit hard to verify admin didnโt login and change
Blockchain: some need distr consensus, but some customers donโt need that, need a crypt verifiable way to track and verify changes, donโt want to setup distribute blockchain.
Create ledger
Serverless
Key component is journal
When you create transaction write to journal, stored in block, once written cant be changed
If you made a mistake need to write a new record, canโt undo, e.g. write new keeper.
Maintains current state and history
Think bank, debit, credit, debit credit etc. Current state is current balance.
History, e.g. last 30 days of acc activity.
Just a table created by default to query this history.
Ledger comprises c h and j
J determines c and h so can destroy c and h but get it back with J
transparent, immutable, and cryptographically verifiable ledger database
For apps that act as a โsystem or recordโ.
Eliminates the need to build complex audit functionality within your relational database, or rely on the ledger capabilities within blockchain frameworks.
Built with tried + tested tech used extensively in AWS and Amazon.com
Immutable
Cryptographically Verifiable
Transparent
Fast
Highly scalable
Easy to use
---
Need to sprinkle notes from the below detail into above notes:
3/ Immutable - QLDB uses an append-only immutable journal to maintain a sequenced and permanent record of each data change, which cannot be deleted or modified.
4/ Cryptographically Verifiable โ All changes are cryptographically chained and verifiable. QLDB provides APIs that allow you to verify the integrity of data to ensure that there have been no unintended modifications.
5/ Transparent โ Customers have full visibility into their entire data lineage. Customers can look back and easily track and analyze each change for the dataโs entire history.
6/ Fast โ Execute 2x more transactions. Unlike using ledgers in blockchain frameworks, Quantumโs transactions donโt require distributed consensus to execute. This allows Quantum to easily scale up and execute twice as many transactions than ledgers in common blockchain frameworks.
7/ Highly scalable โWith Quantum, no worries about provisioning capacity or configuring read and write limits. Simply create a ledger, define tables, and Quantum automatically scales to support the demands of your application.
8/ Easy to use โ Quantum provides familiar database capabilities to get you started quickly. With Quantum use a familiar SQL query language to interact with the database and there are no specific blockchain programming APIs to learn.
If you look at our offerings and how they align to these categories, our database strategy is fairly simple, we want to ensure you as developers have the very best purpose-built databases in each of these categories so that you never have to sacrifice on scale, performance, and functionality.
Today weโre going to talk about the Amazon DocumentDB - Fully managedย MongoDBย compatible database service designed from the ground up to be fast, scalable, and highly available