Database Agnostic Workload Management (CIDR 2019)

Database-Agnostic Workload
Management
Shrainik Jain, Jiaqi Yan*, Thierry
Cruanes*, Bill Howe
1/21/2019 1

Workload Management and Analytics
2
Workload
Summarization
Index Selection
Query Routing /
Resource
Allocation
Query
Recommendation
Pick your favorite
next challenge:
Query Forensics
Multi Query
optimization
Self-Tuning
Databases
Predicting
Cache
Performance
Modeling User
Behavior

Jain et al., CIDR 2019 3
Q
High
priority?
(Q, priority)
(Q, normal)
Fast server

Q
Heavy
hitter?
(Q, heavy)
(Q, normal)
Big cluster

Q
Likely
Error?
(Q, error)
(Q, no error)
Instrumented
cluster

Q
Atypical
query?
(Q, atypical)
(Q, typical)
Workload
summary for
periodic index
recommendation

Q
Suspicious
query?
(Q, suspicious)
(Q, not suspicious)
Audit Log

Q
(Q, estimated cost)
big cluster
optimizer

9
Q
heavy
suspicious
atypical
priority
(Q, heavy)
(Q, heavy, suspicious)
(Q, heavy, suspicious, atypical)
(Q, heavy, suspicious, atypical, priority)
RDS
Workload Management = Learning and
operationalizing a set of query labeling functions

Workload Management and Analytics
10
Workload
Summarization
Index Selection
Query Routing /
Resource
Allocation
Query
Recommendation
Pick your favorite
next challenge:
Query Forensics
Multi Query
optimization
Self-Tuning
Databases
Predicting
Cache
Performance
Modeling User
Behavior

○ Extract query type, count joins, etc. [Chaudhuri et al. 2002]
○ Extract fragments [Khoussainova et al. 2010]
○ Extract operators and sql functions [Jain et al. 2016]
○ etc.
Every workload management task => feature engineering

12
N TasksM SQL Dialects
PostgreSQL
Snowflake
SQL Server
and so on...
Summarization
Error Prediction
Query Routing
Security audits
N * M feature
extractors
More if tenant-
specific features are
important
Manual feature engineering is hopeless
● Many databases, many tasks
● Maybe ~10 database services, each with different dialects of SQL
● The dialects may change frequently, at different rates:
○ Ex: Snowflake SQL parser changes ~10 times / month on average
● 100s of millions of SQL-like queries per day (hour/minute/sec)...
● Workloads are diverse (yet structured) due to multi-tenancy

We want a query representation that can
support all these learning tasks
SELECT A
FROM
tableA, tableB
WHERE
tableA.B = tableB.A
AND tableA.C LIKE ‘%something%’
[0.2, 1, 23, 0.01 … … … … …]
Given a
query
Find a vector in k
dimensional space that
represents it.
13

14
predic
t
SELECT D,E,F,G FROM tableA, tableB WHERE tableA.A = tableB.B AND tableA.C = 4Q23
Doc2Vec
Word2Vec
Totally novel automatic feature learning:
Predict a token from its context;
use the learned weights as a
vector to represent the
predicted token

Lots of generic representations…
16
● Treat queries (or plans) as sentences (natural language text)
● Use representation learning methods for text
○ Doc2Vec
○ LSTM autoencoders
○ LSTM encoder-classifiers
○ TreeLSTM encoder-classifiers on query plans
○ CNNs

Sanity check: TPC-H
Query Representations for a TPC-H
workload projected onto two
dimensions using TSNE
17
Each color is a
different TPCH
query template
The learned
representations
are at least
minimally
coherent
Do generic NLP representations produce anything meaningful?

18
Error Prediction
big, real SQL workload
Each point is a query that
generated an error.
Random sample of 4200
error-generating queries
over a 7 day period.
Colors are selected error
codes
OOM
Error
Unknown
Timezone in
Date
Date Parse
Error
Divide by Zero

Error Prediction
19
Clusters are repeated syntactic patterns in the workload; they’re meaningful

DOES THIS ACTUALLY
WORK?

Datasets used
21
● Datasets for training Embedders
● Datasets for training classifiers
Workload Total
Queries
Distinct
Queries
Snowflake 500000 175958
TPC-H 4200 2180
Workload Total
Queries
Distinct
Queries
Snowflake-
MultiError
100000 17311
Snowflake-
OOM
4491 2501

Predicting OOM Errors
22
Method Precision Recall f1-score
Contains heavy joins 0.729 0.115 0.198
Contains window functions 0.762 0.377 0.504
Contains heavy joins OR window
functions
0.724 0.403 0.518
Contains heavy joins AND window
functions
0.931 0.162 0.162
Query2Vec-LSTM 0.983 0.977 0.980
Query2Vec-Doc2Vec 0.919 0.823 0.869

Predicting Other Errors
23
ErrorCode Precision Recall f1-score #queries
-1 (No Error) 0.986 0.992 0.989 7464
604 0.878 0.927 0.902 1106
606 0.929 0.578 0.712 45
608 0.996 0.993 0.995 3119
630 0.894 0.864 0.879 88
2031 0.765 0.667 0.712 39
90030 1 0.998 0.999 1529
100035 1 0.71 0.83 31
100037 1 0.417 0.588 12
100038 0.981 0.968 0.975 1191
100040 0.952 0.833 0.889 48
100046 1 0.923 0.96 13
100051 0.941 0.913 0.927 104
100069 0.857 0.5 0.632 12
100071 0.857 0.5 0.632 12
100078 1 0.974 0.987 77
100094 0.833 0.921 0.875 38
100097 0.923 0.667 0.774 18
~90% P/R

Security Audits:
Predict user, compare with actual user
#queries #users Accuracy
73881 28 49.30%
55333 10 37.40%
18487 46 31.80%
5471 21 96.20%
4213 6 58.50%
3894 12 99.70%
3373 9 99.80%
2867 6 99.80%
1953 15 89.10%
1924 4 98.10%
1776 9 95.20%
1699 5 99.80%
1108 12 98.20%
Account
Labeling
User
Labeling
Doc2Vec 78.8% 39%
LSTMAutoencode
r
99.1% 55.4%

Workload Summarization
for Index Recommendation
A lot of
Queries
Account_name =
‘xyz’
Workload
Apply
Filters
100
Queries
Sample
Uniform
Sample Output
Workload
25

100
Queries
A lot of
Queries
Account_name =
‘xyz’
Workload
Apply
Filters
Summarization
using query vectors
Output
Workload
26
** Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, Allison Lee, “Snowtrail: Testing with Production Queries on a Cloud
Database”, DBTEST 2018
** Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, Allison Lee, “Snowtrail: Testing with Production Queries on a Cloud
Database”, US Patent Application No. 62/646,817
Workload Summarization
for Index Recommendation

Evaluation of workload summary:
index recommendation
27
○ Run the full workload with no indexes, record the time (t1)
○ Recommend and create indexes on the FULL workload
○ Run the full workload again, record the time (t2)
○ Generate small workload summary
○ Recommend and create indexes on the SUMMARY workload
○ Run the full workload again, record the time (t3)
○ Set a time budget for the recommender

28
Transfer learning:
We can even learn the
model on Snowflake
workload, and use it to
infer representations for
the TPC-H workload
Workload Summarization for Index Selection

Querc: Query Classsifier
30
Reuse
embeddings
where possible
Collect training
labels from the
databases (cost,
error codes)
Retrain models
periodically, or
online

Last slide
● Every workload management task is query labeling
● You don’t need fancy features
● You can’t maintain fancy features anyway
● SQL strings (and plans) have a lot of signal
● There is tons of training data
● Your workload is not “all possible queries” – use the
patterns
● Transfer learning works – you can train on one workload
and use on another
● Opens up a lot of simple interesting little applications
○ User behavior modeling, resource allocation, …
● External “query labeling service” keeps everything
organized 31
Shrainik
Jain

Query recommendation:
Predict next query in a session
32

33
Up is
good
Learned features about as good as manual features,
even with generous assumptions

Database Agnostic Workload Management (CIDR 2019)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Database Agnostic Workload Management (CIDR 2019)

Similar a Database Agnostic Workload Management (CIDR 2019) (20)

Más de University of Washington

Más de University of Washington (20)

Último

Último (20)

Database Agnostic Workload Management (CIDR 2019)

Notas del editor