This combined #SFMySQL and #SFPHP meetup talked about Shard-Query. You can find the video to accompany this set of slides here: https://www.youtube.com/watch?v=vC3mL_5DfEM
2. Introduction
Presenter
• Justin Swanhart
• Principal Support Engineer at Percona
• Previously a trainer and consultant at Percona too
Developer
• Swanhart-tools
• Shard-Query – MPP sharding middleware for MySQL
• Flexviews – Materialized views (fast refresh) for MySQL
• bcmath UDF – arbitrary precision math for MySQL
3. Intended Audience
• MySQL users with data too large to query efficiently using a single
machine
• Big Data
• Analytics / OLAP
• User generated content analysis
• People interested in distributed database processing
5. MPP – Massively Parallel Processing
• An MPP system is a system that can process a SQL statement in
parallel on a single machine or even many machines
• A collection of machines is often called a Grid
• MPP is also sometimes called Grid Computing
6. MPP (cont)
• Not many open source databases (none?) support MPP
• Community editions of closed source offerings are limited
• Some closed source databases include Vertica, Greenplum, Redshift
7. The Cloud
• Managed collection of virtual servers
• Easy to add servers on demand
• Ideal for a federated, distributed database grid
• Easy to “scale up” by moving to a VM with more cores
• Easy to “scale out” by adding machines
• Amazon is one of the most popular cloud environments
8. LAMP stack
• Linux
• Amazon Linux
• RHEL
• Ubuntu LTS, etc.
• Apache Web Server
• Most popular web server on the planet
• MySQL
• The world’s most popular open source database
• PHP
• High level language makes development easier
9. Database Middleware
• A piece of software that sits between an end-user application and
the database
• Operates on the queries submitted by the application, then
returns the results to the application
• Usually a proxy of some sort
• MySQL proxy is the open source user configurable proxy for MySQL
• Supports Lua scripts which intercept queries
• Shard-Query can use MySQL Proxy out of the box
10. Message Queue / Job Server
• Accepts jobs or messages and places them in a queue
• A worker reads jobs/messages from the queue and acts on them
• Offers support for asynchronous jobs
• Gearman
• My job server of choice for PHP
• Has two different PHP interfaces (pear and pecl)
• SQ comes bundled with a modified version* of the pear interface
• Excellent integration with MySQL as well (UDF)
* Removes warnings triggered by modern PHP strict mode
11. Sharding
• It is a short for Shared Nothing
• Means splitting up your data onto more than one machine
• Tables that are split up are called sharded tables
• Lookup tables are not sharded. In other words, they must be
duplicated on all nodes
• Shard-Query supports directory based or hash based sharding
12. Shard mapper
• Shard-Query supports DIRECTORY and HASH mapping out of the
box
• DIRECTORY based sharding allows you to add or remove shards
from the system, but lookups may go over the network, reducing
performance* compared to HASH mapping
• HASH based sharding uses a hash algorithm to balance rows over
the sharded database. However, since a HASH algorithm is used,
the number of database shards can not change after initial data
loading.
* But only for queries like “select count(*) from table where customer_id = 50”
13. What is “big data”
Most machine generated data
• Line order information for a large organization like Wal-Mart™
• Any data so large that you can’t effectively operate on it on one
machine
• For example, an important query that needs to run daily executes in
greater than 24 hours. It is impossible to meet the daily goal unless
you can find a way to make the query execute faster.
• These kind of problems can happen on relatively small amounts of
data (tens of gigabytes)
14. Analytics(OLAP) versus OLTP
• OLTP is focused on short lived small transactions that read or
write small amounts of data
• OLAP is focused on bulk loading and reading large amounts of
data in a single query.
• Aggregation queries are OLAP queries
• Shard-Query is designed for analytics (OLAP) not OLTP
• must parse all commands sent to it (and make multiple round trips)
• Minium query time of around 20ms
16. Single thread queries in the database
• MySQL, PostgreSQL, Firebird and all other major open source
databases have single threaded queries
• This means that a single query can only ever utilize the resources
of a single core
• As the data size grows, analytical queries get slower and slower
• In memory, as the data grows the speed decreases because the data is
accessed in a single query
• As the number of rows to be examined increases, performance
decreases
17. Why single threaded
• MySQL is optimized for getting small amounts of data
quickly(OLTP)
• It was created at a time when having more than one CPU was not
common
• Adding parallelism now is a very complex task, particularly since
MySQL supports multiple storage engines
• So adding parallel query is not a high priority (not even on the
roadmap)
• Designed to run LOTS of small queries simultaneously, not one
big query
18. Single Threading – bad for IO
• If the data set is significantly larger than memory, single threaded
queries often cause the buffer pool to "churn“
• For example, small lookup tables can easily be pushed out of the buffer
pool, resulting in frequent IO to look up values
• While SSD may helps somewhat, one database thread can not read
from an SSD at maximum device capacity
• While the disk may be capable of 1000+ MB/sec, a single thread is
generally limited to <100MB/sec (usually 30-40)
• This is because a single thread shares doing IO AND running the query
on one CPU (MySQL does not use read threads for queries)
19. The OLAP Example
• A large company maintains a star schema of their sales history for
analytics purposes
• This company likes to present a sum total of orders for all time on
the dashboard
• In the beginning the query is very fast
• It gets slower, though, as months of data are added and as the business
grows, data increases too
• Eventually the query takes more than 24 hours to run, which means it
can no longer be updated daily
• “Drill down” gets slower as data increases
20. What can be done?
• Caching?
• Materialized views?
• Partitioning?
• Sharding?
21. Making OLAP more like OLTP!
• Shard-Query breaks on big query up into smaller queries that can
access the database in parallel
• Partitioning and sharding are used to keep data size for any single
query to a minimum
• If your table has 16 partitions, you can get up to 16 way parallelism
• If you also have 2 nodes, you get 32 way parallelism, and so on
• You can use multiple database schema on a single server instead (a
form of sharding) if you don’t partition your data
23. Sharding Reviewed
• A sharded database contains multiple nodes or databases called
shards
• One physical machine might host many shards
• Each shard has identical schema
24. Sharding Reviewed (cont)
• The multiple shards function together as one RDBMS system.
• You can think of the shards as a big UNION ALL of the data, with
only a portion of the data on any one machine
• A mechanism must control which server on which to place
particular pieces of data.
• In Shard-Query a particular column controls data placement – this
is called the shard key
25. Sharding – Data distribution
• There are usually one or two large tables that are sharded
• These are usually called FACT tables
• An example might be blogs, blog_posts and blog_comments. All three
share a “shard key” of blog_id
• Most common case is one big table with smaller lookup tables
26. Sharding Reviewed (cont)
• The shard key is very important!
• Since a specific column acts as the “shard key”, all sharded tables must
contain the shard key.
• For example: blog_id might be the shard key.
• The rows for a specific blog_id are then located on the same shard in
any table that has the blog_id column
27. Optimization - Shard Elimination
• When Shard-Query sees an expression on the shard key it looks up*
the shard that contains the appropriate data and only sends queries to
the necessary shards.
• Equality lookup is most efficient, but IN, BETWEEN and other operators are allowed
as well
• Lookups may not use subqueries (ie, blog_id IN (1,2,3) is okay, not blog_id in (select
…))
• This is called “shard elimination”
• Shard elimination is analogous to partition elimination.
• where blog_id = 10, for example
28. Can Shard-Query help on 1 machine?
• Yes! - Use MySQL partitioning on a single machine
• Shard-Query can access the partitions of a table in parallel!
• This means that if you have many partitions, then Shard-Query can
utilize many cores to answer the query
Use partitions for
parallelism
29. How does that work?
• Shard-Query executes an EXPLAIN PLAN on the query
• This EXPLAIN PLAN shows the partitions that MySQL will access
when running the query
• Shard-Query uses the 5.6 PARTITION hint to generate one query
per partition
• These queries can execute in parallel
30. Sharding can help too
• How?
• Shard-Query adds parallelism to queries by spreading them over nodes
in parallel
• Spread the data over four nodes and queries are 4x faster
MySQL database shards
Shard-Query
31. Sharding + Partitioning is best
• Why?
• Partition the tables to add parallelism to each node
• Use sharding to have multiple nodes working together
• 4 nodes with 3 partitions each = 12 way parallelism
Shard-Query
MySQL database shards
Partitions
33. Configuration Repository
• Shard-Query stores all configuration information in a MySQL
database called the configuration repository
• This should be a highly available replication pair (or XtraDB
cluster) for HA
• Web interface can change the settings
• Manual settings changes can be done via SQL
• schemata_config table in Shard-Query repository
• Makes using Shard-Query easier, especially when using more than
one node
34. PHP OO Apache
Web
Interface
MySQL
Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
Interfaces
Communication
Workers
Storage
Config
Repository
Configuration
Management
35. PHP OO Apache
Web
Interface
MySQL
Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
Gearman job server
• Provides the parallel mechanism
for Shard-Query
• Multiple Gearman are
supported for HA
• Enables Shard-Query to use a
map/reduce like architecture
• Sends jobs to workers when they
arrive at the queue
• If all workers are busy the job
waits
36. Gearman at a glance
Shard-Query OO
Store-resultset
Loader worker
SQ run SQL worker
37. PHP OO Apache
Web
Interface
MySQL
Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
Three kinds of workers
• loader_worker – Listens for
loader jobs and executes them.
Used by parallel loader.
• shard_query_worker – Listens
for SQL jobs, runs the job via
Shard-Query and returns the
results as JSON. Used by web
and proxy interfaces.
• store_resultset_worker – Main
worker used by Shard-Query. It
runs SQL and stores the result
in a table.
38. PHP OO Apache
Web
Interface
MySQL
Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
PHP Object Oriented Interface
• Very simple to use
• Constructor parameters not
even usually needed
• Just one function to run a SQL
query and get results back
• Complete example comes with
Shard-Query as:
bin/run_query
39. PHP OO Example (from bin/run_query):
$shard_query = new ShardQuery();
$stime = microtime(true);
$stmt = $shard_query->query($sql);
$endtime = microtime(true);
if(!empty($shard_query->errors)) {
if(!empty($shard_query->errors)) {
echo "ERRORS RETURNED BY OPERATION:n";
print_r($shard_query->errors);
}
}
if(is_resource($stmt) || is_object($stmt)) {
$count=0;
while($row = $shard_query->DAL->my_fetch_assoc($stmt)) {
print_r($row);
++$count;
}
echo "$count rows returnedn";
$shard_query->DAL->my_free_result($stmt);
} else {
if(!empty($shard_query->info)) print_r($shard_query->info);
echo "no query resultsn";
}
echo "Exec time: " . ($endtime - $stime) . "n";
Simple data access layer
comes with Shard-Query
Errors are returned as a member
of the object
Run the query
40. PHP OO Apache
Web
Interface
MySQL
Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
Apache web interface
• GUI
• Easy to set up
• Run queries and get results
• Serves as an example of using
Shard-Query in a web app with
asynchronous queries
• Submits queries via Gearman
• Simple HTTP authentication
41. PHP OO Apache
Web
Interface
MySQL
Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
MySQL Proxy Interface
• LUA script for MySQL Proxy
• Supports most SHOW
commands
• Intercepts queries, and sends
them to Shard-Query using the
MySQL Gearman UDF
• Serves as another example of
using Gearman to execute
queries.
• Behaves slightly differently than
MySQL for some commands
42. Query submitted
SQL is parsed
Query rewrite
for parallelism
yields multiple
queries
Gearman Jobs
(map/combine)
Final Aggregation
(reduce)
Return result
Shard-Query Data Flow
Map/reduce like workflow
43. Query submitted
SQL is parsed
Query rewrite
for parallelism
yields multiple
queries
Gearman Jobs
(map/combine)
Final Aggregation
(reduce)
Return result
Shard-Query Data Flow
44. SQL Parser
• Find it at http://github.com/greenlion/php-sql-parser
• Supports
• SELECT/INSERT/UPDATE/DELETE
• REPLACE
• RENAME
• SHOW/SET
• DROP/CREATE INDEX/CREATE TABLE
• EXPLAIN/DESCRIBE
Used by SugarCRM too, as
well as other open source
projects.
45. Query submitted
SQL is parsed
Query rewrite
for parallelism
yields multiple
queries
Gearman Jobs
(map/combine)
Final Aggregation
(reduce)
Return result
Shard-Query Data Flow
46. Query Rewrite for parallelism
• Shard-Query has to manipulate the SQL statement so that it can
be executed over more than on partition or machine
• COUNT() turns into SUM of COUNTs from each query
• AVG turns into SUM and COUNT
• SEMI-JOIN is turned into a materialized join
• STDDEV/VARIANCE are rewritten as well use the sum of squares
method
• Push down LIMIT when possible
47. Query Rewrite for parallelism (cont)
• Because lookup tables are duplicated on all shards, the query
executes in a shared-nothing way
• All joins, filtering and aggregation are pushed down
• Mean very little data must flow between nodes in most cases
• High performance
• Meets or beats Amazon Redshift in testing at 200GB of data
48. Query submitted
SQL is parsed
Query rewrite
for parallelism
yields multiple
queries
Gearman Jobs
(map/combine)
Final Aggregation
(reduce)
Return result
Shard-Query Data Flow
49. Map/Combine
• The store_resultset gearman worker runs SQL and stores the result
in a table
• To keep the number of rows in the table (and the time it takes to
aggregate results in the end) small, an INSERT … ON DUPLICATE
KEY UPDATE (ODKU) statement is used when inserting the rows
• There is a UNIQUE KEY over the GROUP BY attributes to facilitate
the upsert
50. Query submitted
SQL is parsed
Query rewrite
for parallelism
yields multiple
queries
Gearman Jobs
(map/combine)
Final Aggregation
(reduce)
Return result
Shard-Query Data Flow
51. Final aggregation
• Shard-Query has to return a proper result, combining the results
in the result table together to return the correct answer
• Again, for example COUNT must be rewritten as SUM to combine
all the counts (from each shard) in the result table
• Aggregated result is returned to the client
52. Shard-Query Flow as SQL
[justin@localhost bin]$ ./run_query --verbose
select count(*) from lineorder;
Shard-Query optimizer messages:
SQL TO SEND TO SHARDS:
Array
(
[0] => SELECT COUNT(*) AS expr_2913896658
FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1
[1] => SELECT COUNT(*) AS expr_2913896658
FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1
[2] => SELECT COUNT(*) AS expr_2913896658
FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1
[3] => SELECT COUNT(*) AS expr_2913896658
FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1
)
SQL TO SEND TO COORDINATOR NODE:
SELECT SUM(expr_2913896658) AS ` count `
FROM `aggregation_tmp_58392079`
Array
(
[count ] => 0
)
1 rows returned
Exec time: 0.03083610534668
Initial query
Query rewrite / map
Final aggregation / reduce
Final result
53. Map/Combine example
select LO_OrderDateKey, count(*) from lineorder group by LO_OrderDateKey;
Shard-Query optimizer messages:
* The following projections may be selected for a UNIQUE CHECK on the storage node operation:
expr$0
* storage node result set merge optimization enabled:
ON DUPLICATE KEY UPDATE
expr_2445085448=expr_2445085448 + VALUES(expr_2445085448)
SQL TO SEND TO SHARDS:
Array
(
[0] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448
FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[1] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448
FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[2] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448
FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[3] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448
FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 GROUP BY expr$0
)
SQL TO SEND TO COORDINATOR NODE:
SELECT expr$0 AS `LO_OrderDateKey`,SUM(expr_2445085448) AS ` count `
FROM `aggregation_tmp_12033903` GROUP BY expr$0
combine
reduce
55. Machine generated data
• Sensor readings
• Metrics
• Logs
• Any large table with short lookup tables
Star schema are ideal
56. Call detail records
• Shard-Query is used in the billing system of a large cellular provider
• CDRs generate a lot of data
• Shard-Query includes a fast PERCENTILE function
57. Green energy meter processing
• High volume of data means sharding is necessary
• With Shard-Query, reporting is possible over all the shards,
making queries possible that would not work with Fabric or other
sharding solutions
• Used in India for reporting on a green power grid
58. Log analysis
• Performance logs from a web application for example
• Aggregate many different statistics and shard if log volumes are
high enough
• Search text logs with regular expressions
60. Star Schema Benchmark – SF 20
• 119 million rows of data (12GB)
• Infobright Community Database
• Only 1st query from each “flight” selected
• Unsharded compared to four shards (box has 4 cpu - Amazon
m1.xlarge)
61. COLD
• MySQL – 35.39s
• Shard-Query – 11.62s
HOT
• MySQL – 10.99s
• Shard-Query – 2.95s
Query 1
select sum(lo_extendedprice*lo_discount) as revenue
from lineorder join dim_date on lo_orderdatekey = d_datekey
where d_year = 1993
and lo_discount between 1 and 3
and lo_quantity < 25;
62. COLD
• MySQL – 34.24s
• Shard-Query – 12.74s
HOT
• MySQL – 12.74s
• Shard-Query – 3.26s
Query 2
select sum(lo_revenue), d_year, p_brand
from lineorder
join dim_date on lo_orderdatekey = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12'
and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;
63. COLD
• MySQL – 27.29s
• Shard-Query – 7.97s
HOT
• MySQL – 18.89
• Shard-Query – 5.06s
Query 3
select c_nation, s_nation, d_year, sum(lo_revenue) as revenue
from customer join lineorder
on lo_custkey = c_customerkey
join supplier on lo_suppkey = s_suppkey
join dim_date on lo_orderdatekey = d_datekey
where c_region = 'ASIA'
and s_region = 'ASIA'
and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, revenue desc;
64. COLD
• MySQL – 23.02s
• Shard-Query – 8.48s
HOT
• MySQL – 14.77
• Shard-Query – 4.29s
Query 4
select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit
from lineorder join dim_date on lo_orderdatekey = d_datekey
join customer on lo_custkey = c_customerkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'
and s_region = 'AMERICA'
and (p_mfgr = 'MFGR#1'
or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;