SlideShare una empresa de Scribd logo
1 de 118
Descargar para leer sin conexión
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
MySQL Schema Design in Practice
Jaime Crespo
Percona Live Amsterdam 2016
-Amsterdam, 3 Oct 2016-
https://wikitech.wikimedia.org/wiki/User:Jcrespo/plam16
2
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Agenda
0. Introduction & setup 5. Case #5: Revisions and deletions
1. Case #1: Random pages 6. Case #6: A large table
2. Case #2: Supporting 290
Languages
7. Case #7: What links here
3. Case #3: An abnormal
denormalization
8. Case #8: Anecdotes: The ghost
tables and Timestamps
4. Case #4: Key-value system 9. Case #9: Slots
3
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
INTRODUCTION & SETUP
MySQL Schema design in practice
4
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
●
Sr. Database Administrator at
Wikimedia Foundation
●
Used to work as a trainer for
Oracle (MySQL), as a
Consultant (Percona) and as a
Freelance administrator
(DBAHire.com)
This is me fighting bad query
performance
5
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Schema design is key for query performance
●
Check my past presentations at:
http://www.slideshare.net/jynus/
6
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Mediawiki as the example application
• Mediawiki code distributed under GPL 2 or later
• All Wikimedia project's data licensed under CC-BY-SA-2.5
7
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Accessing Wikimedia Production Database (I)
●
Login or register a Wikimedia SUL account (for example,
on https://en.wikipedia.org )
●
Use that account to authenticate on Quarry:
http://quarry.wmflabs.org/
●
Send your queries to the right database (for example,
enwiki_p)!
8
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Accessing Wikimedia Production Database (II)
9
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Session Dynamic
●
A real database design problem is presented
●
A brief discussion starts (5-10 minutes top)
– If you know the answer, let other people do proposals first;
crazy ideas are encouraged
– assume anything you need
– this is the place to be wrong, not to show of
●
We analyze the proposals, balance its strengths and
weaknesses and compare them with the one in use
10
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Case #0: Designing a schema for Wikipedia
●
Which entities would we need?
●
Which relationships?
●
What kind of queries are probably the most common
ones?
●
What do you think are the main scalability and pain
points?
11
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Potential entities related to content
●
Page: do we need one?
●
Edit: diferent from page?
●
Dif: Should it be a first-class entity?
●
Revision: Large table?
12
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
What about page types and properties?
●
Talk pages: Same entity or separate table?
●
Categories, images (files), redirections: are they regular
pages or should they be stored on its own entity?
●
Similar question for image description pages
●
Categories and tags for pages/revisions: how to implement
them?
●
Protection: Some pages can have restrictions on who can edit
them
●
Other properties (tags?)
13
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Identifiers
●
How to solve the problem of 2 pages that should have
the same title: Ben Hur (1959 film) and Ben Hur (2016
film)?
●
Should we use the page name or an arbitrary id to
identify a page?
●
What about revision ids?
●
If we needed it, should it be a UUID or a numerical id?
14
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
15
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Mediawiki Schema
16
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Disclaimers
●
The best solution on paper is not necessarily the best
on production
– It may be too difficult to migrate existing logic (15-
year old application) or not worth it
– Performance is not the only metric: Security,
scalability, reliability, simplicity, etc.
●
You will see many examples of compromises like this on
our application
17
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #1: RANDOM PAGES
MySQL Schema design in practice
18
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Problem #1 Description
●
At the left of each page there is a link to a
“Random article”
●
It should allow to filter by namespace (e.g.
not all pages are articles)
●
It is a relatively important page compared
to, let’s say, Google’s “I am feeling lucky”,
as it will give an overview of what kind of
articles you will find in a project
19
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Problem #1 Restrictions
●
It has to work as fast as a regular page, but
for obvious reasons cannot be cached
●
It has to be realistically pseudorandom
●
It has to always return a result
●
It has to work on a continuously increasing
number of pages, and scale from 1 to
millions
20
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Potential solutions
● ORDER BY rand() LIMIT 1
●
Use a well-distributed integer id, use it to get one at
random
●
Questions?
– What indexes would be beneficial on each case?
– How to count the number of total ids?
– How to handle deletions?
21
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
22
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual solution: Table design (I)
CREATE TABLE /*_*/page (
[…]
-- A page name is broken into a namespace and a title.
-- The namespace keys are UI-language-independent constants,
-- defined in includes/Defines.php
page_namespace int NOT NULL,
-- The rest of the title, as text.
-- Spaces are transformed into underscores in title storage.
page_title varchar(255) binary NOT NULL,
-- 1 indicates the article is a redirect.
page_is_redirect tinyint unsigned NOT NULL default 0,
[…]
-- Random value between 0 and 1, used for Special:Randompage
page_random real unsigned NOT NULL,
23
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual solution: Table design (II)
[…]
CREATE INDEX /*i*/page_random ON /*_*/page
(page_random);
24
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual solution: Relevant code
protected function getQueryInfo( $randstr ) {
$redirect = $this->isRedirect() ? 1 : 0;
$tables = [ 'page' ];
$conds = array_merge( [
'page_namespace' => $this->namespaces,
'page_is_redirect' => $redirect,
'page_random >= ' . $randstr
], $this->extra );
$joinConds = [];
// Allow extensions to modify the query
Hooks::run( 'RandomPageQuery', [ &$tables, &$conds, &$joinConds ] );
return [
'tables' => $tables,
'fields' => [ 'page_title', 'page_namespace' ],
'conds' => $conds,
'options' => [
'ORDER BY' => 'page_random',
'LIMIT' => 1,
],
'join_conds' => $joinConds
];
}
From: mediawiki/core/includes/specials/SpecialRandompage.php
25
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual solution: Query generated
SELECT page_title, page_namespace
FROM page
LEFT JOIN page_props
ON page_id = pp_page AND
pp_propname = ?
WHERE page_namespace IN (…) AND
page_is_redirect = 0 AND
page_random >= $rand
ORDER BY page_random
LIMIT 1;
26
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual solution: Performance (I)
mysql> EXPLAIN SELECT … G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: page
type: range
possible_keys: name_title,page_random,page_redirect_namespace_len
key: page_random
key_len: 8
ref: NULL
rows: 20473233
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: page_props
type: eq_ref
possible_keys: PRIMARY,pp_propname_page,pp_propname_sortkey_page
key: PRIMARY
key_len: 66
ref: enwiki.page.page_id,const
rows: 1
Extra: Using where; Using index; Not exists
27
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual solution: Performance (II)
mysql> SELECT … FROM sys.x$statement_analysis …
*************************** 1. row ***************************
exec_count: 27126203
max_latency: 450802755000
avg_latency: 755698000
lock_latency: 2224515688000000
rows_sent: 27125869
rows_sent_avg: 1
rows_examined: 0
rows_examined_avg: 0
rows_affected: 0
rows_affected_avg: 0
tmp_tables: 0
tmp_disk_tables: 0
rows_sorted: 777598
sort_merge_passes: 0
+------------------------------+
| sys.format_time(avg_latency) |
+------------------------------+
| 755.70 us |
+------------------------------+
28
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #2: SUPPORTING 290
LANGUAGES
MySQL Schema design in practice
29
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikipedia launch and early growth
●
Wikipedia was launched on January 15, 2001, as a single
English-language edition
●
By August 8, 2001, Wikipedia had over 8,000 articles.
●
On September 25, 2001, Wikipedia had over 13,000
articles.
●
By the end of 2001, it had grown to approximately
20,000 articles and 18 language editions.
References: https://en.wikipedia.org/wiki/Wikipedia#History
30
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Single-tenancy vs Multi-tenancy
●
1 database per wiki:
– Easier to code
– Easier to scale (?) - you can move wikis to a diferent
server
●
Several wikis on a single database
– More efficiency, specially for small wikis
– They can share existing user database
31
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Should we Shard?
●
From early on, people was telling us “you need to shard
to scale”
●
Is it really such a bad idea? When is it needed? When can
it be avoided?
●
If we shard, based on which key?
32
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
33
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia solution
●
As of this lines, the Wikimedia Foundation hosts 897
wiki-like projects (diferent types and languages)
●
They are divided on 7 “shards” (functional partitions)
– 1 master per shard and datacenter
– Multiples slaves sharing the read load
●
English wikipedia has its own separate shard (s1)
●
s3 host most of the wikis (892)
34
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Functional partitioning
Source: https://dbtree.wikimedia.org
35
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
No sharding
●
Our number of edits is “low” compared to our reads
– Only 500-3000 logical page edits per minute
https://grafana.wikimedia.org/dashboard/db/edit-co
unt
– That means 2000-8000 unique rows written per
second
https://grafana.wikimedia.org/dashboard/db/mysql-
aggregated
– Compared to ~300K total QPS (10-40M rows read/s)
36
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Lessons learned: users were separate for years
●
Users were required to register on each project and
language independently
●
Diferent users had the same name registered on
diferent wikis
●
From discussion to universal deployment, it took almost
10 years:
https://meta.wikimedia.org/wiki/Help:Unified_login
37
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Unicode
38
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Escaping Latin1
●
The mission of the Wikimedia Foundation is to provide
free content in every language
●
That was not possible with Latin1
– It only supports Western languages
39
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Multi-language support was limited back in the
day
●
How many of you has ever created a database in
latin1_swedish_ci ?
●
Real UTF-8 support beyond the BMP was added in
MySQL 5.5 (utf8mb4)
●
Still today, latest collation support is relatively new
40
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Requirements
●
Full support of all available character sets in the world
●
Support for fully customizable ordering (e.g. entries
within categories), it can be diferent depending on the
language
●
It has to work with available technology 15 years ago
41
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
42
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia solution: character set
●
Text-like fields are stored in binary fields
– Technically, they are strings with the binary charset
set
●
Latest versions of Mediawiki allow utf8mb4, too
– It wouldn’t work for Wikimedia sites, collation has
been traditionally very limiting
43
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
tables.sql
CREATE TABLE /*_*/user (
user_id int unsigned NOT NULL PRIMARY KEY
AUTO_INCREMENT,
user_name varchar(255) binary NOT NULL default '',
44
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia solution: collation
●
List of ordered articles are avoided
●
Whenever custom ordering is needed, an additional,
indexed field is used to allow per-table configurable
ordering
●
Whenever the ordering has to be changed, only row-
level changes have to be done, instead of ALTER TABLEs
45
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
$wgCategoryCollation
●
https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation
if ( !$dryRun ) {
$dbw->update(
'categorylinks',
[
'cl_sortkey' => $newSortKey,
'cl_sortkey_prefix' => $prefix,
'cl_collation' => $collationName,
'cl_type' => $type,
'cl_timestamp = cl_timestamp',
],
[ 'cl_from' => $row->cl_from, 'cl_to' => $row->cl_to ],
__METHOD__
);
}
46
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Unicode is not the only challenge
47
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #3: AN ABNORMAL
DENORMALIZATION
MySQL Schema design in practice
48
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Users on Wikimedia projects
●
An account is not needed to view the content
●
An account is not needed to edit the content
(anonymous edits)
●
Registered users get some advantages:
– Better tools for editing
– Persistent configurable preferences
49
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Content review
●
There must be a way to see the edits from the same
users (all edits must be publicly review-able)
●
There must be a way to block “vandalisms”
(misbehaving users, both registered and unregistered)
●
In extreme cases, there must be a way to protect
content from certain user groups (page protections)
●
Only trusted users should be able to destroy content
50
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
How to represent users?
●
Should we use strings or arbitrary numerical ids?
– If we use strings, how to rename registered users?
– If we use arbitrary ids for registered users, how to
reference non-registered ones?
●
How to be able to block returning users, including
anonymous ones?
●
How to allow anonymous edits on countries with
doubtful privacy laws?
51
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
52
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia solution
●
Registered users have a local user id and a string
identifier
●
Local wiki accounts are linked to a global unified
account (SUL) on “centralauth” database
●
Anonymous users are identified by its IPv4 or IPv6
string
●
In general, editions store both the id and the user text
53
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Table revision
CREATE TABLE `revision` (
`rev_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`rev_page` int(10) unsigned NOT NULL,
`rev_text_id` int(10) unsigned NOT NULL,
`rev_comment` tinyblob NOT NULL,
`rev_user` int(10) unsigned NOT NULL DEFAULT '0',
`rev_user_text` varbinary(255) NOT NULL DEFAULT '',
[…]
) ENGINE=InnoDB AUTO_INCREMENT=N DEFAULT
CHARSET=binary
54
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual user content (registered user)
mysql> SELECT * FROM revision WHERE rev_id = 742218408G
************************ 1. row ************************
rev_id: 742218408
rev_page: 46812822
rev_text_id: 750242058
rev_comment: Testing revision comment
rev_user: 25118340
rev_user_text: JCrespo (WMF)
rev_timestamp: 20161002111252
...
55
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Actual user content (anonymous user)
mysql> SELECT * FROM revision WHERE rev_id = 742219056G
************************ 1. row ************************
rev_id: 742219056
rev_page: 46812822
rev_text_id: 750242734
rev_comment: As an anonymous user, my public IP
will get saved, instead of a username
rev_user: 0
rev_user_text: 80.113.15.100
rev_timestamp: 20161002111851
...
56
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Pros and cons of the current implementation
●
By denormalizing the table, most of the time only this
table has to be checked
– Only when a user is clicked the user table is accessed
– No need to store information for the large amount
of anonymous users with very few edits
●
User renames are painful database-wise, and almost
impossible for users with huge amount of edits (like
bots)
57
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #4: KEY-VALUE SYSTEM
MySQL Schema design in practice
58
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Content storage
●
In the typical mediawiki installation content is
referenced this way:
– Pages can have several revisions, by default the last
one is parsed and displayed
– Revisions point to a text row
– Text contains wikitext that has to be parsed
“rendered” and sent to the user
59
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia sites needs to return
hundreds thousands of pages per second
●
The size of the content (wikitext) is multiple times that of the
metadata
●
The database growth is also diferent from the metadata, and
very diferent for each wiki
●
Should we setup a separate key-value system to store those
edits? What should we seek?
– Compression
– Automatic sharding
– Automatic failover
– JSON support for flexible datatypes
60
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
61
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia solution
●
Each page can have a diferent content model/storage
●
For example:
– Regular Wikitext pages
– User-editable JS/CSS/application messages
– Forum threads (Flow feature, etc.)
– Any other created by new extensions
62
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikitext storage
●
The “text” table only contains pointers to content, not real content
mysql> SELECT * FROM text ...;
+---------+-----------------------+---------------------+
| old_id | old_text | old_flags |
+---------+-----------------------+---------------------+
| 1 | #REDIRECT [[Town of 1770]] | utf-8 |
| 2 | #REDIRECT [[Project:One-liner listings]] | utf-8 |
...
| 3027206 | DB://cluster24/545108 | utf-8,gzip,external |
| 3027205 | DB://cluster25/544444 | utf-8,gzip,external |
+---------+-----------------------+---------------------+
63
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
External Storage
●
Several shards of MySQL servers can serve content for every
wiki
●
The text is compressed and decompressed in gzip format at
application level
●
Smart “compressing” can be done:
– Reviews with the same content for the same page are
deduplicated
– Older revisions are stored only using difs
64
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
External Storage Cluster
65
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Solid performance over HDs
mysql> SELECT
digest, digest_text, sum(COUNT_STAR),
sum(SUM_TIMER_WAIT)/SUM(COUNT_STAR)/1000000 as microseconds,
min(first_seen), max(last_seen)
FROM events_statements_summary_by_digest
GROUP BY DIGEST ORDER BY sum(COUNT_STAR) DESC LIMIT 1G
*************************** 1. row ***************************
digest: 1bf861e8cd3ea6bcac323bdf9caf4876
digest_text: SELECT `blob_text` FROM `blobs_cluster25` WHERE `blob_id`
= ? LIMIT ?
sum(COUNT_STAR): 2356927126
microseconds: 3384.60227946
min(first_seen): 2015-12-09 15:00:45
max(last_seen): 2016-10-02 11:41:01
66
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Speed optimizations
●
External storage is only the canonical place where data is stored
– Several layers of caching makes them low-traffic, disk-based
storage
●
Parsercache in memory (memcached) and disk (parsercache
mysqls) avoids frequent usage
– Local memcache is fast
– Parsercache is shared even between datacenters, and
survives restarts
67
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #5: REVISIONS AND
DELETIONS
MySQL Schema design in practice
68
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Deletions
●
Pages can be deleted, with several degrees:
– A new revision could just override a previous version of the page
– A deletion that could be restored afterwards
– Personal information or copyright that must be hidden from
everyone
●
In some cases, only some revisions should be deleted, not the whole
page- e.g. someone editing an otherwise legitimate page with
someone’s private data
69
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
How to implement deletions and
restores?
●
Should we delete content rows when doing hard-
deletes or just overwrite the them with garbage?
●
Should we move the rows to an archive table or should
we mark them with deleted=1?
70
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
71
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia implementation
●
Individual revisions can be hidden (“suppresed”), no matter
the page status
●
Normal procedure is that full pages have to be deleted:
– In that case, revisions are moved to the archive table
– The page entry is deleted
●
On restore, revisions are moved back to the revision table
– A new page, with a new page_id is created
– Not all revisions have to be restored necessarily
72
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
73
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Deleting a page Code
/**
* Back-end article deletion
* Deletes the article with database consistency, writes logs, purges caches
…
*/
public function doDeleteArticleReal(
$reason, $suppress = false, $u1 = null, $u2 = null, &$error = '', User $user = null,
$tags = []
) {
$dbw = wfGetDB( DB_MASTER );
$res = $dbw->select(
'revision',
array_merge( $fields, $deletionFields ),
[ 'rev_page' => $id ],
__METHOD__,
'FOR UPDATE'
);
$dbw->insert( 'archive', $rowsInsert, __METHOD__ );
// Now that it's safely backed up, delete it
$dbw->delete( 'page', [ 'page_id' => $id ], __METHOD__ );
$dbw->delete( 'revision', [ 'rev_page' => $id ], __METHOD__ );
// Log the deletion, if the page was suppressed, put it in the suppression log instead
$logEntry = new ManualLogEntry( $logtype, 'delete' );
INSERT … SELECTs are
avoided on HEAD
74
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Lessons learned
●
INSERT … SELECTs are painful both for performance
and/or consistency reasons
– They created actual issues when combined with
filtering or on pages with many revisions
– New implementation avoids them
●
Moving rows between tables is a terrible idea
– Specifically, our implementation makes almost
impossible to track the history of a text
75
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #6: A LARGE TABLE
MySQL Schema design in practice
76
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Revision table on enwiki
MariaDB MARIADB s1-master enwiki > SHOW CREATE TABLE revision;
CREATE TABLE `revision` (
`rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
…
) ENGINE=InnoDB AUTO_INCREMENT=741078657 DEFAULT CHARSET=binary
MariaDB MARIADB s1-master enwiki > SHOW TABLE STATUS like 'revision'G
*************************** 1. row ***************************
Name: revision
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 614837124
Avg_row_length: 153
Data_length: 94510252032
Max_data_length: 0
Index_length: 90206896128
77
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Point SELECTs are fast
MariaDB PRODUCTION s1 localhost performance_schema > SELECT * FROM
events_statements_summary_by_digest ORDER BY count_star DESC LIMIT 1G
*************************** 1. row ***************************
SCHEMA_NAME: enwiki
DIGEST: ed3c3539910af27e0f7e4ea442db9124
DIGEST_TEXT: SELECT `page_id` , `page_len` ,
`page_is_redirect` , `page_latest` , `page_content_model` FROM `page` WHERE
`page_namespace` = ? AND `page_title` = ? LIMIT ?
COUNT_STAR: 37247176536
SUM_TIMER_WAIT: 6633094681562378000
MIN_TIMER_WAIT: 43209000
AVG_TIMER_WAIT: 178083000
MAX_TIMER_WAIT: 258818851000
SUM_LOCK_TIME: 1867944737116000000
...
1 row in set (0.06 sec)
+---------------------------------+
| sys.format_time(AVG_TIMER_WAIT) |
+---------------------------------+
| 178.08 us |
+---------------------------------+
78
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Ranges are slow
Host User Schema Client Source Thread Transaction Runtime Stamp
db1066 wikiuser enwiki mw1193 - 42910008229 133321827778 318s 2016-09-25
15:31:15
SELECT /* ApiQueryContributors::execute */ rev_page AS `page`, rev_user AS `user`,
MAX(rev_user_text) AS `username` FROM `revision` WHERE rev_page = '6768170' AND
(rev_user != 0) AND ((rev_deleted & 4) = 0) GROUP BY rev_page, rev_user ORDER BY
rev_user LIMIT 501
79
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
revision is not the only tall table
MariaDB PRODUCTION s1 localhost enwiki > SHOW CREATE TABLE loggingG
*************************** 1. row ***************************
Table: logging
Create Table: CREATE TABLE `logging` (
`log_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
...
) ENGINE=InnoDB AUTO_INCREMENT=77577756 DEFAULT CHARSET=binary
1 row in set (0.00 sec)
MariaDB PRODUCTION s1 localhost enwiki > SHOW TABLE STATUS like 'logging'G
*************************** 1. row ***************************
Name: logging
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 72871150
Avg_row_length: 164
Data_length: 11963203584
Max_data_length: 0
Index_length: 36273225728
80
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
81
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
recentchanges table
-- Primarily a summary table for Special:Recentchanges,
-- this table contains some additional info on edits from
-- the last few days, see Article::editUpdates()
–
CREATE TABLE /*_*/recentchanges (
rc_id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
rc_timestamp varbinary(14) NOT NULL default '',
-- As in revision
rc_user int unsigned NOT NULL default 0,
rc_user_text varchar(255) binary NOT NULL,
More on:
https://phabricator.wikimedia.org/difusion/MW/browse/master/maintenance/tables.sql;bc05426ae2708f8ac23b9106911fe35b5c51fd30$1057
82
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
recentchanges usage
●
Recentchanges is mainly a revision summary table
– Most reviews are only of recent edits -entries are
purged after 30 days:
MariaDB MARIADB s1-master enwiki > SELECT now(), min(rc_timestamp) from recentchanges;
+---------------------+-------------------+
| now() | min(rc_timestamp) |
+---------------------+-------------------+
| 2016-09-25 09:04:35 | 20160826090422 |
+---------------------+-------------------+
1 row in set (0.00 sec)
●
It is updated synchronously with the edits
●
It also contains additional fields / related tables like tags
83
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Special slaves && partitioning
●
Mediawiki allows to define instance groups for certain
queries https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php
'db1051' => 50, # 2.8TB 96GB, watchlist, recentchanges, contributions, logpager
'db1055' => 50, # 2.8TB 96GB, watchlist, recentchanges, contributions, logpager
●
Querying contributions uses a separate group
contributions:
https://phabricator.wikimedia.org/difusion/MW/browse/master/includes/specials/pagers/ContribsPager.php;bc05426ae2708f8ac23b9106911fe35b5c51fd30$73
// Most of this code will use the 'contributions' group DB, which can map to replica Dbs
// with extra user based indexes or partioning by user. The additional metadata
// queries should use a regular replica DB since the lookup pattern is not all by user.
$this->mDbSecondary = wfGetDB( DB_REPLICA ); // any random replica DB
$this->mDb = wfGetDB( DB_REPLICA, 'contributions' );
84
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
enwiki tables have special partitioning
ALTER TABLE enwiki.revision
DROP PRIMARY KEY,
DROP INDEX rev_id,
ADD PRIMARY KEY (rev_id, rev_user)
PARTITION BY RANGE (rev_user) (
PARTITION p1 VALUES LESS THAN (1),
PARTITION p50000 VALUES LESS THAN (50000),
PARTITION p100000 VALUES LESS THAN (100000),
PARTITION p200000 VALUES LESS THAN (200000),
PARTITION p300000 VALUES LESS THAN (300000),
PARTITION p400000 VALUES LESS THAN (400000),
PARTITION p500000 VALUES LESS THAN (500000),
PARTITION p750000 VALUES LESS THAN (750000),
…
More on: https://phabricator.wikimedia.org/diffusion/OSOF/browse/master/dbtools/s1-pager.sql
85
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
The solution has problems
●
It is a server-side hack, with poor to no awareness on
server side
– Mediawiki has to support MySQL 5.0, which has no
patitioning support
●
It is an enwiki-only patch; which is a bad idea in general
●
Special slaves are a threat to High Availability
86
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #7: WHAT LINKS HERE
MySQL Schema design in practice
87
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Categories, templates, images and special
pages
●
There is a lot of information that needs to be updated
when a new edits is done:
– The category must include the new page, on its right
position
– If it is a template or an image, all pages that include it
have to change
– The “What links here” has to reflect the new links
– If a new page is created, links to it have to go from red
to blue color
88
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Latency on edit has to be low
●
Users understand that and “edit” will be slower than
loading a page, but it still cannot take more than 0.5-1
second tops:
https://grafana.wikimedia.org/dashboard/db/save-timing
●
But we just said that an edit may require an update on
millions of others!
●
How to implement all the previous changes?
89
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
90
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Wikimedia solution
●
Only immediate tasks are done synchronously,
transactionally:
– Wikitext content is saved on External Storage
– The recentchanges table adds a new entry for
reviewing
– The user edit count is increased in some wikis
– The page is parsed for display to the user
91
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Background jobs
●
Many tasks are enqueued on a Redis job queue
●
Most of those tasks are cached/denormalized on database
tables for easy joining:
– Add the page to the proper category (categorylinks)
– Add the page to the proper links (pagelinks)
– Add the page to the list of templates used (templatelinks)
– Many others such as refreshing the list of titles and its
index on elasticsearch
92
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
doRefreshLinks() Job
function run() {
// Job to update all (or a range of) backlink pages for a page
$this->runForTitle( $this->title );
}
protected function runForTitle( Title $title ) {
$revision = Revision::newFromTitle( $title, false, Revision::READ_LATEST );
$parserOutput = $content->getParserOutput( $title, $revision->$title, $revision-
>$title, $revision getId(), $parserOptions, false );→
$updates = $content->getSecondaryDataUpdates($title, null, !empty( $this, $revision-
>$title, $revision->getId(), $parserOptions, false );
$updates = $content getSecondaryDataUpdates(→
$title,
null,
!empty( $this->params['useRecursiveLinksUpdate'] ),
$parserOutput
);
InfoAction::invalidateCache( $title );
93
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
*links tables
●
They store the page_ids where the resource is used but
the namespace and the title they include, not the ids
●
This is because pages reference a title, not an entity
(e.g. you can include the template {{stub}}, or link to a
page that can or not exist
●
Some of those tables can grow a lot, and store very
redundant data (there could be millions of rows with
the template {{cc-by-sa-2.5}} on Wikimedia Commons
94
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Example: PageLinks Table
SHOW CREATE TABLE pagelinksG
*************************** 1. row ***************************
Table: pagelinks
Create Table: CREATE TABLE `pagelinks` (
`pl_from` int(8) unsigned NOT NULL DEFAULT '0',
`pl_namespace` int(11) NOT NULL DEFAULT '0',
`pl_title` varbinary(255) NOT NULL DEFAULT '',
`pl_from_namespace` int(11) NOT NULL DEFAULT '0',
UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`),
KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`),
KEY `pl_backlinks_namespace`
(`pl_from_namespace`,`pl_namespace`,`pl_title`,`pl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
95
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Future
●
A lot of space could be saved by normalizing/deduplicating the title
text:
mysql> SHOW TABLE STATUS like '%links';
+--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length |
+--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+
| categorylinks | InnoDB | 10 | Compact | 106308892 | 210 | 22379020288 | 0 | 30838489088 |
| externallinks | InnoDB | 10 | Compact | 91491185 | 279 | 25571622912 | 0 | 40650997760 |
| imagelinks | InnoDB | 10 | Compact | 81270504 | 83 | 6765756416 | 0 | 9644818432 |
| iwlinks | InnoDB | 10 | Compact | 16909955 | 95 | 1622573056 | 0 | 2367488000 |
| langlinks | InnoDB | 10 | Compact | 26155404 | 74 | 1941733376 | 0 | 1434042368 |
| msg_resource_links | InnoDB | 10 | Compact | 3524 | 130 | 458752 | 0 | 0 |
| pagelinks | InnoDB | 10 | Compact | 1192043172 | 74 | 89018269696 | 0 | 122491224064 |
| templatelinks | InnoDB | 10 | Compact | 616611334 | 80 | 49860050944 | 0 | 66098577408 |
+--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+
●
That would make title a first-class entity, diferent from the page
entity
96
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #8: ANECDOTES: THE
GHOST TABLES AND
TIMESTAMPS
MySQL Schema design in practice
97
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Story time
●
A recently pooled slave broke its replication with:
Error 'Table 'idwiki.hitcounter' doesn't exist' on
query. Default database: 'idwiki'. Query: 'DELETE
FROM `idwiki`.`hitcounter`'
●
The table was indeed non-existent, but it was
deprecated and was unused by mediawiki
●
Hackers? Replication bug? Someone doing maintenance
out-of-band?
98
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
99
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Architecture
●
At the moment, Wikimedia servers mostly use
STATEMENT-based replication due to several
application dependencies
●
A Master-Master active/passive replication was being
used among datacenters
●
Replication was temporarily routed through a slave to
deploy new TLS certificates
●
The middle slave was rebooted
100
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
hitcounter table was using the MEMORY engine
●
On server restart, because the server to avoid
replication issues, a DELETE command is sent
●
The DELETE got replicated towards the remote master,
back to the primary master, and finally to the new slave
– This broke replication because the new slave didn’t
have the obsolete table, despite not writing to it
●
Remember to clean up your tables on all servers to
avoid issues like this!
101
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Timestamps and MySQL < 4.1
●
There where automatically updated on INSERT and
UPDATE with no possibility of disabling that
●
No strict mode disallowing zero dates
●
Diferent databases and standards support needed:
https://www.mediawiki.org/wiki/Manual:WfTimestamp
#Formats
102
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
How to implement timestamps?
●
They must store UTC times, controlled at application
side
●
They must be strictly sortable to be used on listings
●
They must work on mysql 4.0
●
They must work similarly for other database backends
103
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
104
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Mediawiki solution
●
Timestamps are stored as binary(14)
https://www.mediawiki.org/wiki/Manual:Timestamp
●
For backwards and join compatibility, that is still the
preferred format
●
TIMESTAMP (4 bytes) now has all required features and
would be much more compact, but converting millions
of records is not worth the efort right now
105
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
tables.sql
-- The MySQL table backend for MediaWiki currently uses
-- 14-character BINARY or VARBINARY fields to store timestamps.
-- The format is YYYYMMDDHHMMSS, which is derived from the
-- text format of MySQL's TIMESTAMP fields.
–
-- Historically TIMESTAMP fields were used, but abandoned
-- in early 2002 after a lot of trouble with the fields
-- auto-updating.
–
-- The Postgres backend uses TIMESTAMPTZ fields for timestamps,
-- and we will migrate the MySQL definitions at some point as
-- well.
CREATE TABLE /*_*/logging (
-- Log ID, for referring to this specific log entry, probably for deletion and such.
log_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
-- Symbolic keys for the general log type and the action type
-- within the log. The output format will be controlled by the
-- action field, but only the type controls categorization.
log_type varbinary(32) NOT NULL default '',
log_action varbinary(32) NOT NULL default '',
-- Timestamp. Duh.
log_timestamp binary(14) NOT NULL default '19700101000000',
106
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
CASE #9: SLOTS
MySQL Schema design in practice
107
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Structured data for Wikipedia
●
Wikitext is powerful but complex
– It is not easy to do changes such as categories,
infoboxes (templates)
– It is also not easy to ofer them in a computer-
readable way
●
Tools like wikidata or image metadata require an easier
way to integrate structured data
108
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
How to implement Multi-Content Revisions?
●
A page may combine multiple sources (Image & Image
metadata, wikitext and discussions, etc.)
●
A new revision could be created by editing regular text
or some of the structured data
●
They types of structured data may vary and could be
added later with new extensions
109
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Brainstorming time
110
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Current Multi-Content Revision proposal
●
page and revision will work as usual (except a few of its
fields will be made redundant)
●
2 new tables: content and content_revision will be
added
●
content initially will have the metadata for wikitext
●
Other types of content can be referenced by revision
thought content_revision
– A revision now can handle multiple contents
111
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
The idea is to multiplex revision
●
https://phabricator.wikimedia.org/T107595
●
We go from the straight forward
– page -> revision -> text ( -> external store )
●
To a more indirect model:
– page -> ( revision -> ) slots -> content ->
( text | external store )
112
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
The proposal is not issue-free
●
Revision table is already very large for non-point SELECTs
– How to handle ranges when content will be an even taller
table (several content types per revision)
– Also previously independent tables will now be
integrated there (image_revisions, etc.)
●
Maybe a diferent, multi-table implementation for the
polymorphic association would be preferred?
– Normalization and easiness to code vs. performance on
implementation for large wikis
113
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
FINAL REMARKS
MySQL Schema design in practice
114
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
“Do not let reality spoil your perfect on-paper
design”
●
Not always the most elegant design is the most
appropriate
– Reliability is the enemy of performance: simple, fast,
safe – chose two
– Technology available now does not normally cover
100% of the use cases; chose the one that cover 99%,
implement the other 1%
115
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Where to know more about design
●
Learn how other are doing it:
– Uber:
●
https://eng.uber.com/schemaless-part-one/
●
https://www.percona.com/live/plam16/sessions/relational-databases-uber-mysql-postgres
– Facebook:
●
https://www.percona.com/live/plam16/sessions/massive-schema-changes-facebook
●
https://www.facebook.com/MySQLatFacebook/
– Youtube:
https://www.percona.com/live/plam16/sessions/launching-vitess-how-run-youtubes-mysql-sharding-engine
– Pinterest: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet
●
Chose the right literature:
– https://pragprog.com/book/bksqla/sql-antipatterns
116
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Where to know more about Mediawiki /
MySQL@Wikipedia
●
MySQL at Wikipedia Introduction:
https://www.mediawiki.org/wiki/File:MySQL
_at_Wikipedia.pdf
●
Mediawiki source code and documentation:
https://www.mediawiki.org/wiki/MediaWiki
●
Wikitech (technical documentation):
https://wikitech.wikimedia.org/
●
Operations/puppet (infrastructure) git repository:
https://phabricator.wikimedia.org/difusion/OPUP/
117
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Was this session helpful?
●
Consider supporting the Wikimedia Foundation!
– The Wikimedia Foundation, Inc. is a nonprofit charitable organization
dedicated to encouraging the growth, development and distribution of
free, multilingual, educational content, and to providing the full content
of these wiki-based projects to the public free of charge.
– https://wikimediafoundation.org
●
You can contribute with:
– Your code (including infrastructure!):
https://phabricator.wikimedia.org/difusion/
– Your time: https://en.wikipedia.org/wiki/Help:Editing
– Your money: https://donate.wikimedia.org/
118
MySQL Schema design in practice
© 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
Thank You! Remember to rate my session

Más contenido relacionado

La actualidad más candente

Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesDimas Prasetyo
 
Mongo and node mongo dc 2011
Mongo and node mongo dc 2011Mongo and node mongo dc 2011
Mongo and node mongo dc 2011async_io
 
MySQL on Docker - Containerizing the Dolphin
MySQL on Docker - Containerizing the DolphinMySQL on Docker - Containerizing the Dolphin
MySQL on Docker - Containerizing the DolphinSeveralnines
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdfErin O'Neill
 
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)M Malai
 
Automating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControl
Automating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControlAutomating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControl
Automating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControlSeveralnines
 
The Complete MariaDB Server Tutorial - Percona Live 2015
The Complete MariaDB Server Tutorial - Percona Live 2015The Complete MariaDB Server Tutorial - Percona Live 2015
The Complete MariaDB Server Tutorial - Percona Live 2015Colin Charles
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
 
High Performance Drupal with MariaDB
High Performance Drupal with MariaDBHigh Performance Drupal with MariaDB
High Performance Drupal with MariaDBMariaDB Corporation
 
MySQL topology healing at OLA.
MySQL topology healing at OLA.MySQL topology healing at OLA.
MySQL topology healing at OLA.Mydbops
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisationgrooverdan
 
How THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scaleHow THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scaleMariaDB plc
 
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterWebseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterMariaDB Corporation
 
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCarM|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCarMariaDB plc
 
EWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWD
EWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWDEWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWD
EWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWDRob Tweed
 

La actualidad más candente (20)

Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
 
Mongo and node mongo dc 2011
Mongo and node mongo dc 2011Mongo and node mongo dc 2011
Mongo and node mongo dc 2011
 
MySQL on Docker - Containerizing the Dolphin
MySQL on Docker - Containerizing the DolphinMySQL on Docker - Containerizing the Dolphin
MySQL on Docker - Containerizing the Dolphin
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
 
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
 
Automating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControl
Automating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControlAutomating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControl
Automating and Managing MongoDB: An Analysis of Ops Manager vs. ClusterControl
 
The Complete MariaDB Server Tutorial - Percona Live 2015
The Complete MariaDB Server Tutorial - Percona Live 2015The Complete MariaDB Server Tutorial - Percona Live 2015
The Complete MariaDB Server Tutorial - Percona Live 2015
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
High Performance Drupal with MariaDB
High Performance Drupal with MariaDBHigh Performance Drupal with MariaDB
High Performance Drupal with MariaDB
 
Varnish intro
Varnish introVarnish intro
Varnish intro
 
MySQL topology healing at OLA.
MySQL topology healing at OLA.MySQL topology healing at OLA.
MySQL topology healing at OLA.
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
Varnish - PLNOG 4
Varnish - PLNOG 4Varnish - PLNOG 4
Varnish - PLNOG 4
 
How THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scaleHow THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scale
 
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterWebseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
 
MySQL highav Availability
MySQL highav AvailabilityMySQL highav Availability
MySQL highav Availability
 
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCarM|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
 
EWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWD
EWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWDEWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWD
EWD 3 Training Course Part 15: Using a Framework other than jQuery with QEWD
 

Destacado

How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinaroysteing
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
 
Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentspil-engineering
 
Ansible for large scale deployment
Ansible for large scale deploymentAnsible for large scale deployment
Ansible for large scale deploymentRemote MySQL DBA
 
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
 
1 introduction
1 introduction1 introduction
1 introductionUtkarsh De
 
6 relational schema_design
6 relational schema_design6 relational schema_design
6 relational schema_designUtkarsh De
 
Managing your tech career
Managing your tech careerManaging your tech career
Managing your tech careerGreg Jensen
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexingUtkarsh De
 
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema DesignIron Speed
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standardUtkarsh De
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedMongoDB
 
3 relational model
3 relational model3 relational model
3 relational modelUtkarsh De
 
Advanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningAdvanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningMYXPLAIN
 
MySQL Replication: Pros and Cons
MySQL Replication: Pros and ConsMySQL Replication: Pros and Cons
MySQL Replication: Pros and ConsRachel Li
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed PostgresStas Kelvich
 
Week3 Lecture Database Design
Week3 Lecture Database DesignWeek3 Lecture Database Design
Week3 Lecture Database DesignKevin Element
 
Database Design
Database DesignDatabase Design
Database Designlearnt
 

Destacado (20)

How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environment
 
Ansible for large scale deployment
Ansible for large scale deploymentAnsible for large scale deployment
Ansible for large scale deployment
 
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
 
1 introduction
1 introduction1 introduction
1 introduction
 
6 relational schema_design
6 relational schema_design6 relational schema_design
6 relational schema_design
 
Managing your tech career
Managing your tech careerManaging your tech career
Managing your tech career
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema Design
 
Normalization
NormalizationNormalization
Normalization
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standard
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting Started
 
3 relational model
3 relational model3 relational model
3 relational model
 
Advanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningAdvanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema Tuning
 
MySQL Replication: Pros and Cons
MySQL Replication: Pros and ConsMySQL Replication: Pros and Cons
MySQL Replication: Pros and Cons
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed Postgres
 
Week3 Lecture Database Design
Week3 Lecture Database DesignWeek3 Lecture Database Design
Week3 Lecture Database Design
 
Database Design
Database DesignDatabase Design
Database Design
 

Similar a MySQL Schema Design in Practice

Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)Four Kitchens
 
The Future Is The Cloud
The Future Is The CloudThe Future Is The Cloud
The Future Is The CloudGatsbyjs
 
Accelerated Grid Theming
Accelerated Grid ThemingAccelerated Grid Theming
Accelerated Grid ThemingNathan Smith
 
Accelerated grid theming using NineSixty (DrupalCamp Dallas)
Accelerated grid theming using NineSixty (DrupalCamp Dallas)Accelerated grid theming using NineSixty (DrupalCamp Dallas)
Accelerated grid theming using NineSixty (DrupalCamp Dallas)Four Kitchens
 
960 grid psd
960 grid psd960 grid psd
960 grid psdRaju Nag
 
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)Four Kitchens
 
Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)
Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)
Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)Four Kitchens
 
Responsive web design
Responsive web designResponsive web design
Responsive web designpsophy
 
Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...
Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...
Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...Frédéric Harper
 
Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)
Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)
Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)Four Kitchens
 
Responsive Web Design On Student's day
Responsive Web Design On Student's day Responsive Web Design On Student's day
Responsive Web Design On Student's day psophy
 
Responsive Web Design, get the best out of your designs - JavaScript Open Day...
Responsive Web Design, get the best out of your designs - JavaScript Open Day...Responsive Web Design, get the best out of your designs - JavaScript Open Day...
Responsive Web Design, get the best out of your designs - JavaScript Open Day...Frédéric Harper
 
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...Tokyo University of Science
 
Accelerated grid theming using NineSixty (DrupalCamp LA 2011)
Accelerated grid theming using NineSixty (DrupalCamp LA 2011)Accelerated grid theming using NineSixty (DrupalCamp LA 2011)
Accelerated grid theming using NineSixty (DrupalCamp LA 2011)Four Kitchens
 
Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25
Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25
Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25Frédéric Harper
 
WWW/Internet 2011 - A Framework for Web 2.0 Secure Widgets
WWW/Internet 2011 - A Framework for Web 2.0 Secure WidgetsWWW/Internet 2011 - A Framework for Web 2.0 Secure Widgets
WWW/Internet 2011 - A Framework for Web 2.0 Secure WidgetsVagner Santana
 
MW2011 Grid-based Web Design presentation
MW2011 Grid-based Web Design presentationMW2011 Grid-based Web Design presentation
MW2011 Grid-based Web Design presentationCharlie Moad
 
Business of Front-end Web Development
Business of Front-end Web DevelopmentBusiness of Front-end Web Development
Business of Front-end Web DevelopmentRachel Andrew
 

Similar a MySQL Schema Design in Practice (20)

Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2009)
 
The Future Is The Cloud
The Future Is The CloudThe Future Is The Cloud
The Future Is The Cloud
 
Accelerated Grid Theming
Accelerated Grid ThemingAccelerated Grid Theming
Accelerated Grid Theming
 
Accelerated grid theming using NineSixty (DrupalCamp Dallas)
Accelerated grid theming using NineSixty (DrupalCamp Dallas)Accelerated grid theming using NineSixty (DrupalCamp Dallas)
Accelerated grid theming using NineSixty (DrupalCamp Dallas)
 
960 grid psd
960 grid psd960 grid psd
960 grid psd
 
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)
Accelerated grid theming using NineSixty (Drupal Design Camp Boston 2010)
 
Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)
Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)
Accelerated grid theming using NineSixty (Dallas Drupal Days 2011)
 
Responsive web design
Responsive web designResponsive web design
Responsive web design
 
Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...
Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...
Responsive Web Design: the secret sauce - JavaScript Open Day Montreal - 2015...
 
Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)
Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)
Accelerated grid theming using NineSixty (DrupalCon San Francisco 2010)
 
Responsive Web Design On Student's day
Responsive Web Design On Student's day Responsive Web Design On Student's day
Responsive Web Design On Student's day
 
Design
DesignDesign
Design
 
Responsive Web Design, get the best out of your designs - JavaScript Open Day...
Responsive Web Design, get the best out of your designs - JavaScript Open Day...Responsive Web Design, get the best out of your designs - JavaScript Open Day...
Responsive Web Design, get the best out of your designs - JavaScript Open Day...
 
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledg...
 
Accelerated grid theming using NineSixty (DrupalCamp LA 2011)
Accelerated grid theming using NineSixty (DrupalCamp LA 2011)Accelerated grid theming using NineSixty (DrupalCamp LA 2011)
Accelerated grid theming using NineSixty (DrupalCamp LA 2011)
 
RWD
RWDRWD
RWD
 
Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25
Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25
Responsive Web Design, the secret sauce - MSDEVMTL - 2016-01-25
 
WWW/Internet 2011 - A Framework for Web 2.0 Secure Widgets
WWW/Internet 2011 - A Framework for Web 2.0 Secure WidgetsWWW/Internet 2011 - A Framework for Web 2.0 Secure Widgets
WWW/Internet 2011 - A Framework for Web 2.0 Secure Widgets
 
MW2011 Grid-based Web Design presentation
MW2011 Grid-based Web Design presentationMW2011 Grid-based Web Design presentation
MW2011 Grid-based Web Design presentation
 
Business of Front-end Web Development
Business of Front-end Web DevelopmentBusiness of Front-end Web Development
Business of Front-end Web Development
 

Más de Jaime Crespo

Haciendo copias de seguridad de todo el conocimiento humano con python y soft...
Haciendo copias de seguridad de todo el conocimiento humano con python y soft...Haciendo copias de seguridad de todo el conocimiento humano con python y soft...
Haciendo copias de seguridad de todo el conocimiento humano con python y soft...Jaime Crespo
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Wikipedia Burgos devfest 2017
Wikipedia Burgos devfest 2017Wikipedia Burgos devfest 2017
Wikipedia Burgos devfest 2017Jaime Crespo
 
Query optimization: from 0 to 10 (and up to 5.7)
Query optimization: from 0 to 10 (and up to 5.7)Query optimization: from 0 to 10 (and up to 5.7)
Query optimization: from 0 to 10 (and up to 5.7)Jaime Crespo
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
MySQL GUI Administration
MySQL GUI AdministrationMySQL GUI Administration
MySQL GUI AdministrationJaime Crespo
 
Software libre para gestión de proyectos
Software libre para gestión de proyectosSoftware libre para gestión de proyectos
Software libre para gestión de proyectosJaime Crespo
 
warptalk: Experiencia de usuario y usabilidad
warptalk: Experiencia de usuario y usabilidadwarptalk: Experiencia de usuario y usabilidad
warptalk: Experiencia de usuario y usabilidadJaime Crespo
 

Más de Jaime Crespo (9)

Haciendo copias de seguridad de todo el conocimiento humano con python y soft...
Haciendo copias de seguridad de todo el conocimiento humano con python y soft...Haciendo copias de seguridad de todo el conocimiento humano con python y soft...
Haciendo copias de seguridad de todo el conocimiento humano con python y soft...
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Wikipedia Burgos devfest 2017
Wikipedia Burgos devfest 2017Wikipedia Burgos devfest 2017
Wikipedia Burgos devfest 2017
 
Query optimization: from 0 to 10 (and up to 5.7)
Query optimization: from 0 to 10 (and up to 5.7)Query optimization: from 0 to 10 (and up to 5.7)
Query optimization: from 0 to 10 (and up to 5.7)
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
Why MySQL
Why MySQLWhy MySQL
Why MySQL
 
MySQL GUI Administration
MySQL GUI AdministrationMySQL GUI Administration
MySQL GUI Administration
 
Software libre para gestión de proyectos
Software libre para gestión de proyectosSoftware libre para gestión de proyectos
Software libre para gestión de proyectos
 
warptalk: Experiencia de usuario y usabilidad
warptalk: Experiencia de usuario y usabilidadwarptalk: Experiencia de usuario y usabilidad
warptalk: Experiencia de usuario y usabilidad
 

Último

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

MySQL Schema Design in Practice

  • 1. MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 MySQL Schema Design in Practice Jaime Crespo Percona Live Amsterdam 2016 -Amsterdam, 3 Oct 2016- https://wikitech.wikimedia.org/wiki/User:Jcrespo/plam16
  • 2. 2 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Agenda 0. Introduction & setup 5. Case #5: Revisions and deletions 1. Case #1: Random pages 6. Case #6: A large table 2. Case #2: Supporting 290 Languages 7. Case #7: What links here 3. Case #3: An abnormal denormalization 8. Case #8: Anecdotes: The ghost tables and Timestamps 4. Case #4: Key-value system 9. Case #9: Slots
  • 3. 3 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 INTRODUCTION & SETUP MySQL Schema design in practice
  • 4. 4 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 ● Sr. Database Administrator at Wikimedia Foundation ● Used to work as a trainer for Oracle (MySQL), as a Consultant (Percona) and as a Freelance administrator (DBAHire.com) This is me fighting bad query performance
  • 5. 5 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Schema design is key for query performance ● Check my past presentations at: http://www.slideshare.net/jynus/
  • 6. 6 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Mediawiki as the example application • Mediawiki code distributed under GPL 2 or later • All Wikimedia project's data licensed under CC-BY-SA-2.5
  • 7. 7 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Accessing Wikimedia Production Database (I) ● Login or register a Wikimedia SUL account (for example, on https://en.wikipedia.org ) ● Use that account to authenticate on Quarry: http://quarry.wmflabs.org/ ● Send your queries to the right database (for example, enwiki_p)!
  • 8. 8 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Accessing Wikimedia Production Database (II)
  • 9. 9 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Session Dynamic ● A real database design problem is presented ● A brief discussion starts (5-10 minutes top) – If you know the answer, let other people do proposals first; crazy ideas are encouraged – assume anything you need – this is the place to be wrong, not to show of ● We analyze the proposals, balance its strengths and weaknesses and compare them with the one in use
  • 10. 10 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Case #0: Designing a schema for Wikipedia ● Which entities would we need? ● Which relationships? ● What kind of queries are probably the most common ones? ● What do you think are the main scalability and pain points?
  • 11. 11 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Potential entities related to content ● Page: do we need one? ● Edit: diferent from page? ● Dif: Should it be a first-class entity? ● Revision: Large table?
  • 12. 12 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 What about page types and properties? ● Talk pages: Same entity or separate table? ● Categories, images (files), redirections: are they regular pages or should they be stored on its own entity? ● Similar question for image description pages ● Categories and tags for pages/revisions: how to implement them? ● Protection: Some pages can have restrictions on who can edit them ● Other properties (tags?)
  • 13. 13 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Identifiers ● How to solve the problem of 2 pages that should have the same title: Ben Hur (1959 film) and Ben Hur (2016 film)? ● Should we use the page name or an arbitrary id to identify a page? ● What about revision ids? ● If we needed it, should it be a UUID or a numerical id?
  • 14. 14 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 15. 15 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Mediawiki Schema
  • 16. 16 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Disclaimers ● The best solution on paper is not necessarily the best on production – It may be too difficult to migrate existing logic (15- year old application) or not worth it – Performance is not the only metric: Security, scalability, reliability, simplicity, etc. ● You will see many examples of compromises like this on our application
  • 17. 17 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #1: RANDOM PAGES MySQL Schema design in practice
  • 18. 18 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Problem #1 Description ● At the left of each page there is a link to a “Random article” ● It should allow to filter by namespace (e.g. not all pages are articles) ● It is a relatively important page compared to, let’s say, Google’s “I am feeling lucky”, as it will give an overview of what kind of articles you will find in a project
  • 19. 19 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Problem #1 Restrictions ● It has to work as fast as a regular page, but for obvious reasons cannot be cached ● It has to be realistically pseudorandom ● It has to always return a result ● It has to work on a continuously increasing number of pages, and scale from 1 to millions
  • 20. 20 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Potential solutions ● ORDER BY rand() LIMIT 1 ● Use a well-distributed integer id, use it to get one at random ● Questions? – What indexes would be beneficial on each case? – How to count the number of total ids? – How to handle deletions?
  • 21. 21 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 22. 22 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual solution: Table design (I) CREATE TABLE /*_*/page ( […] -- A page name is broken into a namespace and a title. -- The namespace keys are UI-language-independent constants, -- defined in includes/Defines.php page_namespace int NOT NULL, -- The rest of the title, as text. -- Spaces are transformed into underscores in title storage. page_title varchar(255) binary NOT NULL, -- 1 indicates the article is a redirect. page_is_redirect tinyint unsigned NOT NULL default 0, […] -- Random value between 0 and 1, used for Special:Randompage page_random real unsigned NOT NULL,
  • 23. 23 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual solution: Table design (II) […] CREATE INDEX /*i*/page_random ON /*_*/page (page_random);
  • 24. 24 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual solution: Relevant code protected function getQueryInfo( $randstr ) { $redirect = $this->isRedirect() ? 1 : 0; $tables = [ 'page' ]; $conds = array_merge( [ 'page_namespace' => $this->namespaces, 'page_is_redirect' => $redirect, 'page_random >= ' . $randstr ], $this->extra ); $joinConds = []; // Allow extensions to modify the query Hooks::run( 'RandomPageQuery', [ &$tables, &$conds, &$joinConds ] ); return [ 'tables' => $tables, 'fields' => [ 'page_title', 'page_namespace' ], 'conds' => $conds, 'options' => [ 'ORDER BY' => 'page_random', 'LIMIT' => 1, ], 'join_conds' => $joinConds ]; } From: mediawiki/core/includes/specials/SpecialRandompage.php
  • 25. 25 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual solution: Query generated SELECT page_title, page_namespace FROM page LEFT JOIN page_props ON page_id = pp_page AND pp_propname = ? WHERE page_namespace IN (…) AND page_is_redirect = 0 AND page_random >= $rand ORDER BY page_random LIMIT 1;
  • 26. 26 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual solution: Performance (I) mysql> EXPLAIN SELECT … G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: range possible_keys: name_title,page_random,page_redirect_namespace_len key: page_random key_len: 8 ref: NULL rows: 20473233 Extra: Using where *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page_props type: eq_ref possible_keys: PRIMARY,pp_propname_page,pp_propname_sortkey_page key: PRIMARY key_len: 66 ref: enwiki.page.page_id,const rows: 1 Extra: Using where; Using index; Not exists
  • 27. 27 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual solution: Performance (II) mysql> SELECT … FROM sys.x$statement_analysis … *************************** 1. row *************************** exec_count: 27126203 max_latency: 450802755000 avg_latency: 755698000 lock_latency: 2224515688000000 rows_sent: 27125869 rows_sent_avg: 1 rows_examined: 0 rows_examined_avg: 0 rows_affected: 0 rows_affected_avg: 0 tmp_tables: 0 tmp_disk_tables: 0 rows_sorted: 777598 sort_merge_passes: 0 +------------------------------+ | sys.format_time(avg_latency) | +------------------------------+ | 755.70 us | +------------------------------+
  • 28. 28 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #2: SUPPORTING 290 LANGUAGES MySQL Schema design in practice
  • 29. 29 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikipedia launch and early growth ● Wikipedia was launched on January 15, 2001, as a single English-language edition ● By August 8, 2001, Wikipedia had over 8,000 articles. ● On September 25, 2001, Wikipedia had over 13,000 articles. ● By the end of 2001, it had grown to approximately 20,000 articles and 18 language editions. References: https://en.wikipedia.org/wiki/Wikipedia#History
  • 30. 30 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Single-tenancy vs Multi-tenancy ● 1 database per wiki: – Easier to code – Easier to scale (?) - you can move wikis to a diferent server ● Several wikis on a single database – More efficiency, specially for small wikis – They can share existing user database
  • 31. 31 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Should we Shard? ● From early on, people was telling us “you need to shard to scale” ● Is it really such a bad idea? When is it needed? When can it be avoided? ● If we shard, based on which key?
  • 32. 32 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 33. 33 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia solution ● As of this lines, the Wikimedia Foundation hosts 897 wiki-like projects (diferent types and languages) ● They are divided on 7 “shards” (functional partitions) – 1 master per shard and datacenter – Multiples slaves sharing the read load ● English wikipedia has its own separate shard (s1) ● s3 host most of the wikis (892)
  • 34. 34 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Functional partitioning Source: https://dbtree.wikimedia.org
  • 35. 35 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 No sharding ● Our number of edits is “low” compared to our reads – Only 500-3000 logical page edits per minute https://grafana.wikimedia.org/dashboard/db/edit-co unt – That means 2000-8000 unique rows written per second https://grafana.wikimedia.org/dashboard/db/mysql- aggregated – Compared to ~300K total QPS (10-40M rows read/s)
  • 36. 36 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Lessons learned: users were separate for years ● Users were required to register on each project and language independently ● Diferent users had the same name registered on diferent wikis ● From discussion to universal deployment, it took almost 10 years: https://meta.wikimedia.org/wiki/Help:Unified_login
  • 37. 37 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Unicode
  • 38. 38 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Escaping Latin1 ● The mission of the Wikimedia Foundation is to provide free content in every language ● That was not possible with Latin1 – It only supports Western languages
  • 39. 39 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Multi-language support was limited back in the day ● How many of you has ever created a database in latin1_swedish_ci ? ● Real UTF-8 support beyond the BMP was added in MySQL 5.5 (utf8mb4) ● Still today, latest collation support is relatively new
  • 40. 40 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Requirements ● Full support of all available character sets in the world ● Support for fully customizable ordering (e.g. entries within categories), it can be diferent depending on the language ● It has to work with available technology 15 years ago
  • 41. 41 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 42. 42 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia solution: character set ● Text-like fields are stored in binary fields – Technically, they are strings with the binary charset set ● Latest versions of Mediawiki allow utf8mb4, too – It wouldn’t work for Wikimedia sites, collation has been traditionally very limiting
  • 43. 43 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 tables.sql CREATE TABLE /*_*/user ( user_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT, user_name varchar(255) binary NOT NULL default '',
  • 44. 44 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia solution: collation ● List of ordered articles are avoided ● Whenever custom ordering is needed, an additional, indexed field is used to allow per-table configurable ordering ● Whenever the ordering has to be changed, only row- level changes have to be done, instead of ALTER TABLEs
  • 45. 45 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 $wgCategoryCollation ● https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation if ( !$dryRun ) { $dbw->update( 'categorylinks', [ 'cl_sortkey' => $newSortKey, 'cl_sortkey_prefix' => $prefix, 'cl_collation' => $collationName, 'cl_type' => $type, 'cl_timestamp = cl_timestamp', ], [ 'cl_from' => $row->cl_from, 'cl_to' => $row->cl_to ], __METHOD__ ); }
  • 46. 46 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Unicode is not the only challenge
  • 47. 47 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #3: AN ABNORMAL DENORMALIZATION MySQL Schema design in practice
  • 48. 48 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Users on Wikimedia projects ● An account is not needed to view the content ● An account is not needed to edit the content (anonymous edits) ● Registered users get some advantages: – Better tools for editing – Persistent configurable preferences
  • 49. 49 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Content review ● There must be a way to see the edits from the same users (all edits must be publicly review-able) ● There must be a way to block “vandalisms” (misbehaving users, both registered and unregistered) ● In extreme cases, there must be a way to protect content from certain user groups (page protections) ● Only trusted users should be able to destroy content
  • 50. 50 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 How to represent users? ● Should we use strings or arbitrary numerical ids? – If we use strings, how to rename registered users? – If we use arbitrary ids for registered users, how to reference non-registered ones? ● How to be able to block returning users, including anonymous ones? ● How to allow anonymous edits on countries with doubtful privacy laws?
  • 51. 51 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 52. 52 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia solution ● Registered users have a local user id and a string identifier ● Local wiki accounts are linked to a global unified account (SUL) on “centralauth” database ● Anonymous users are identified by its IPv4 or IPv6 string ● In general, editions store both the id and the user text
  • 53. 53 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Table revision CREATE TABLE `revision` ( `rev_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `rev_page` int(10) unsigned NOT NULL, `rev_text_id` int(10) unsigned NOT NULL, `rev_comment` tinyblob NOT NULL, `rev_user` int(10) unsigned NOT NULL DEFAULT '0', `rev_user_text` varbinary(255) NOT NULL DEFAULT '', […] ) ENGINE=InnoDB AUTO_INCREMENT=N DEFAULT CHARSET=binary
  • 54. 54 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual user content (registered user) mysql> SELECT * FROM revision WHERE rev_id = 742218408G ************************ 1. row ************************ rev_id: 742218408 rev_page: 46812822 rev_text_id: 750242058 rev_comment: Testing revision comment rev_user: 25118340 rev_user_text: JCrespo (WMF) rev_timestamp: 20161002111252 ...
  • 55. 55 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Actual user content (anonymous user) mysql> SELECT * FROM revision WHERE rev_id = 742219056G ************************ 1. row ************************ rev_id: 742219056 rev_page: 46812822 rev_text_id: 750242734 rev_comment: As an anonymous user, my public IP will get saved, instead of a username rev_user: 0 rev_user_text: 80.113.15.100 rev_timestamp: 20161002111851 ...
  • 56. 56 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Pros and cons of the current implementation ● By denormalizing the table, most of the time only this table has to be checked – Only when a user is clicked the user table is accessed – No need to store information for the large amount of anonymous users with very few edits ● User renames are painful database-wise, and almost impossible for users with huge amount of edits (like bots)
  • 57. 57 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #4: KEY-VALUE SYSTEM MySQL Schema design in practice
  • 58. 58 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Content storage ● In the typical mediawiki installation content is referenced this way: – Pages can have several revisions, by default the last one is parsed and displayed – Revisions point to a text row – Text contains wikitext that has to be parsed “rendered” and sent to the user
  • 59. 59 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia sites needs to return hundreds thousands of pages per second ● The size of the content (wikitext) is multiple times that of the metadata ● The database growth is also diferent from the metadata, and very diferent for each wiki ● Should we setup a separate key-value system to store those edits? What should we seek? – Compression – Automatic sharding – Automatic failover – JSON support for flexible datatypes
  • 60. 60 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 61. 61 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia solution ● Each page can have a diferent content model/storage ● For example: – Regular Wikitext pages – User-editable JS/CSS/application messages – Forum threads (Flow feature, etc.) – Any other created by new extensions
  • 62. 62 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikitext storage ● The “text” table only contains pointers to content, not real content mysql> SELECT * FROM text ...; +---------+-----------------------+---------------------+ | old_id | old_text | old_flags | +---------+-----------------------+---------------------+ | 1 | #REDIRECT [[Town of 1770]] | utf-8 | | 2 | #REDIRECT [[Project:One-liner listings]] | utf-8 | ... | 3027206 | DB://cluster24/545108 | utf-8,gzip,external | | 3027205 | DB://cluster25/544444 | utf-8,gzip,external | +---------+-----------------------+---------------------+
  • 63. 63 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 External Storage ● Several shards of MySQL servers can serve content for every wiki ● The text is compressed and decompressed in gzip format at application level ● Smart “compressing” can be done: – Reviews with the same content for the same page are deduplicated – Older revisions are stored only using difs
  • 64. 64 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 External Storage Cluster
  • 65. 65 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Solid performance over HDs mysql> SELECT digest, digest_text, sum(COUNT_STAR), sum(SUM_TIMER_WAIT)/SUM(COUNT_STAR)/1000000 as microseconds, min(first_seen), max(last_seen) FROM events_statements_summary_by_digest GROUP BY DIGEST ORDER BY sum(COUNT_STAR) DESC LIMIT 1G *************************** 1. row *************************** digest: 1bf861e8cd3ea6bcac323bdf9caf4876 digest_text: SELECT `blob_text` FROM `blobs_cluster25` WHERE `blob_id` = ? LIMIT ? sum(COUNT_STAR): 2356927126 microseconds: 3384.60227946 min(first_seen): 2015-12-09 15:00:45 max(last_seen): 2016-10-02 11:41:01
  • 66. 66 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Speed optimizations ● External storage is only the canonical place where data is stored – Several layers of caching makes them low-traffic, disk-based storage ● Parsercache in memory (memcached) and disk (parsercache mysqls) avoids frequent usage – Local memcache is fast – Parsercache is shared even between datacenters, and survives restarts
  • 67. 67 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #5: REVISIONS AND DELETIONS MySQL Schema design in practice
  • 68. 68 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Deletions ● Pages can be deleted, with several degrees: – A new revision could just override a previous version of the page – A deletion that could be restored afterwards – Personal information or copyright that must be hidden from everyone ● In some cases, only some revisions should be deleted, not the whole page- e.g. someone editing an otherwise legitimate page with someone’s private data
  • 69. 69 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 How to implement deletions and restores? ● Should we delete content rows when doing hard- deletes or just overwrite the them with garbage? ● Should we move the rows to an archive table or should we mark them with deleted=1?
  • 70. 70 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 71. 71 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia implementation ● Individual revisions can be hidden (“suppresed”), no matter the page status ● Normal procedure is that full pages have to be deleted: – In that case, revisions are moved to the archive table – The page entry is deleted ● On restore, revisions are moved back to the revision table – A new page, with a new page_id is created – Not all revisions have to be restored necessarily
  • 72. 72 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0
  • 73. 73 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Deleting a page Code /** * Back-end article deletion * Deletes the article with database consistency, writes logs, purges caches … */ public function doDeleteArticleReal( $reason, $suppress = false, $u1 = null, $u2 = null, &$error = '', User $user = null, $tags = [] ) { $dbw = wfGetDB( DB_MASTER ); $res = $dbw->select( 'revision', array_merge( $fields, $deletionFields ), [ 'rev_page' => $id ], __METHOD__, 'FOR UPDATE' ); $dbw->insert( 'archive', $rowsInsert, __METHOD__ ); // Now that it's safely backed up, delete it $dbw->delete( 'page', [ 'page_id' => $id ], __METHOD__ ); $dbw->delete( 'revision', [ 'rev_page' => $id ], __METHOD__ ); // Log the deletion, if the page was suppressed, put it in the suppression log instead $logEntry = new ManualLogEntry( $logtype, 'delete' ); INSERT … SELECTs are avoided on HEAD
  • 74. 74 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Lessons learned ● INSERT … SELECTs are painful both for performance and/or consistency reasons – They created actual issues when combined with filtering or on pages with many revisions – New implementation avoids them ● Moving rows between tables is a terrible idea – Specifically, our implementation makes almost impossible to track the history of a text
  • 75. 75 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #6: A LARGE TABLE MySQL Schema design in practice
  • 76. 76 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Revision table on enwiki MariaDB MARIADB s1-master enwiki > SHOW CREATE TABLE revision; CREATE TABLE `revision` ( `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT, … ) ENGINE=InnoDB AUTO_INCREMENT=741078657 DEFAULT CHARSET=binary MariaDB MARIADB s1-master enwiki > SHOW TABLE STATUS like 'revision'G *************************** 1. row *************************** Name: revision Engine: InnoDB Version: 10 Row_format: Compact Rows: 614837124 Avg_row_length: 153 Data_length: 94510252032 Max_data_length: 0 Index_length: 90206896128
  • 77. 77 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Point SELECTs are fast MariaDB PRODUCTION s1 localhost performance_schema > SELECT * FROM events_statements_summary_by_digest ORDER BY count_star DESC LIMIT 1G *************************** 1. row *************************** SCHEMA_NAME: enwiki DIGEST: ed3c3539910af27e0f7e4ea442db9124 DIGEST_TEXT: SELECT `page_id` , `page_len` , `page_is_redirect` , `page_latest` , `page_content_model` FROM `page` WHERE `page_namespace` = ? AND `page_title` = ? LIMIT ? COUNT_STAR: 37247176536 SUM_TIMER_WAIT: 6633094681562378000 MIN_TIMER_WAIT: 43209000 AVG_TIMER_WAIT: 178083000 MAX_TIMER_WAIT: 258818851000 SUM_LOCK_TIME: 1867944737116000000 ... 1 row in set (0.06 sec) +---------------------------------+ | sys.format_time(AVG_TIMER_WAIT) | +---------------------------------+ | 178.08 us | +---------------------------------+
  • 78. 78 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Ranges are slow Host User Schema Client Source Thread Transaction Runtime Stamp db1066 wikiuser enwiki mw1193 - 42910008229 133321827778 318s 2016-09-25 15:31:15 SELECT /* ApiQueryContributors::execute */ rev_page AS `page`, rev_user AS `user`, MAX(rev_user_text) AS `username` FROM `revision` WHERE rev_page = '6768170' AND (rev_user != 0) AND ((rev_deleted & 4) = 0) GROUP BY rev_page, rev_user ORDER BY rev_user LIMIT 501
  • 79. 79 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 revision is not the only tall table MariaDB PRODUCTION s1 localhost enwiki > SHOW CREATE TABLE loggingG *************************** 1. row *************************** Table: logging Create Table: CREATE TABLE `logging` ( `log_id` int(10) unsigned NOT NULL AUTO_INCREMENT, ... ) ENGINE=InnoDB AUTO_INCREMENT=77577756 DEFAULT CHARSET=binary 1 row in set (0.00 sec) MariaDB PRODUCTION s1 localhost enwiki > SHOW TABLE STATUS like 'logging'G *************************** 1. row *************************** Name: logging Engine: InnoDB Version: 10 Row_format: Compact Rows: 72871150 Avg_row_length: 164 Data_length: 11963203584 Max_data_length: 0 Index_length: 36273225728
  • 80. 80 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 81. 81 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 recentchanges table -- Primarily a summary table for Special:Recentchanges, -- this table contains some additional info on edits from -- the last few days, see Article::editUpdates() – CREATE TABLE /*_*/recentchanges ( rc_id int NOT NULL PRIMARY KEY AUTO_INCREMENT, rc_timestamp varbinary(14) NOT NULL default '', -- As in revision rc_user int unsigned NOT NULL default 0, rc_user_text varchar(255) binary NOT NULL, More on: https://phabricator.wikimedia.org/difusion/MW/browse/master/maintenance/tables.sql;bc05426ae2708f8ac23b9106911fe35b5c51fd30$1057
  • 82. 82 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 recentchanges usage ● Recentchanges is mainly a revision summary table – Most reviews are only of recent edits -entries are purged after 30 days: MariaDB MARIADB s1-master enwiki > SELECT now(), min(rc_timestamp) from recentchanges; +---------------------+-------------------+ | now() | min(rc_timestamp) | +---------------------+-------------------+ | 2016-09-25 09:04:35 | 20160826090422 | +---------------------+-------------------+ 1 row in set (0.00 sec) ● It is updated synchronously with the edits ● It also contains additional fields / related tables like tags
  • 83. 83 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Special slaves && partitioning ● Mediawiki allows to define instance groups for certain queries https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php 'db1051' => 50, # 2.8TB 96GB, watchlist, recentchanges, contributions, logpager 'db1055' => 50, # 2.8TB 96GB, watchlist, recentchanges, contributions, logpager ● Querying contributions uses a separate group contributions: https://phabricator.wikimedia.org/difusion/MW/browse/master/includes/specials/pagers/ContribsPager.php;bc05426ae2708f8ac23b9106911fe35b5c51fd30$73 // Most of this code will use the 'contributions' group DB, which can map to replica Dbs // with extra user based indexes or partioning by user. The additional metadata // queries should use a regular replica DB since the lookup pattern is not all by user. $this->mDbSecondary = wfGetDB( DB_REPLICA ); // any random replica DB $this->mDb = wfGetDB( DB_REPLICA, 'contributions' );
  • 84. 84 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 enwiki tables have special partitioning ALTER TABLE enwiki.revision DROP PRIMARY KEY, DROP INDEX rev_id, ADD PRIMARY KEY (rev_id, rev_user) PARTITION BY RANGE (rev_user) ( PARTITION p1 VALUES LESS THAN (1), PARTITION p50000 VALUES LESS THAN (50000), PARTITION p100000 VALUES LESS THAN (100000), PARTITION p200000 VALUES LESS THAN (200000), PARTITION p300000 VALUES LESS THAN (300000), PARTITION p400000 VALUES LESS THAN (400000), PARTITION p500000 VALUES LESS THAN (500000), PARTITION p750000 VALUES LESS THAN (750000), … More on: https://phabricator.wikimedia.org/diffusion/OSOF/browse/master/dbtools/s1-pager.sql
  • 85. 85 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 The solution has problems ● It is a server-side hack, with poor to no awareness on server side – Mediawiki has to support MySQL 5.0, which has no patitioning support ● It is an enwiki-only patch; which is a bad idea in general ● Special slaves are a threat to High Availability
  • 86. 86 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #7: WHAT LINKS HERE MySQL Schema design in practice
  • 87. 87 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Categories, templates, images and special pages ● There is a lot of information that needs to be updated when a new edits is done: – The category must include the new page, on its right position – If it is a template or an image, all pages that include it have to change – The “What links here” has to reflect the new links – If a new page is created, links to it have to go from red to blue color
  • 88. 88 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Latency on edit has to be low ● Users understand that and “edit” will be slower than loading a page, but it still cannot take more than 0.5-1 second tops: https://grafana.wikimedia.org/dashboard/db/save-timing ● But we just said that an edit may require an update on millions of others! ● How to implement all the previous changes?
  • 89. 89 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 90. 90 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Wikimedia solution ● Only immediate tasks are done synchronously, transactionally: – Wikitext content is saved on External Storage – The recentchanges table adds a new entry for reviewing – The user edit count is increased in some wikis – The page is parsed for display to the user
  • 91. 91 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Background jobs ● Many tasks are enqueued on a Redis job queue ● Most of those tasks are cached/denormalized on database tables for easy joining: – Add the page to the proper category (categorylinks) – Add the page to the proper links (pagelinks) – Add the page to the list of templates used (templatelinks) – Many others such as refreshing the list of titles and its index on elasticsearch
  • 92. 92 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 doRefreshLinks() Job function run() { // Job to update all (or a range of) backlink pages for a page $this->runForTitle( $this->title ); } protected function runForTitle( Title $title ) { $revision = Revision::newFromTitle( $title, false, Revision::READ_LATEST ); $parserOutput = $content->getParserOutput( $title, $revision->$title, $revision- >$title, $revision getId(), $parserOptions, false );→ $updates = $content->getSecondaryDataUpdates($title, null, !empty( $this, $revision- >$title, $revision->getId(), $parserOptions, false ); $updates = $content getSecondaryDataUpdates(→ $title, null, !empty( $this->params['useRecursiveLinksUpdate'] ), $parserOutput ); InfoAction::invalidateCache( $title );
  • 93. 93 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 *links tables ● They store the page_ids where the resource is used but the namespace and the title they include, not the ids ● This is because pages reference a title, not an entity (e.g. you can include the template {{stub}}, or link to a page that can or not exist ● Some of those tables can grow a lot, and store very redundant data (there could be millions of rows with the template {{cc-by-sa-2.5}} on Wikimedia Commons
  • 94. 94 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Example: PageLinks Table SHOW CREATE TABLE pagelinksG *************************** 1. row *************************** Table: pagelinks Create Table: CREATE TABLE `pagelinks` ( `pl_from` int(8) unsigned NOT NULL DEFAULT '0', `pl_namespace` int(11) NOT NULL DEFAULT '0', `pl_title` varbinary(255) NOT NULL DEFAULT '', `pl_from_namespace` int(11) NOT NULL DEFAULT '0', UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`), KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`), KEY `pl_backlinks_namespace` (`pl_from_namespace`,`pl_namespace`,`pl_title`,`pl_from`) ) ENGINE=InnoDB DEFAULT CHARSET=binary
  • 95. 95 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Future ● A lot of space could be saved by normalizing/deduplicating the title text: mysql> SHOW TABLE STATUS like '%links'; +--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | +--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+ | categorylinks | InnoDB | 10 | Compact | 106308892 | 210 | 22379020288 | 0 | 30838489088 | | externallinks | InnoDB | 10 | Compact | 91491185 | 279 | 25571622912 | 0 | 40650997760 | | imagelinks | InnoDB | 10 | Compact | 81270504 | 83 | 6765756416 | 0 | 9644818432 | | iwlinks | InnoDB | 10 | Compact | 16909955 | 95 | 1622573056 | 0 | 2367488000 | | langlinks | InnoDB | 10 | Compact | 26155404 | 74 | 1941733376 | 0 | 1434042368 | | msg_resource_links | InnoDB | 10 | Compact | 3524 | 130 | 458752 | 0 | 0 | | pagelinks | InnoDB | 10 | Compact | 1192043172 | 74 | 89018269696 | 0 | 122491224064 | | templatelinks | InnoDB | 10 | Compact | 616611334 | 80 | 49860050944 | 0 | 66098577408 | +--------------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+ ● That would make title a first-class entity, diferent from the page entity
  • 96. 96 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #8: ANECDOTES: THE GHOST TABLES AND TIMESTAMPS MySQL Schema design in practice
  • 97. 97 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Story time ● A recently pooled slave broke its replication with: Error 'Table 'idwiki.hitcounter' doesn't exist' on query. Default database: 'idwiki'. Query: 'DELETE FROM `idwiki`.`hitcounter`' ● The table was indeed non-existent, but it was deprecated and was unused by mediawiki ● Hackers? Replication bug? Someone doing maintenance out-of-band?
  • 98. 98 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 99. 99 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Architecture ● At the moment, Wikimedia servers mostly use STATEMENT-based replication due to several application dependencies ● A Master-Master active/passive replication was being used among datacenters ● Replication was temporarily routed through a slave to deploy new TLS certificates ● The middle slave was rebooted
  • 100. 100 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 hitcounter table was using the MEMORY engine ● On server restart, because the server to avoid replication issues, a DELETE command is sent ● The DELETE got replicated towards the remote master, back to the primary master, and finally to the new slave – This broke replication because the new slave didn’t have the obsolete table, despite not writing to it ● Remember to clean up your tables on all servers to avoid issues like this!
  • 101. 101 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Timestamps and MySQL < 4.1 ● There where automatically updated on INSERT and UPDATE with no possibility of disabling that ● No strict mode disallowing zero dates ● Diferent databases and standards support needed: https://www.mediawiki.org/wiki/Manual:WfTimestamp #Formats
  • 102. 102 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 How to implement timestamps? ● They must store UTC times, controlled at application side ● They must be strictly sortable to be used on listings ● They must work on mysql 4.0 ● They must work similarly for other database backends
  • 103. 103 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 104. 104 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Mediawiki solution ● Timestamps are stored as binary(14) https://www.mediawiki.org/wiki/Manual:Timestamp ● For backwards and join compatibility, that is still the preferred format ● TIMESTAMP (4 bytes) now has all required features and would be much more compact, but converting millions of records is not worth the efort right now
  • 105. 105 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 tables.sql -- The MySQL table backend for MediaWiki currently uses -- 14-character BINARY or VARBINARY fields to store timestamps. -- The format is YYYYMMDDHHMMSS, which is derived from the -- text format of MySQL's TIMESTAMP fields. – -- Historically TIMESTAMP fields were used, but abandoned -- in early 2002 after a lot of trouble with the fields -- auto-updating. – -- The Postgres backend uses TIMESTAMPTZ fields for timestamps, -- and we will migrate the MySQL definitions at some point as -- well. CREATE TABLE /*_*/logging ( -- Log ID, for referring to this specific log entry, probably for deletion and such. log_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT, -- Symbolic keys for the general log type and the action type -- within the log. The output format will be controlled by the -- action field, but only the type controls categorization. log_type varbinary(32) NOT NULL default '', log_action varbinary(32) NOT NULL default '', -- Timestamp. Duh. log_timestamp binary(14) NOT NULL default '19700101000000',
  • 106. 106 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 CASE #9: SLOTS MySQL Schema design in practice
  • 107. 107 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Structured data for Wikipedia ● Wikitext is powerful but complex – It is not easy to do changes such as categories, infoboxes (templates) – It is also not easy to ofer them in a computer- readable way ● Tools like wikidata or image metadata require an easier way to integrate structured data
  • 108. 108 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 How to implement Multi-Content Revisions? ● A page may combine multiple sources (Image & Image metadata, wikitext and discussions, etc.) ● A new revision could be created by editing regular text or some of the structured data ● They types of structured data may vary and could be added later with new extensions
  • 109. 109 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Brainstorming time
  • 110. 110 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Current Multi-Content Revision proposal ● page and revision will work as usual (except a few of its fields will be made redundant) ● 2 new tables: content and content_revision will be added ● content initially will have the metadata for wikitext ● Other types of content can be referenced by revision thought content_revision – A revision now can handle multiple contents
  • 111. 111 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 The idea is to multiplex revision ● https://phabricator.wikimedia.org/T107595 ● We go from the straight forward – page -> revision -> text ( -> external store ) ● To a more indirect model: – page -> ( revision -> ) slots -> content -> ( text | external store )
  • 112. 112 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 The proposal is not issue-free ● Revision table is already very large for non-point SELECTs – How to handle ranges when content will be an even taller table (several content types per revision) – Also previously independent tables will now be integrated there (image_revisions, etc.) ● Maybe a diferent, multi-table implementation for the polymorphic association would be preferred? – Normalization and easiness to code vs. performance on implementation for large wikis
  • 113. 113 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 FINAL REMARKS MySQL Schema design in practice
  • 114. 114 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 “Do not let reality spoil your perfect on-paper design” ● Not always the most elegant design is the most appropriate – Reliability is the enemy of performance: simple, fast, safe – chose two – Technology available now does not normally cover 100% of the use cases; chose the one that cover 99%, implement the other 1%
  • 115. 115 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Where to know more about design ● Learn how other are doing it: – Uber: ● https://eng.uber.com/schemaless-part-one/ ● https://www.percona.com/live/plam16/sessions/relational-databases-uber-mysql-postgres – Facebook: ● https://www.percona.com/live/plam16/sessions/massive-schema-changes-facebook ● https://www.facebook.com/MySQLatFacebook/ – Youtube: https://www.percona.com/live/plam16/sessions/launching-vitess-how-run-youtubes-mysql-sharding-engine – Pinterest: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet ● Chose the right literature: – https://pragprog.com/book/bksqla/sql-antipatterns
  • 116. 116 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Where to know more about Mediawiki / MySQL@Wikipedia ● MySQL at Wikipedia Introduction: https://www.mediawiki.org/wiki/File:MySQL _at_Wikipedia.pdf ● Mediawiki source code and documentation: https://www.mediawiki.org/wiki/MediaWiki ● Wikitech (technical documentation): https://wikitech.wikimedia.org/ ● Operations/puppet (infrastructure) git repository: https://phabricator.wikimedia.org/difusion/OPUP/
  • 117. 117 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Was this session helpful? ● Consider supporting the Wikimedia Foundation! – The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual, educational content, and to providing the full content of these wiki-based projects to the public free of charge. – https://wikimediafoundation.org ● You can contribute with: – Your code (including infrastructure!): https://phabricator.wikimedia.org/difusion/ – Your time: https://en.wikipedia.org/wiki/Help:Editing – Your money: https://donate.wikimedia.org/
  • 118. 118 MySQL Schema design in practice © 2016 Wikimedia Foundation & Jaime Crespo. http://wikimedia.org. License: CC-BY-SA-3.0 Thank You! Remember to rate my session