The Future of Big Data is Relational (or why you can't escape SQL)
1. The Future of Relational
(or Why You Can't
Escape SQL)
tobrien@discursive.com
Twitter: @tobrien
Thursday, February 28, 13
2. In this session...
Ouroboros
Copernican Revolution
Ptolemaic Entrenchment
Janus
A two minute summary of the last 15 years
Google Magic
The Future of SQL
Thursday, February 28, 13
3. Tim O’Brien
I’m a developer who also writes
tobrien@discursive.com
Twitter: @tobrien
Thursday, February 28, 13
18. Google F1, Spanner
Translattice, Impala,
+ =
Drawn-to-Scale
Google’s BigTable
Paper - 2006 Text
Codd
NuoDB, Akiban, many
more NewSQL
products
Hadoop - 2007
Thursday, February 28, 13
27. 2000 In the beginning...
Proprietary app servers
Big Oracle database
Text
Thursday, February 28, 13
28. 2001 More traffic?
Specialized application
servers
Text
Throw hardware at the
database
Thursday, February 28, 13
29. 2002-2005 More traffic?
Specialized application
servers
Throw hardware at the
database
Thursday, February 28, 13
30. 2005 Event More
Traffic?
Sharding.... ugh.
Tex
Everything else was
scaling horizontal except
the database.
Thursday, February 28, 13
31. 2006 - New Reality of Big Data
Q: What would
Google’s BigTable
Google do?
Paper - 2006
A: Not use a
Hadoop - 2007 RDBMs
Thursday, February 28, 13
32. 2006
Big Data vs. RDBMs
for a few for most
Thursday, February 28, 13
33. 2007 •The
rise
of
Database
“Luddites”
Text
Who
needs
Foreign
Keys?
Transac3ons?
Just
Simplify
•
Thursday, February 28, 13
34. 2007 •The
rise
of
Database
“Luddites”
Text
Rails
hacked
away
@
database
“orthodoxy”
Opened
the
door
to
alterna3ve
approaches
Thursday, February 28, 13
36. 2007- present == Alternatives
•Documents
–MongoDB
–
Started
in
2007,
OSS
in
2009
–CouchDB
–
Started
in
2005
•Graphs
–Neo4j
•Key-‐Value
Stores
–Cassandra
–Riak
–Tokyo
Cabinet
•Memory
–Memcached
/
Redis
•Tabular
–HBase
Thursday, February 28, 13
37. 2012 Q: What database
do you use?
A: All of them
Text
Oracle, Mongo, MySQL, Impala,
Riak, some memcache, and some
Hadoop thrown in for fun
Thursday, February 28, 13
39. Big Data a Necessity at Largest Scale
“A certain kind of developer at a certain kind of company”
Most development still RDBMS
Thursday, February 28, 13
40. • There’s
this
company
that
sells
adver3sing
–~96%
of
revenue
came
from
adver3sing
in
2011
–~75%
of
the
US
Search
Advert
Market
in
2011
–~44%
shared
of
overall
online
ad
market
• One
of
the
most
important
applica3ons
at
Google
ran
on
MySQL
–AdWords
missed
the
NoSQL
revolu3on
Thursday, February 28, 13
41. Digging into the evolution of Storage at Google
• Google’s
BigTable
–
2006
–Tabular
–Sparse,
distributed,
mul3-‐dimensional
sorted
map
Thursday, February 28, 13
42. Digging into the evolution of Storage at Google
•Google’s
BigTable
–
2006
–“New
users
[]
uncertain
of
how
to
best
use
the
BigTable
interface,
par3cularly
if
they
are
accustomed
to
using
rela3onal
databases
that
support
general-‐purpose
transac3ons.”
Thursday, February 28, 13
43. Digging into the evolution of Storage at Google
• Google’s
Megastore
–
2010
–Hierarchical
“schemas”
–Posi3oned
as
a
NoSQL
store
–ACID
within
par33ons
Thursday, February 28, 13
44. Digging into the evolution of Storage at Google
• Google’s
Megastore
–
2010
–“Supports
two-‐phase
commit
for
atomic
updates
[]
these
transac3ons
have
much
higher
latency
and
increase
the
risk
of
conten3on,
we
generally
discourage
applica3ons
from
using
the
feature“
Thursday, February 28, 13
45. Digging into the evolution of Storage at Google
•Google’s
Spanner
&
F1
–
2012
•Paper
published
in
2012
–Hierarchical,
Semi-‐rela3onal
Schemas
–ACID
across
con3nents
possible
-‐
14ms
transac3on
overhead
in
a
data-‐center
with
clock
uncertainty
of
1ms.
–SQL
–Focus
on
Performance
•Gated
by
Clock
Uncertainty
•Consensus:
Paxos
Thursday, February 28, 13
46. What Differentiates Google Spanner?
•Transac3ons
are
only
possible
because
of
Paxos
•Forget
NTP,
Google
has
“Reified
Clock
Uncertainty”
•Epsilon,
clock
uncertainty,
is
the
ga3ng
factor
for
gaining
consensus
on
transac3on
3mestampe.
•It’s
all
about
Time
•“as
the
underlying
system
enforces
3ghter
bounds
on
clock
uncertainty,
the
overhead
of
the
stronger
seman3cs
decreases.
As
a
community,
we
should
no
longer
depend
on
loosely
synchronized
clocks
and
weak
3me
APIs
in
designing
distributed
algorithms.
Thursday, February 28, 13
47. Let me reiterate Google has Mastered Time
Thursday, February 28, 13
48. What Differentiates Google Spanner?
•Hierarchical,
Schema3zed
Tables
•Similar
to
Akiban’s
approach.
•Leads
to
some
interes3ng
possibili3es.
•Nested
Subqueries
and
Tree
Results
Thursday, February 28, 13
49. What Differentiates Google Spanner?
To reiterate:
* hierarchical, schematized tables
* distributed “compute fabric” for data
* Google has mastered Time
* Google built a warp reactor
Thursday, February 28, 13
50. As goes Google so does the world...
Translattice
Drawn-to-Scale
Akiban
Impala
Several NewSQL companies quickly jumped on this train:
- NuoDB
- VoltDB
Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust
effort.
Thursday, February 28, 13
51. Translattice
Translattice identifies itself as a database that resembles F1
It is a hosted database service which provides distributed transactions.
Translattice uses Paxos
They’ve extended Postgresql and emphasize customer control over data. A distributed,
cloud-based database
Thursday, February 28, 13
52. Akiban
Akiban’s approach to storage almost *exactly* matches the strategy Google uses in
Spanner.
Akiban lacks the distributed transaction capability of Spanner and F1, but they are
working on developing the capability.
Akiban has implemented a query parser, optimizer, and execution engine atop a
hierarchical approach to storage.
Thursday, February 28, 13
53. Drawn-to-Scale
Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments
Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute
fabric”
No Paxos or Transactions... yet. To be released, shortly. Stay tuned.
Drawn to Scale aims to be an “installable” database. Not going the hosted route.
Data stored in HDFS/HBase.
Thursday, February 28, 13
54. So there.
Big Data is turning into a Big Relational Database
Thursday, February 28, 13