SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
The Future of Relational
                              (or Why You Can't
                                 Escape SQL)
                                  tobrien@discursive.com

                                     Twitter: @tobrien




Thursday, February 28, 13
In this session...
                       Ouroboros
                       Copernican Revolution
                       Ptolemaic Entrenchment
                       Janus
                       A two minute summary of the last 15 years
                       Google Magic
                       The Future of SQL



Thursday, February 28, 13
Tim O’Brien
                 I’m a developer who also writes


                 tobrien@discursive.com
                 Twitter: @tobrien




Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
Revolution

Thursday, February 28, 13
Remember all
                            that Big Data
                            Stuff?

Thursday, February 28, 13
Remember when we all
                            thought it was time to give
                            up schemas?

                            Man, wasn’t that a lot of
                            work.




Thursday, February 28, 13
What if the relational
                            database “catches up”?

                            What then?




Thursday, February 28, 13
How we market Big Data:

                            Big Data == Paradigm Shift

                            “singularity” > “disruptor”



Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
“Big Data” is to “Traditional Databases” as...




                                                           Copernicus is to Ptolemy



Thursday, February 28, 13
Out with the “old”
                            In with the “new”



Thursday, February 28, 13
Claudius Ptolemy   Copernicus’
                                ~150 AD          model
                                                1543 AD




Thursday, February 28, 13
Google’s BigTable
                                                       Paper - 2006
                               Edgar F. Codd

                            “A Relational Model of
                            Data for Large Shared
                                                      Hadoop - 2007
                                Data Banks”
                                     1970




Thursday, February 28, 13
Thursday, February 28, 13
Google F1, Spanner
                                                                      Translattice, Impala,



                                   +                              =
                                                                      Drawn-to-Scale
                                       Google’s BigTable
                                         Paper - 2006      Text




                            Codd
                                                                      NuoDB, Akiban, many
                                                                      more NewSQL
                                                                      products
                                        Hadoop - 2007




Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
Youth             Age
                            Looking Forward   Looking Backward




Thursday, February 28, 13
Whatever.                Let’s create a schema.

                            Haven’t you heard?       Ok?

                            Databases don’t scale.




Thursday, February 28, 13
And, both are right...



Thursday, February 28, 13
• 




Thursday, February 28, 13
Thursday, February 28, 13
Thursday, February 28, 13
2000          In the beginning...

                                          Proprietary app servers


                                          Big Oracle database
                                   Text




Thursday, February 28, 13
2001          More traffic?

                                          Specialized application
                                          servers

                                   Text

                                          Throw hardware at the
                                          database




Thursday, February 28, 13
2002-2005   More traffic?

                                        Specialized application
                                        servers


                                        Throw hardware at the
                                        database




Thursday, February 28, 13
2005     Event More
                                     Traffic?

                                     Sharding.... ugh.


                               Tex


                                     Everything else was
                                     scaling horizontal except
                                     the database.




Thursday, February 28, 13
2006 - New Reality of Big Data

                                                 Q: What would
                             Google’s BigTable
                                                 Google do?
                               Paper - 2006

                                                 A: Not use a
                              Hadoop - 2007      RDBMs

Thursday, February 28, 13
2006


                              Big Data    vs.   RDBMs
                              for a few         for most


Thursday, February 28, 13
2007   •The	
  rise	
  of	
  Database	
  “Luddites”



                                                          Text




                                          Who	
  needs	
  Foreign	
  Keys?
                                          Transac3ons?	
  Just	
  Simplify
                                              •




Thursday, February 28, 13
2007      •The	
  rise	
  of	
  Database	
  “Luddites”



                                                           Text




                               Rails	
  hacked	
  away	
  @	
  database	
  “orthodoxy”

                               Opened	
  the	
  door	
  to	
  alterna3ve	
  approaches



Thursday, February 28, 13
•Although,	
  Basecamp	
  is	
  s3ll	
  a	
  single	
  RDBMS…




Thursday, February 28, 13
2007- present == Alternatives
                                     •Documents
                                       –MongoDB	
  –	
  Started	
  in	
  2007,	
  OSS	
  in	
  2009
                                       –CouchDB	
  –	
  Started	
  in	
  2005
                                     •Graphs
                                       –Neo4j
                                     •Key-­‐Value	
  Stores
                                       –Cassandra
                                       –Riak
                                       –Tokyo	
  Cabinet
                                     •Memory
                                       –Memcached	
  /	
  Redis
                                     •Tabular
                                       –HBase


Thursday, February 28, 13
2012          Q: What database
                                          do you use?

                                          A: All of them
                                   Text




                                          Oracle, Mongo, MySQL, Impala,
                                          Riak, some memcache, and some
                                          Hadoop thrown in for fun


Thursday, February 28, 13
Thursday, February 28, 13
Big Data a Necessity at Largest Scale


                            “A certain kind of developer at a certain kind of company”




                      Most development still RDBMS




Thursday, February 28, 13
• There’s	
  this	
  company	
  that	
  sells	
  adver3sing
             –~96%	
  of	
  revenue	
  came	
  from	
  adver3sing	
  in	
  2011
             –~75%	
  of	
  the	
  US	
  Search	
  Advert	
  Market	
  in	
  2011
             –~44%	
  shared	
  of	
  overall	
  online	
  ad	
  market

           • One	
  of	
  the	
  most	
  important	
  applica3ons	
  at	
  Google	
  ran	
  on	
  MySQL	
  
             –AdWords	
  missed	
  the	
  NoSQL	
  revolu3on




Thursday, February 28, 13
Digging into the evolution of Storage at Google
                            • Google’s	
  BigTable	
  –	
  2006
                              –Tabular
                              –Sparse,	
  distributed,	
  mul3-­‐dimensional	
  sorted	
  map




Thursday, February 28, 13
Digging into the evolution of Storage at Google
                            •Google’s	
  BigTable	
  –	
  2006

                             –“New	
  users	
  []	
  uncertain	
  of	
  how	
  to	
  best	
  use	
  the	
  BigTable	
  
                              interface,	
  par3cularly	
  if	
  they	
  are	
  accustomed	
  to	
  using	
  
                              rela3onal	
  databases	
  that	
  support	
  general-­‐purpose	
  
                              transac3ons.”




Thursday, February 28, 13
Digging into the evolution of Storage at Google
                            • Google’s	
  Megastore	
  –	
  2010
                              –Hierarchical	
  “schemas”
                              –Posi3oned	
  as	
  a	
  NoSQL	
  store
                              –ACID	
  within	
  par33ons




Thursday, February 28, 13
Digging into the evolution of Storage at Google
                            • Google’s	
  Megastore	
  –	
  2010

                             –“Supports	
  two-­‐phase	
  commit	
  for	
  atomic	
  updates	
  []	
  these	
  
                              transac3ons	
  have	
  much	
  higher	
  latency	
  and	
  increase	
  the	
  risk	
  
                              of	
  conten3on,	
  we	
  generally	
  discourage	
  applica3ons	
  from	
  
                              using	
  the	
  feature“




Thursday, February 28, 13
Digging into the evolution of Storage at Google
                            •Google’s	
  Spanner	
  &	
  F1	
  –	
  2012
                            •Paper	
  published	
  in	
  2012
                             –Hierarchical,	
  Semi-­‐rela3onal	
  Schemas
                             –ACID	
  across	
  con3nents	
  possible	
  -­‐	
  14ms	
  transac3on	
  overhead	
  in	
  a	
  
                              data-­‐center	
  with	
  clock	
  uncertainty	
  of	
  1ms.
                             –SQL

                              –Focus	
  on	
  Performance	
  
                               •Gated	
  by	
  Clock	
  Uncertainty
                               •Consensus:	
  Paxos


Thursday, February 28, 13
What Differentiates Google Spanner?
                            •Transac3ons	
  are	
  only	
  possible	
  because	
  of	
  Paxos

                            •Forget	
  NTP,	
  Google	
  has	
  “Reified	
  Clock	
  Uncertainty”
                              •Epsilon,	
  clock	
  uncertainty,	
  is	
  the	
  ga3ng	
  factor	
  for	
  gaining	
  consensus	
  on	
  
                               transac3on	
  3mestampe.

                            •It’s	
  all	
  about	
  Time
                              •“as	
  the	
  underlying	
  system	
  enforces	
  3ghter	
  bounds	
  on	
  clock	
  uncertainty,	
  the	
  
                               overhead	
  of	
  the	
  stronger	
  seman3cs	
  decreases.	
  As	
  a	
  community,	
  we	
  should	
  no	
  
                               longer	
  depend	
  on	
  loosely	
  synchronized	
  clocks	
  and	
  weak	
  3me	
  APIs	
  in	
  designing	
  
                               distributed	
  algorithms.



Thursday, February 28, 13
Let me reiterate Google has Mastered Time




Thursday, February 28, 13
What Differentiates Google Spanner?
                            •Hierarchical,	
  Schema3zed	
  Tables
                             •Similar	
  to	
  Akiban’s	
  approach.

                             •Leads	
  to	
  some	
  interes3ng	
  possibili3es.

                             •Nested	
  Subqueries	
  and	
  Tree	
  Results




Thursday, February 28, 13
What Differentiates Google Spanner?

                            To reiterate:

                            * hierarchical, schematized tables
                            * distributed “compute fabric” for data
                            * Google has mastered Time
                            * Google built a warp reactor

Thursday, February 28, 13
As goes Google so does the world...
                 Translattice
                 Drawn-to-Scale
                 Akiban
                 Impala


                 Several NewSQL companies quickly jumped on this train:
                      - NuoDB
                      - VoltDB


                 Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust
                 effort.



Thursday, February 28, 13
Translattice
                 Translattice identifies itself as a database that resembles F1


                 It is a hosted database service which provides distributed transactions.


                 Translattice uses Paxos


                 They’ve extended Postgresql and emphasize customer control over data. A distributed,
                  cloud-based database




Thursday, February 28, 13
Akiban
                 Akiban’s approach to storage almost *exactly* matches the strategy Google uses in
                  Spanner.


                 Akiban lacks the distributed transaction capability of Spanner and F1, but they are
                  working on developing the capability.


                 Akiban has implemented a query parser, optimizer, and execution engine atop a
                  hierarchical approach to storage.




Thursday, February 28, 13
Drawn-to-Scale

                 Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments


                 Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute
                  fabric”


                 No Paxos or Transactions... yet. To be released, shortly. Stay tuned.


                 Drawn to Scale aims to be an “installable” database. Not going the hosted route.


                 Data stored in HDFS/HBase.



Thursday, February 28, 13
So there.
                            Big Data is turning into a Big Relational Database




Thursday, February 28, 13

Más contenido relacionado

Más de OReillyStrata

Dealing with Uncertainty: What the reverend Bayes can teach us.
Dealing with Uncertainty: What the reverend Bayes can teach us.Dealing with Uncertainty: What the reverend Bayes can teach us.
Dealing with Uncertainty: What the reverend Bayes can teach us.OReillyStrata
 
SapientNitro Strata_presentation_upload
SapientNitro Strata_presentation_uploadSapientNitro Strata_presentation_upload
SapientNitro Strata_presentation_uploadOReillyStrata
 
Digital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the worldDigital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the worldOReillyStrata
 
Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7OReillyStrata
 
Data as an Art Material. Case study: The Open Data Institute
Data as an Art Material. Case study: The Open Data InstituteData as an Art Material. Case study: The Open Data Institute
Data as an Art Material. Case study: The Open Data InstituteOReillyStrata
 
Giving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsGiving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsOReillyStrata
 
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?OReillyStrata
 
The Workflow Abstraction
The Workflow AbstractionThe Workflow Abstraction
The Workflow AbstractionOReillyStrata
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballOReillyStrata
 
Designing Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of DiscoveryDesigning Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of DiscoveryOReillyStrata
 
Digital Reasoning_Tim Estes_Strata NYC 2012
Digital Reasoning_Tim Estes_Strata NYC 2012Digital Reasoning_Tim Estes_Strata NYC 2012
Digital Reasoning_Tim Estes_Strata NYC 2012OReillyStrata
 
clearScienceStrataRx2012
clearScienceStrataRx2012clearScienceStrataRx2012
clearScienceStrataRx2012OReillyStrata
 

Más de OReillyStrata (14)

Dealing with Uncertainty: What the reverend Bayes can teach us.
Dealing with Uncertainty: What the reverend Bayes can teach us.Dealing with Uncertainty: What the reverend Bayes can teach us.
Dealing with Uncertainty: What the reverend Bayes can teach us.
 
SapientNitro Strata_presentation_upload
SapientNitro Strata_presentation_uploadSapientNitro Strata_presentation_upload
SapientNitro Strata_presentation_upload
 
Digital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the worldDigital analytics & privacy: it's not the end of the world
Digital analytics & privacy: it's not the end of the world
 
Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7Giving Organisations new capabilities to ask the right business questions 1.7
Giving Organisations new capabilities to ask the right business questions 1.7
 
Data as an Art Material. Case study: The Open Data Institute
Data as an Art Material. Case study: The Open Data InstituteData as an Art Material. Case study: The Open Data Institute
Data as an Art Material. Case study: The Open Data Institute
 
Giving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsGiving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business Questions
 
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
 
The Workflow Abstraction
The Workflow AbstractionThe Workflow Abstraction
The Workflow Abstraction
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the Hairball
 
Designing Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of DiscoveryDesigning Big Data Interactions: The Language of Discovery
Designing Big Data Interactions: The Language of Discovery
 
Digital Reasoning_Tim Estes_Strata NYC 2012
Digital Reasoning_Tim Estes_Strata NYC 2012Digital Reasoning_Tim Estes_Strata NYC 2012
Digital Reasoning_Tim Estes_Strata NYC 2012
 
clearScienceStrataRx2012
clearScienceStrataRx2012clearScienceStrataRx2012
clearScienceStrataRx2012
 

Último

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Último (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

The Future of Big Data is Relational (or why you can't escape SQL)

  • 1. The Future of Relational (or Why You Can't Escape SQL) tobrien@discursive.com Twitter: @tobrien Thursday, February 28, 13
  • 2. In this session...  Ouroboros  Copernican Revolution  Ptolemaic Entrenchment  Janus  A two minute summary of the last 15 years  Google Magic  The Future of SQL Thursday, February 28, 13
  • 3. Tim O’Brien  I’m a developer who also writes  tobrien@discursive.com  Twitter: @tobrien Thursday, February 28, 13
  • 7. Remember all that Big Data Stuff? Thursday, February 28, 13
  • 8. Remember when we all thought it was time to give up schemas? Man, wasn’t that a lot of work. Thursday, February 28, 13
  • 9. What if the relational database “catches up”? What then? Thursday, February 28, 13
  • 10. How we market Big Data: Big Data == Paradigm Shift “singularity” > “disruptor” Thursday, February 28, 13
  • 13. “Big Data” is to “Traditional Databases” as... Copernicus is to Ptolemy Thursday, February 28, 13
  • 14. Out with the “old” In with the “new” Thursday, February 28, 13
  • 15. Claudius Ptolemy Copernicus’ ~150 AD model 1543 AD Thursday, February 28, 13
  • 16. Google’s BigTable Paper - 2006 Edgar F. Codd “A Relational Model of Data for Large Shared Hadoop - 2007 Data Banks” 1970 Thursday, February 28, 13
  • 18. Google F1, Spanner Translattice, Impala, + = Drawn-to-Scale Google’s BigTable Paper - 2006 Text Codd NuoDB, Akiban, many more NewSQL products Hadoop - 2007 Thursday, February 28, 13
  • 21. Youth Age Looking Forward Looking Backward Thursday, February 28, 13
  • 22. Whatever. Let’s create a schema. Haven’t you heard? Ok? Databases don’t scale. Thursday, February 28, 13
  • 23. And, both are right... Thursday, February 28, 13
  • 27. 2000 In the beginning... Proprietary app servers Big Oracle database Text Thursday, February 28, 13
  • 28. 2001 More traffic? Specialized application servers Text Throw hardware at the database Thursday, February 28, 13
  • 29. 2002-2005 More traffic? Specialized application servers Throw hardware at the database Thursday, February 28, 13
  • 30. 2005 Event More Traffic? Sharding.... ugh. Tex Everything else was scaling horizontal except the database. Thursday, February 28, 13
  • 31. 2006 - New Reality of Big Data Q: What would Google’s BigTable Google do? Paper - 2006 A: Not use a Hadoop - 2007 RDBMs Thursday, February 28, 13
  • 32. 2006 Big Data vs. RDBMs for a few for most Thursday, February 28, 13
  • 33. 2007 •The  rise  of  Database  “Luddites” Text Who  needs  Foreign  Keys? Transac3ons?  Just  Simplify • Thursday, February 28, 13
  • 34. 2007 •The  rise  of  Database  “Luddites” Text Rails  hacked  away  @  database  “orthodoxy” Opened  the  door  to  alterna3ve  approaches Thursday, February 28, 13
  • 35. •Although,  Basecamp  is  s3ll  a  single  RDBMS… Thursday, February 28, 13
  • 36. 2007- present == Alternatives •Documents –MongoDB  –  Started  in  2007,  OSS  in  2009 –CouchDB  –  Started  in  2005 •Graphs –Neo4j •Key-­‐Value  Stores –Cassandra –Riak –Tokyo  Cabinet •Memory –Memcached  /  Redis •Tabular –HBase Thursday, February 28, 13
  • 37. 2012 Q: What database do you use? A: All of them Text Oracle, Mongo, MySQL, Impala, Riak, some memcache, and some Hadoop thrown in for fun Thursday, February 28, 13
  • 39. Big Data a Necessity at Largest Scale “A certain kind of developer at a certain kind of company” Most development still RDBMS Thursday, February 28, 13
  • 40. • There’s  this  company  that  sells  adver3sing –~96%  of  revenue  came  from  adver3sing  in  2011 –~75%  of  the  US  Search  Advert  Market  in  2011 –~44%  shared  of  overall  online  ad  market • One  of  the  most  important  applica3ons  at  Google  ran  on  MySQL   –AdWords  missed  the  NoSQL  revolu3on Thursday, February 28, 13
  • 41. Digging into the evolution of Storage at Google • Google’s  BigTable  –  2006 –Tabular –Sparse,  distributed,  mul3-­‐dimensional  sorted  map Thursday, February 28, 13
  • 42. Digging into the evolution of Storage at Google •Google’s  BigTable  –  2006 –“New  users  []  uncertain  of  how  to  best  use  the  BigTable   interface,  par3cularly  if  they  are  accustomed  to  using   rela3onal  databases  that  support  general-­‐purpose   transac3ons.” Thursday, February 28, 13
  • 43. Digging into the evolution of Storage at Google • Google’s  Megastore  –  2010 –Hierarchical  “schemas” –Posi3oned  as  a  NoSQL  store –ACID  within  par33ons Thursday, February 28, 13
  • 44. Digging into the evolution of Storage at Google • Google’s  Megastore  –  2010 –“Supports  two-­‐phase  commit  for  atomic  updates  []  these   transac3ons  have  much  higher  latency  and  increase  the  risk   of  conten3on,  we  generally  discourage  applica3ons  from   using  the  feature“ Thursday, February 28, 13
  • 45. Digging into the evolution of Storage at Google •Google’s  Spanner  &  F1  –  2012 •Paper  published  in  2012 –Hierarchical,  Semi-­‐rela3onal  Schemas –ACID  across  con3nents  possible  -­‐  14ms  transac3on  overhead  in  a   data-­‐center  with  clock  uncertainty  of  1ms. –SQL –Focus  on  Performance   •Gated  by  Clock  Uncertainty •Consensus:  Paxos Thursday, February 28, 13
  • 46. What Differentiates Google Spanner? •Transac3ons  are  only  possible  because  of  Paxos •Forget  NTP,  Google  has  “Reified  Clock  Uncertainty” •Epsilon,  clock  uncertainty,  is  the  ga3ng  factor  for  gaining  consensus  on   transac3on  3mestampe. •It’s  all  about  Time •“as  the  underlying  system  enforces  3ghter  bounds  on  clock  uncertainty,  the   overhead  of  the  stronger  seman3cs  decreases.  As  a  community,  we  should  no   longer  depend  on  loosely  synchronized  clocks  and  weak  3me  APIs  in  designing   distributed  algorithms. Thursday, February 28, 13
  • 47. Let me reiterate Google has Mastered Time Thursday, February 28, 13
  • 48. What Differentiates Google Spanner? •Hierarchical,  Schema3zed  Tables •Similar  to  Akiban’s  approach. •Leads  to  some  interes3ng  possibili3es. •Nested  Subqueries  and  Tree  Results Thursday, February 28, 13
  • 49. What Differentiates Google Spanner? To reiterate: * hierarchical, schematized tables * distributed “compute fabric” for data * Google has mastered Time * Google built a warp reactor Thursday, February 28, 13
  • 50. As goes Google so does the world...  Translattice  Drawn-to-Scale  Akiban  Impala  Several NewSQL companies quickly jumped on this train: - NuoDB - VoltDB  Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust  effort. Thursday, February 28, 13
  • 51. Translattice  Translattice identifies itself as a database that resembles F1  It is a hosted database service which provides distributed transactions.  Translattice uses Paxos  They’ve extended Postgresql and emphasize customer control over data. A distributed, cloud-based database Thursday, February 28, 13
  • 52. Akiban  Akiban’s approach to storage almost *exactly* matches the strategy Google uses in Spanner.  Akiban lacks the distributed transaction capability of Spanner and F1, but they are working on developing the capability.  Akiban has implemented a query parser, optimizer, and execution engine atop a hierarchical approach to storage. Thursday, February 28, 13
  • 53. Drawn-to-Scale  Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments  Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute fabric”  No Paxos or Transactions... yet. To be released, shortly. Stay tuned.  Drawn to Scale aims to be an “installable” database. Not going the hosted route.  Data stored in HDFS/HBase. Thursday, February 28, 13
  • 54. So there. Big Data is turning into a Big Relational Database Thursday, February 28, 13