SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
DB Revolution: 2nd Roundtable
Wednesday, March 14, 12
Eric Kavanagh
                          Eric.kavanagh@bloorgroup.com




                                          Twitter Tag: #briefr
Wednesday, March 14, 12
To conduct an Open Research program that
                      invites the participation of both IT users and
                      technology vendors

                      To assist IT buyers in understanding database
                      technology and the architecture that surrounds
                      it.

                      Allow audience members to pose serious
                      questions... and get answers!

                      Publish all findings


                                                               Twitter Tag: #briefr
Wednesday, March 14, 12
Your Host: Eric Kavanagh
                   Research Leader: Mark Madsen - Third Nature
                  Primary Collaborator: Robin Bloor - The Bloor
                  Group
                   Guest Analyst 1: Colin White - BI Research
                  Guest Analyst 2: Steve Dine - DataSource
                  Consulting



Wednesday, March 14, 12
Colin White is the president of
               DataBase Associates Inc. and founder
               of BI Research. He is well known for
               his in-depth knowledge of data
               management, information integration,
               and business intelligence technologies.
               He has consulted for dozens of
               companies throughout the world and is
               a frequent speaker at leading IT
               events. For ten years he was the
               conference chair of the DCI and Shared
               Insights Portals, Content Management,
               and Collaboration conference.




  Twitter Tag: #briefr
Wednesday, March 14, 12
Big Data is Bigger than NoSQL



                                                  Colin White
                                       President BI Research
                                                 March 2012




Wednesday, March 14, 12
What is Big Data?

      A term that represents
      workloads and data
      management solutions that
      could not previously be
      supported because of cost
      considerations and/or
      technology limitations
      Three important technologies:
      •Optimized analytic RDBMSs
      •Non-relational “NoSQL” systems
      •Stream processing systems



                               Copyright © BI Research, 2012   2

Wednesday, March 14, 12
Big Data: The Business Case

        Smarter Decisions
            • Analyze new sources of data e.g., sensor data, web content,
              systems logs, text, XML files, graph data, map data, etc.
            • More sophisticated analyses - advanced analytics

        Faster Decisions
            • Supports workloads that were difficult to implement previously in
              a timely or cost-effective manner
            • Faster data analysis, e.g., analysis of large detailed data stores,
              dramatic increase in analytic model execution

        Faster Time to Value
            • Analyze data that is outside of the enterprise data warehouse,
              e.g., machine-generated data such as sensor data


                                     Copyright © BI Research, 2012                  3

Wednesday, March 14, 12
Non-Relational Solutions

  Some organizations have developed their
  own non-relational (NoSQL) systems to
  support extreme workloads
       • Google: MapReduce + BigTable DBMS +
         Google File System

  Non-relational systems are not new, but
  modern versions are often available to the
  open source community
       • Often support commodity hardware in a large-
         scale distributed computing environment
       • Several types of data stores (key value,
         graph, document, indexed file/DB systems)
       • A key vendor focus area is the Hadoop
         distributed computing system



                                     Copyright © BI Research, 2012   4

Wednesday, March 14, 12
Hadoop versus an RDBMS
     This debate is reminiscent of the object versus relational database
     debates of the 1980s, and the reasons are similar
          • Programmers prefer procedural programmatic approaches for accessing
            and manipulating data, e.g., MapReduce
          • Non-programmers prefer declarative languages, e.g., RDBMSs and SQL

     Adding the Hive SQL-like language to Hadoop, and MR functions to
     RDBMSs, however, complicates the debate
     Key requirements are:
          • The ability for organizations to easily analyze large volumes of multi-
            structured data with good price/performance
          • The need to make technologies for developing and running these
            analyses more usable by data scientists

     Organizations will likely use Hadoop and an RDBMS - the challenges
     are deciding which to use when and interconnecting the systems


                                        Copyright © BI Research, 2012                 5

Wednesday, March 14, 12
11

Wednesday, March 14, 12
The Value of Big Data: McKinsey Report




          www.mckinsey.com/Insights/MGI/Research/
          Technology_and_Innovation/Big_data_The_next_frontier_for_innovation



                                                            Copyright © BI Research, 2012   7

Wednesday, March 14, 12
Robin Bloor is Chief
                            Analyst at The
                             Bloor Group.



                           Robin.Bloor@Bloorgroup.com




   Twitter Tag: #briefr
Wednesday, March 14, 12
The Hardware Landscape
              CPUs go multicore
              Memory/Disk cost ratio falls
              Speed of random reads lag speed
              of serial reads
              Faster networking and fast
              switches
              Parallelism becomes more
              important
              Commodity servers
              Cloud computing cuts H/W costs




Wednesday, March 14, 12
That MapReduce Thing
           There are two fundamental
           approaches to parallelism
             Data Partitioning
             Process partitioning
           MapReduce implements an
           approach which is oriented to
           data partitioning
           This relates to data processing
           rather than to database
           Hadoop is often used for ETL




Wednesday, March 14, 12
The Devil Is In The Workload
              NoSQL is a distraction
              Big Data can be Big US          Big         XML

              Data or Big SDATA        D
                                             Table        Store

                                       A

              Unstructured             T
                                       A    Column
                                             Store
                                                        Document
                                                          Store
              workloads are rarely     V
                                       O
              suited to traditional    L
                                       U
                                            RDBMS        ODBMS


              RDMBS-type engines
                                       M
                                       E
                                            Database     Database

              Analytical workloads
              span both                       More
                                           Structured
                                                           Less
                                                        Structured




Wednesday, March 14, 12
If you don’t know the expected
              workloads, you shouldn’t be
                  selecting a database




Wednesday, March 14, 12
Steve Dine is the founder of Datasource
                          Consulting, LLC. He has extensive experience
                          delivering and managing successful, highly
                          scalable and maintainable data integration and
                          business intelligence solutions. Steve combines
                          hands-on technical experience across the
                          entire BI project lifecycle with strong business
                          acumen. He currently works as a consultant for
                          Fortune 500 companies. Steve is a faculty
                          member at TDWI and a judge for the Annual
                          TDWI Best Practices Awards. He teaches
                          courses and presents on many BI topics.
                          Contact info: Twitter: @steve_dineEmail:
                          sdine@datasourceconsulting.com Web: http://
                          www.datasourceconsulting.com




   Twitter Tag: #briefr
Wednesday, March 14, 12
The State of NoSQL & BI
    From the trenches…




                                                      “Hey	
  Bob,	
  seems	
  like	
  a	
  no	
  brainer.	
  	
  
                                                      So,	
  what’s	
  the	
  catch?”
   *	
  Graphic	
  from	
  h=p://schri@man.wordpress.com/category/booksbook-­‐reviews/c-­‐s-­‐lewis/page/2/
    Confiden)al,	
  Datasource	
  Consul)ng,	
  LLC

                                                                                                                     19
Wednesday, March 14, 12
Why NoSQL?




                      More	
  data
                      More	
  different	
  types	
  of	
  data	
  (semi-­‐structured,	
  
                       unstructured)
                      More	
  frequent	
  changes	
  to	
  the	
  structure	
  of	
  the	
  data	
  we	
  
                       need	
  to	
  store	
  and	
  analyze
                      More	
  demand	
  for	
  the	
  long	
  tail	
  analysis
                      More	
  “affordable”,	
  commodity	
  hardware	
  available	
  (blade	
  
                       servers,	
  “cheap”	
  storage,	
  cloud)
                      More	
  buzz!
     *	
  Graphic	
  from	
  h=p://www.fredberinger.com/musings-­‐on-­‐nosql/
    Confiden)al,	
  Datasource	
  Consul)ng,	
  LLC

                                                                                                          20
Wednesday, March 14, 12
Why Not Not NoSQL?




                       RelaCvely	
  immature	
  (0.x	
  –	
  2.x)
                       Difficult	
  to	
  describe	
  to	
  decision	
  makers
                       Not	
  fit	
  for	
  purpose	
  (low	
  latency,	
  update	
  heavy,	
  complex	
  
                        joins)
                       In	
  many	
  organizaCons	
  it’s	
  a	
  soluCon	
  looking	
  for	
  a	
  problem	
  
                       Lack	
  of	
  “BI”	
  support
                       Skills	
  gap!


   *	
  Graphic	
  based	
  on	
  h=p://www.fredberinger.com/musings-­‐on-­‐nosql/
    Confiden)al,	
  Datasource	
  Consul)ng,	
  LLC

                                                                                                              21
Wednesday, March 14, 12
BI-NoSQL Skills Gap




                                            “SQL”	
  Skills                                              NoSQL	
  Skills

                                   •	
  GUI’s	
  (mostly)                                         •	
  Command	
  Line
                                   •	
  Rela)onal	
  Data	
                                       •	
  Key-­‐Value	
  /	
  Column	
  
                                   Modeling	
                                                     Family	
  Modeling
                                   •	
  RDBMS	
                                                   •	
  Distributed	
  Data	
  
                                   •	
  SQL                                                       Store
                                   •	
  Stored	
  procedures                                      •	
  Programming	
  (Java,	
  
                                   •	
  LDAP                                                      Jscript,	
  Python,	
  etc)
                                   •	
  Javascript                                                •	
  MapReduce	
  (Hive)
                                   •	
  Batch/Shell	
  Scripts                                    •	
  JSON
                                                                                                  •	
  Shell	
  Scripts




    *	
  Graphic	
  based	
  on	
  h=p://www.beckshome.com/index.php/2007/09/the-­‐soa-­‐chasm/
    Confiden)al,	
  Datasource	
  Consul)ng,	
  LLC

                                                                                                                                        22
Wednesday, March 14, 12
Conclusions?




                      • Best	
  to	
  evaluate	
  your	
  true	
  data	
  size,	
  data	
  growth,	
  data	
  formats,	
  data	
  
                        structure	
  and	
  analyCc	
  requirements	
  before	
  deciding	
  on	
  soluCon
                      • Make	
  sure	
  to	
  evaluate	
  your	
  available	
  skills
                        • Experienced	
  NoSQL	
  resources	
  with	
  BI	
  experience	
  not	
  always	
  easy	
  to	
  
                          find
                      • Need	
  to	
  plan	
  for	
  addiConal	
  technology	
  risk	
  in	
  project	
  plan	
  
                        • Consider	
  starCng	
  out	
  with	
  one	
  part	
  of	
  your	
  DW	
  architecture	
  (i.e.	
  
                          staging)
                        • POC	
  POC	
  POC
                      • NoSQL	
  maturing	
  quickly	
  and	
  will	
  likely	
  conCnue	
  to	
  evolve	
  into	
  a	
  hybrid	
  
                        soluCon

    Confiden)al,	
  Datasource	
  Consul)ng,	
  LLC

                                                                                                                                     23
Wednesday, March 14, 12
Mark Madsen is founder of Third Nature, a
                research and consulting firm focused on
                analytics, BI and decision-making. Mark
                spent the past two decades working on
                analysis and decision support in many
                industries and countries. He is an award-
                winning architect and former CTO whose
                work has been featured in numerous
                industry publications. Over the past ten
                years Mark received awards for his work
                from the American Productivity & Quality
                Center, TDWI, and the Smithsonian Institute.
                He is an international speaker, a contributing
                editor at Intelligent Enterprise, and manages
                the open source channel at the Business
                Intelligence Network. For more information
                or to contact Mark, visit http://
                ThirdNature.net.




   Twitter Tag: #briefr
Wednesday, March 14, 12
One Size Doesn’t Fit All
                          Choosing which big data,
                          NoSQL or database
                          technology to use


                          March 14, 2012

                          Mark R. Madsen
                          http://ThirdNature.net




Wednesday, March 14, 12
Wednesday, March 14, 12
Big	
  data?




                                 Unstructured	
  data	
  isn’t	
  really	
  
                                 unstructured.
                                 The	
  problem	
  is	
  that	
  this	
  data	
  
                                 is	
  unmodeled.
                                 The	
  real	
  challenge	
  is	
  
                                 complexity.


Wednesday, March 14, 12
The	
  holy	
  grail	
  of	
  databases	
  under	
  current	
  market	
  hype




  A	
  key	
  problem	
  is	
  that	
  we’re	
  
  talking	
  mostly	
  about	
  
  computa?on	
  over	
  data	
  
  when	
  we	
  talk	
  about	
  “big	
  
  data”	
  and	
  analy?cs,	
  a	
  
  poten?al	
  mismatch	
  for	
  both	
  
  rela?onal	
  and	
  nosql.
Wednesday, March 14, 12
Solving	
  the	
  Problem	
  Depends	
  on	
  the	
  Diagnosis




Wednesday, March 14, 12
You	
  must	
  understand	
  your	
  
workload	
  -­‐	
  throughput	
  and	
  
response	
  =me	
  requirements	
  
aren’t	
  enough.
    ▪ 100	
  simple	
  queries	
  accessing	
  
      month-­‐to-­‐date	
  data
    ▪ 90	
  simple	
  queries	
  accessing	
  
      month-­‐to-­‐date	
  data	
  plus	
  10	
  
      complex	
  queries	
  using	
  two	
  years	
  
      of	
  history
    ▪ Hazard	
  calculaCon	
  for	
  the	
  enCre	
  
      customer	
  master
    ▪ Performance	
  problems	
  are	
  rarely	
  
      due	
  to	
  a	
  single	
  factor.	
  


Wednesday, March 14, 12
Workload:	
  One	
  big	
  query	
  or	
  many	
  small	
  queries?




Retrieval: small return set or large?
Selectivity: large volume of data scanned or small?
Wednesday, March 14, 12
Important	
  workload	
  parameters	
  to	
  know
    • Read-­‐intensive	
  	
  vs.	
  write-­‐intensive




Wednesday, March 14, 12
Important	
  workload	
  parameters	
  to	
  know
    • Read-­‐intensive	
  	
  vs.	
  write-­‐intensive
    • Mutable	
  vs.	
  immutable	
  data




Wednesday, March 14, 12
Important	
  workload	
  parameters	
  to	
  know
    • Read-­‐intensive	
  	
  vs.	
  write-­‐intensive
    • Mutable	
  vs.	
  immutable	
  data
    • Immediate	
  vs.	
  eventual	
  consistency




Wednesday, March 14, 12
Important	
  workload	
  parameters	
  to	
  know
    • Read-­‐intensive	
  	
  vs.	
  write-­‐intensive
    • Mutable	
  vs.	
  immutable	
  data
    • Immediate	
  vs.	
  eventual	
  consistency
    • Short	
  vs.	
  long	
  access	
  latency




Wednesday, March 14, 12
Important	
  workload	
  parameters	
  to	
  know
    • Read-­‐intensive	
  	
  vs.	
  write-­‐intensive
    • Mutable	
  vs.	
  immutable	
  data
    • Immediate	
  vs.	
  eventual	
  consistency
    • Short	
  vs.	
  long	
  access	
  latency
    • Predictable	
  vs.	
  unpredictable	
  data	
  access	
  paEerns




Wednesday, March 14, 12
Types	
  of	
  workloads
    Write-­‐biased:	
                       Read-­‐biased:
          ▪ OLTP                             Query
          ▪ OLTP,	
  batch                   Query,	
  simple	
  retrieval
          ▪ OLTP,	
  lite                    Query,	
  complex
          ▪ Object	
  persistence            Query-­‐hierarchical	
  /	
  
          ▪ Data	
  ingest,	
  batch         object	
  /	
  network
          ▪ Data	
  ingest,	
  real-­‐Cme    AnalyCc


                  Mixed?
                  Inline analytic execution, operational BI

Wednesday, March 14, 12
Matching	
  to	
  parameters,	
  at	
  assumpCon	
  of	
  data	
  scale
 Workload	
               Write-­‐   Read-­‐   Updateable	
   Eventual	
      Un-­‐           Compute	
  
 parameters               biased     biased    data           consistency	
   predictable	
   intensive
                                                              ok              query	
  path
 Standard	
  
 RDBMS
 Parallel	
  
 RDBMS
 NoSQL	
  (kv,	
  
 dht,	
  obj)
 Hadoop*

 Streaming	
  
 database

         You see the problem: it’s an intersection of multiple parameters, and this
         chart only includes the first tier of parameters. Plus, workload factors can
         completely invert these general rules of thumb.
Wednesday, March 14, 12
Matching	
  to	
  parameters,	
  at	
  assumpCon	
  of	
  data	
  scale
 Workload	
                  Complex	
   SelecCve	
     Low	
  latency	
   High	
        High	
  ingest	
  
 parameters                  queries     queries        queries            concurrency   rate


 Standard	
  
 RDBMS
 Parallel	
  RDBMS


 NoSQL	
  (kv,	
  dht,	
  
 obj)
 Hadoop

 Streaming	
  
 database

       You have to look at the combination of workload factors: data scale,
       concurrency, latency & response time, then chart the parameters.

Wednesday, March 14, 12
Always	
  build	
  a	
  proof	
  of	
  concept!




Wednesday, March 14, 12
Disection &
                          Discussion




                                        Twitter Tag: #briefr
Wednesday, March 14, 12
Wednesday, March 14, 12
March:
                          Vendor Research
                          March 14th: Second Round Table focusing on No SQL databases and
                          their application
                          DB Revolution Survey conducted

                   April:
                          Vendor Research
                          Publishing of Round Table Transcripts, with comments
                   May:
                          Authoring of White Paper
                          Publishing of White Paper
                          Publishing of survey activity



                                                                                 Twitter Tag: #briefr
Wednesday, March 14, 12
March Briefing Room:
              Integration

              April Briefing Room:
              Discovery

              May Briefing Room: Analytics




                                             Twitter Tag: #briefr
Wednesday, March 14, 12
Thank You
                           For Your
                          Attention



Wednesday, March 14, 12

Más contenido relacionado

Más de Inside Analysis

Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 

Más de Inside Analysis (20)

Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Roundtable 2: Big Data Analytics and NoSQL

  • 1. DB Revolution: 2nd Roundtable Wednesday, March 14, 12
  • 2. Eric Kavanagh Eric.kavanagh@bloorgroup.com Twitter Tag: #briefr Wednesday, March 14, 12
  • 3. To conduct an Open Research program that invites the participation of both IT users and technology vendors To assist IT buyers in understanding database technology and the architecture that surrounds it. Allow audience members to pose serious questions... and get answers! Publish all findings Twitter Tag: #briefr Wednesday, March 14, 12
  • 4. Your Host: Eric Kavanagh Research Leader: Mark Madsen - Third Nature Primary Collaborator: Robin Bloor - The Bloor Group Guest Analyst 1: Colin White - BI Research Guest Analyst 2: Steve Dine - DataSource Consulting Wednesday, March 14, 12
  • 5. Colin White is the president of DataBase Associates Inc. and founder of BI Research. He is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies. He has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. For ten years he was the conference chair of the DCI and Shared Insights Portals, Content Management, and Collaboration conference. Twitter Tag: #briefr Wednesday, March 14, 12
  • 6. Big Data is Bigger than NoSQL Colin White President BI Research March 2012 Wednesday, March 14, 12
  • 7. What is Big Data? A term that represents workloads and data management solutions that could not previously be supported because of cost considerations and/or technology limitations Three important technologies: •Optimized analytic RDBMSs •Non-relational “NoSQL” systems •Stream processing systems Copyright © BI Research, 2012 2 Wednesday, March 14, 12
  • 8. Big Data: The Business Case Smarter Decisions • Analyze new sources of data e.g., sensor data, web content, systems logs, text, XML files, graph data, map data, etc. • More sophisticated analyses - advanced analytics Faster Decisions • Supports workloads that were difficult to implement previously in a timely or cost-effective manner • Faster data analysis, e.g., analysis of large detailed data stores, dramatic increase in analytic model execution Faster Time to Value • Analyze data that is outside of the enterprise data warehouse, e.g., machine-generated data such as sensor data Copyright © BI Research, 2012 3 Wednesday, March 14, 12
  • 9. Non-Relational Solutions Some organizations have developed their own non-relational (NoSQL) systems to support extreme workloads • Google: MapReduce + BigTable DBMS + Google File System Non-relational systems are not new, but modern versions are often available to the open source community • Often support commodity hardware in a large- scale distributed computing environment • Several types of data stores (key value, graph, document, indexed file/DB systems) • A key vendor focus area is the Hadoop distributed computing system Copyright © BI Research, 2012 4 Wednesday, March 14, 12
  • 10. Hadoop versus an RDBMS This debate is reminiscent of the object versus relational database debates of the 1980s, and the reasons are similar • Programmers prefer procedural programmatic approaches for accessing and manipulating data, e.g., MapReduce • Non-programmers prefer declarative languages, e.g., RDBMSs and SQL Adding the Hive SQL-like language to Hadoop, and MR functions to RDBMSs, however, complicates the debate Key requirements are: • The ability for organizations to easily analyze large volumes of multi- structured data with good price/performance • The need to make technologies for developing and running these analyses more usable by data scientists Organizations will likely use Hadoop and an RDBMS - the challenges are deciding which to use when and interconnecting the systems Copyright © BI Research, 2012 5 Wednesday, March 14, 12
  • 12. The Value of Big Data: McKinsey Report www.mckinsey.com/Insights/MGI/Research/ Technology_and_Innovation/Big_data_The_next_frontier_for_innovation Copyright © BI Research, 2012 7 Wednesday, March 14, 12
  • 13. Robin Bloor is Chief Analyst at The Bloor Group. Robin.Bloor@Bloorgroup.com Twitter Tag: #briefr Wednesday, March 14, 12
  • 14. The Hardware Landscape CPUs go multicore Memory/Disk cost ratio falls Speed of random reads lag speed of serial reads Faster networking and fast switches Parallelism becomes more important Commodity servers Cloud computing cuts H/W costs Wednesday, March 14, 12
  • 15. That MapReduce Thing There are two fundamental approaches to parallelism Data Partitioning Process partitioning MapReduce implements an approach which is oriented to data partitioning This relates to data processing rather than to database Hadoop is often used for ETL Wednesday, March 14, 12
  • 16. The Devil Is In The Workload NoSQL is a distraction Big Data can be Big US Big XML Data or Big SDATA D Table Store A Unstructured T A Column Store Document Store workloads are rarely V O suited to traditional L U RDBMS ODBMS RDMBS-type engines M E Database Database Analytical workloads span both More Structured Less Structured Wednesday, March 14, 12
  • 17. If you don’t know the expected workloads, you shouldn’t be selecting a database Wednesday, March 14, 12
  • 18. Steve Dine is the founder of Datasource Consulting, LLC. He has extensive experience delivering and managing successful, highly scalable and maintainable data integration and business intelligence solutions. Steve combines hands-on technical experience across the entire BI project lifecycle with strong business acumen. He currently works as a consultant for Fortune 500 companies. Steve is a faculty member at TDWI and a judge for the Annual TDWI Best Practices Awards. He teaches courses and presents on many BI topics. Contact info: Twitter: @steve_dineEmail: sdine@datasourceconsulting.com Web: http:// www.datasourceconsulting.com Twitter Tag: #briefr Wednesday, March 14, 12
  • 19. The State of NoSQL & BI From the trenches… “Hey  Bob,  seems  like  a  no  brainer.     So,  what’s  the  catch?” *  Graphic  from  h=p://schri@man.wordpress.com/category/booksbook-­‐reviews/c-­‐s-­‐lewis/page/2/ Confiden)al,  Datasource  Consul)ng,  LLC 19 Wednesday, March 14, 12
  • 20. Why NoSQL? More  data More  different  types  of  data  (semi-­‐structured,   unstructured) More  frequent  changes  to  the  structure  of  the  data  we   need  to  store  and  analyze More  demand  for  the  long  tail  analysis More  “affordable”,  commodity  hardware  available  (blade   servers,  “cheap”  storage,  cloud) More  buzz! *  Graphic  from  h=p://www.fredberinger.com/musings-­‐on-­‐nosql/ Confiden)al,  Datasource  Consul)ng,  LLC 20 Wednesday, March 14, 12
  • 21. Why Not Not NoSQL? RelaCvely  immature  (0.x  –  2.x) Difficult  to  describe  to  decision  makers Not  fit  for  purpose  (low  latency,  update  heavy,  complex   joins) In  many  organizaCons  it’s  a  soluCon  looking  for  a  problem   Lack  of  “BI”  support Skills  gap! *  Graphic  based  on  h=p://www.fredberinger.com/musings-­‐on-­‐nosql/ Confiden)al,  Datasource  Consul)ng,  LLC 21 Wednesday, March 14, 12
  • 22. BI-NoSQL Skills Gap “SQL”  Skills NoSQL  Skills •  GUI’s  (mostly) •  Command  Line •  Rela)onal  Data   •  Key-­‐Value  /  Column   Modeling   Family  Modeling •  RDBMS   •  Distributed  Data   •  SQL Store •  Stored  procedures •  Programming  (Java,   •  LDAP Jscript,  Python,  etc) •  Javascript •  MapReduce  (Hive) •  Batch/Shell  Scripts •  JSON •  Shell  Scripts *  Graphic  based  on  h=p://www.beckshome.com/index.php/2007/09/the-­‐soa-­‐chasm/ Confiden)al,  Datasource  Consul)ng,  LLC 22 Wednesday, March 14, 12
  • 23. Conclusions? • Best  to  evaluate  your  true  data  size,  data  growth,  data  formats,  data   structure  and  analyCc  requirements  before  deciding  on  soluCon • Make  sure  to  evaluate  your  available  skills • Experienced  NoSQL  resources  with  BI  experience  not  always  easy  to   find • Need  to  plan  for  addiConal  technology  risk  in  project  plan   • Consider  starCng  out  with  one  part  of  your  DW  architecture  (i.e.   staging) • POC  POC  POC • NoSQL  maturing  quickly  and  will  likely  conCnue  to  evolve  into  a  hybrid   soluCon Confiden)al,  Datasource  Consul)ng,  LLC 23 Wednesday, March 14, 12
  • 24. Mark Madsen is founder of Third Nature, a research and consulting firm focused on analytics, BI and decision-making. Mark spent the past two decades working on analysis and decision support in many industries and countries. He is an award- winning architect and former CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http:// ThirdNature.net. Twitter Tag: #briefr Wednesday, March 14, 12
  • 25. One Size Doesn’t Fit All Choosing which big data, NoSQL or database technology to use March 14, 2012 Mark R. Madsen http://ThirdNature.net Wednesday, March 14, 12
  • 27. Big  data? Unstructured  data  isn’t  really   unstructured. The  problem  is  that  this  data   is  unmodeled. The  real  challenge  is   complexity. Wednesday, March 14, 12
  • 28. The  holy  grail  of  databases  under  current  market  hype A  key  problem  is  that  we’re   talking  mostly  about   computa?on  over  data   when  we  talk  about  “big   data”  and  analy?cs,  a   poten?al  mismatch  for  both   rela?onal  and  nosql. Wednesday, March 14, 12
  • 29. Solving  the  Problem  Depends  on  the  Diagnosis Wednesday, March 14, 12
  • 30. You  must  understand  your   workload  -­‐  throughput  and   response  =me  requirements   aren’t  enough. ▪ 100  simple  queries  accessing   month-­‐to-­‐date  data ▪ 90  simple  queries  accessing   month-­‐to-­‐date  data  plus  10   complex  queries  using  two  years   of  history ▪ Hazard  calculaCon  for  the  enCre   customer  master ▪ Performance  problems  are  rarely   due  to  a  single  factor.   Wednesday, March 14, 12
  • 31. Workload:  One  big  query  or  many  small  queries? Retrieval: small return set or large? Selectivity: large volume of data scanned or small? Wednesday, March 14, 12
  • 32. Important  workload  parameters  to  know • Read-­‐intensive    vs.  write-­‐intensive Wednesday, March 14, 12
  • 33. Important  workload  parameters  to  know • Read-­‐intensive    vs.  write-­‐intensive • Mutable  vs.  immutable  data Wednesday, March 14, 12
  • 34. Important  workload  parameters  to  know • Read-­‐intensive    vs.  write-­‐intensive • Mutable  vs.  immutable  data • Immediate  vs.  eventual  consistency Wednesday, March 14, 12
  • 35. Important  workload  parameters  to  know • Read-­‐intensive    vs.  write-­‐intensive • Mutable  vs.  immutable  data • Immediate  vs.  eventual  consistency • Short  vs.  long  access  latency Wednesday, March 14, 12
  • 36. Important  workload  parameters  to  know • Read-­‐intensive    vs.  write-­‐intensive • Mutable  vs.  immutable  data • Immediate  vs.  eventual  consistency • Short  vs.  long  access  latency • Predictable  vs.  unpredictable  data  access  paEerns Wednesday, March 14, 12
  • 37. Types  of  workloads Write-­‐biased:   Read-­‐biased: ▪ OLTP Query ▪ OLTP,  batch Query,  simple  retrieval ▪ OLTP,  lite Query,  complex ▪ Object  persistence Query-­‐hierarchical  /   ▪ Data  ingest,  batch object  /  network ▪ Data  ingest,  real-­‐Cme AnalyCc Mixed? Inline analytic execution, operational BI Wednesday, March 14, 12
  • 38. Matching  to  parameters,  at  assumpCon  of  data  scale Workload   Write-­‐ Read-­‐ Updateable   Eventual   Un-­‐ Compute   parameters biased biased data consistency   predictable   intensive ok query  path Standard   RDBMS Parallel   RDBMS NoSQL  (kv,   dht,  obj) Hadoop* Streaming   database You see the problem: it’s an intersection of multiple parameters, and this chart only includes the first tier of parameters. Plus, workload factors can completely invert these general rules of thumb. Wednesday, March 14, 12
  • 39. Matching  to  parameters,  at  assumpCon  of  data  scale Workload   Complex   SelecCve   Low  latency   High   High  ingest   parameters queries queries queries concurrency rate Standard   RDBMS Parallel  RDBMS NoSQL  (kv,  dht,   obj) Hadoop Streaming   database You have to look at the combination of workload factors: data scale, concurrency, latency & response time, then chart the parameters. Wednesday, March 14, 12
  • 40. Always  build  a  proof  of  concept! Wednesday, March 14, 12
  • 41. Disection & Discussion Twitter Tag: #briefr Wednesday, March 14, 12
  • 43. March: Vendor Research March 14th: Second Round Table focusing on No SQL databases and their application DB Revolution Survey conducted April: Vendor Research Publishing of Round Table Transcripts, with comments May: Authoring of White Paper Publishing of White Paper Publishing of survey activity Twitter Tag: #briefr Wednesday, March 14, 12
  • 44. March Briefing Room: Integration April Briefing Room: Discovery May Briefing Room: Analytics Twitter Tag: #briefr Wednesday, March 14, 12
  • 45. Thank You For Your Attention Wednesday, March 14, 12