SlideShare una empresa de Scribd logo
1 de 41
Bigdata vs. Data Warehousing
     Synergy or Conflict?



          Thomas Kejser
        thomas@kejser.org
       http://blog.kejser.org
          @thomaskejser
Who is this Guy?


Thomas Kejser
http://blog.kejser.org
@thomaskejser

• Formerly: Lead SQLCAT EMEA
• Now:      CTO FusionIo EMEA

• 15 year database experience
• Performance Tuner
Human Consciousness Doesn’t Scale
                 10



                 9
Billion Humans




                 8



                 7



                 6



                 5
                  2000   2050   2100          2150   2200            2250
                                       Year           Source: United Nations Projections
Text Messages in a Table

CREATE TABLE AllTexts (
    Sender BIGINT                 8B
    , Receiver BIGINT             8B
    , SenderLocation BIGINT       8B
    , ReceiverLocation BIGINT     8B
    , Time DATETIME               8B
    , SMS VARCHAR(140)          140B
)
                           = 180Bytes
How much do we text?

• World Average
    •   6.1 Trillion Text Messages / year
    •   About 80% cell phone coverage
    •   7 billion people
    •   3 messages/day/person
• But:
    • Teenagers: 50 messages/day




Source: Pew Internet Research 2010 & ITU
How much will we EVER text?

• 9B people acting like teenagers (in 2050)
  • 50 texts/day
• That’s 450 billion texts/day
  • 164 Trillion texts/year (20x today)
  • 180 bytes each
  • Assume x3 compression
• Approximation: 10 Petabytes/year in
  2050
Moore’s Hard Drives


       LOG
Capacity GB




                  Can it be done?
                                    Year
How Large is this/year?



Hard Disk (4TB) : 2.5” Wine Bottle (75cl): 4.0”



            About 1500 Wine Bottles
In the Data Center

• Calculating:
  • 2U Storage=24 Disks
    (includes compute)
  • 4TB per Disk
  • 100TB in 2U (a bit
    less)
  • 10PB = 200U storage
• About six racks
Warehouses Serve us Well..
… And it is Becoming a Commodity

• Good Management
  Interfaces
• Standard SQL
  • with a few extensions
• Appliances
• Support system
• Homogenous HW
  • In chunks
vs.
PDW vs. Hive – Scan/seek
Query 1                     Query 2
SELECT count(*)             SELECT max(l_quantity)
FROM lineitem               FROM lineitem
                            WHERE l_orderkey > 1000
                              and l_orderkey < 100000
                            GROUP BY l_linestatus



          Secs.
          1500

          1000
                                               Hive
           500                                 PDW

             0
                  Query 1     Query 2
PDW vs. Hive - Joins
                                 PDW-U:
SELECT max(l_orderkey)           • orders partitioned on c_custkey
FROM orders
JOIN lineitem                    • lineitem partitioned on l_partkey
ON l_orderkey = o_orderkey       PDW-P:
                                 • orders partitioned on o_orderkey
                                 • lineitem partitioned on
                                   l_orderkey

        Secs.
         4000

         3000
                                                  Hive
         2000                                     PDW-U
         1000                                     PDW-P
            0
                  Hive   PDW-U    PDW-P
What does Big Data need to Catch up?

• Thread startup times
• Co-location awareness
• Files vs. optimized DB memory
  structures
• Column stores and other DB tech

            Generic is good…

… but when there is structure, make
            use of it!
• What is Bigdata
           Very Unstructured Data
How many Pictures of Cats?

• Flickr Today:
  • 300MB/month
  • 2GB/year
  • 51M users (too small?)


• Estimate: 102 PB /
  year

• 10 x text messages


                             Source: WikiPedia
How big is this in wine bottles?
We have learned how to store it!
What is HDFS?

• Distributed File
  System
• Open Source
• No more SAN



• The Failure
  Unit is the
  Server
Fully unstructured data is
          boring


…Unless you get money for
        storing it
Acquiring Personal Information




Your Semi-structured Data, the Old Fashioned Way
The Social Angle

Who do you talk to and how often?
The Reasons

Why do you own a cell phone?
Saturday, 1:39am   - at The Pub




Your Semi-structured Data, For Free
Big Value

      Extraction of
 of meaning and insight
from semi-structured data
Extracting Meaning from Humans

Method                             Examples
Turn semi-structure to structure   Image recognition, network proximity
                                   and super nodes, social media
Needle in a haystack               Extract outliers, Fraud
Herd behaviors                     Clustering, Pattern Recognition,
                                   “Customers who bought this also
                                   bought”
Text classification and search     Text indexes, syntactic counting,
                                   pagerank
Text to structure                  Semantic analysis, loose structure into
                                   structure
Find New Customers



 “Michael, who is
                                Tommy

                       Thomas

 respected among his
 peers,                             Michael
 often talks
 about his
 new, cool
 gadgets”
Cross Sell




 “Families who own an Aston Martin will often buy a
                 Mini Cooper too”
Free Information
Need: Lots of CPU Cores!
Need: Data Centers!
Provisioning has to be REALLY fast
Things to Learn for the Future

• Get good at
  • Statistics (again)
  • Distributed Algorithms
  • Tuning
• Understand Physical
  Constraints
• Acquire deep domain
  knowledge
Something is Changing


      Today                             Tomorrow




     CAPEX Hardware     OPEX Hardware       You
The Mother of All Stovepipes
Big Data / Staging
                (No Model)


Data you
are afraid                          Data You      Delivery
to lose                           actually need
                                                  (Model)
Synergy




              Create Structure
                  for me


                                 Warehouse
          Here is a table
Applying Social Media to Structure
Summary

    Data Warehouse                 Big Data

•   There is a model               •   Don’t bother modeling!
•   Seek Co-location               •   Optional Co-Location
•   Respond in seconds             •   Respond in minutes
•   Calculate first, query after   •   Calculate while querying
•   Expensive HW                   •   Cheap HW
•   Optimise for target HW         •   Good enough on all HW
•   Homogenous HW                  •   Heterogeneous HW
•   Pay vendor, expect             •   Free license, optimise
    optimised                          yourself
&

Más contenido relacionado

La actualidad más candente

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
 

La actualidad más candente (20)

Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big data
Big dataBig data
Big data
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 

Similar a Big Data vs Data Warehousing

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 

Similar a Big Data vs Data Warehousing (20)

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
In Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionIn Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics Solution
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
NoSQL e Python RuPy 2012
NoSQL e Python RuPy 2012NoSQL e Python RuPy 2012
NoSQL e Python RuPy 2012
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)
 
Big Data, Big Dream
Big Data, Big DreamBig Data, Big Dream
Big Data, Big Dream
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Big Data vs Data Warehousing

  • 1. Bigdata vs. Data Warehousing Synergy or Conflict? Thomas Kejser thomas@kejser.org http://blog.kejser.org @thomaskejser
  • 2. Who is this Guy? Thomas Kejser http://blog.kejser.org @thomaskejser • Formerly: Lead SQLCAT EMEA • Now: CTO FusionIo EMEA • 15 year database experience • Performance Tuner
  • 3. Human Consciousness Doesn’t Scale 10 9 Billion Humans 8 7 6 5 2000 2050 2100 2150 2200 2250 Year Source: United Nations Projections
  • 4. Text Messages in a Table CREATE TABLE AllTexts ( Sender BIGINT 8B , Receiver BIGINT 8B , SenderLocation BIGINT 8B , ReceiverLocation BIGINT 8B , Time DATETIME 8B , SMS VARCHAR(140) 140B ) = 180Bytes
  • 5. How much do we text? • World Average • 6.1 Trillion Text Messages / year • About 80% cell phone coverage • 7 billion people • 3 messages/day/person • But: • Teenagers: 50 messages/day Source: Pew Internet Research 2010 & ITU
  • 6. How much will we EVER text? • 9B people acting like teenagers (in 2050) • 50 texts/day • That’s 450 billion texts/day • 164 Trillion texts/year (20x today) • 180 bytes each • Assume x3 compression • Approximation: 10 Petabytes/year in 2050
  • 7. Moore’s Hard Drives LOG Capacity GB Can it be done? Year
  • 8. How Large is this/year? Hard Disk (4TB) : 2.5” Wine Bottle (75cl): 4.0” About 1500 Wine Bottles
  • 9. In the Data Center • Calculating: • 2U Storage=24 Disks (includes compute) • 4TB per Disk • 100TB in 2U (a bit less) • 10PB = 200U storage • About six racks
  • 11. … And it is Becoming a Commodity • Good Management Interfaces • Standard SQL • with a few extensions • Appliances • Support system • Homogenous HW • In chunks
  • 12. vs.
  • 13. PDW vs. Hive – Scan/seek Query 1 Query 2 SELECT count(*) SELECT max(l_quantity) FROM lineitem FROM lineitem WHERE l_orderkey > 1000 and l_orderkey < 100000 GROUP BY l_linestatus Secs. 1500 1000 Hive 500 PDW 0 Query 1 Query 2
  • 14. PDW vs. Hive - Joins PDW-U: SELECT max(l_orderkey) • orders partitioned on c_custkey FROM orders JOIN lineitem • lineitem partitioned on l_partkey ON l_orderkey = o_orderkey PDW-P: • orders partitioned on o_orderkey • lineitem partitioned on l_orderkey Secs. 4000 3000 Hive 2000 PDW-U 1000 PDW-P 0 Hive PDW-U PDW-P
  • 15. What does Big Data need to Catch up? • Thread startup times • Co-location awareness • Files vs. optimized DB memory structures • Column stores and other DB tech Generic is good… … but when there is structure, make use of it!
  • 16. • What is Bigdata Very Unstructured Data
  • 17. How many Pictures of Cats? • Flickr Today: • 300MB/month • 2GB/year • 51M users (too small?) • Estimate: 102 PB / year • 10 x text messages Source: WikiPedia
  • 18. How big is this in wine bottles?
  • 19. We have learned how to store it!
  • 20. What is HDFS? • Distributed File System • Open Source • No more SAN • The Failure Unit is the Server
  • 21. Fully unstructured data is boring …Unless you get money for storing it
  • 22. Acquiring Personal Information Your Semi-structured Data, the Old Fashioned Way
  • 23. The Social Angle Who do you talk to and how often?
  • 24. The Reasons Why do you own a cell phone?
  • 25. Saturday, 1:39am - at The Pub Your Semi-structured Data, For Free
  • 26. Big Value Extraction of of meaning and insight from semi-structured data
  • 27. Extracting Meaning from Humans Method Examples Turn semi-structure to structure Image recognition, network proximity and super nodes, social media Needle in a haystack Extract outliers, Fraud Herd behaviors Clustering, Pattern Recognition, “Customers who bought this also bought” Text classification and search Text indexes, syntactic counting, pagerank Text to structure Semantic analysis, loose structure into structure
  • 28. Find New Customers “Michael, who is Tommy Thomas respected among his peers, Michael often talks about his new, cool gadgets”
  • 29. Cross Sell “Families who own an Aston Martin will often buy a Mini Cooper too”
  • 31. Need: Lots of CPU Cores!
  • 33. Provisioning has to be REALLY fast
  • 34. Things to Learn for the Future • Get good at • Statistics (again) • Distributed Algorithms • Tuning • Understand Physical Constraints • Acquire deep domain knowledge
  • 35. Something is Changing Today Tomorrow CAPEX Hardware OPEX Hardware You
  • 36. The Mother of All Stovepipes
  • 37. Big Data / Staging (No Model) Data you are afraid Data You Delivery to lose actually need (Model)
  • 38. Synergy Create Structure for me Warehouse Here is a table
  • 39. Applying Social Media to Structure
  • 40. Summary Data Warehouse Big Data • There is a model • Don’t bother modeling! • Seek Co-location • Optional Co-Location • Respond in seconds • Respond in minutes • Calculate first, query after • Calculate while querying • Expensive HW • Cheap HW • Optimise for target HW • Good enough on all HW • Homogenous HW • Heterogeneous HW • Pay vendor, expect • Free license, optimise optimised yourself
  • 41. &

Notas del editor

  1. We are at the end of the growth curve... 9B is our total population... This is an important observation because many data estimates are based on human activity and has so far assumed exponention growthm.. This is NOT the case anymore!
  2. This show the development of hard drive capacity over time
  3. The calculation is not meant to be read, just letting people know we did the calc and what it PHYSICALLY means (see the animation)... There is a real cost to storing a lot of data, and this is one of the reasons cloud makes a lot of senseWine bottles
  4. This is Hyde Park.. From on end to the other...