SlideShare una empresa de Scribd logo
1 de 25
It’s Not About the “Big”
             in Big Data
$26.5 BILLION




  Nick Leeson      Jerome Kerviel      Kweku Adoboli      Bruno Iksil
   Barings Bank     Societe Generale         UBS              JPMC
    $1.4 billion       $6 billion         $1.4 billion     $17.5 Billion




          Nasa                          $17.6B

Rogue Traders                                            $26.5B
Hadoop in 2009
Hadoop 2012
Hype Cycle




        We are
         here
Crossing the Chasm




        We are
         here
Hadoop - Enabler
Database Harddrive




                              Unstructured ( 61.7% growths )




       time to find one record        =    logb N * 10ms
       log100(100,000,000) * 10ms    =    40ms
       time to read record           =    10ms
       10,000,000 * 50ms             =    5.8 days
Hadoop Harddrive




      throughput                =   10MB/s
      time to transfer record   =   10ms
      10,000,000 * 10ms         =   1.5 days
      random reads              =   (5.8 days)
Laws of Physics



             Random
             Sequential                                                             Values/Sec.
                                316
  Disk
                                                                               53,200,000



                                      1,924
  SSD
                                                                              42,200,000



                                                                             36,700,000
Memory
                                                                                          358,200,000


         1      10        100         1,000   10,000   100,000   1,000,000   10,000,000    100,000,000 1,000,000,000
                                                                                                          Adam Jacobs
                                                                                           The Pathologies of Big Data
People cost




Hardware Cost
Time to insight




Time for decision
Volume

                        Data
                        Size


                      Data
                    Complexity
            Sp ang




                                       es
                                 So ta
              Ch
              ee e




                                   urc
                                   Da
                 do




                                              ety
Ve




                    f




                                               ri
   loc




                                            Va
     y it
Impossible: 360 View
Game Changer

     Slow         Static        Barrier

                                 Business
      ETL      Data Warehouse
                                Intelligence




     Fast       Dynamic            View


    Raw Load      Hadoop        Data Pipeline
NO - SQL




                     RDBMS


  Standard SQL   Structured Data     Response in sec.


     No SQL      Unstructured Data        Batch



                     Hadoop
Common Applications


               Asset Management Analytics
                      Security Analytics
                   Product Cohort Analytics
                   Advanced Web Analytics



    Structured +
                           Many
    Unstructured                           Decision Makers
                        Data Sources
        Data
follow us: @datameer
 @stefanGroschupf
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data

Más contenido relacionado

Similar a Not about the Big in Big Data

Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
DATAVERSITY
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
Amazon Web Services
 

Similar a Not about the Big in Big Data (20)

What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop Introduction
 
The causes and consequences of too many bits
The causes and consequences of too many bitsThe causes and consequences of too many bits
The causes and consequences of too many bits
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920
 
Treasure Data and Heroku
Treasure Data and HerokuTreasure Data and Heroku
Treasure Data and Heroku
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Guy Coates
Guy CoatesGuy Coates
Guy Coates
 
Top Technology Trends
Top Technology Trends Top Technology Trends
Top Technology Trends
 
hadoop事例紹介
hadoop事例紹介hadoop事例紹介
hadoop事例紹介
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
Introducing DynamoDB
Introducing DynamoDBIntroducing DynamoDB
Introducing DynamoDB
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Not about the Big in Big Data

  • 1. It’s Not About the “Big” in Big Data
  • 2. $26.5 BILLION Nick Leeson Jerome Kerviel Kweku Adoboli Bruno Iksil Barings Bank Societe Generale UBS JPMC $1.4 billion $6 billion $1.4 billion $17.5 Billion Nasa $17.6B Rogue Traders $26.5B
  • 5. Hype Cycle We are here
  • 6. Crossing the Chasm We are here
  • 8. Database Harddrive Unstructured ( 61.7% growths ) time to find one record = logb N * 10ms log100(100,000,000) * 10ms = 40ms time to read record = 10ms 10,000,000 * 50ms = 5.8 days
  • 9. Hadoop Harddrive throughput = 10MB/s time to transfer record = 10ms 10,000,000 * 10ms = 1.5 days random reads = (5.8 days)
  • 10. Laws of Physics Random Sequential Values/Sec. 316 Disk 53,200,000 1,924 SSD 42,200,000 36,700,000 Memory 358,200,000 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Adam Jacobs The Pathologies of Big Data
  • 12. Time to insight Time for decision
  • 13. Volume Data Size Data Complexity Sp ang es So ta Ch ee e urc Da do ety Ve f ri loc Va y it
  • 15. Game Changer Slow Static Barrier Business ETL Data Warehouse Intelligence Fast Dynamic View Raw Load Hadoop Data Pipeline
  • 16. NO - SQL RDBMS Standard SQL Structured Data Response in sec. No SQL Unstructured Data Batch Hadoop
  • 17. Common Applications Asset Management Analytics Security Analytics Product Cohort Analytics Advanced Web Analytics Structured + Many Unstructured Decision Makers Data Sources Data
  • 18. follow us: @datameer @stefanGroschupf