SlideShare una empresa de Scribd logo
1 de 41
Making Sense of
   Big Data
  October11, 2012
     #MakeSenseBD
#MakeSenseBD




   “”
    Information is powerful.

    But it is how we use is it that will define us.



   10/15/2012          Infochimps Confidential    2
#MakeSenseBD




   First
    What is Big Data?

   “data sets so large and complex that it becomes
   difficult to process using on-hand database
   management tools.”


   10/15/2012           Infochimps Confidential      3
#MakeSenseBD




                                #Volume
                                #Velocity
                                #Variety




                2010 = 1.2                                    2020 = 35.2
                Zettabytes/yr                                 Zettabytes/yr

                                                               Source: 2011 IDC Digital Universe Study
   10/15/2012                       Infochimps Confidential                                              4
#MakeSenseBD


                  It’s All About The Data
                         DIGITAL CONTENT




                                                                         OPERATIONAL DATA

                                                     WEB LOGS


                          SOCIAL MEDIA
                FILES
                                                                SMART GRIDS




                                           TRANSACTIONAL DATA



               AD IMPRESSIONS
                                                                     R&D DATA




                                                                                            5
#MakeSenseBD




   Problem
                “Little Data For Business Users“




   10/15/2012              Infochimps Confidential   6
#MakeSenseBD
#MakeSenseBD

                                                         Problem
                                             One Size Does Not Fit All

      Non-Relational                                                               Relational
                                               Analytic                            Teradata        IBM InfoSphere
                                                                                   Aster           Netezza            HP Vertica    Infobright
                                               Hadoop                 Hadapt                                                        ParAccel
                                               Horton                                                                               Calpont
                                                                                   EMC             SAP Hana           Oracle
                                               Cloudera                                                                             VectorWise
                                                                                   Greenplum       SAP Sybase IQ      Times-Ten
                                               MapR
                                               Zettaset
       Operational                                            Spark            Oracle          IBM DB2         SQLSrvr JustOneDB
       InterSystems
       Progress          Document                    MarkLogic                     MySQL             Ingress                  PostgreSQL
       Objectivity                                   McObject
                            Lotus Notes                                    Sybase ASE                          EnterpriseDB
       Versant

        NoSQL               CouchDB
                                                                                 NewSQL
                            MongoDB             ‘Data as a Service’                                                     HandlerSocket
      Key                                                                           Amazon RDS
                Couchbase   RavenDB                                                                                     Akiban
                                          Cloudant                                  SQL Azure
      Value                                             App Engine                                                      MySQL Cluster
                                                                                    Database.com
                                                        SimpleDB                                                        Clustrix
                                                                                    Xeround        FathomDB
                                                                                                                        Drizzle
       Riak               Big Tables                                                                                    GenieDB
       Redis
                  Cassandra
                                                Graph                              SchoonerSQL     ScaleBase            ScalArc
       Membrain                                                                    Tokutek                              NimbusDB
                                                    FlockDB                                        CodeFutures
       Voldemort            HyperTable
                                                 InfiniteGraph                     Continuent      VoltDB
       BerkeleyDB             HBase
                                                     Neo4j                         Translattice
                                                 AllegroGraph


   10/15/2012                                               Infochimps Confidential                                                              8
#MakeSenseBD

                                Problem
                Complexity of A New Data Architecture
                          Structured
                                                                                 BI User
                                               Departmental
                                                                 Reports        (reports)
    Online              Teradata
   Click Data        Data Warehouse                SQL             BI
                                                Data Mart        Server
                      Virt     Virt    Virt
    Online            DM       DM      DM
    BI Data


      CRM             Real-Time Data                            App
      Data              Streaming                              Server
                                                              Operational    Customer
                                                                            Application
      POS
      Data
                                                                 BI
                         Hadoop                                Server
    Cust Srvc                                                               Analytics User
                          Data
    Call Logs                                     NoSQL
                        Warehouse                Platform

                                                In-Memory
     Social         Sandbox
                         Sandbox
                               Sandbox
                                     Sandbox
                                                 Analytics
                                                                            Bus User
                       Semi-structured                           IT (ETL)   (Reports)
#MakeSenseBD
#MakeSenseBD




                “Big Data For Business Users“




   10/15/2012              Infochimps Confidential   11
#MakeSenseBD




                                           $ $
                                            $ $

                                             ?




                                          Executive
        Data


   10/15/2012   Infochimps Confidential
                                                      12
#MakeSenseBD




                #thisisreallygood




   10/15/2012       Infochimps Confidential   13
#MakeSenseBD

                #timeforaPOLL




   10/15/2012      Infochimps Confidential   14
#MakeSenseBD




   Next
    Hadoop + NoSQL technologies =

    the ability to process large and complex
    data sets without the challenges
    associated with legacy, and at a fraction
    of the price.

   10/15/2012       Infochimps Confidential   15
#MakeSenseBD


           Enterprise Data Warehouse
                      Request                            Answer
                                        Parsing
                        ?               Engines



                                  BYNET Interconnect

               Amp              Amp                               Amp
               Node             Node                              Node




                                                  ....


                                                                         PARC | 16
#MakeSenseBD


                                  Big Data Warehouse
    Search      Recommend


             Rank
                                  Analytic
                                  Request                   Master:        Answer
     Score     Next-Best-Action                           Name Node
                                                          Job Tracker



                                                Ethernet Interconnect

                      Slave:                   Slave:                                 Slave:
                    Task Trckr               Task Trckr                             Task Trckr
                    Data Node                Data Node                              Data Node




                                                                                                   Semi-
                                                                    ....                         Structured
                                                                                                    Data



                                                                                                 PARC | 17
#MakeSenseBD



                Real
                Time


                                       Traditional Operational
                                                    Application Ecosystem



                                                          Deployment in
                                   Analytic             Public/Private Cloud
                                  Appliances
                                                       Toolset Integration

                                 Traditional
                               Decision Support        Hardened




                Batch
                    Large                                                        Small
                  Enterprise                                                   Enterprise



   10/15/2012                             Infochimps Confidential                           18
#MakeSenseBD




                #lotsofdata        + #simpleanalytics




   10/15/2012                 Infochimps Confidential   19
#MakeSenseBD

    Images      Web, Mobile, CRM,
                ERP, SCM…

                                                 Business
     Docs,
                                              Transactions &
     Text                                      Interactions



     Web
     Logs                              SQL         NoSQL       NewSQL




     Social                            EDW         MPP         NewSQL




    Sensors                                      Business
                                              Intelligence &
                                                 Analytics
                Dashboards, Reports
     GPS        Visualization…



   10/15/2012       Infochimps Confidential                             20
#MakeSenseBD




   Use Case
    Hedge Fund

    How do I predict whether companies will
    make their quarterly earnings forecast?



   10/15/2012      Infochimps Confidential   21
#MakeSenseBD




                Walmart




   10/15/2012        Infochimps Confidential   22
#MakeSenseBD




                Target




   10/15/2012            Infochimps Confidential   23
#MakeSenseBD

     Cars
    In Lot




    News
    Text




     Web
    Pricing                                Quarterly
                                            Revenue
                                           Prediction
    Social
  Sentiment



   Weather
   Sensors




    Local
  Employment



    10/15/2012   Infochimps Confidential                24
#MakeSenseBD




   Use Case
    Media Company

    How do I merge my traditional media
    sources with new media sources to
    provide improved and instant insights to
    my customers?

   10/15/2012       Infochimps Confidential   25
#MakeSenseBD
  New Media
                          Data Scientist                        App Developer
     Gnip
   Powertrack
                                                                                 Business Users


      Gnip
      EDC

                                                                                Sources   Sentiment

    Moreover
    Metabase
                     In-Motion
                    Data Delivery                                APIs             Listening
                       Service                                                   Application
       TV
  Transcription
                                                      NoSQL

     Radio
  Transcription




      Print
  Transcription
                                           IT Staff
Traditional Media
     10/15/2012                       Infochimps Confidential                                26
#MakeSenseBD




   Use Case
    Retail Company

    How do I increase online revenue?




   10/15/2012        Infochimps Confidential   27
#MakeSenseBD
                                                   Family 60% + 10%
                                                                                  Million $ Q 40%
                                    Color 30%
                                                   Welcome 15%            Kids Exclusive
               Current         Baby 60%

              Approved                     Hue Denim
                                                                Weekend 15%
                                                                                      Threadless
                Offers        Sunday 25%            Denim
                                                                                       Million $ Q
                                                                Spring 25%
                                                Khakis
                                                                                       Color 30%
                               Color 30%
                                                      Million $ Q         Color Denim 30%
                                             Khakis         Hoodies 10%
                                                                                                     Dynamically Populated
                                                                                                      Personalized Email




     Known & Unknown                                                           Existing
         Customers &                                                          Approved
    Online/Offline Behavior                                                    Product
                                                                               Content
#MakeSenseBD
       Current
      Campaign
        Offers


        Online
      Click Data


       Online                        Traditional
       BI Data                        Analytics

                                                   Targeted Offers    Personalized
                             Data                    & Products      Email Campaign
      Past CRM
        Data                 Model

                   Hadoop             Graph
        POS        Cluster           Analytics
        Data
                             Data
                             Model
      Cust Srvc                                       Measure
      Call Logs                                     Performance



       Social



       Product
       Content
#MakeSenseBD




                #85%AccurateFirstTime




   10/15/2012         Infochimps Confidential   30
#MakeSenseBD


                #timeforaPOLL




   10/15/2012      Infochimps Confidential   31
#MakeSenseBD




   I’m Ready
    So How Do I Start?

    …without spending a *$#&-load of
    money before proving ROI?



   10/15/2012      Infochimps Confidential   32
#MakeSenseBD


                     Deployment Options


                                 On-Premise




                Public Cloud
                  Provider                                     Trusted
                                                         Data Center Provider



   10/15/2012                  Infochimps Confidential                          33
#MakeSenseBD

                             You Manage                                      Someone Else Manages




                $                   $
                                                           $
                                                                                $
                                                                                                   $
          Private Big Data   Virtual Private Big     Public Big Data      Virtual Private Big Public Big Data
             Cloud (You       Data Cloud (You          Cloud (You             Data Cloud      Cloud (Managed
              Manage)             Manage)               Manage)          (Managed Service)        Service)

                                      $Cost        Security Risk       Time To Market


   10/15/2012                                      Infochimps Confidential                                      34
#MakeSenseBD




   Who?
                #InfochimpsOfCourse




   10/15/2012         Infochimps Confidential   35
#MakeSenseBD


                     Infochimps
                                                        Enterprise Customers
   • Managed Big Data Services
   • Elastic & Secure Private &
     Public Clouds
   • Across a Global Network of                        App
                                                        BI
                                                                    Analytics       Sys
                                                                                     BI

     Trusted Data Center                    Data
                                            Lang
                                                               Data Intelligence             Data
                                                                                            Delivery
                                                               Delivery Network
     Providers                                        Hadoop                       NoSQL
                                                                     Infra
   • With Batch & Real-time                                         Delivery


     Analytic Framework                                   Global Network Of
   • Supporting Structured &                      Data Center Infrastructure Providers

     Unstructured Data
   10/15/2012           Infochimps Confidential                                            36
#MakeSenseBD



                                                                 Data
                                                        Intelligence Network




                              Cloud-based
                               Data PaaS
                                                        Virtual Private & Public Cloud
                  Data                                  Tier4 Lights Out Data Centers
                Marketplace                             OpenStack & VSphere
                                                        Managed Services

                              Big Data PaaS
                              Public Cloud
         15,000 Data Sets    Amazon & Rackspace
                              Managed Services
   10/15/2012                 Infochimps Confidential                             37
#MakeSenseBD


                           Elastic Big Data PaaS




   Deployment From Laptop to Cloud (Public & Private)                       Amazon, Rackspace, OpenStack & VSphere
                                                          Ironfan



   10/15/2012                                     Infochimps Confidential                                        38
#MakeSenseBD


       Big Data Managed Service Offerings
   Community                Public                  Virtual Private             Private
                            Cloud                       Cloud                   Cloud

      Access to        Pre-integrated, pre-        Pre-integrated, pre-    Pre-integrated, pre-
    Infochimps Big        tested Big Data             tested Big Data         tested Big Data
   Data Platform via           stack                       stack                   stack
     open source
                        Quickly deploy in              Deployed in a        Deployed in your
  Deploy Anywhere            Amazon                   trusted lights-out      Data Center -
                        Cloud, Rackspace                 data center          Open Stack or
                              Cloud                        network              Vsphere


  Try It Under Your                                        High SLA
         Control         Fully Managed               Managed Service         Customized
                              Service                                       Managed Service
   10/15/2012                        Infochimps Confidential                              39
#MakeSenseBD


                #LastPOLL




   10/15/2012    Infochimps Confidential   40
#MakeSenseBD




        #1 Big Data Platform For The Cloud
                        #MakeSenseBD

                  www.infochimps.com/demo

                1-855-DATA-FUN (1-855-328-2386)

   10/15/2012             Infochimps Confidential   41

Más contenido relacionado

Más de Infochimps, a CSC Big Data Business (11)

451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
Ironfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data InfrastructureIronfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data Infrastructure
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Making Sense of Big Data

  • 1. Making Sense of Big Data October11, 2012 #MakeSenseBD
  • 2. #MakeSenseBD “” Information is powerful. But it is how we use is it that will define us. 10/15/2012 Infochimps Confidential 2
  • 3. #MakeSenseBD First What is Big Data? “data sets so large and complex that it becomes difficult to process using on-hand database management tools.” 10/15/2012 Infochimps Confidential 3
  • 4. #MakeSenseBD #Volume #Velocity #Variety 2010 = 1.2 2020 = 35.2 Zettabytes/yr Zettabytes/yr Source: 2011 IDC Digital Universe Study 10/15/2012 Infochimps Confidential 4
  • 5. #MakeSenseBD It’s All About The Data DIGITAL CONTENT OPERATIONAL DATA WEB LOGS SOCIAL MEDIA FILES SMART GRIDS TRANSACTIONAL DATA AD IMPRESSIONS R&D DATA 5
  • 6. #MakeSenseBD Problem “Little Data For Business Users“ 10/15/2012 Infochimps Confidential 6
  • 8. #MakeSenseBD Problem One Size Does Not Fit All Non-Relational Relational Analytic Teradata IBM InfoSphere Aster Netezza HP Vertica Infobright Hadoop Hadapt ParAccel Horton Calpont EMC SAP Hana Oracle Cloudera VectorWise Greenplum SAP Sybase IQ Times-Ten MapR Zettaset Operational Spark Oracle IBM DB2 SQLSrvr JustOneDB InterSystems Progress Document MarkLogic MySQL Ingress PostgreSQL Objectivity McObject Lotus Notes Sybase ASE EnterpriseDB Versant NoSQL CouchDB NewSQL MongoDB ‘Data as a Service’ HandlerSocket Key Amazon RDS Couchbase RavenDB Akiban Cloudant SQL Azure Value App Engine MySQL Cluster Database.com SimpleDB Clustrix Xeround FathomDB Drizzle Riak Big Tables GenieDB Redis Cassandra Graph SchoonerSQL ScaleBase ScalArc Membrain Tokutek NimbusDB FlockDB CodeFutures Voldemort HyperTable InfiniteGraph Continuent VoltDB BerkeleyDB HBase Neo4j Translattice AllegroGraph 10/15/2012 Infochimps Confidential 8
  • 9. #MakeSenseBD Problem Complexity of A New Data Architecture Structured BI User Departmental Reports (reports) Online Teradata Click Data Data Warehouse SQL BI Data Mart Server Virt Virt Virt Online DM DM DM BI Data CRM Real-Time Data App Data Streaming Server Operational Customer Application POS Data BI Hadoop Server Cust Srvc Analytics User Data Call Logs NoSQL Warehouse Platform In-Memory Social Sandbox Sandbox Sandbox Sandbox Analytics Bus User Semi-structured IT (ETL) (Reports)
  • 11. #MakeSenseBD “Big Data For Business Users“ 10/15/2012 Infochimps Confidential 11
  • 12. #MakeSenseBD $ $ $ $ ? Executive Data 10/15/2012 Infochimps Confidential 12
  • 13. #MakeSenseBD #thisisreallygood 10/15/2012 Infochimps Confidential 13
  • 14. #MakeSenseBD #timeforaPOLL 10/15/2012 Infochimps Confidential 14
  • 15. #MakeSenseBD Next Hadoop + NoSQL technologies = the ability to process large and complex data sets without the challenges associated with legacy, and at a fraction of the price. 10/15/2012 Infochimps Confidential 15
  • 16. #MakeSenseBD Enterprise Data Warehouse Request Answer Parsing ? Engines BYNET Interconnect Amp Amp Amp Node Node Node .... PARC | 16
  • 17. #MakeSenseBD Big Data Warehouse Search Recommend Rank Analytic Request Master: Answer Score Next-Best-Action Name Node Job Tracker Ethernet Interconnect Slave: Slave: Slave: Task Trckr Task Trckr Task Trckr Data Node Data Node Data Node Semi- .... Structured Data PARC | 17
  • 18. #MakeSenseBD Real Time Traditional Operational Application Ecosystem Deployment in Analytic Public/Private Cloud Appliances Toolset Integration Traditional Decision Support Hardened Batch Large Small Enterprise Enterprise 10/15/2012 Infochimps Confidential 18
  • 19. #MakeSenseBD #lotsofdata + #simpleanalytics 10/15/2012 Infochimps Confidential 19
  • 20. #MakeSenseBD Images Web, Mobile, CRM, ERP, SCM… Business Docs, Transactions & Text Interactions Web Logs SQL NoSQL NewSQL Social EDW MPP NewSQL Sensors Business Intelligence & Analytics Dashboards, Reports GPS Visualization… 10/15/2012 Infochimps Confidential 20
  • 21. #MakeSenseBD Use Case Hedge Fund How do I predict whether companies will make their quarterly earnings forecast? 10/15/2012 Infochimps Confidential 21
  • 22. #MakeSenseBD Walmart 10/15/2012 Infochimps Confidential 22
  • 23. #MakeSenseBD Target 10/15/2012 Infochimps Confidential 23
  • 24. #MakeSenseBD Cars In Lot News Text Web Pricing Quarterly Revenue Prediction Social Sentiment Weather Sensors Local Employment 10/15/2012 Infochimps Confidential 24
  • 25. #MakeSenseBD Use Case Media Company How do I merge my traditional media sources with new media sources to provide improved and instant insights to my customers? 10/15/2012 Infochimps Confidential 25
  • 26. #MakeSenseBD New Media Data Scientist App Developer Gnip Powertrack Business Users Gnip EDC Sources Sentiment Moreover Metabase In-Motion Data Delivery APIs Listening Service Application TV Transcription NoSQL Radio Transcription Print Transcription IT Staff Traditional Media 10/15/2012 Infochimps Confidential 26
  • 27. #MakeSenseBD Use Case Retail Company How do I increase online revenue? 10/15/2012 Infochimps Confidential 27
  • 28. #MakeSenseBD Family 60% + 10% Million $ Q 40% Color 30% Welcome 15% Kids Exclusive Current Baby 60% Approved Hue Denim Weekend 15% Threadless Offers Sunday 25% Denim Million $ Q Spring 25% Khakis Color 30% Color 30% Million $ Q Color Denim 30% Khakis Hoodies 10% Dynamically Populated Personalized Email Known & Unknown Existing Customers & Approved Online/Offline Behavior Product Content
  • 29. #MakeSenseBD Current Campaign Offers Online Click Data Online Traditional BI Data Analytics Targeted Offers Personalized Data & Products Email Campaign Past CRM Data Model Hadoop Graph POS Cluster Analytics Data Data Model Cust Srvc Measure Call Logs Performance Social Product Content
  • 30. #MakeSenseBD #85%AccurateFirstTime 10/15/2012 Infochimps Confidential 30
  • 31. #MakeSenseBD #timeforaPOLL 10/15/2012 Infochimps Confidential 31
  • 32. #MakeSenseBD I’m Ready So How Do I Start? …without spending a *$#&-load of money before proving ROI? 10/15/2012 Infochimps Confidential 32
  • 33. #MakeSenseBD Deployment Options On-Premise Public Cloud Provider Trusted Data Center Provider 10/15/2012 Infochimps Confidential 33
  • 34. #MakeSenseBD You Manage Someone Else Manages $ $ $ $ $ Private Big Data Virtual Private Big Public Big Data Virtual Private Big Public Big Data Cloud (You Data Cloud (You Cloud (You Data Cloud Cloud (Managed Manage) Manage) Manage) (Managed Service) Service) $Cost Security Risk Time To Market 10/15/2012 Infochimps Confidential 34
  • 35. #MakeSenseBD Who? #InfochimpsOfCourse 10/15/2012 Infochimps Confidential 35
  • 36. #MakeSenseBD Infochimps Enterprise Customers • Managed Big Data Services • Elastic & Secure Private & Public Clouds • Across a Global Network of App BI Analytics Sys BI Trusted Data Center Data Lang Data Intelligence Data Delivery Delivery Network Providers Hadoop NoSQL Infra • With Batch & Real-time Delivery Analytic Framework Global Network Of • Supporting Structured & Data Center Infrastructure Providers Unstructured Data 10/15/2012 Infochimps Confidential 36
  • 37. #MakeSenseBD Data Intelligence Network Cloud-based Data PaaS Virtual Private & Public Cloud Data Tier4 Lights Out Data Centers Marketplace OpenStack & VSphere Managed Services Big Data PaaS Public Cloud 15,000 Data Sets Amazon & Rackspace Managed Services 10/15/2012 Infochimps Confidential 37
  • 38. #MakeSenseBD Elastic Big Data PaaS Deployment From Laptop to Cloud (Public & Private) Amazon, Rackspace, OpenStack & VSphere Ironfan 10/15/2012 Infochimps Confidential 38
  • 39. #MakeSenseBD Big Data Managed Service Offerings Community Public Virtual Private Private Cloud Cloud Cloud Access to Pre-integrated, pre- Pre-integrated, pre- Pre-integrated, pre- Infochimps Big tested Big Data tested Big Data tested Big Data Data Platform via stack stack stack open source Quickly deploy in Deployed in a Deployed in your Deploy Anywhere Amazon trusted lights-out Data Center - Cloud, Rackspace data center Open Stack or Cloud network Vsphere Try It Under Your High SLA Control Fully Managed Managed Service Customized Service Managed Service 10/15/2012 Infochimps Confidential 39
  • 40. #MakeSenseBD #LastPOLL 10/15/2012 Infochimps Confidential 40
  • 41. #MakeSenseBD #1 Big Data Platform For The Cloud #MakeSenseBD www.infochimps.com/demo 1-855-DATA-FUN (1-855-328-2386) 10/15/2012 Infochimps Confidential 41

Notas del editor

  1. Title slide: "Making Sense of Big Data" (I like the Elephant on the motorcycle as the background image here along with the descriptor, "We provide a suite of big data services in the cloud, used by enterprise customers who want to quickly unlock the value of their data"Slide 1:  "Information is Powerful, but it's how we use it..." Set the stage, we are here today to learn how to leverage Big Data to derive value and achieve insights. Slide 2: "What is Big Data?" The message here is to start at the beginning and define it for those in the audience who might be unclear (we know there are many people who are). Use the first slide from your CloudCon deck here.Slide 3: State of the world - data is increasing exponentially and it's only going to continue and therefore require infrastructure and management in order to provide useful insights. Use your slide 2 from your Cloudcon deck - it has a nice image of volume (which is one of the tenants of big data)Slide 4: Why is this occurring? Here the message is new types of data, batch vs. real-time -- everyone is "listening" now and measuring more activity, actions, conversations than every before. Use the CloudCon slide that builds vertically from batch to real time and horizontally from large enterprise to small enterprise.Slide 5: Problem: "Little Data for Business Users" slide from CloudCon. The message here is that due to the influx and types of data, etc. the actual users are too far removed from it and therefore blind to how to instill insights from it. Walk through the build as it explains really well how data moves throughout an organization and where the roadblocks are for getting insight to execs to act upon. Slide 6: Use the #thisreallysucks slide here to drive home the current state of being.Slide 7: "Big Data for Business Users" slide. This is the end state of being for executives looking to use data to improve operational efficiency and competitive advantage.Slide 8: Use the build slide here to show how we bring the data to the app developer and therefore reduce the friction for executives.Slide 9: Use the #thisisreallygood slide here to enhance the point that this is the way data and info should flowSlide 10: How do you achieve this state?Slide 11: Introducing Infochimps. Use the "Good to Great" (#2) slide from your 451 deck to give a brief overview of who we are. We cut our teeth on big data having built the largest data marketplace, where we leveraged the latest technology (Hadoop, etc) to manage big data. We realized that others must be realizing the same issues as IC and decided to externalize our platform to help companies implement their own Big Data infrastructure.Slide 12: Big Data Cloud Platform (the solution to the Big Data problem). Use slide #7 from 451 deck. Walk through the platform and the components, allowing attendees to see that we offer an end-to-end, cloud-based solution. Call out the value of our 4 pillars here - Fast, Simple, Flexible, Enterprise-ready.Slide 13: Deployment options slide. This is where we talk about IC being offered as a managed service and the value it affords. NOTE: Not sure if we want to communicate the Data Intelligence Network since we have not publicly or formally announced it. Slide 14: IC in action: Infomart use case (challenge, IC solution, result)Slide 15: one more use case if time (Koupon)?Slide 16: Close: Infochimps the #1 Big Data Platform for the Cloud. Include sales contact number at bottom of the slide along with web address.
  2. AvinashKaushik gave a talk at Strata 2012 in Santa Clara in March….and quoted an Kenyan Farmer.If you listen to all the hype of Big Data, it solves for the first problem.If you listen to all the vendors, there is a lot of emphasis on the first part (perhaps Infochimps included), and very little on the second.I think that’s because we don’t exactly know how to truly empower the organization to interact directly with any/all data available.It’s too expensive, risky, complex.
  3. AvinashKaushik gave a talk at Strata 2012 in Santa Clara in March.If you listen to all the hype of Big Data, it solves for the first problem.If you listen to all the vendors, there is a lot of emphasis on the first part (perhaps Infochimps included), and very little on the second.I think that’s because we don’t exactly know how to truly empower the organization to interact directly with any/all data available.It’s too expensive, risky, complex.
  4. 40%+ YoY growth with 2012 generating 2.4Zettabytes alone.http://jameskaskade.com/?p=2040http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
  5. Discussions with O’Reilly Media, Teradata, Aster Data, Yahoo!, eBay, and Facebook.The issue is not just the fact that unstructured data is exploding, but the number of sources and types of data as well…all fed from the explosion of devices used by people to interact with each other, products, and services.
  6. We have a problem today WITH our data infrastructure….our ability to gleam insights.I think all of you know what I’m referring to…..It’s the fact that we’re operating on less than 15% of the corporate data available to us…..even with the ENTERPRISE DATA WAREHOUSE, the EDW which is supposedly storing a COMPLETE, SINGLE VIEW OF THE TRUTH….We’re still giving our business users…..a tiny bit…a little bit of data.
  7. http://blogs.the451group.com/opensource/2011/04/15/nosql-newsql-and-beyond-the-answer-to-sprained-relational-databases/NoSQL databases designed to meet scalability requirements of distributed architectures and/or schema-less data management requirements, including big tables, key value stores, document database and graph databasesNewSQL databases designed to meet scalability requirements of distributed architectures or to improve performance such that horizontal scalability is no longer a necessity, including new MySQL storage engines, transparent sharding technologies, software and hardware appliances, and completely new databasesData grid/cache products designed to store data in memory to increase application and database performance, covering a spectrum of data management capabilities from non-persistent data caching to persistent caching, replication, and distributed data and compute grid functionalityhttp://en.wikipedia.org/wiki/DatabaseThe first generation of database systems were navigational,[2] applications typically accessed data by following pointers from one record to another. The two main data models at this time were the hierarchical model, epitomized by IBM's IMS system, and the Codasyl model (Network model), implemented in a number of products such as IDMS.http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redisNew SQL: The “new relational databases” that retain SQL & ACID compliance*Scalable with distributed architectures, or*Performance improved such that horizontal scalability no loner necessitySchoonerSQL: http://www.schoonerinfotech.com/Tokutek: http://www.tokutek.com/Continuent: http://www.continuent.com/Translattice: http://www.translattice.com/ScaleBase: http://www.scalebase.com/CodeFutures: http://www.codefutures.com/database-products/VoltDB: http://voltdb.com/HandlerSocket: https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQLAkiban: http://www.akiban.com/MySQL Cluster: http://www.mysql.com/products/cluster/Clustrix: http://www.clustrix.com/Drizzle: http://www.drizzle.org/GenieDB: http://www.geniedb.com/ScalArc: http://scalarc.com/NimbusDB: http://nimbusdb.com/NimbusDB/NimbusDb.html
  8. http://www.nrf-arts.org/content/unifiedposStep 1:Integrate into CRM (email)Step 2:Integrate into WebStep 3:Integrate into POS (UnifiedPOS)
  9. The Business User
  10. The Business User
  11. The Business User
  12. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  13. AMP:access module processorsPE: Parsing EngineBYNET: Banyan Cross-bar Switch YNET (Y Network)Store:The Parsing Engine dispatches a request to retrieve one or more rows.The BYNET ensures that appropriate AMP(s) are activated.The Parsing Engine dispatches a request to insert a row.The BYNET ensures that the row gets to the appropriate AMP (Access Module Processor) via the hashing algorithm.The AMP stores the row on its associated disk.Each AMP can have multiple physical disks associated with it.Retrieve:The AMPs (access module processors) locate and retrieve desired rows in parallel access and will sort, aggregate or format if needed.The BYNET returns retrieved rows to Parsing Engine.The Parsing Engine returns row(s) to requesting client application.Teradata’s shared-nothing architecture allows for highly scalable data volumes.
  14. 3 node Hadoop system:$8K/node$10K switch$4K/node HadoopDistro$24K + $10K x 25%x3 maintenance = $43K$4K x 3 x 3 = $36KTotal = There are three essential elements of an analytic platform: Strong support for analytic database query. A variety of query styles — at a minimum, SQL, MDX or graph.Strong support for analytic processes other than queries. Typically these would be in the areas of mathematics (statistics, predictive analytics, data mining, linear algebra, optimization, graph theory, etc.) and/or data transformation (e.g. sessionization, entity extraction).Strong integration between the first two.The point is — an analytic platform is something on which you can build a range of powerful analytic applications. Some specifics of what to look for in analytic platform may be found in the link above.http://www.dbms2.com/2011/02/24/analytic-platforms/http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/Enterprise data warehouse (Full or partial)Kinds of data likely to be included: All, but especially operationalLikely use styles: AllCanonical example: Central EDW for a big enterpriseStresses: Concurrency, reliability, workload managementClassical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL ServerTraditional data martKinds of data likely to be included: AllLikely use styles: Business intelligence, budgeting/consolidation, investigativeExamples: Reporting servers, planning/consolidation servers, anything MOLAP, etc.Stresses: Performance, concurrency, TCOColumnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them — e.g. Sybase IQ and Vertica — have excellent track records in concurrent usage as well.Investigative data mart — agileKinds of data likely to be included: All, especially customer-centricLikely use styles: InvestigativeCanonical example: A few analysts getting a few TB to examineStresses: Ease of setup/load, ease of admin, price/performanceInfobright is often cost-effective among columnar analytic DBMS. Investigative data mart — bigKinds of data likely to be included: All, especially customer-centric, logs, financial trade, scientificLikely use styles: InvestigativeCanonical example: Single-subject 20 TB – 20 PB relational databaseStresses: Performance, scale-out, analytic functionalityPerformance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum.Bit bucket - HadoopKinds of data likely to be included: Logs, other technical/externalLikely use styles: Staging/ETL, investigativeCanonical example: Log files in a Hadoop clusterStresses: TCO, scale-out, transform/big-query performance, ETL functionalityArchival data storeKinds of data likely to be included: Operational, CDR (call detail record), security logLikely use styles: Archival, reporting (for compliance), possibly also investigativeExamples: Any long-term detailed historical storeStresses: TCO, compression, scale-out, performance (if multi-use)Perhaps only Rainstor truly embraces the archival positioningOutsourced data martKinds of data likely to be included: AllLikely use styles: Traditional BI, investigative analytics, staging/ETLExamples: Advertising tracking, SaaS CRMStresses: Performance, TCO, reliability, concurrencyOracle shops = Vertica gets the nod in a number of these casesOperational analytic(s) serverKinds of data likely to be included: Customer-centric, log, financial tradeLikely use styles: Advanced operational analyticsExamples:Lower latency: Web or call-center personalization, anti-fraudHigher latency: Customer profiling, Basel 3 risk analysisStresses: Performance, reliability, analytic functionality, perhaps concurrencyhttp://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/
  15. The Business User
  16. The way this is performed is by taking data sources like images and storing them into Hadoop. Then using Big Data tools like MapReduce to perform sophisticated analysis on those aggregated data sets.Why is this concept so disruptive?Things like a fraction of the price….no structured data model – aka no star schema…yet the ability to run sophisticated queries and algorithms against all your detailed data.
  17. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  18. The current image shows a Walmart in Wichita, Kansas.Analysts count cars in Wal-Mart parking lots to measure overall customer traffic to understand growth versus its competition.For example, Wal-Mart's growthwas determined to come mostly from areas of high unemployment.This type of analysis is being performed in Amazon”s EC2…
  19. The current image shows the a Target in the Moraine Point Plaza located in Gardiner, NorthAnalysts comparing satellite parking lot data with regional unemployment trends found Target's growth tended to come in areas of lower-than-average unemployment.

Again, these processes are being performed in Amazon EC2.…this is interesting….but how do we process the data further to help derive more relevant insights?http://www.cnbc.com/id/38738810/Spying_For_Profits_The_Satellite_Image_Indicator
  20. The previous examples of Walmart and Target involved using a regression algorithm which was executed against the satellite data + other data to produce a quarterly revenue prediction which BEAT all previous models.
  21. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  22. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  23. The Business User
  24. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  25. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  26. Slide 1: Company Overview.The best way to give an overview of your company is to state concisely your core value proposition: What unique benefit will you provide to what set of customers to address what particular need? Then you can add three or four additional dot points to clarify your target markets, your unique technology/solution, and your status (launch date, current customers, revenue rate, pipeline, funding needed). Key objective: Flesh out the foundation you established at the beginning. At this point, no one should have any question about what it is that your company does, or plans to do. The only questions that should remain are the details of how you are going to do it. Another key objective you should have achieved by this point in your presentation is to make sure that if there are some compelling brand names associated with your company (customers, partners, investors, advisors), your audience knows about them. Feel free to drop names early and often—starting with your first email introduction to the investor. Brand name relationships build your credibility, but do not overstate them if they are tenuous.Use-cases:RunaAutomated real-time online offers - monitors and analyzes shopper behavior on web, and then makes each shopper a personalized offerInfochimps helps Runa configure and manage their entire production system, including Hadoop, HBase, messaging, monitoring, and more. (using Ironfan – Robert Berger)SpringSenseintelligence enterprise document searchSpringSense uses Infochimps to scale its award-winning technology to process the full Wikipedia corpus - over 4 million articles - for rapid meaning-based search. (using Ironfan)Black LocusCompetitive pricing analytics platform for enterprisesIngesting millions of product pricing data points from the web, analyzing historical and current data, presenting analytic results in real-time.Koupon MediaMobile coupon platformFor every user who enters into the mobile coupon system, more demographic information is needed to help target the right coupon to the right customer and in real-time.BlueCavaBehavioral target marketing platform - joins customers across any/all devices & augments w/ demograph / behavioral for targeted advertisingFor every user who enters into the mobile coupon system, more demographic information is needed to help target the right coupon to the right customer and in real-time.A new Attribution data product (using Hadoop) which determines correlations between customer purchases / conversions to advertising impressions and website behavior.InfoMartLargest media company in Canada transforming business from print to digital – focus is on engaging and better understanding their audiencesSocial media listening platform which consists of both real-time social feed search / analytics / reporting for InfoMart and their customers + historic analysis / trending research.
  27. Slide 3: Solution.What specifically are you offering to whom? Software, hardware, services, a combination? Use common terms to state concretely what you have, or what you do, that solves the problem you’ve identified. Avoid acronyms and don’t try to use these precious few words to create and trademark a bunch of terms that won’t mean anything to most people, and don’t use this as an opportunity to showcase your insider status and facility with the idiomatic lingo of the industry. If you can demonstrate your solution (briefly) in a meeting, this is the place to do it.Slide 3.1: Delivering the Solution.You might need an extra slide to show how your solution fits in the value chain or ecosystem of your target market. Do you complement commonly used technologies, or do you displace them? Do you change the way certain business processes get executed, or do you just do them the same way, but faster, better and cheaper? Do you disrupt the current value chain, or do you fit into established channels? Who exactly is the buyer, and is that person different than the user?
  28. Slide 7: Go to Market Strategy.The single most compelling slide in any pitch is a pipeline of customers and strategic partners that have already expressed some interest in your solution—if they haven’t already joined your beta program. Too often this slide is, instead, a bland laundry list of standard sales and marketing tactics. You should focus on articulating the non-obvious, potentially disruptive elements of your strategy. Even better, frame your comments in terms of the critical hurdles you need to get over, and how you are going to jump them. If you don’t have a pipeline, and there is nothing unique or innovative about your strategy, then drop this slide and make the elements of your sales model clear in the discussion of your business model (next slide).
  29. 1. Which best describes your position in your organization? a. Executive (VP, SVP, C-level)b. Business User (Marketing, Product, etc.)c. Analytics Team (Data Scientist, Analyst, etc.)d. IT User (App Dev, DevOps, Project Manager, etc)e. Other2. Do you have a current or upcoming Big Data project? a. Yesb. Noc. Not Sure3. Which deployment option do you prefer? a. Public Cloudb. Private Cloudc. Virtual Private Cloudd. No Cloude. Not Sure