SlideShare una empresa de Scribd logo
1 de 19
Data Science
A Practitioner’s Perspective

Mass Technology Leadership Council Panel Discussion
David Menninger, Formerly VP & Research Director, Ventana Research
David.Menninger@emc.com




                                                                ©2012, Ventana Research
David Menninger
Former Vice President – Ventana Research

Now head of business development and strategy for EMC Greenplum.

Until last week, covered analytics, business intelligence and information
management for Ventana Research. Over two decades of experience
developing and bringing to market some of the leading edge
technologies for helping organizations analyze data to support a range
of action-taking and decision-making processes.

Prior to joining Ventana Research, served as VP of Marketing and
Product Management at Vertica Systems, Oracle, Applix, InforSense
and IRI Software. Helped create over three quarter billion dollars of
shareholder value while serving in these roles.

Email: david.menninger@emc.com




                                       2
                        ©2011, Ventana Research, Inc.
Some Recent Relevant Research
Volume and Velocity of Data Are Most
   Important In Evaluating Big Data Technology
                                                                       less than 1 TB        10%

                                                                             1-10 TB               29%

                                                                           11-100 TB               31%

                                                                         101 TB-1 PB         13%

                                                                     more than 1 PB          11%

                                                                         Don't know      7%

                                                                                        0%     10%       20%   30%   40%

less than 10 GB per day       26%

    11-100 GB per day             33%

   101 GB-1 TB per day       20%

       1-10 TB per day       4%

  More than 10 TB per…        6%

           Don't know      12%

                          0% 10% 20% 30% 40%

                                                       Source: Ventana Research The Challenge of Big Data Benchmark Research
                                                   4
                                        ©2012, Ventana Research
Hadoop Is Being Adopted or Considered
 by 54% of Enterprises




Production    22%


  Planned    15%


Evaluating   17%




                           Source: Ventana Research Hadoop Information Management Analytics Research


                                   5
                    ©2011, Ventana Research, Inc.
…but the Vast Majority Use a Variety of
     Big Data Technologies
  An RDBMS (for example, IBM
        DB2, Microsoft
 SQLServer, MySQL, Oracle) on
                                                                           89%                                2% 3%
                                                                                                               2% 3%
      standard hardware


                        Flat files                               70%                           7%1%
                                                                                                  4%          18%


   A DW appliance (for example
     , Netezza, Exadata, EMC                34%                        11% 3%      21%                  31%
       Greenplum, Teradata)


          In-memory databases              33%                         13%    4%   17%                  33%



                         Hadoop       22%               12%           3%     17%                  45%



                           Other       26%                 4%4% 10%                          57%

   A specialized DBMS (for
         example, Aster
Data, Infobright, Kognitio, Parac
                                     15%      9%       5%              19%                      51%
    cel, SybaseIQ, Vertica)
                                       Currently in production                        Plan to use within 12 months
                                       Plan to use in 12-24 months                    Still evaluating
                                       No plans to use


                                                           Source: Ventana Research The Challenge of Big Data Benchmark Research

                                                       6
                                            ©2012, Ventana Research
What Types of Applications?

        What types of large-scale data applications is your
        organization running?
                                                             60%
    Query and reporting
                                                                           89%

Consolidation of multiple                                     63%              Hadoop is most often
data sources for analysis                                           71%        used for advanced
     Custom/production                                            65%          analyses and is more
         application                                              68%          likely to be used to
                                                            56%                analyze unstructured
       Data preparation
                                                             60%               data and for data
                                                                   69%         sandboxing than other
     Advanced analyses
                                                   47%                         technologies. It is less
    Analysis or indexing                          46%
                                                                               likely to be used for
    of unstructured data              32%                                      query and reporting.
                                                                  Hadoop
      Data sandbox/                             44%
   Data experimentation               32%                         Non-Hadoop



                                   Source: Ventana Research Hadoop Information Management Analytics Research
                                           7
                            ©2011, Ventana Research, Inc.
Predictive Analytics Still Emerging


  Despite its potential, predictive analytics remain a
  specialist tool, ranking 10th among BI capabilities with
  only 13% using them
   Spreadsheets                                                                                 60%
   Business Intelligence                                                              49%
   Analytic Databases                                                41%
   Custom-built systems                              34%
   Data warehouse                                 28%
   Planning and forecasting                      26%
   Application server            20%
   LOB analytics              18%
   RDB                      14%    … yet 80% ranked predictive analytics
   Predictive Analytics    13%     capabilities as important or very important

                                                      Source: Ventana Research Business Analytics Benchmark Research


                                             8
                                 ©2012, Ventana Research
Forecasting and Marketing are the Most
Common Uses of Predictive Analytics

                         Forecasting…                                     72%                              24%

                 Marketing analyses…                                     70%                             22%

       Customer service or support…                           45%                            34%

Product recommendations or offers                            43%                       22%

                     Fraud detection                     34%                         31%

Intelligence or surveillance analysis                28%                       28%

             Social network analysis                27%                          38%

                   Logistics analysis               26%                    27%

    Predicting product development …           18%                       34%

Predicting prices in the supply chain         17%                        36%

       Scientific or clinical research        17%                    27%

               Healthcare decisions           16%                    29%
                                                                                                          Current
      Predicting mechanical failures       9%                      33%                                    Future

                               Other          17%                   24%



                                                              Source: Ventana Research Predictive Analytics Benchmark Research


                                                     9
                                         ©2012, Ventana Research
Organizations Employ a Variety of Predictive
Analytics Algorithms
       Classification and
      regression trees /…
                                                         69%                                            25%          6%

      Linear Regression                                66%                                              33%
  Logistic regression or
  other discrete choice…
                                                    61%                                           29%               10%

       Association rules                   49%                                              37%                 14%

    K-nearest neighbors             36%                                            42%                        21%

       Neural networks            30%                                    36%                            34%
            Box
 Jenkins, Autoregressive…
                                  30%                                    35%                            35%
Exponential smoothing /
  double exponential…
                              22%                                     43%                               34%

            Naïve Bayes       21%                                    43%                                36%

Support vector machines      20%                     23%                                        57%

       Survival analysis    15%                               41%                                     44%

Monte Carlo Simulations     13%                               47%                                      40%

                                             Frequently             Occasionally   Not at all


Classification and regression trees / decision trees and Linear
Regression are the most popular predictive analytics techniques used.
                                                         Source: Ventana Research Predictive Analytics Benchmark Research


                                               10
                                    ©2012, Ventana Research
Who Designs and Deploys Predictive Analytics?




        Data Scientist /          Bus. Intelligence /            Line-of-
         Data Mining              Data Warehouse                 Business
         Resources                       Team                    Analysts
             32%                         31%                       19%



        … but who should be performing these tasks?
                                                Source: Ventana Research Predictive Analytics Benchmark Research


                                      11                                                                    Q18
                           ©2012, Ventana Research
Who Does the Best Job?


                              Satisfaction vs. Project Team


 Specialized data scientist, statistical
                                                                            70%
      or data mining resources




            Line of business analysts                                 65%




      Business intelligence and data
                                                            59%
            warehouse team


                                           50%                  55%         60%          65%           70%           75%

                                                                               Overall Average
                                                           Source: Ventana Research Predictive Analytics Benchmark Research


                                                 12
                                      ©2012, Ventana Research
Real-Time Scoring of New Records




Not at all                                        Regularly
  30%                                               30%
                                                                    More than half
                                                                    the organizations
                                                                    perform real-time
                                                                    scoring
                                                                    infrequently or
                                                                    not at all.

                                             Occasionally
    Infrequently                                18%
        22%
                                        Source: Ventana Research Predictive Analytics Benchmark Research


                              13                                                                    Q26
                   ©2012, Ventana Research
Organizations Need More Timely Results
from Predictive Analytics

                        Satisfaction vs. Use of Real-time Scoring



    Regularly                                            88%




 Occasionally                                  73%




  Infrequently
                               47%
   or Not at all



                   0%        20%                    40%              60%              80%             100%

                                                               Overall Average
                                                        Source: Ventana Research Predictive Analytics Benchmark Research


                                              14
                                   ©2012, Ventana Research
Frequency of Updating Predictive Models


                                                     Don't know                       Constantly
                                                        16%                             12%
                                                                                              Hourly
                                                                                                2%
Most organizations                                                                               Daily
don’t update their                                                                                6%
                                        Less often
analytic models                            than
frequently enough.                      quarterly                                                    Weekly
                                           17%                                                        11%
Nearly four in 10 update
their models quarterly or
less frequently.
                                                                                             Monthly
                                                        Quarterly                             14%
                                                          22%




                                             Source: Ventana Research Predictive Analytics Benchmark Research


                                   15                                                                    Q27
                        ©2012, Ventana Research
Organizations that Update Models More
Frequently Have Higher Satisfaction

                               Satisfaction vs. Model Updates



   At Least Daily                                           81%




 At least Monthly                                        74%




 Less Frequently                        48%



                    0%   10%     20%             30%        40%       50%       60%       70%       80%        90%


                                                                           Overall Average
                                                    Source: Ventana Research Predictive Analytics Benchmark Research


                                          16
                               ©2011, Ventana Research
Most Organizations Are Not Providing
Adequate Support and Training
   Training in Predictive analytics
                                                          44%                 32%             24%
      concepts and techniques




                  Product training                        42%               33%               26%



    Training in the application of
   predictive analytics to business                   39%                   38%                23%
              problems



  Specialized consulting resources
                                                  31%                   39%                 31%
       (internal or external)




              Help desk resources               24%               34%                    42%




                  Adequately           Only somewhat adequately                     Inadequately

                                                     Source: Ventana Research Predictive Analytics Benchmark Research


                                           17
                                ©2012, Ventana Research
What Types of Training and Support Are
Most Effective?

                     Satisfaction vs. Training and Support
                       Training in Predictive analytics
                                                                            89%
                         concepts and techniques


                                      Help desk resources                   89%



             Training in the application of predictive
                                                                           86%
                 analytics to business problems


                                            Product training         79%



                   Specialized consulting resources
                                                                     77%
                         (internal or external)

                                                           60% 65% 70% 75% 80% 85% 90% 95%

                                                  Overall Average

                                       Source: Ventana Research Predictive Analytics Benchmark Research


                             18
                  ©2012, Ventana Research
Data Science
A Practitioner’s Perspective

Mass Technology Leadership Council Panel Discussion
David Menninger, Formerly VP & Research Director, Ventana Research
David.Menninger@emc.com




                                                                ©2012, Ventana Research

Más contenido relacionado

Destacado

Financial organization-orm
Financial organization-ormFinancial organization-orm
Financial organization-orm
MetricStream Inc
 
Shipley - Algebra II Ch3 Proficiency Charts
Shipley - Algebra II Ch3 Proficiency ChartsShipley - Algebra II Ch3 Proficiency Charts
Shipley - Algebra II Ch3 Proficiency Charts
jtentinger
 
1. blue flashcards1 21
1. blue flashcards1 211. blue flashcards1 21
1. blue flashcards1 21
Julie Sanchez
 
Alg ii3 1-solvingsystemsusingtablesgraphs
Alg ii3 1-solvingsystemsusingtablesgraphsAlg ii3 1-solvingsystemsusingtablesgraphs
Alg ii3 1-solvingsystemsusingtablesgraphs
jtentinger
 
How to make $60 an hour as a video ghostwriter
How to make $60 an hour as a video ghostwriterHow to make $60 an hour as a video ghostwriter
How to make $60 an hour as a video ghostwriter
artp
 
Alg II Unit 3-6-solvingsystemsmatrices
Alg II Unit 3-6-solvingsystemsmatricesAlg II Unit 3-6-solvingsystemsmatrices
Alg II Unit 3-6-solvingsystemsmatrices
jtentinger
 
1.p4600 spring 2011 course procedures
1.p4600 spring 2011 course procedures1.p4600 spring 2011 course procedures
1.p4600 spring 2011 course procedures
Julie Sanchez
 
MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20
MassTLC
 
Vistaprint tech stack at MassTLC software development summit
Vistaprint tech stack at MassTLC software development summitVistaprint tech stack at MassTLC software development summit
Vistaprint tech stack at MassTLC software development summit
MassTLC
 
What most people do wrong in internet and how to avoid their mistakes
What most people do wrong in internet and how to avoid their mistakesWhat most people do wrong in internet and how to avoid their mistakes
What most people do wrong in internet and how to avoid their mistakes
artp
 

Destacado (20)

Brad Meiseles, Maximizing Engineering Productivity
Brad Meiseles, Maximizing Engineering ProductivityBrad Meiseles, Maximizing Engineering Productivity
Brad Meiseles, Maximizing Engineering Productivity
 
Financial organization-orm
Financial organization-ormFinancial organization-orm
Financial organization-orm
 
Telecom Italia Self-Regulatory Code - February 2009
Telecom Italia Self-Regulatory Code - February 2009Telecom Italia Self-Regulatory Code - February 2009
Telecom Italia Self-Regulatory Code - February 2009
 
4 steps to a digital future
4 steps to a digital future4 steps to a digital future
4 steps to a digital future
 
Shipley - Algebra II Ch3 Proficiency Charts
Shipley - Algebra II Ch3 Proficiency ChartsShipley - Algebra II Ch3 Proficiency Charts
Shipley - Algebra II Ch3 Proficiency Charts
 
1. blue flashcards1 21
1. blue flashcards1 211. blue flashcards1 21
1. blue flashcards1 21
 
Brightcove presentation on Automated Testing
Brightcove presentation on Automated TestingBrightcove presentation on Automated Testing
Brightcove presentation on Automated Testing
 
Alg ii3 1-solvingsystemsusingtablesgraphs
Alg ii3 1-solvingsystemsusingtablesgraphsAlg ii3 1-solvingsystemsusingtablesgraphs
Alg ii3 1-solvingsystemsusingtablesgraphs
 
How to make $60 an hour as a video ghostwriter
How to make $60 an hour as a video ghostwriterHow to make $60 an hour as a video ghostwriter
How to make $60 an hour as a video ghostwriter
 
Alg II Unit 3-6-solvingsystemsmatrices
Alg II Unit 3-6-solvingsystemsmatricesAlg II Unit 3-6-solvingsystemsmatrices
Alg II Unit 3-6-solvingsystemsmatrices
 
1.p4600 spring 2011 course procedures
1.p4600 spring 2011 course procedures1.p4600 spring 2011 course procedures
1.p4600 spring 2011 course procedures
 
MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20MassTLC Big Data Seminar Sept 20
MassTLC Big Data Seminar Sept 20
 
Vistaprint tech stack at MassTLC software development summit
Vistaprint tech stack at MassTLC software development summitVistaprint tech stack at MassTLC software development summit
Vistaprint tech stack at MassTLC software development summit
 
Lose it! how to monetize your app by Charles Teague
Lose it! how to monetize your app by Charles TeagueLose it! how to monetize your app by Charles Teague
Lose it! how to monetize your app by Charles Teague
 
Арбитраж и согласительное урегулирование в спорте
Арбитраж и согласительное урегулирование в спортеАрбитраж и согласительное урегулирование в спорте
Арбитраж и согласительное урегулирование в спорте
 
What most people do wrong in internet and how to avoid their mistakes
What most people do wrong in internet and how to avoid their mistakesWhat most people do wrong in internet and how to avoid their mistakes
What most people do wrong in internet and how to avoid their mistakes
 
Ch 07
Ch 07Ch 07
Ch 07
 
MassTLC marketing analytics summit, HubSpot
MassTLC marketing analytics summit, HubSpot MassTLC marketing analytics summit, HubSpot
MassTLC marketing analytics summit, HubSpot
 
Enterprise Mobile Alan Murray and Jim Whalen
Enterprise Mobile Alan Murray and Jim WhalenEnterprise Mobile Alan Murray and Jim Whalen
Enterprise Mobile Alan Murray and Jim Whalen
 
Resoconto intermedio di gestione Telecom Italia al 31 marzo 2014
Resoconto intermedio di gestione Telecom Italia al 31 marzo 2014Resoconto intermedio di gestione Telecom Italia al 31 marzo 2014
Resoconto intermedio di gestione Telecom Italia al 31 marzo 2014
 

Similar a Mass tlc presentation menninger

Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
BigMine
 
Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)
DirectionFirst
 
ESG: NetApp Open Solution for Hadoop
ESG: NetApp Open Solution for HadoopESG: NetApp Open Solution for Hadoop
ESG: NetApp Open Solution for Hadoop
NetApp
 
Jeff's what isdatascience
Jeff's what isdatascienceJeff's what isdatascience
Jeff's what isdatascience
lizliddy
 

Similar a Mass tlc presentation menninger (20)

101 ab 1415-1445
101 ab 1415-1445101 ab 1415-1445
101 ab 1415-1445
 
101 ab 1415-1445
101 ab 1415-1445101 ab 1415-1445
101 ab 1415-1445
 
Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011
 
Strategies for Integrating with Hadoop
Strategies for Integrating with HadoopStrategies for Integrating with Hadoop
Strategies for Integrating with Hadoop
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Forrester
ForresterForrester
Forrester
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Big Data Insights & Opportunities
Big Data Insights & OpportunitiesBig Data Insights & Opportunities
Big Data Insights & Opportunities
 
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data WarehouseHadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
 
Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)Young Research Group Preso - When MR meets Glocal Trend (2011)
Young Research Group Preso - When MR meets Glocal Trend (2011)
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Big Data in Context
Big Data in ContextBig Data in Context
Big Data in Context
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
 
ESG: NetApp Open Solution for Hadoop
ESG: NetApp Open Solution for HadoopESG: NetApp Open Solution for Hadoop
ESG: NetApp Open Solution for Hadoop
 
NetApp Open Solution for Hadoop
NetApp Open Solution for HadoopNetApp Open Solution for Hadoop
NetApp Open Solution for Hadoop
 
Symantec 2010 Windows 7 Migration Survey
Symantec 2010 Windows 7 Migration SurveySymantec 2010 Windows 7 Migration Survey
Symantec 2010 Windows 7 Migration Survey
 
Jeff's what isdatascience
Jeff's what isdatascienceJeff's what isdatascience
Jeff's what isdatascience
 
2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata
 

Más de MassTLC

Old Company - New Technology, Elixir @ the MBTA
Old Company - New Technology, Elixir @ the MBTAOld Company - New Technology, Elixir @ the MBTA
Old Company - New Technology, Elixir @ the MBTA
MassTLC
 

Más de MassTLC (20)

MassIntelligence 2018: Intelligent Connected Cities
MassIntelligence 2018: Intelligent Connected CitiesMassIntelligence 2018: Intelligent Connected Cities
MassIntelligence 2018: Intelligent Connected Cities
 
MassIntelligence 2018: How to Rapidly Prototype an AI Solution
MassIntelligence 2018: How to Rapidly Prototype an AI SolutionMassIntelligence 2018: How to Rapidly Prototype an AI Solution
MassIntelligence 2018: How to Rapidly Prototype an AI Solution
 
MassIntelligence 2018: Connecting the Nation's Top Fishing Port
MassIntelligence 2018: Connecting the Nation's Top Fishing PortMassIntelligence 2018: Connecting the Nation's Top Fishing Port
MassIntelligence 2018: Connecting the Nation's Top Fishing Port
 
MassIntelligence 2018: Transportation & Mobility, Alex Wyglinski
MassIntelligence 2018: Transportation & Mobility, Alex WyglinskiMassIntelligence 2018: Transportation & Mobility, Alex Wyglinski
MassIntelligence 2018: Transportation & Mobility, Alex Wyglinski
 
Andres Corrada-Emmanuel - Ground Truth Problems in Business
Andres Corrada-Emmanuel - Ground Truth Problems in BusinessAndres Corrada-Emmanuel - Ground Truth Problems in Business
Andres Corrada-Emmanuel - Ground Truth Problems in Business
 
MassTLC product launch campaign strategies, Jason Baudreau, NetBrain
MassTLC product launch campaign strategies, Jason Baudreau, NetBrainMassTLC product launch campaign strategies, Jason Baudreau, NetBrain
MassTLC product launch campaign strategies, Jason Baudreau, NetBrain
 
MassTLC product launch campaign strategies, ben austin, Carbon Black
MassTLC product launch campaign strategies, ben austin, Carbon BlackMassTLC product launch campaign strategies, ben austin, Carbon Black
MassTLC product launch campaign strategies, ben austin, Carbon Black
 
Forget about A.G.I. Let's Build Useable Ai Tools!
Forget about A.G.I. Let's Build Useable Ai Tools!Forget about A.G.I. Let's Build Useable Ai Tools!
Forget about A.G.I. Let's Build Useable Ai Tools!
 
Cloud Edge Computing: Beyond the Data Center
Cloud Edge Computing: Beyond the Data CenterCloud Edge Computing: Beyond the Data Center
Cloud Edge Computing: Beyond the Data Center
 
Old Company - New Technology, Elixir @ the MBTA
Old Company - New Technology, Elixir @ the MBTAOld Company - New Technology, Elixir @ the MBTA
Old Company - New Technology, Elixir @ the MBTA
 
Lisa seacat deluca io t robotics presentation
Lisa seacat deluca io t robotics presentationLisa seacat deluca io t robotics presentation
Lisa seacat deluca io t robotics presentation
 
Smart cities thinking outside the box
Smart cities thinking outside the boxSmart cities thinking outside the box
Smart cities thinking outside the box
 
Lily lim data privacy ownership and ethics
Lily lim data privacy ownership and ethicsLily lim data privacy ownership and ethics
Lily lim data privacy ownership and ethics
 
Abbas bagasra smart ag
Abbas bagasra smart agAbbas bagasra smart ag
Abbas bagasra smart ag
 
Ben goodman cybersecurity in the iiot
Ben goodman cybersecurity in the iiotBen goodman cybersecurity in the iiot
Ben goodman cybersecurity in the iiot
 
MassTLC Opening Slides and Simulation Session
MassTLC Opening Slides and Simulation SessionMassTLC Opening Slides and Simulation Session
MassTLC Opening Slides and Simulation Session
 
Tom Hopcroft: State of the Tech Economy Key Findings
Tom Hopcroft: State of the Tech Economy Key FindingsTom Hopcroft: State of the Tech Economy Key Findings
Tom Hopcroft: State of the Tech Economy Key Findings
 
Michael Goodman: The State of the State Economy
Michael Goodman: The State of the State EconomyMichael Goodman: The State of the State Economy
Michael Goodman: The State of the State Economy
 
Brainshark mass tlc brand revitalizaion_final for distribution
Brainshark mass tlc brand revitalizaion_final for distributionBrainshark mass tlc brand revitalizaion_final for distribution
Brainshark mass tlc brand revitalizaion_final for distribution
 
Mass tlc summit-mapping-content-strategy-customer-journey-final (002)
Mass tlc summit-mapping-content-strategy-customer-journey-final (002)Mass tlc summit-mapping-content-strategy-customer-journey-final (002)
Mass tlc summit-mapping-content-strategy-customer-journey-final (002)
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Mass tlc presentation menninger

  • 1. Data Science A Practitioner’s Perspective Mass Technology Leadership Council Panel Discussion David Menninger, Formerly VP & Research Director, Ventana Research David.Menninger@emc.com ©2012, Ventana Research
  • 2. David Menninger Former Vice President – Ventana Research Now head of business development and strategy for EMC Greenplum. Until last week, covered analytics, business intelligence and information management for Ventana Research. Over two decades of experience developing and bringing to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision-making processes. Prior to joining Ventana Research, served as VP of Marketing and Product Management at Vertica Systems, Oracle, Applix, InforSense and IRI Software. Helped create over three quarter billion dollars of shareholder value while serving in these roles. Email: david.menninger@emc.com 2 ©2011, Ventana Research, Inc.
  • 4. Volume and Velocity of Data Are Most Important In Evaluating Big Data Technology less than 1 TB 10% 1-10 TB 29% 11-100 TB 31% 101 TB-1 PB 13% more than 1 PB 11% Don't know 7% 0% 10% 20% 30% 40% less than 10 GB per day 26% 11-100 GB per day 33% 101 GB-1 TB per day 20% 1-10 TB per day 4% More than 10 TB per… 6% Don't know 12% 0% 10% 20% 30% 40% Source: Ventana Research The Challenge of Big Data Benchmark Research 4 ©2012, Ventana Research
  • 5. Hadoop Is Being Adopted or Considered by 54% of Enterprises Production 22% Planned 15% Evaluating 17% Source: Ventana Research Hadoop Information Management Analytics Research 5 ©2011, Ventana Research, Inc.
  • 6. …but the Vast Majority Use a Variety of Big Data Technologies An RDBMS (for example, IBM DB2, Microsoft SQLServer, MySQL, Oracle) on 89% 2% 3% 2% 3% standard hardware Flat files 70% 7%1% 4% 18% A DW appliance (for example , Netezza, Exadata, EMC 34% 11% 3% 21% 31% Greenplum, Teradata) In-memory databases 33% 13% 4% 17% 33% Hadoop 22% 12% 3% 17% 45% Other 26% 4%4% 10% 57% A specialized DBMS (for example, Aster Data, Infobright, Kognitio, Parac 15% 9% 5% 19% 51% cel, SybaseIQ, Vertica) Currently in production Plan to use within 12 months Plan to use in 12-24 months Still evaluating No plans to use Source: Ventana Research The Challenge of Big Data Benchmark Research 6 ©2012, Ventana Research
  • 7. What Types of Applications? What types of large-scale data applications is your organization running? 60% Query and reporting 89% Consolidation of multiple 63% Hadoop is most often data sources for analysis 71% used for advanced Custom/production 65% analyses and is more application 68% likely to be used to 56% analyze unstructured Data preparation 60% data and for data 69% sandboxing than other Advanced analyses 47% technologies. It is less Analysis or indexing 46% likely to be used for of unstructured data 32% query and reporting. Hadoop Data sandbox/ 44% Data experimentation 32% Non-Hadoop Source: Ventana Research Hadoop Information Management Analytics Research 7 ©2011, Ventana Research, Inc.
  • 8. Predictive Analytics Still Emerging Despite its potential, predictive analytics remain a specialist tool, ranking 10th among BI capabilities with only 13% using them Spreadsheets 60% Business Intelligence 49% Analytic Databases 41% Custom-built systems 34% Data warehouse 28% Planning and forecasting 26% Application server 20% LOB analytics 18% RDB 14% … yet 80% ranked predictive analytics Predictive Analytics 13% capabilities as important or very important Source: Ventana Research Business Analytics Benchmark Research 8 ©2012, Ventana Research
  • 9. Forecasting and Marketing are the Most Common Uses of Predictive Analytics Forecasting… 72% 24% Marketing analyses… 70% 22% Customer service or support… 45% 34% Product recommendations or offers 43% 22% Fraud detection 34% 31% Intelligence or surveillance analysis 28% 28% Social network analysis 27% 38% Logistics analysis 26% 27% Predicting product development … 18% 34% Predicting prices in the supply chain 17% 36% Scientific or clinical research 17% 27% Healthcare decisions 16% 29% Current Predicting mechanical failures 9% 33% Future Other 17% 24% Source: Ventana Research Predictive Analytics Benchmark Research 9 ©2012, Ventana Research
  • 10. Organizations Employ a Variety of Predictive Analytics Algorithms Classification and regression trees /… 69% 25% 6% Linear Regression 66% 33% Logistic regression or other discrete choice… 61% 29% 10% Association rules 49% 37% 14% K-nearest neighbors 36% 42% 21% Neural networks 30% 36% 34% Box Jenkins, Autoregressive… 30% 35% 35% Exponential smoothing / double exponential… 22% 43% 34% Naïve Bayes 21% 43% 36% Support vector machines 20% 23% 57% Survival analysis 15% 41% 44% Monte Carlo Simulations 13% 47% 40% Frequently Occasionally Not at all Classification and regression trees / decision trees and Linear Regression are the most popular predictive analytics techniques used. Source: Ventana Research Predictive Analytics Benchmark Research 10 ©2012, Ventana Research
  • 11. Who Designs and Deploys Predictive Analytics? Data Scientist / Bus. Intelligence / Line-of- Data Mining Data Warehouse Business Resources Team Analysts 32% 31% 19% … but who should be performing these tasks? Source: Ventana Research Predictive Analytics Benchmark Research 11 Q18 ©2012, Ventana Research
  • 12. Who Does the Best Job? Satisfaction vs. Project Team Specialized data scientist, statistical 70% or data mining resources Line of business analysts 65% Business intelligence and data 59% warehouse team 50% 55% 60% 65% 70% 75% Overall Average Source: Ventana Research Predictive Analytics Benchmark Research 12 ©2012, Ventana Research
  • 13. Real-Time Scoring of New Records Not at all Regularly 30% 30% More than half the organizations perform real-time scoring infrequently or not at all. Occasionally Infrequently 18% 22% Source: Ventana Research Predictive Analytics Benchmark Research 13 Q26 ©2012, Ventana Research
  • 14. Organizations Need More Timely Results from Predictive Analytics Satisfaction vs. Use of Real-time Scoring Regularly 88% Occasionally 73% Infrequently 47% or Not at all 0% 20% 40% 60% 80% 100% Overall Average Source: Ventana Research Predictive Analytics Benchmark Research 14 ©2012, Ventana Research
  • 15. Frequency of Updating Predictive Models Don't know Constantly 16% 12% Hourly 2% Most organizations Daily don’t update their 6% Less often analytic models than frequently enough. quarterly Weekly 17% 11% Nearly four in 10 update their models quarterly or less frequently. Monthly Quarterly 14% 22% Source: Ventana Research Predictive Analytics Benchmark Research 15 Q27 ©2012, Ventana Research
  • 16. Organizations that Update Models More Frequently Have Higher Satisfaction Satisfaction vs. Model Updates At Least Daily 81% At least Monthly 74% Less Frequently 48% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Overall Average Source: Ventana Research Predictive Analytics Benchmark Research 16 ©2011, Ventana Research
  • 17. Most Organizations Are Not Providing Adequate Support and Training Training in Predictive analytics 44% 32% 24% concepts and techniques Product training 42% 33% 26% Training in the application of predictive analytics to business 39% 38% 23% problems Specialized consulting resources 31% 39% 31% (internal or external) Help desk resources 24% 34% 42% Adequately Only somewhat adequately Inadequately Source: Ventana Research Predictive Analytics Benchmark Research 17 ©2012, Ventana Research
  • 18. What Types of Training and Support Are Most Effective? Satisfaction vs. Training and Support Training in Predictive analytics 89% concepts and techniques Help desk resources 89% Training in the application of predictive 86% analytics to business problems Product training 79% Specialized consulting resources 77% (internal or external) 60% 65% 70% 75% 80% 85% 90% 95% Overall Average Source: Ventana Research Predictive Analytics Benchmark Research 18 ©2012, Ventana Research
  • 19. Data Science A Practitioner’s Perspective Mass Technology Leadership Council Panel Discussion David Menninger, Formerly VP & Research Director, Ventana Research David.Menninger@emc.com ©2012, Ventana Research

Notas del editor

  1. 93% of RDBMs users also use another technology.
  2. Q17 What types of large-scale data applications is your organization running?
  3. Q106 What technologies does your organization use today to generate analytics? (Select the five most important )
  4. When adequate training and support are provided satisfaction increases. All types have a positive influence, but training in predictive analytics concepts and help desk support seem to have the most positive impact.