Más contenido relacionado Similar a Mass tlc presentation menninger (20) Mass tlc presentation menninger1. Data Science
A Practitioner’s Perspective
Mass Technology Leadership Council Panel Discussion
David Menninger, Formerly VP & Research Director, Ventana Research
David.Menninger@emc.com
©2012, Ventana Research
2. David Menninger
Former Vice President – Ventana Research
Now head of business development and strategy for EMC Greenplum.
Until last week, covered analytics, business intelligence and information
management for Ventana Research. Over two decades of experience
developing and bringing to market some of the leading edge
technologies for helping organizations analyze data to support a range
of action-taking and decision-making processes.
Prior to joining Ventana Research, served as VP of Marketing and
Product Management at Vertica Systems, Oracle, Applix, InforSense
and IRI Software. Helped create over three quarter billion dollars of
shareholder value while serving in these roles.
Email: david.menninger@emc.com
2
©2011, Ventana Research, Inc.
4. Volume and Velocity of Data Are Most
Important In Evaluating Big Data Technology
less than 1 TB 10%
1-10 TB 29%
11-100 TB 31%
101 TB-1 PB 13%
more than 1 PB 11%
Don't know 7%
0% 10% 20% 30% 40%
less than 10 GB per day 26%
11-100 GB per day 33%
101 GB-1 TB per day 20%
1-10 TB per day 4%
More than 10 TB per… 6%
Don't know 12%
0% 10% 20% 30% 40%
Source: Ventana Research The Challenge of Big Data Benchmark Research
4
©2012, Ventana Research
5. Hadoop Is Being Adopted or Considered
by 54% of Enterprises
Production 22%
Planned 15%
Evaluating 17%
Source: Ventana Research Hadoop Information Management Analytics Research
5
©2011, Ventana Research, Inc.
6. …but the Vast Majority Use a Variety of
Big Data Technologies
An RDBMS (for example, IBM
DB2, Microsoft
SQLServer, MySQL, Oracle) on
89% 2% 3%
2% 3%
standard hardware
Flat files 70% 7%1%
4% 18%
A DW appliance (for example
, Netezza, Exadata, EMC 34% 11% 3% 21% 31%
Greenplum, Teradata)
In-memory databases 33% 13% 4% 17% 33%
Hadoop 22% 12% 3% 17% 45%
Other 26% 4%4% 10% 57%
A specialized DBMS (for
example, Aster
Data, Infobright, Kognitio, Parac
15% 9% 5% 19% 51%
cel, SybaseIQ, Vertica)
Currently in production Plan to use within 12 months
Plan to use in 12-24 months Still evaluating
No plans to use
Source: Ventana Research The Challenge of Big Data Benchmark Research
6
©2012, Ventana Research
7. What Types of Applications?
What types of large-scale data applications is your
organization running?
60%
Query and reporting
89%
Consolidation of multiple 63% Hadoop is most often
data sources for analysis 71% used for advanced
Custom/production 65% analyses and is more
application 68% likely to be used to
56% analyze unstructured
Data preparation
60% data and for data
69% sandboxing than other
Advanced analyses
47% technologies. It is less
Analysis or indexing 46%
likely to be used for
of unstructured data 32% query and reporting.
Hadoop
Data sandbox/ 44%
Data experimentation 32% Non-Hadoop
Source: Ventana Research Hadoop Information Management Analytics Research
7
©2011, Ventana Research, Inc.
8. Predictive Analytics Still Emerging
Despite its potential, predictive analytics remain a
specialist tool, ranking 10th among BI capabilities with
only 13% using them
Spreadsheets 60%
Business Intelligence 49%
Analytic Databases 41%
Custom-built systems 34%
Data warehouse 28%
Planning and forecasting 26%
Application server 20%
LOB analytics 18%
RDB 14% … yet 80% ranked predictive analytics
Predictive Analytics 13% capabilities as important or very important
Source: Ventana Research Business Analytics Benchmark Research
8
©2012, Ventana Research
9. Forecasting and Marketing are the Most
Common Uses of Predictive Analytics
Forecasting… 72% 24%
Marketing analyses… 70% 22%
Customer service or support… 45% 34%
Product recommendations or offers 43% 22%
Fraud detection 34% 31%
Intelligence or surveillance analysis 28% 28%
Social network analysis 27% 38%
Logistics analysis 26% 27%
Predicting product development … 18% 34%
Predicting prices in the supply chain 17% 36%
Scientific or clinical research 17% 27%
Healthcare decisions 16% 29%
Current
Predicting mechanical failures 9% 33% Future
Other 17% 24%
Source: Ventana Research Predictive Analytics Benchmark Research
9
©2012, Ventana Research
10. Organizations Employ a Variety of Predictive
Analytics Algorithms
Classification and
regression trees /…
69% 25% 6%
Linear Regression 66% 33%
Logistic regression or
other discrete choice…
61% 29% 10%
Association rules 49% 37% 14%
K-nearest neighbors 36% 42% 21%
Neural networks 30% 36% 34%
Box
Jenkins, Autoregressive…
30% 35% 35%
Exponential smoothing /
double exponential…
22% 43% 34%
Naïve Bayes 21% 43% 36%
Support vector machines 20% 23% 57%
Survival analysis 15% 41% 44%
Monte Carlo Simulations 13% 47% 40%
Frequently Occasionally Not at all
Classification and regression trees / decision trees and Linear
Regression are the most popular predictive analytics techniques used.
Source: Ventana Research Predictive Analytics Benchmark Research
10
©2012, Ventana Research
11. Who Designs and Deploys Predictive Analytics?
Data Scientist / Bus. Intelligence / Line-of-
Data Mining Data Warehouse Business
Resources Team Analysts
32% 31% 19%
… but who should be performing these tasks?
Source: Ventana Research Predictive Analytics Benchmark Research
11 Q18
©2012, Ventana Research
12. Who Does the Best Job?
Satisfaction vs. Project Team
Specialized data scientist, statistical
70%
or data mining resources
Line of business analysts 65%
Business intelligence and data
59%
warehouse team
50% 55% 60% 65% 70% 75%
Overall Average
Source: Ventana Research Predictive Analytics Benchmark Research
12
©2012, Ventana Research
13. Real-Time Scoring of New Records
Not at all Regularly
30% 30%
More than half
the organizations
perform real-time
scoring
infrequently or
not at all.
Occasionally
Infrequently 18%
22%
Source: Ventana Research Predictive Analytics Benchmark Research
13 Q26
©2012, Ventana Research
14. Organizations Need More Timely Results
from Predictive Analytics
Satisfaction vs. Use of Real-time Scoring
Regularly 88%
Occasionally 73%
Infrequently
47%
or Not at all
0% 20% 40% 60% 80% 100%
Overall Average
Source: Ventana Research Predictive Analytics Benchmark Research
14
©2012, Ventana Research
15. Frequency of Updating Predictive Models
Don't know Constantly
16% 12%
Hourly
2%
Most organizations Daily
don’t update their 6%
Less often
analytic models than
frequently enough. quarterly Weekly
17% 11%
Nearly four in 10 update
their models quarterly or
less frequently.
Monthly
Quarterly 14%
22%
Source: Ventana Research Predictive Analytics Benchmark Research
15 Q27
©2012, Ventana Research
16. Organizations that Update Models More
Frequently Have Higher Satisfaction
Satisfaction vs. Model Updates
At Least Daily 81%
At least Monthly 74%
Less Frequently 48%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Overall Average
Source: Ventana Research Predictive Analytics Benchmark Research
16
©2011, Ventana Research
17. Most Organizations Are Not Providing
Adequate Support and Training
Training in Predictive analytics
44% 32% 24%
concepts and techniques
Product training 42% 33% 26%
Training in the application of
predictive analytics to business 39% 38% 23%
problems
Specialized consulting resources
31% 39% 31%
(internal or external)
Help desk resources 24% 34% 42%
Adequately Only somewhat adequately Inadequately
Source: Ventana Research Predictive Analytics Benchmark Research
17
©2012, Ventana Research
18. What Types of Training and Support Are
Most Effective?
Satisfaction vs. Training and Support
Training in Predictive analytics
89%
concepts and techniques
Help desk resources 89%
Training in the application of predictive
86%
analytics to business problems
Product training 79%
Specialized consulting resources
77%
(internal or external)
60% 65% 70% 75% 80% 85% 90% 95%
Overall Average
Source: Ventana Research Predictive Analytics Benchmark Research
18
©2012, Ventana Research
19. Data Science
A Practitioner’s Perspective
Mass Technology Leadership Council Panel Discussion
David Menninger, Formerly VP & Research Director, Ventana Research
David.Menninger@emc.com
©2012, Ventana Research
Notas del editor 93% of RDBMs users also use another technology. Q17 What types of large-scale data applications is your organization running? Q106 What technologies does your organization use today to generate analytics? (Select the five most important ) When adequate training and support are provided satisfaction increases. All types have a positive influence, but training in predictive analytics concepts and help desk support seem to have the most positive impact.