SlideShare una empresa de Scribd logo
1 de 43
Waiting for Hadoop 
(Apologies to Samuel Beckett) 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written 
permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from 
sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This 
publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may 
include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms 
and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research 
organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity." 
Merv Adrian 
Research Vice President, Information Management 
@merv 
Blogs.gartner.com/merv-adrian
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
What Is "Big Data”? 
"Big data" is high-volume, high-velocity and high-variety 
information assets that demand cost-effective, 
innovative forms of information processing 
for enhanced insight and decision making. 
Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
Let's go. We can't. 
Waiting for Hadoop… 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Why not? Let's wait till 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
we know 
exactly how 
we stand. 
Waiting for Hadoop…
Big Data Plans? 
Many Find Themselves Waiting …
Investments Are on the Rise, 
And Deployments Are Beginning 
11% 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Have invested in 
big data technology 
Plan to within next year 
Plan to within 2 years 
No plans at this time 
Don't know 
31% 
2013 
5% 
15% 19% 
64% 
30% 
N = 720 
Investing or Planning 
27% 
15% 
16% 
31% 
2012 
58% 
N = 473 
Investing or Planning 
Source: Gartner Research Circle Surveys, 2012, 2013
But They Know the Leading Opportunities 
0% 5% 10% 15% 20% 25% 30% 35% 
Monetizing Data 
(Directly/Indirectly) 
Marketing & Sales Growth 
New Products & Services 
Innovation 
Risk & Fraud Detection 
Operational & Financial 
Performance 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 6 
2014 
2013
I wouldn't 
even know 
him if I saw 
him. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Who is he?
So We Search… on Gartner.com, 2nd Highest Term 
1600 
1400 
1000 
800 
600 
400 
200 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Big Data + Hadoop 
Magic Quadrant 
0 
1200 
Januray 
Feb 
March 
January 
Over 
1000 
searches 
per 
month
Starting With What You Need to Do, 
We See Pieces of a Solution… 
Analyze 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Compute 
Persist 
Ingest 
Monitor, 
Administer 
Describe
How to Begin?
It's the start 
that's 
difficult. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
You can start 
from 
anything.
Yes, but you 
have to 
decide. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
The Complexity of Stack Composition Is Rising 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Ingest/Propagate 
Describe, Develop 
Compute, Search 
Persist 
Monitor, Administer 
Analytics, Machine Learning
And Usage Moves - From Pilot to Production 
10% 
15% 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
57% 
14% 
4% 
Piloting on premise 
Piloting in the cloud 
Production on premise with cluster 
Production on premise with 
appliance 
Production in the cloud 
Source: Gartner Webinar n=127
And “Production” Means Growth 
1% 
15% 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
4% None – haven’t started yet 
20 
18% 62% 
Fewer than 10 nodes 
Between 11 and 50 nodes 
Between 51 and 100 nodes 
Over 100 nodes 
Source: Gartner Webinar, April 2014 n=145
What is Your Secondary Processing Mode for Hadoop? 
18% 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
21 
14% 
53% 
6% 
9% 
Stream processing 
Interactive analytics 
Graph applications 
Database Management 
Systems 
Search 
Source: Gartner Webinar, April 2014 n=120
Then all we 
have to do is 
wait on here. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
It’s not 
certain.
No, nothing is 
certain... 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
So, After Batch, What’s Next? 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
YARN Changes the Game – It All Starts Here 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
YARN 
Cluster Resource Management 
HDFS 
Distributed Storage 
SQL 
Interactive 
Streaming 
and events 
DBMSs: 
Graph, 
others 
Batch In-Memory Search
SQL-on-Hadoop Is The Most Typical Addition in 2014 
Which SQL-on-Hadoop Approach are You MOST Likely to Use in 2014? 
9% 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
26 
27% 
23% 
9% 
32% 
Creating your own SQL queries via Hive 
Using a distribution-specific SQL solution 
(e.g., Cloudera Impala, Pivotal HAWQ) 
Using interfaces to HDFS/Hbase from 
analytics tool providers (e.g. Cognos, SAS, 
Tableau) 
Using Hadoop BI specialists (e.g. Platfora, 
Datameer) 
Getting to HDFS/Hbase data from your 
DBMS’ external table capability (e.g. Kognitio 
HDFS Connector, Teradata SQL-H) 
Source: Gartner Webinars 2014 n=164
HBase Is The Default “Hadoop Database,” But Not Alone 
• In every distribution 
• Not just the Valleybase anymore: Bloomberg, Nielsen, others adopt 
• Becoming more secure: cell level is coming 
• But there are alternatives: 
- NOSQL (Accumulo, Apache Cassandra, MongoDB... ) 
- RDBMS on cluster and off 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
27
Let’s go. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
We can’t.
Why not? 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
We’re 
waiting for 
[Hadoop].
Spark Powers Machine Learning, 
Other Iterative Uses in-Memory 
Sp ark 
Unifies batch, streaming, interactive comp. 
Easy to build sophisticated applications 
» Support iterative, graph-parallel algorithms 
» Powerful APIs in Scala, Python, Java 
• In-memory execution engine (richer alternative to 
MapReduce) for multiple reuse of data to support 
• Iterative algorithms (machine learning, graphs) 
• Interactive data mining 
• Directed acyclic graphs, function pipelining, Partition aware 
(minimize shuffle) 
• Used with HDFS, HBase 
• Streaming applications 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
BlinkDB! 
Sophisticated algos. 
Spark! 
Spark! 
Streaming! Shark SQL! 
GraphX! MLlib! 
Streaming 
Batch, 
Interactive 
Batch, 
Interactive Interactive 
Data-parallel, 
Iterative
Storm: Do-It-Yourself Stream Processing 
• Storm processes streams 
• Spouts emit tuples: k/v 
tuples representing 
events 
• Bolts consume tuples and 
pass them through rest of 
topology 
• Logic & topology is up to 
you 
• Apache: Incubating 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Spout 
Spout 
31 
Bolt 
Bolt 
Bolt 
Bolt 
Bolt 
Bolt
Tackling the Limitations of Search 
Finding 
Stuff 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Shifting 
Schemas 
On-the-fly 
Aggregations 
• Iterating over a large number of 
results 
• Doing calculations on field values 
for lots of documents 
• Joining values from multiple 
indexes 
• Does not do complex analytic 
chains well 
• You must precalculate answers 
to facilitate responsiveness 
• If new data changes stored 
answers, you must reindex 
• Indexes are HUGE 
Distributed 
Computing
Hadoop to the Rescue? Maybe… 
• Scalable, reliable, fault-tolerant data processing 
• Very good for batch processing of lots of data 
• Can do very complex analysis 
• Can work on data from multiple records at once 
• But it’s hands-on. Much assembly required. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
...we'll come 
back 
tomorrow. 
And then the 
day after 
tomorrow. 
And so on. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
He should be 
here. And if 
he doesn't 
come?
So Now We Wait… 
For What’s Next. But First…
Securing HDFS – There’s No DBMS There 
Supported 
Distribution 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Access Restriction (Physical and Logical) 
Configuration & Vulnerability Management 
Identity & Access Management 
Network traffic 
encryption 
Audit & Protection 
Data masking 
Tokenization, 
encryption 
36 
Data 
Protection 
Monitoring For Sensitive Data 
Data 
Anonymization 
Admin. Privilege 
Management 
Change 
Management 
Log 
Management 
Operations Hygiene 
HDFS Data
Data lake… …or reservoir? 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
"Big Data Replacing the Data Warehouse?" 
Not a Relevant Notion. It Joins the Warehouse. 
Data warehouses are collections of data — not technology platforms. 
A data warehouse can be made out of anything that manages data. 
The key point is that when we find value, it is indeed managed. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Source MQ DW DBMS Survey, Nov. 2013 and Nov. 2012 
What Are 
Organizations 
Planning for 
Their DWs?
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Managed 
Transformed 
Filtered 
Secured (somewhat) 
Portable 
Potable (fit for consumption) 
A reservoir 
contains 
water that is…
And it’s 
not over. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
Apparently 
not.
It’s only 
beginning. 
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 
The Journey from Pilot… 
to Production… 
to Platform 
Begins here. 
Thank you! 
http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/

Más contenido relacionado

La actualidad más candente

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Cloudera, Inc.
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Big Data Spain
 
"When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! "
"When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! ""When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! "
"When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! "MongoDB
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaEdureka!
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionAdnan Masood
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big dataDr. Wilfred Lin (Ph.D.)
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
 
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios SpagoWorld
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationAmazon Web Services
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947CMR WORLD TECH
 
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesNon-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesJyrki Määttä
 
What is big data
What is big data What is big data
What is big data DeZyre
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaEdureka!
 

La actualidad más candente (20)

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
"When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! "
"When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! ""When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! "
"When a Startup Hits Growth Mode: Scaling from 200GB to 20TB! "
 
Big data primer
Big data primerBig data primer
Big data primer
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
 
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Big data with java
Big data with javaBig data with java
Big data with java
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947
 
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesNon-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
 
What is big data
What is big data What is big data
What is big data
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
 

Similar a Waiting for Hadoop

Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureDataWorks Summit
 
The Revolutionary Impact of the Cloud by Arun Chandrasekaran
The Revolutionary Impact of the Cloud by Arun ChandrasekaranThe Revolutionary Impact of the Cloud by Arun Chandrasekaran
The Revolutionary Impact of the Cloud by Arun ChandrasekaranPoh Lee
 
2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow PresentationFelix Liao
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactCA Technologies
 
As Novas Tecnologias e o Potencial de Inovação em Governo
As Novas Tecnologias e o Potencial de Inovação em GovernoAs Novas Tecnologias e o Potencial de Inovação em Governo
As Novas Tecnologias e o Potencial de Inovação em GovernoinovaDay .
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...AgileNetwork
 
From Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessFrom Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessNG DATA
 
Co-innovation in Action - 2014 SAP Co-innovation Lab Project Highlights
Co-innovation in Action - 2014 SAP Co-innovation Lab Project HighlightsCo-innovation in Action - 2014 SAP Co-innovation Lab Project Highlights
Co-innovation in Action - 2014 SAP Co-innovation Lab Project HighlightsTom Turchioe
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data AnalyticsVMware Tanzu
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
 
Teradata Investor Presentation
Teradata Investor Presentation Teradata Investor Presentation
Teradata Investor Presentation teradata2014
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...DataWorks Summit
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationInside Analysis
 
(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...
(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...
(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...Amazon Web Services
 
How to Use Hybrid Integration Platforms Effectively
How to Use Hybrid Integration Platforms EffectivelyHow to Use Hybrid Integration Platforms Effectively
How to Use Hybrid Integration Platforms EffectivelyMuleSoft
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 

Similar a Waiting for Hadoop (20)

Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the Future
 
The Revolutionary Impact of the Cloud by Arun Chandrasekaran
The Revolutionary Impact of the Cloud by Arun ChandrasekaranThe Revolutionary Impact of the Cloud by Arun Chandrasekaran
The Revolutionary Impact of the Cloud by Arun Chandrasekaran
 
2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation2015 HortonWorks MDA Roadshow Presentation
2015 HortonWorks MDA Roadshow Presentation
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business Impact
 
As Novas Tecnologias e o Potencial de Inovação em Governo
As Novas Tecnologias e o Potencial de Inovação em GovernoAs Novas Tecnologias e o Potencial de Inovação em Governo
As Novas Tecnologias e o Potencial de Inovação em Governo
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
 
From Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessFrom Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your business
 
Co-innovation in Action - 2014 SAP Co-innovation Lab Project Highlights
Co-innovation in Action - 2014 SAP Co-innovation Lab Project HighlightsCo-innovation in Action - 2014 SAP Co-innovation Lab Project Highlights
Co-innovation in Action - 2014 SAP Co-innovation Lab Project Highlights
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data Analytics
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Teradata Investor Presentation
Teradata Investor Presentation Teradata Investor Presentation
Teradata Investor Presentation
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...
(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...
(ENT312) Should You Build or Buy Cloud Infrastructure and Platforms? | AWS re...
 
How to Use Hybrid Integration Platforms Effectively
How to Use Hybrid Integration Platforms EffectivelyHow to Use Hybrid Integration Platforms Effectively
How to Use Hybrid Integration Platforms Effectively
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Waiting for Hadoop

  • 1. Waiting for Hadoop (Apologies to Samuel Beckett) © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity." Merv Adrian Research Vice President, Information Management @merv Blogs.gartner.com/merv-adrian
  • 2. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. What Is "Big Data”? "Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
  • 3. Let's go. We can't. Waiting for Hadoop… © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 4. Why not? Let's wait till © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. we know exactly how we stand. Waiting for Hadoop…
  • 5. Big Data Plans? Many Find Themselves Waiting …
  • 6. Investments Are on the Rise, And Deployments Are Beginning 11% © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Have invested in big data technology Plan to within next year Plan to within 2 years No plans at this time Don't know 31% 2013 5% 15% 19% 64% 30% N = 720 Investing or Planning 27% 15% 16% 31% 2012 58% N = 473 Investing or Planning Source: Gartner Research Circle Surveys, 2012, 2013
  • 7. But They Know the Leading Opportunities 0% 5% 10% 15% 20% 25% 30% 35% Monetizing Data (Directly/Indirectly) Marketing & Sales Growth New Products & Services Innovation Risk & Fraud Detection Operational & Financial Performance © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 6 2014 2013
  • 8. I wouldn't even know him if I saw him. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Who is he?
  • 9. So We Search… on Gartner.com, 2nd Highest Term 1600 1400 1000 800 600 400 200 © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Big Data + Hadoop Magic Quadrant 0 1200 Januray Feb March January Over 1000 searches per month
  • 10. Starting With What You Need to Do, We See Pieces of a Solution… Analyze © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Compute Persist Ingest Monitor, Administer Describe
  • 12. It's the start that's difficult. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. You can start from anything.
  • 13. Yes, but you have to decide. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 14. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 15. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 16. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 17. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 18. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 19. The Complexity of Stack Composition Is Rising © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Ingest/Propagate Describe, Develop Compute, Search Persist Monitor, Administer Analytics, Machine Learning
  • 20. And Usage Moves - From Pilot to Production 10% 15% © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 57% 14% 4% Piloting on premise Piloting in the cloud Production on premise with cluster Production on premise with appliance Production in the cloud Source: Gartner Webinar n=127
  • 21. And “Production” Means Growth 1% 15% © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 4% None – haven’t started yet 20 18% 62% Fewer than 10 nodes Between 11 and 50 nodes Between 51 and 100 nodes Over 100 nodes Source: Gartner Webinar, April 2014 n=145
  • 22. What is Your Secondary Processing Mode for Hadoop? 18% © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 21 14% 53% 6% 9% Stream processing Interactive analytics Graph applications Database Management Systems Search Source: Gartner Webinar, April 2014 n=120
  • 23. Then all we have to do is wait on here. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. It’s not certain.
  • 24. No, nothing is certain... © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 25. So, After Batch, What’s Next? © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 26. YARN Changes the Game – It All Starts Here © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. YARN Cluster Resource Management HDFS Distributed Storage SQL Interactive Streaming and events DBMSs: Graph, others Batch In-Memory Search
  • 27. SQL-on-Hadoop Is The Most Typical Addition in 2014 Which SQL-on-Hadoop Approach are You MOST Likely to Use in 2014? 9% © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 26 27% 23% 9% 32% Creating your own SQL queries via Hive Using a distribution-specific SQL solution (e.g., Cloudera Impala, Pivotal HAWQ) Using interfaces to HDFS/Hbase from analytics tool providers (e.g. Cognos, SAS, Tableau) Using Hadoop BI specialists (e.g. Platfora, Datameer) Getting to HDFS/Hbase data from your DBMS’ external table capability (e.g. Kognitio HDFS Connector, Teradata SQL-H) Source: Gartner Webinars 2014 n=164
  • 28. HBase Is The Default “Hadoop Database,” But Not Alone • In every distribution • Not just the Valleybase anymore: Bloomberg, Nielsen, others adopt • Becoming more secure: cell level is coming • But there are alternatives: - NOSQL (Accumulo, Apache Cassandra, MongoDB... ) - RDBMS on cluster and off © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 27
  • 29. Let’s go. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. We can’t.
  • 30. Why not? © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. We’re waiting for [Hadoop].
  • 31. Spark Powers Machine Learning, Other Iterative Uses in-Memory Sp ark Unifies batch, streaming, interactive comp. Easy to build sophisticated applications » Support iterative, graph-parallel algorithms » Powerful APIs in Scala, Python, Java • In-memory execution engine (richer alternative to MapReduce) for multiple reuse of data to support • Iterative algorithms (machine learning, graphs) • Interactive data mining • Directed acyclic graphs, function pipelining, Partition aware (minimize shuffle) • Used with HDFS, HBase • Streaming applications © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. BlinkDB! Sophisticated algos. Spark! Spark! Streaming! Shark SQL! GraphX! MLlib! Streaming Batch, Interactive Batch, Interactive Interactive Data-parallel, Iterative
  • 32. Storm: Do-It-Yourself Stream Processing • Storm processes streams • Spouts emit tuples: k/v tuples representing events • Bolts consume tuples and pass them through rest of topology • Logic & topology is up to you • Apache: Incubating © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Spout Spout 31 Bolt Bolt Bolt Bolt Bolt Bolt
  • 33. Tackling the Limitations of Search Finding Stuff © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Shifting Schemas On-the-fly Aggregations • Iterating over a large number of results • Doing calculations on field values for lots of documents • Joining values from multiple indexes • Does not do complex analytic chains well • You must precalculate answers to facilitate responsiveness • If new data changes stored answers, you must reindex • Indexes are HUGE Distributed Computing
  • 34. Hadoop to the Rescue? Maybe… • Scalable, reliable, fault-tolerant data processing • Very good for batch processing of lots of data • Can do very complex analysis • Can work on data from multiple records at once • But it’s hands-on. Much assembly required. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 35. ...we'll come back tomorrow. And then the day after tomorrow. And so on. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. He should be here. And if he doesn't come?
  • 36. So Now We Wait… For What’s Next. But First…
  • 37. Securing HDFS – There’s No DBMS There Supported Distribution © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Access Restriction (Physical and Logical) Configuration & Vulnerability Management Identity & Access Management Network traffic encryption Audit & Protection Data masking Tokenization, encryption 36 Data Protection Monitoring For Sensitive Data Data Anonymization Admin. Privilege Management Change Management Log Management Operations Hygiene HDFS Data
  • 38. Data lake… …or reservoir? © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 39. "Big Data Replacing the Data Warehouse?" Not a Relevant Notion. It Joins the Warehouse. Data warehouses are collections of data — not technology platforms. A data warehouse can be made out of anything that manages data. The key point is that when we find value, it is indeed managed. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Source MQ DW DBMS Survey, Nov. 2013 and Nov. 2012 What Are Organizations Planning for Their DWs?
  • 40. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Managed Transformed Filtered Secured (somewhat) Portable Potable (fit for consumption) A reservoir contains water that is…
  • 41. And it’s not over. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Apparently not.
  • 42. It’s only beginning. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
  • 43. © 2014 Gartner, Inc. and/or its affiliates. All rights reserved. The Journey from Pilot… to Production… to Platform Begins here. Thank you! http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/