SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
Game Changed: How Hadoop is Reinventing Enterprise Thinking 
The Briefing Room
Twitter Tag: #briefr 
The Briefing Room 
Welcome 
Host: 
Eric Kavanagh 
eric.kavanagh@bloorgroup.com 
@eric_kavanagh
! Reveal the essential characteristics of enterprise software, 
good and bad 
! Provide a forum for detailed analysis of today’s innovative 
technologies 
! Give vendors a chance to explain their product to savvy 
analysts 
! Allow audience members to pose serious questions... and get 
answers! 
Twitter Tag: #briefr 
The Briefing Room 
Mission
Twitter Tag: #briefr 
The Briefing Room 
Topics 
This Month: BIG DATA 
May: DATABASE 
June: ANALYTICS & MACHINE LEARNING 
2014 Editorial Calendar at 
www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr 
The Briefing Room 
Big Data
Twitter Tag: #briefr 
The Briefing Room 
Analyst: Robin Bloor 
Robin Bloor is 
Chief Analyst at 
The Bloor Group 
robin.bloor@bloorgroup.com 
@robinbloor
Twitter Tag: #briefr 
The Briefing Room 
RedPoint Global 
! RedPoint Global is a data management and integrated 
marketing technology company 
! Its Convergent Marketing Platform™ offers products 
designed for data management, collaboration and 
architecture integration. 
! RedPoint Data Management for Hadoop is YARN-compliant 
and enables analysts to access and manipulate data directly 
within the Hadoop cluster.
Twitter Tag: #briefr 
The Briefing Room 
Guest: George Corugedo 
George Corugedo is Chief Technology Officer & Co- 
Founder at RedPoint Global Inc. A mathematician 
and seasoned technology executive, George has 
over 20 years of business and technical expertise. 
As co-founder and CTO of RedPoint Global, George 
is responsible for leading the development of the 
RedPoint Convergent Marketing Platform™. A 
former math professor, George left academia to 
co-found Accenture’s Customer Insight Practice, 
which specialized in strategic data utilization, 
analytics and customer strategy. Previous positions 
include director of client delivery at ClarityBlue, 
Inc., a provider of hosted customer intelligence 
solutions to enterprise commercial entities, and 
COO/CIO of Riscuity, a receivables management 
company specializing in the utilization of analytics 
to drive collections.
RedPoint 
Overview 
for 
Bloor 
Group
Overview 
-­‐ 
What 
is 
Hadoop/Hadoop 
2.0 
Lower 
cost 
scaling 
Hadoop 
1.0 
• All 
opera?ons 
based 
on 
Map 
Reduce 
• Intrinsic 
inconsistency 
of 
code 
based 
solu?ons 
• Highly 
skilled 
and 
expensive 
resources 
needed 
• 3rd 
party 
applica?ons 
constrained 
by 
the 
need 
to 
generate 
code 
No need 
for 
structure 
11 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
Ease of 
data 
capture 
Hadoop 
2.0 
• Introduc?on 
of 
the 
YARN: 
“a 
general-­‐purpose, 
distributed, 
applica?on 
management 
framework 
that 
supersedes 
the 
classic 
Apache 
Hadoop 
MapReduce 
framework 
for 
processing 
data 
in 
Hadoop 
clusters.” 
• Mature 
applica?ons 
can 
now 
operate 
directly 
on 
Hadoop 
• Reduce 
skill 
requirements 
and 
increased 
consistency
Overview 
– 
Challenges 
to 
Adop?on 
Skills 
Gap 
• Severe 
shortage 
of 
MR 
skilled 
resources 
• Very 
expensive 
resources 
and 
hard 
to 
retain 
• Inconsistent 
skills 
lead 
to 
inconsistent 
results 
• Under 
u?lizes 
exis?ng 
resources 
• Prevents 
broad 
leverage 
of 
investments 
across 
enterprise 
Maturity 
& 
Governance 
• A 
nascent 
technology 
ecosystem 
around 
Hadoop 
• Emerging 
technologies 
only 
address 
narrow 
slivers 
of 
func?onality 
• New 
applica?ons 
are 
not 
enterprise 
class 
• Legacy 
applica?ons 
have 
built 
short 
term 
capabili?es 
12 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
Data 
Into 
Informa?on 
• Data 
is 
not 
useful 
in 
its 
raw 
state, 
it 
must 
be 
turned 
into 
informa?on 
• Benefit 
of 
Hadoop 
is 
that 
same 
data 
can 
be 
used 
from 
many 
perspec?ves 
• Analysts 
must 
now 
do 
the 
structuring 
of 
the 
data 
based 
on 
intended 
use 
of 
the 
data
How 
RedPoint 
Achieves 
this 
First 
YARN 
compliant 
ETL/data 
quality 
toolset 
on 
the 
market 
– 
brings 
together 
both 
Big 
Data 
and 
tradiGonal 
data 
to 
create 
Big 
InformaGon! 
by 
in: 
• Customer 
or 
Party 
Data 
• Processing 
Speed 
• Match 
Quality 
• Ease 
of 
Use 
RANKED 
#1 
The 
power 
to 
make 
your 
data 
the 
biggest 
asset 
your 
organiza?on 
has 
13 RedPoint 8 Apr©il 2014 Global Inc. Confidential
Key 
features 
of 
RedPoint 
Data 
Management 
ETL 
& 
ELT 
Data 
Quality 
• Profiling, 
reads/writes, 
transforma?ons 
• Single 
project 
for 
all 
jobs 
Master 
Key 
Management 
• Cleanse 
data 
• Parsing, 
correc?on 
• Geo-­‐spa?al 
analysis 
Web 
Services 
Integra?on 
Integra?on 
& 
Matching 
• Grouping 
• Fuzzy 
Process 
Automa?on 
& 
Opera?ons 
14 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
match 
• Create 
keys 
• Track 
changes 
• Maintain 
matches 
over 
?me 
• Consume 
and 
publish 
• HTTP/HTTPS 
protocols 
• XML/JSON/SOAP 
formats 
• Job 
scheduling, 
monitoring, 
no?fica?ons 
• Central 
point 
of 
control 
All 
func(ons 
can 
be 
used 
on 
both 
TRADITIONAL 
and 
BIG 
DATA 
Creates 
clean, 
integrated, 
ac/onable 
data 
– 
quickly, 
reliably 
and 
at 
low 
cost
Spotlight 
on 
RedPoint 
Data 
Management 
for 
Hadoop 
For 
data 
management 
in 
Hadoop: 
WITH 
REDPOINT 
PREVIOUS 
OPTIONS 
15 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
• Easy-­‐to-­‐use 
interface 
• Leverages 
exis?ng 
skills 
• Executes 
in 
Hadoop 
2.0 
(using 
YARN 
architecture) 
• Fast 
– 
no 
MapReduce 
• Can 
combine 
Big 
Data 
with 
tradi?onal 
data 
• Data 
becomes 
ac?onable 
by 
RedPoint 
Interac?on 
the 
only 
pure 
YARN 
data 
management 
pla?orm 
Makes 
Hadoop 
data 
management 
easy, 
fast, 
low-­‐cost. 
Makes 
Big 
Data 
clean, 
integrated, 
usable. 
You 
get 
more 
out 
of 
your 
Big 
Data 
investment. 
Use 
MapReduce 
x complex 
x requires 
new 
skills 
x inefficient 
execu?on 
Move 
data 
out 
of 
Hadoop 
x extra 
?me 
and 
effort 
x extra 
storage 
(expensive) 
x defeats 
the 
purpose 
of 
Hadoop
Data 
Management 
on 
Hadoop 
Par??oning 
AM 
/ 
Tasks 
Execu?on 
AM 
/ 
Tasks 
Data 
I/O 
16 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
Key 
/ 
Split 
Analysis 
Parallel 
Sec?on 
Par??on 
Data 
server 
YARN 
HDFS/MapReduce
Resource 
Manager 
17 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
Launches 
Tasks 
Node 
Manager 
DM 
App 
Master 
DM 
Task 
Node 
Manager 
DM 
Task 
DM 
Task 
Node 
Manager 
DM 
Task 
DM 
Task 
Launches 
DM 
App 
Master 
Data 
Management 
Designer 
DM 
ExecuGon 
Server 
Parallel 
Sec?on 
Running 
DM 
Task 
1 
2 
3 
RedPoint 
DM 
for 
Hadoop: 
Processing 
Flow
The 
Data 
Management 
designer 
18 RedPoint 8 Apr©il 2014 Global Inc. Confidential
DM 
Parallel 
Sec?on 
on 
Hadoop 
19 RedPoint 8 Apr©il 2014 Global Inc. Confidential
DM 
Hadoop 
Sehngs 
20 RedPoint 8 Apr©il 2014 Global Inc. Confidential
Sample 
MapReduce 
(small 
subset 
of 
the 
entire 
code 
which 
totals 
nearly 
150 
lines): 
public 
static 
class 
MapClass 
extends 
Mapper<WordOffset, Text, Text, IntWritable> { 
21 RedPoint 8 Apr©il 2014 Global Inc. Confidential 
RedPoint 
Benchmarks 
– 
Project 
Gutenberg 
Map 
Reduce 
Pig 
private 
final 
static 
String delimiters = 
"',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿"; 
private 
final 
static 
IntWritable one = new 
IntWritable(1); 
private 
Text word = new 
Text(); 
public 
void 
map(WordOffset key, Text value, Context context) 
throws 
IOException, InterruptedException { 
String line = value.toString(); 
StringTokenizer itr = new 
StringTokenizer(line, delimiters); 
while 
(itr.hasMoreTokens()) { 
word.set(itr.nextToken()); 
context.write(word, one); 
} 
} 
} 
Sample 
Pig 
script 
without 
the 
UDF: 
SET 
pig.maxCombinedSplitSize 67108864 
SET 
pig.splitCombination true 
A = LOAD 
'/testdata/pg/*/*/*'; 
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) C = FOREACH B GENERATE UPPER(word) AS 
word; 
D = GROUP 
C BY 
word; 
E = FOREACH D GENERATE COUNT(C) AS 
occurrences, group; 
F = ORDER 
E BY 
occurrences DESC; 
STORE F INTO 
'/user/cleonardi/pg/pig-count'; 
>150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code 
6 hours of development 3 hours of development 15 min. of development 
3 hours runtime 15 minutes runtime 3 minutes runtime 
Extensive optimization 
needed 
User Defined Functions 
required prior to 
running script 
No tuning or 
optimization required
Who 
Should 
Care 
! Companies 
interested 
in 
exploring 
the 
promise 
of 
Big 
Data 
Analy?cs 
and 
need 
an 
easy 
way 
to 
get 
started. 
! Companies 
already 
inves?ng 
heavily 
inves?ng 
in 
Big 
Data 
Analy?cs 
technologies 
but 
are 
stuck 
due 
to 
the 
shortage 
of 
skilled 
resources 
! Large 
organiza?ons 
that 
are 
focused 
on 
“Opera?onal 
Offloading” 
and 
need 
to 
achieve 
it 
cost 
effec?vely 
! Companies 
who 
recognize 
that 
much 
of 
the 
data 
that 
lands 
in 
Hadoop 
is 
external 
to 
the 
organiza?on 
and 
need 
to 
have 
Data 
Quality 
and 
proper 
data 
governance 
applied 
to 
their 
Hadoop 
data. 
22 RedPoint 8 Apr©il 2014 Global Inc. Confidential
Why 
RedPoint 
! Directly 
overcomes 
the 
Hadoop 
skills 
gap 
! Reduced 
TCO 
because 
exis?ng 
resources 
can 
be 
leveraged 
! Increased 
produc?vity 
and 
consistency 
of 
solu?ons 
! Only 
pure 
YARN 
Data 
Quality 
applica?on 
on 
the 
market 
! Delivers 
enterprise 
grade 
data 
quality 
and 
governance 
into 
the 
Hadoop 
cluster 
23 RedPoint 8 Apr©il 2014 Global Inc. Confidential
Twitter Tag: #briefr 
The Briefing Room 
Perceptions & Questions 
Analyst: 
Robin Bloor
Where Is That 
Elephant Going? 
Robin Bloor, Ph.D.
The Key-Value Store is Back! 
u General purpose key-value 
stores used to be 
called ISAM files 
u They were available on 
Mainframes (VSAM) and 
DEC VAX (RMS) and other 
minicomputers 
u But not on Unix or 
Windows or Linux 
u Well now they’re back, 
and they’re scalable 
WHAT DID WE LIKE 
ABOUT THEM?
The Open Source Landscape 
u Hadoop + components 
• The data reservoir 
• The archive store 
• The analytics sandbox 
u Machine Learning 
Algorithms 
• Raw power 
u The R Language 
• Over 1 million users 
These are COMPONENTS 
of a solution
A Process Not an Activity 
u Data Analytics is a multi-disciplinary 
end-to-end 
process 
u Until recently it was a 
walled-garden, but the 
walls were torn down by… 
• Data availability 
• Scalable technology 
• Open source tool 
u Hadoop has a role here
The Evolution of Hadoop 
u There were many 
components before YARN 
and Tez 
u But YARN and Tez have 
changed the picture 
u MapReduce is now an 
option 
u Most likely Hadoop will 
become the default 
scale out file system and 
the OS for data flow
The Hadoop Ecosystem 
u Even though it may 
not seem so, Hadoop 
is in its infancy 
u Hadoop’s popularity 
guarantees its future 
u Its future is also 
guaranteed by its 
commercial 
ecosystem 
u That’s the Open 
Source Way
u Do you see Hadoop as a replacement for the data 
warehouse? 
u Which specific components of the Hadoop 
ecosystem do you always (or nearly always) 
employ? 
u Which other technologies/products do you 
integrate with? 
u How does a RedPoint engagement normally pan 
out?
u What do you see as the natural business 
applications for Hadoop (and its ecosystem)? 
u Do you think there any natural industry specific 
(i.e., vertical) applications? 
u Which companies/technologies do you see as 
competitive with RedPoint
Twitter Tag: #briefr 
The Briefing Room
This Month: BIG DATA 
May: DATABASE 
June: ANALYTICS & MACHINE LEARNING 
www.insideanalysis.com/webcasts/the-briefing-room 
Twitter Tag: #briefr 
The Briefing Room 
Upcoming Topics 
2014 Editorial Calendar at 
www.insideanalysis.com
Twitter Tag: #briefr 
THANK YOU 
for your 
ATTENTION! 
The Briefing Room

Más contenido relacionado

La actualidad más candente

Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
Hortonworks
 
1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...
IBM
 

La actualidad más candente (20)

Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Big data/Hadoop/HANA Basics
Big data/Hadoop/HANA BasicsBig data/Hadoop/HANA Basics
Big data/Hadoop/HANA Basics
 
RESUME_N
RESUME_NRESUME_N
RESUME_N
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysis
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 

Similar a Game Changed – How Hadoop is Reinventing Enterprise Thinking

YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
DataWorks Summit
 

Similar a Game Changed – How Hadoop is Reinventing Enterprise Thinking (20)

Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both Worlds
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Offload, Transform, and Present - the New World of Data Integration
Offload, Transform, and Present - the New World of Data IntegrationOffload, Transform, and Present - the New World of Data Integration
Offload, Transform, and Present - the New World of Data Integration
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Time to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamTime to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going Mainstream
 
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
Revolution Analytics - Presentation at Hortonworks Booth - Strata 2014
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 

Más de Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

Más de Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Game Changed – How Hadoop is Reinventing Enterprise Thinking

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Game Changed: How Hadoop is Reinventing Enterprise Thinking The Briefing Room
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. ! Reveal the essential characteristics of enterprise software, good and bad ! Provide a forum for detailed analysis of today’s innovative technologies ! Give vendors a chance to explain their product to savvy analysts ! Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics This Month: BIG DATA May: DATABASE June: ANALYTICS & MACHINE LEARNING 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
  • 6. Twitter Tag: #briefr The Briefing Room Big Data
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room RedPoint Global ! RedPoint Global is a data management and integrated marketing technology company ! Its Convergent Marketing Platform™ offers products designed for data management, collaboration and architecture integration. ! RedPoint Data Management for Hadoop is YARN-compliant and enables analysts to access and manipulate data directly within the Hadoop cluster.
  • 9. Twitter Tag: #briefr The Briefing Room Guest: George Corugedo George Corugedo is Chief Technology Officer & Co- Founder at RedPoint Global Inc. A mathematician and seasoned technology executive, George has over 20 years of business and technical expertise. As co-founder and CTO of RedPoint Global, George is responsible for leading the development of the RedPoint Convergent Marketing Platform™. A former math professor, George left academia to co-found Accenture’s Customer Insight Practice, which specialized in strategic data utilization, analytics and customer strategy. Previous positions include director of client delivery at ClarityBlue, Inc., a provider of hosted customer intelligence solutions to enterprise commercial entities, and COO/CIO of Riscuity, a receivables management company specializing in the utilization of analytics to drive collections.
  • 10. RedPoint Overview for Bloor Group
  • 11. Overview -­‐ What is Hadoop/Hadoop 2.0 Lower cost scaling Hadoop 1.0 • All opera?ons based on Map Reduce • Intrinsic inconsistency of code based solu?ons • Highly skilled and expensive resources needed • 3rd party applica?ons constrained by the need to generate code No need for structure 11 RedPoint 8 Apr©il 2014 Global Inc. Confidential Ease of data capture Hadoop 2.0 • Introduc?on of the YARN: “a general-­‐purpose, distributed, applica?on management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.” • Mature applica?ons can now operate directly on Hadoop • Reduce skill requirements and increased consistency
  • 12. Overview – Challenges to Adop?on Skills Gap • Severe shortage of MR skilled resources • Very expensive resources and hard to retain • Inconsistent skills lead to inconsistent results • Under u?lizes exis?ng resources • Prevents broad leverage of investments across enterprise Maturity & Governance • A nascent technology ecosystem around Hadoop • Emerging technologies only address narrow slivers of func?onality • New applica?ons are not enterprise class • Legacy applica?ons have built short term capabili?es 12 RedPoint 8 Apr©il 2014 Global Inc. Confidential Data Into Informa?on • Data is not useful in its raw state, it must be turned into informa?on • Benefit of Hadoop is that same data can be used from many perspec?ves • Analysts must now do the structuring of the data based on intended use of the data
  • 13. How RedPoint Achieves this First YARN compliant ETL/data quality toolset on the market – brings together both Big Data and tradiGonal data to create Big InformaGon! by in: • Customer or Party Data • Processing Speed • Match Quality • Ease of Use RANKED #1 The power to make your data the biggest asset your organiza?on has 13 RedPoint 8 Apr©il 2014 Global Inc. Confidential
  • 14. Key features of RedPoint Data Management ETL & ELT Data Quality • Profiling, reads/writes, transforma?ons • Single project for all jobs Master Key Management • Cleanse data • Parsing, correc?on • Geo-­‐spa?al analysis Web Services Integra?on Integra?on & Matching • Grouping • Fuzzy Process Automa?on & Opera?ons 14 RedPoint 8 Apr©il 2014 Global Inc. Confidential match • Create keys • Track changes • Maintain matches over ?me • Consume and publish • HTTP/HTTPS protocols • XML/JSON/SOAP formats • Job scheduling, monitoring, no?fica?ons • Central point of control All func(ons can be used on both TRADITIONAL and BIG DATA Creates clean, integrated, ac/onable data – quickly, reliably and at low cost
  • 15. Spotlight on RedPoint Data Management for Hadoop For data management in Hadoop: WITH REDPOINT PREVIOUS OPTIONS 15 RedPoint 8 Apr©il 2014 Global Inc. Confidential • Easy-­‐to-­‐use interface • Leverages exis?ng skills • Executes in Hadoop 2.0 (using YARN architecture) • Fast – no MapReduce • Can combine Big Data with tradi?onal data • Data becomes ac?onable by RedPoint Interac?on the only pure YARN data management pla?orm Makes Hadoop data management easy, fast, low-­‐cost. Makes Big Data clean, integrated, usable. You get more out of your Big Data investment. Use MapReduce x complex x requires new skills x inefficient execu?on Move data out of Hadoop x extra ?me and effort x extra storage (expensive) x defeats the purpose of Hadoop
  • 16. Data Management on Hadoop Par??oning AM / Tasks Execu?on AM / Tasks Data I/O 16 RedPoint 8 Apr©il 2014 Global Inc. Confidential Key / Split Analysis Parallel Sec?on Par??on Data server YARN HDFS/MapReduce
  • 17. Resource Manager 17 RedPoint 8 Apr©il 2014 Global Inc. Confidential Launches Tasks Node Manager DM App Master DM Task Node Manager DM Task DM Task Node Manager DM Task DM Task Launches DM App Master Data Management Designer DM ExecuGon Server Parallel Sec?on Running DM Task 1 2 3 RedPoint DM for Hadoop: Processing Flow
  • 18. The Data Management designer 18 RedPoint 8 Apr©il 2014 Global Inc. Confidential
  • 19. DM Parallel Sec?on on Hadoop 19 RedPoint 8 Apr©il 2014 Global Inc. Confidential
  • 20. DM Hadoop Sehngs 20 RedPoint 8 Apr©il 2014 Global Inc. Confidential
  • 21. Sample MapReduce (small subset of the entire code which totals nearly 150 lines): public static class MapClass extends Mapper<WordOffset, Text, Text, IntWritable> { 21 RedPoint 8 Apr©il 2014 Global Inc. Confidential RedPoint Benchmarks – Project Gutenberg Map Reduce Pig private final static String delimiters = "',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿"; private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WordOffset key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line, delimiters); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Sample Pig script without the UDF: SET pig.maxCombinedSplitSize 67108864 SET pig.splitCombination true A = LOAD '/testdata/pg/*/*/*'; B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) C = FOREACH B GENERATE UPPER(word) AS word; D = GROUP C BY word; E = FOREACH D GENERATE COUNT(C) AS occurrences, group; F = ORDER E BY occurrences DESC; STORE F INTO '/user/cleonardi/pg/pig-count'; >150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code 6 hours of development 3 hours of development 15 min. of development 3 hours runtime 15 minutes runtime 3 minutes runtime Extensive optimization needed User Defined Functions required prior to running script No tuning or optimization required
  • 22. Who Should Care ! Companies interested in exploring the promise of Big Data Analy?cs and need an easy way to get started. ! Companies already inves?ng heavily inves?ng in Big Data Analy?cs technologies but are stuck due to the shortage of skilled resources ! Large organiza?ons that are focused on “Opera?onal Offloading” and need to achieve it cost effec?vely ! Companies who recognize that much of the data that lands in Hadoop is external to the organiza?on and need to have Data Quality and proper data governance applied to their Hadoop data. 22 RedPoint 8 Apr©il 2014 Global Inc. Confidential
  • 23. Why RedPoint ! Directly overcomes the Hadoop skills gap ! Reduced TCO because exis?ng resources can be leveraged ! Increased produc?vity and consistency of solu?ons ! Only pure YARN Data Quality applica?on on the market ! Delivers enterprise grade data quality and governance into the Hadoop cluster 23 RedPoint 8 Apr©il 2014 Global Inc. Confidential
  • 24. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 25. Where Is That Elephant Going? Robin Bloor, Ph.D.
  • 26. The Key-Value Store is Back! u General purpose key-value stores used to be called ISAM files u They were available on Mainframes (VSAM) and DEC VAX (RMS) and other minicomputers u But not on Unix or Windows or Linux u Well now they’re back, and they’re scalable WHAT DID WE LIKE ABOUT THEM?
  • 27. The Open Source Landscape u Hadoop + components • The data reservoir • The archive store • The analytics sandbox u Machine Learning Algorithms • Raw power u The R Language • Over 1 million users These are COMPONENTS of a solution
  • 28. A Process Not an Activity u Data Analytics is a multi-disciplinary end-to-end process u Until recently it was a walled-garden, but the walls were torn down by… • Data availability • Scalable technology • Open source tool u Hadoop has a role here
  • 29. The Evolution of Hadoop u There were many components before YARN and Tez u But YARN and Tez have changed the picture u MapReduce is now an option u Most likely Hadoop will become the default scale out file system and the OS for data flow
  • 30. The Hadoop Ecosystem u Even though it may not seem so, Hadoop is in its infancy u Hadoop’s popularity guarantees its future u Its future is also guaranteed by its commercial ecosystem u That’s the Open Source Way
  • 31. u Do you see Hadoop as a replacement for the data warehouse? u Which specific components of the Hadoop ecosystem do you always (or nearly always) employ? u Which other technologies/products do you integrate with? u How does a RedPoint engagement normally pan out?
  • 32. u What do you see as the natural business applications for Hadoop (and its ecosystem)? u Do you think there any natural industry specific (i.e., vertical) applications? u Which companies/technologies do you see as competitive with RedPoint
  • 33. Twitter Tag: #briefr The Briefing Room
  • 34. This Month: BIG DATA May: DATABASE June: ANALYTICS & MACHINE LEARNING www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room Upcoming Topics 2014 Editorial Calendar at www.insideanalysis.com
  • 35. Twitter Tag: #briefr THANK YOU for your ATTENTION! The Briefing Room