SlideShare una empresa de Scribd logo
1 de 29
© Hortonworks Inc. 2013
The Rise of Apache Hadoop…
…and its Role in Enterprise Data Architectures
Himanshu Bari
Sr. Product Manager, Hortonworks
Page 1
© Hortonworks Inc. 2013
Topics
Page 2
Market trends &
emergence of
Hadoop
Hadoop’s role
and future
direction in the
Enterprise
Enterprise
Hadoop use
cases
© Hortonworks Inc. 2013
Big Data & Big Impact
Page 3
Big Data
15x
growth rate of
machine
generated
data by 2020
Source: IDC
Big Impact
20%
Percentage by which
companies leveraging data
will outperform their peers
1.5M Data Savvy managers
needed
Source: Mckinsey
© Hortonworks Inc. 2013
2013: CIOs take note…
Page 4
2013 STATE OF THE CIO SURVEY (Jan 2013)
When it comes to adoption of Big Data:
•  34% of IT executives surveyed classify their organization as late majority
•  25% of IT executives surveyed classify their organization as laggards
© Hortonworks Inc. 2013
What is Apache Hadoop?
Page 5
D D
.... D
C C C
Petabyte scale reliable storage
management (HDFS) on commodity disks
Highly distributed computation framework
( MapReduce)
Apache Hadoop = Open Source Data Management Software
© Hortonworks Inc. 2013
Quick History: Hadoop at Yahoo!
Source: http://developer.yahoo.com/blogs/ydn/posts/2013/02/hadoop-at-yahoo-more-than-ever-before/
Page 6
© Hortonworks Inc. 2013
Topics
Page 7
Market Trends
& Emergence of
Hadoop
Hadoop’s role
and future
direction in the
Enterprise
Enterprise
Hadoop Use
Cases
© Hortonworks Inc. 2013
Current Data ArchitectureAPPLICATIONS	
  DATA	
  SYSTEMS	
  
TRADITIONAL	
  REPOS	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
OPERATIONAL	
  
TOOLS	
  
MANAGE	
  &	
  
MONITOR	
  
Tradi:onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
DEV	
  &	
  DATA	
  
TOOLS	
  
BUILD	
  &	
  
TEST	
  
Business	
  
Analy:cs	
  
Custom	
  
Applica:ons	
  
Packaged	
  
Applica:ons	
  
Page 8
ETL/ELT
© Hortonworks Inc. 2013
Current Data Architecture PressuredAPPLICATIONS	
  DATA	
  SYSTEMS	
  
TRADITIONAL	
  REPOS	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
OPERATIONAL	
  
TOOLS	
  
MANAGE	
  &	
  
MONITOR	
  
Tradi:onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
New	
  Sources	
  	
  
85% data growth	
  
(sen:ment,	
  clickstream,	
  geo,	
  sensor,	
  …)	
  
DEV	
  &	
  DATA	
  
TOOLS	
  
BUILD	
  &	
  
TEST	
  
Business	
  
Analy:cs	
  
Custom	
  
Applica:ons	
  
Packaged	
  
Applica:ons	
  
Page 9
ETL/ELT
© Hortonworks Inc. 2013
Next generation data architectureAPPLICATIONS	
  DATA	
  SYSTEMS	
  
TRADITIONAL	
  REPOS	
  
RDBMS	
   EDW	
   MPP	
  
DATA	
  SOURCES	
  
OLTP,	
  POS	
  
SYSTEMS	
  
OPERATIONAL	
  
TOOLS	
  
MANAGE	
  &	
  
MONITOR	
  
Tradi:onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  
New	
  Sources	
  	
  
85% data growth	
  
(sen:ment,	
  clickstream,	
  geo,	
  sensor,	
  …)	
  
DEV	
  &	
  DATA	
  
TOOLS	
  
BUILD	
  &	
  
TEST	
  
Business	
  
Analy:cs	
  
Custom	
  
Applica:ons	
  
Packaged	
  
Applica:ons	
  
ENTERPRISE	
  
HADOOP	
  
PLATFORM	
  
Page 10
ETL/ELT
© Hortonworks Inc. 2013
New architecture enables schema on read
Page 11
OLD WAY HADOOP WAY
Define table
with
Schema
Load only
table
conforming
data
CHANGE
?
Fight for eternity
Load
COMPLETE
data in
Hadoop
Read data
as you like
CHANGE
?
Just read differently
© Hortonworks Inc. 2013
OS/VM	
   Cloud	
   Appliance	
  
ENTERPRISE	
  HADOOP	
  PLATFORM	
  
Evolution of Enterprise Hadoop
Page 12
HADOOP	
  	
  
CORE	
  
PLATFORM	
  	
  
SERVICES	
  
DATA	
  
SERVICES	
  
S:nger	
  
HIVE	
  &	
  	
  
HCATALOG	
  
PIG	
   HBASE	
  
SQOOP	
  
FLUME	
  
NFS	
  
WebHDFS	
  
HDFS	
  
MAP	
  
REDUCE	
  
YARN	
  	
  	
  
TEZ	
   OTHER	
  
OPERATIONAL	
  
SERVICES	
  
OOZIE	
  
AMBARI	
  
FALCON	
  
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots
KNOX	
  
OpenStack	
  
© Hortonworks Inc. 2013
YARN: General purpose resource
management framework
•  Why is it needed?
–  New ways of data processing graph and
stream processing have different resource
management needs than mapreduce
–  Need to improve scalability & utilization of
the clusters
–  Support multiple versions of mapreduce
•  How does it work?
–  Splits JobTracker responsibilities into a
global resource manager and a per-
application ApplicationMaster
–  Provides an extendible framework HDFS	
  
MapReduce	
  
Redundant, Reliable Storage
YARN:	
  Cluster	
  Resource	
  Management	
  
Tez	
  
Stream	
  Processing	
  
Other	
  
…
Page 13
HADOOP
CORE
© Hortonworks Inc. 2013
Apache Tez (“Speed”): Alternative to
MapReduce
• Why is it needed?
– Widens the platform for Hadoop use cases beyond batch
– Crucial to improving the performance of low-latency applications
• Core idea-
– Create a pool of pre-allocated containers
–  Reuse containers for multiple tasks
Page 14
pluggable
input
Pluggable
Processor
Task
Pluggable
Output
HADOOP
CORE
© Hortonworks Inc. 2013
Stinger: Improve Hive performance
and SQL compliance
Page 15
Improves existing
tools & preserves
investments
Enable Hive to
support interactive
workloads
Stinger Project
Simple Focus
Query
Planner
Hive
Execution
Engine
Tez
= 100X+
New File
Format
ORC file
= SQL Compliance
+
Data
Types
Windowing
&
Subqueries
+
DATA
SERVICES
© Hortonworks Inc. 2013
Falcon: One-stop Shop for Data
Lifecycle management(DLM)
Data Management Needs Tools
Data Processing Oozie
Replication Sqoop
Retention Distcp
Scheduling Flume
Reprocessing Map / Reduce
Multi Cluster Management Hive and Pig Jobs
Falcon provides a single interface to orchestrate data lifecycle.
Sophisticated DLM easily added to Hadoop applications.
OPERATIONAL
SERVICES
Apache Falcon
Provides Orchestrates
© Hortonworks Inc. 2013
Knox: Make Hadoop Security Simple
Simplify Security Aggregate Access Client Agility
Simplify security for both users
and operators.
Deliver unified and centralized
access to the Hadoop cluster to
give a ‘single application’ feel
Ensure service users are
abstracted from where services
are located and how services
are configured & scaled
PLATFORM
SERVICES
Hadoop Cluster
Authentication
&
Verification
Client
User Store
KDC, AD, LDAP
{REST}!
Knox
gateway
cluster
© Hortonworks Inc. 2013
•  OpenStack provides operational agility and deployment choice
•  Hadoop is a net new workload and a perfect app for OpenStack
•  Integration marries two of the Largest Open Source Movements
–  Community-driven innovation outpaces any single vendor
–  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc…
Page 18
Project Savanna to enable Hadoop
on OpenStack
CLOUD
PLATFORM
ENABLEMENT
Project Savanna
Automate deployment of
Apache Hadoop on
OpenStack
© Hortonworks Inc. 2013
Topics
Page 19
Market Trends
& Emergence of
Hadoop
Hadoop’s role
and future
direction in the
Enterprise
Enterprise
Hadoop Use
Cases
© Hortonworks Inc. 2013
Fundamental business drivers the same…
• Better
– Automation
– Transparency
– Segmentation
– Innovation & experimentation
• Faster
– Everything
• Cheaper
– Across the value chain
Page 20
© Hortonworks Inc. 2013
6 Common TYPES OF DATA
1.  Sentiment
Understand how your customers feel about your brand and
products – right now
2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website
3.  Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines
4.  Geographic
Analyze location-based data to manage operations where
they occur
5.  Server Logs
Research logs to diagnose process failures and prevent
security breaches
6.  Text
Understand patterns in text across millions of web pages,
emails, and documents
Value
Page 21
© Hortonworks Inc. 2013
Financial services
• Industry specific drivers for Hadoop
– Increasing compliance regulatory pressure
– Bad guys never stop
– Never ending Macroeconomic volatility
– Cost pressures – more than ever
– Extreme competition
• Common use cases
– Fraud & risk reduction( eg. During new account creation)
– Sentiment based trading strategies
– Improve insurance underwriting based on usage and longer
history
– Data reservoir ( for archival, compliance inquiries etc.)
Page 22
© Hortonworks Inc. 2013
Retail
• Industry specific drivers for Hadoop
– Increasingly ‘value sensitive’ and SMART consumer
– Constant margin pressure
– Emergence of the multi-channel approach
• Common use cases
– 360 degree customer view ( behavior, location, sentiment etc.)
– Micro and dynamic segmentation
– Optimizations – Price, assortment, layout, supply chain
– Seasonal predictions – product styles, labor needs etc.
Page 23
© Hortonworks Inc. 2013
Telcos
• Industry specific drivers for Hadoop
– Infrastructure under stress with rise of smart devices
– 4G/LTE investment needs CapEx but revenue growth largely flat
– Cloud computing changing the game
– Increasing competition with non Telcos
– Sitting on a gold mine of data
• Common use cases
– Understanding customer behavior AND context (eg. location
based) in real-time
– Packaging and selling data
– Call Detail Record (CDR) & extended data record (XDR) analysis
for service quality improvement & capacity planning/optimization
– Customer churn analysis & prevention
Page 24
© Hortonworks Inc. 2013
Healthcare & Pharma
• Industry specific drivers for Hadoop
– Sudden data deluge
– Health data initiative (HDI) by US Govt.
– Digitization of health records and rise of sensor data
– Huge accumulation of R&D data
– Rising healthcare costs
– Payors shift to outcome based payment models with providers as
well as pharmaceutical companies
• Common use cases
– Improved patient outcome tracking
– Optimized patient recruitment for drug trials
– Reduce drug modeling time
– Improve insurance claim validation accuracy
Page 25
© Hortonworks Inc. 2013
Manufacturing
• Industry specific drivers for Hadoop
– Proliferation of sensors
– Globalization of the supply chain
– Ongoing miniaturization of products
• Common use cases
– Failure analysis to perform proactive maintenance
– Improving equipment quality by more frequent sample testing and
rigorous prototype testing
– Supply chain optimization
Page 26
© Hortonworks Inc. 2013
Hadoop Summit
•  June 26-27, 2013- San Jose Convention Cntr
•  Co-hosted by Hortonworks & Yahoo!
•  Theme: Enabling the Next Generation
Enterprise Data Platform
•  90+ Sessions and 7 Tracks:
•  Community Focused Event
–  Sessions selected by a Conference Committee
–  Community Choice allowed public to vote for
sessions they want to see
•  Training classes offered pre event
–  Apache Hadoop Essentials: A Technical
Understanding for Business Users
–  Understanding Microsoft HDInsight and Apache
Hadoop
–  Developing Solutions with Apache Hadoop –
HDFS and MapReduce
–  Applying Data Science using Apache Hadoop
Page 27
hadoopsummit.org
© Hortonworks Inc. 2013
Thank You
Follow us: @hortonworks
Page 28
http://hortonworks.com/products/hortonworks-sandbox/
© Hortonworks Inc. 2012
Similar solution architecture across use
cases
Page
Hadoop
LOAD
SQOOP
FLUME
Web
HDFS
NFS
USE
DB
EDW
MPP
SOURCE
DATA
1
2
3
4
5
BATCH
STREAMING
STORM
Map
Reduce
PIG
INTERACTIVE
HIVE/SQL
ONLINE
HBASE
AMBARI
HCATALOG (table metadata)
PIG(data
processing)
HIVE
(data processing)
compute
&
storage
. . .
. . .
. .
compute
&
storage
.
.
YARN

Más contenido relacionado

La actualidad más candente

Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysiszafarali1981
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 

La actualidad más candente (20)

Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysis
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBig Data Expo 2015 - Hortonworks Common Hadoop Use Cases
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 

Similar a Apache Hadoop and its role in Big Data architecture - Himanshu Bari

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIKognitio
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Michael Hiskey
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 

Similar a Apache Hadoop and its role in Big Data architecture - Himanshu Bari (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BI
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 

Último

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Último (20)

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Apache Hadoop and its role in Big Data architecture - Himanshu Bari

  • 1. © Hortonworks Inc. 2013 The Rise of Apache Hadoop… …and its Role in Enterprise Data Architectures Himanshu Bari Sr. Product Manager, Hortonworks Page 1
  • 2. © Hortonworks Inc. 2013 Topics Page 2 Market trends & emergence of Hadoop Hadoop’s role and future direction in the Enterprise Enterprise Hadoop use cases
  • 3. © Hortonworks Inc. 2013 Big Data & Big Impact Page 3 Big Data 15x growth rate of machine generated data by 2020 Source: IDC Big Impact 20% Percentage by which companies leveraging data will outperform their peers 1.5M Data Savvy managers needed Source: Mckinsey
  • 4. © Hortonworks Inc. 2013 2013: CIOs take note… Page 4 2013 STATE OF THE CIO SURVEY (Jan 2013) When it comes to adoption of Big Data: •  34% of IT executives surveyed classify their organization as late majority •  25% of IT executives surveyed classify their organization as laggards
  • 5. © Hortonworks Inc. 2013 What is Apache Hadoop? Page 5 D D .... D C C C Petabyte scale reliable storage management (HDFS) on commodity disks Highly distributed computation framework ( MapReduce) Apache Hadoop = Open Source Data Management Software
  • 6. © Hortonworks Inc. 2013 Quick History: Hadoop at Yahoo! Source: http://developer.yahoo.com/blogs/ydn/posts/2013/02/hadoop-at-yahoo-more-than-ever-before/ Page 6
  • 7. © Hortonworks Inc. 2013 Topics Page 7 Market Trends & Emergence of Hadoop Hadoop’s role and future direction in the Enterprise Enterprise Hadoop Use Cases
  • 8. © Hortonworks Inc. 2013 Current Data ArchitectureAPPLICATIONS  DATA  SYSTEMS   TRADITIONAL  REPOS   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   OPERATIONAL   TOOLS   MANAGE  &   MONITOR   Tradi:onal  Sources     (RDBMS,  OLTP,  OLAP)   DEV  &  DATA   TOOLS   BUILD  &   TEST   Business   Analy:cs   Custom   Applica:ons   Packaged   Applica:ons   Page 8 ETL/ELT
  • 9. © Hortonworks Inc. 2013 Current Data Architecture PressuredAPPLICATIONS  DATA  SYSTEMS   TRADITIONAL  REPOS   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   OPERATIONAL   TOOLS   MANAGE  &   MONITOR   Tradi:onal  Sources     (RDBMS,  OLTP,  OLAP)   New  Sources     85% data growth   (sen:ment,  clickstream,  geo,  sensor,  …)   DEV  &  DATA   TOOLS   BUILD  &   TEST   Business   Analy:cs   Custom   Applica:ons   Packaged   Applica:ons   Page 9 ETL/ELT
  • 10. © Hortonworks Inc. 2013 Next generation data architectureAPPLICATIONS  DATA  SYSTEMS   TRADITIONAL  REPOS   RDBMS   EDW   MPP   DATA  SOURCES   OLTP,  POS   SYSTEMS   OPERATIONAL   TOOLS   MANAGE  &   MONITOR   Tradi:onal  Sources     (RDBMS,  OLTP,  OLAP)   New  Sources     85% data growth   (sen:ment,  clickstream,  geo,  sensor,  …)   DEV  &  DATA   TOOLS   BUILD  &   TEST   Business   Analy:cs   Custom   Applica:ons   Packaged   Applica:ons   ENTERPRISE   HADOOP   PLATFORM   Page 10 ETL/ELT
  • 11. © Hortonworks Inc. 2013 New architecture enables schema on read Page 11 OLD WAY HADOOP WAY Define table with Schema Load only table conforming data CHANGE ? Fight for eternity Load COMPLETE data in Hadoop Read data as you like CHANGE ? Just read differently
  • 12. © Hortonworks Inc. 2013 OS/VM   Cloud   Appliance   ENTERPRISE  HADOOP  PLATFORM   Evolution of Enterprise Hadoop Page 12 HADOOP     CORE   PLATFORM     SERVICES   DATA   SERVICES   S:nger   HIVE  &     HCATALOG   PIG   HBASE   SQOOP   FLUME   NFS   WebHDFS   HDFS   MAP   REDUCE   YARN       TEZ   OTHER   OPERATIONAL   SERVICES   OOZIE   AMBARI   FALCON   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots KNOX   OpenStack  
  • 13. © Hortonworks Inc. 2013 YARN: General purpose resource management framework •  Why is it needed? –  New ways of data processing graph and stream processing have different resource management needs than mapreduce –  Need to improve scalability & utilization of the clusters –  Support multiple versions of mapreduce •  How does it work? –  Splits JobTracker responsibilities into a global resource manager and a per- application ApplicationMaster –  Provides an extendible framework HDFS   MapReduce   Redundant, Reliable Storage YARN:  Cluster  Resource  Management   Tez   Stream  Processing   Other   … Page 13 HADOOP CORE
  • 14. © Hortonworks Inc. 2013 Apache Tez (“Speed”): Alternative to MapReduce • Why is it needed? – Widens the platform for Hadoop use cases beyond batch – Crucial to improving the performance of low-latency applications • Core idea- – Create a pool of pre-allocated containers –  Reuse containers for multiple tasks Page 14 pluggable input Pluggable Processor Task Pluggable Output HADOOP CORE
  • 15. © Hortonworks Inc. 2013 Stinger: Improve Hive performance and SQL compliance Page 15 Improves existing tools & preserves investments Enable Hive to support interactive workloads Stinger Project Simple Focus Query Planner Hive Execution Engine Tez = 100X+ New File Format ORC file = SQL Compliance + Data Types Windowing & Subqueries + DATA SERVICES
  • 16. © Hortonworks Inc. 2013 Falcon: One-stop Shop for Data Lifecycle management(DLM) Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs Falcon provides a single interface to orchestrate data lifecycle. Sophisticated DLM easily added to Hadoop applications. OPERATIONAL SERVICES Apache Falcon Provides Orchestrates
  • 17. © Hortonworks Inc. 2013 Knox: Make Hadoop Security Simple Simplify Security Aggregate Access Client Agility Simplify security for both users and operators. Deliver unified and centralized access to the Hadoop cluster to give a ‘single application’ feel Ensure service users are abstracted from where services are located and how services are configured & scaled PLATFORM SERVICES Hadoop Cluster Authentication & Verification Client User Store KDC, AD, LDAP {REST}! Knox gateway cluster
  • 18. © Hortonworks Inc. 2013 •  OpenStack provides operational agility and deployment choice •  Hadoop is a net new workload and a perfect app for OpenStack •  Integration marries two of the Largest Open Source Movements –  Community-driven innovation outpaces any single vendor –  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc… Page 18 Project Savanna to enable Hadoop on OpenStack CLOUD PLATFORM ENABLEMENT Project Savanna Automate deployment of Apache Hadoop on OpenStack
  • 19. © Hortonworks Inc. 2013 Topics Page 19 Market Trends & Emergence of Hadoop Hadoop’s role and future direction in the Enterprise Enterprise Hadoop Use Cases
  • 20. © Hortonworks Inc. 2013 Fundamental business drivers the same… • Better – Automation – Transparency – Segmentation – Innovation & experimentation • Faster – Everything • Cheaper – Across the value chain Page 20
  • 21. © Hortonworks Inc. 2013 6 Common TYPES OF DATA 1.  Sentiment Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geographic Analyze location-based data to manage operations where they occur 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Text Understand patterns in text across millions of web pages, emails, and documents Value Page 21
  • 22. © Hortonworks Inc. 2013 Financial services • Industry specific drivers for Hadoop – Increasing compliance regulatory pressure – Bad guys never stop – Never ending Macroeconomic volatility – Cost pressures – more than ever – Extreme competition • Common use cases – Fraud & risk reduction( eg. During new account creation) – Sentiment based trading strategies – Improve insurance underwriting based on usage and longer history – Data reservoir ( for archival, compliance inquiries etc.) Page 22
  • 23. © Hortonworks Inc. 2013 Retail • Industry specific drivers for Hadoop – Increasingly ‘value sensitive’ and SMART consumer – Constant margin pressure – Emergence of the multi-channel approach • Common use cases – 360 degree customer view ( behavior, location, sentiment etc.) – Micro and dynamic segmentation – Optimizations – Price, assortment, layout, supply chain – Seasonal predictions – product styles, labor needs etc. Page 23
  • 24. © Hortonworks Inc. 2013 Telcos • Industry specific drivers for Hadoop – Infrastructure under stress with rise of smart devices – 4G/LTE investment needs CapEx but revenue growth largely flat – Cloud computing changing the game – Increasing competition with non Telcos – Sitting on a gold mine of data • Common use cases – Understanding customer behavior AND context (eg. location based) in real-time – Packaging and selling data – Call Detail Record (CDR) & extended data record (XDR) analysis for service quality improvement & capacity planning/optimization – Customer churn analysis & prevention Page 24
  • 25. © Hortonworks Inc. 2013 Healthcare & Pharma • Industry specific drivers for Hadoop – Sudden data deluge – Health data initiative (HDI) by US Govt. – Digitization of health records and rise of sensor data – Huge accumulation of R&D data – Rising healthcare costs – Payors shift to outcome based payment models with providers as well as pharmaceutical companies • Common use cases – Improved patient outcome tracking – Optimized patient recruitment for drug trials – Reduce drug modeling time – Improve insurance claim validation accuracy Page 25
  • 26. © Hortonworks Inc. 2013 Manufacturing • Industry specific drivers for Hadoop – Proliferation of sensors – Globalization of the supply chain – Ongoing miniaturization of products • Common use cases – Failure analysis to perform proactive maintenance – Improving equipment quality by more frequent sample testing and rigorous prototype testing – Supply chain optimization Page 26
  • 27. © Hortonworks Inc. 2013 Hadoop Summit •  June 26-27, 2013- San Jose Convention Cntr •  Co-hosted by Hortonworks & Yahoo! •  Theme: Enabling the Next Generation Enterprise Data Platform •  90+ Sessions and 7 Tracks: •  Community Focused Event –  Sessions selected by a Conference Committee –  Community Choice allowed public to vote for sessions they want to see •  Training classes offered pre event –  Apache Hadoop Essentials: A Technical Understanding for Business Users –  Understanding Microsoft HDInsight and Apache Hadoop –  Developing Solutions with Apache Hadoop – HDFS and MapReduce –  Applying Data Science using Apache Hadoop Page 27 hadoopsummit.org
  • 28. © Hortonworks Inc. 2013 Thank You Follow us: @hortonworks Page 28 http://hortonworks.com/products/hortonworks-sandbox/
  • 29. © Hortonworks Inc. 2012 Similar solution architecture across use cases Page Hadoop LOAD SQOOP FLUME Web HDFS NFS USE DB EDW MPP SOURCE DATA 1 2 3 4 5 BATCH STREAMING STORM Map Reduce PIG INTERACTIVE HIVE/SQL ONLINE HBASE AMBARI HCATALOG (table metadata) PIG(data processing) HIVE (data processing) compute & storage . . . . . . . . compute & storage . . YARN