SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Page 1 © Hortonworks Inc. 2014June 2014
We do Hadoop.
Data Lake for the Cloud
…Extending your Hadoop Implementation
Page 2 © Hortonworks Inc. 2014
Your speakers…
John O’Brien
Principal Analyst and CEO
Radiant Advisors
Bob Page
VP Partner Product Management
Hortonworks
Matt Winkler
Principal Program Manager
Microsoft
Page 3 © Hortonworks Inc. 2014
Poll: Where are you on your Hadoop Journey?
•  Researching our options
•  Currently Evaluating
•  Deep in a trial
•  What’s Hadoop?
Page 4 © Hortonworks Inc. 2014
Trends and drivers…
John O’Brien
Principal Analyst and CEO
Radiant Advisors
Page 5 © Hortonworks Inc. 2014
Leading Business Drivers and Trends
1.  Scale down operational infrastructure management costs
•  General evaluation for all on-premises to private/public/hybrid cloud
•  Hadoop does not fit IT efficiency through economies of scale and standards
2.  Centralize Hadoop data management
•  Resolve costly data movement, duplication and latency between data centers
•  Cloud Data Lake Strategy for shared access across geographic regions
3.  Moving data store closer to data sources and Users
•  Performance and costs (Internet/VPN, LAN Ethernet, InfiniBand)
•  Data sources are increasingly external to the company
4.  Ecosystem of strategic IT relationships
“Our sister
organization just
signed a great deal
with Microsoft Azure
and we want to
leverage shared
services.”
Page 6 © Hortonworks Inc. 2014
Technical Drivers for Hadoop in the Cloud
1.  Elasticity – setting nominal resources and handling load volatility
2.  Flexibility – managing base workloads and handling others
3.  Scalability – can on-premises handle scalable requirements
4.  Security – requirements dictate from Hadoop apps to networking
5.  Proximity – distance data travels impacts cost and performance
6.  Functionality – not all distributions are equal (Hive, HBase versions)
7.  Usability – Internal existing skillsets with OS and scripting
8.  Manageability – monitoring cloud and hybrid easily
Reference: Microsoft Big Data Solutions. Wiley 2014. Adam Jorgensen, James
Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell.
Page 7 © Hortonworks Inc. 2014
Hadoop Operating Models and Maturity
1.  On-Premises Hadoop Clusters
•  Predefined balanced configurations with internal connectivity
•  May leverage private cloud architecture for elasticity
2.  Cloud-based Hadoop Clusters and Storage
•  Always-on Infrastructure-as-a-Service (IaaS) pricing model and workload
•  On-demand Platform-as-a-Service (PaaS) pricing model and workloads
3.  Hybrid Hadoop Architectures
•  Affordable storage and access to second class data
•  Separation of production Analytic Applications from temporary activities
•  Enabling on-premises clusters to efficiently meet the demands of volatility
Page 8 © Hortonworks Inc. 2014
Hybrid Cloud Architecture Driver #1
Driver: Lower cost through optimized data platform
•  Lower cost storage for lower value data needs (lower SLA)
•  Regulatory requirements of historical data
Online Transparent Archive:
•  Data policy driven by time, status and read-only state
•  10/90 or 10/100 data architecture to simplify data management
Online Backup and Business Continuity:
•  Hadoop has good fault tolerance built-in with multiple data copies
•  “Clusters” are single location oriented and not disaster recovery
Page 9 © Hortonworks Inc. 2014
Hybrid Cloud Architecture Driver #2
Driver: Flexibility for on-demand and temporary needs
•  Workload and cluster management (Prioritize jobs)
•  Separate Production from Dev/Test and Discovery (mindset)
Discovery Sandboxes:
•  Load external data to cloud for evaluation is easier than into the
data centers (network load, storage, security)
Proof of Concepts:
•  Verifying new technologies and analytic apps on smaller subset
•  Beyond exploring new data (not evaluation of Hadoop distribution)
Separating environment for Analytic Applications:
•  Ensuring SLA-driven operational applications from discovery
Page 10 © Hortonworks Inc. 2014
Hybrid Cloud Architecture Driver #3
Driver: Need for temporary elasticity
•  On-premises clusters typically configured for nominal
•  Volatility requires on-demand temporary resources
Bursting:
•  Setting and managing ongoing nominal workloads with expected
volatility in data volumes (threshold)
Surging:
•  Maintaining performance levels during surging event data volumes
or surging user activity (dynamic)
Electric Grids
maintain the
balance of
dynamic energy
generation with
dynamic demand.
Page 11 © Hortonworks Inc. 2014
Dig Deeper Considerations
1.  Network Connectivity between corporate data centers and cloud
locations are often taken for granted where configuration stability
and latency have become obstacles.
2.  Unified Data Access can become an issue when federated access
involves extracting data out rather than pushing workloads into
Hadoop clusters.
3.  Hybrid Cloud Architectures vary for IaaS and PaaS implementations
of Hadoop. Understand the drivers for either Always-on IaaS or
On-Demand PaaS first then adjust the hybrid architecture.
Page 12 © Hortonworks Inc. 2014
Key Takeaways
1.  Hadoop with the Cloud is driven by a set of business drivers and
then feasibility assessments for an increasing number of use cases,
architecture patterns and balance.
2.  Understand the different value propositions for Hadoop in the
Cloud with both IaaS and PaaS architectures as Cloud elasticity
comes in various forms.
3.  Strategic relationships play a significant roll in determining Cloud
and Hybrid-Cloud Hadoop architectures.
Page 13 © Hortonworks Inc. 2014
Data Lake for the Cloud…
Bob Page
VP Partner Product Management
Hortonworks
Page 14 © Hortonworks Inc. 2014
Hadoop Deployments Start Small
SCALE
SCOPE
New Analytic Apps
New types of data
LOB-driven
Page 15 © Hortonworks Inc. 2014
And Then Grow Into Data LakesSCALE
SCOPE
A Modern Data Architecture/Data Lake
	
  
New Analytic Apps
New types of data
LOB-driven
RDBMS
MPP
EDW
Governance
&Integration
Security
Operations
Data Access
Data
Management
Data Lake
An architectural shift in
the data center that
uses Hadoop to deliver
deeper insight across
a large, broad, diverse
set of data at efficient
scale.
Supporting multiple
applications and
workloads.
Page 16 © Hortonworks Inc. 2014
Example Applications on the Data Lake
$
•  New Account Risk Screens
•  Fraud Prevention
•  Trading Risk
•  Maximize Deposit Spread
•  Insurance Underwriting
•  Accelerate Loan Processing
•  Call Detail Records (CDRs)
•  Infrastructure Investment
•  Next Product to Buy (NPTB)
•  Real-time Bandwidth Allocation
•  New Product Development
•  360° View of the Customer
•  Analyze Brand Sentiment
•  Localized, Personalized
Promotions
•  Website Optimization
•  Optimal Store Layout
Financial
Services
Retail Telecom
Healthcare
Utilities, Oil &
Gas
Public Sector
•  Genomic data for medical trials
•  Monitor patient vitals
•  Reduce re-admittance rates
•  Store medical research data
•  Recruit cohorts for pharmaceutical
trials
•  Smart meter stream analysis
•  Slow oil well decline curves
•  Optimize lease bidding
•  Compliance reporting
•  Proactive equipment repair
•  Seismic image processing
•  Analyze public sentiment
•  Protect critical networks
•  Prevent fraud and waste
•  Crowdsource reporting for repairs
to infrastructure
•  Fulfill open records requests
•  Supplier Consolidation
•  Supply Chain and
Logistics
•  Assembly Line Quality
Assurance
•  Proactive Maintenance
•  Crowdsourced Quality
Assurance
Manufacturing
Page 17 © Hortonworks Inc. 2014
Efficient Data Lakes can Span to the Cloud
On-Premises Cloud
HDP on Windows
HDP on Linux
Your deployment of Hadoop
hosted as a VM in Azure
HDP on Windows
HDP on Linux
Full control of HW and
software configs
Analytics Platform
System
Turnkey Hadoop and
relational warehouse
appliance
HDInsight
Managed Hadoop Service
Built on Azure storage
Enjoy cross-platform interoperability based on 100% open source HDP
1 2
3 4
Page 18 © Hortonworks Inc. 2014
…and Provide On-Premises and Cloud
Interoperability
Deployment choice: run the same
apps in the environment of your
choice
Consistent management story
Co-locate Hadoop processing next
to your apps, deployed on-premises
or in the cloud
Leverage Azure for cloud hosting,
Hadoop as a service, or as a
destination for backup
On-­‐premises	
  or	
  
	
  “private	
  cloud”	
  
Microso6	
  Analy9cs	
  
Pla;orm	
  System	
  
Opera9onal	
  
Tools	
  
Microso6	
  Azure	
  
Microsoft Applications
Azure Storage
Azure HDInsight
Page 19 © Hortonworks Inc. 2014
Hybrid Hadoop Scenarios
Key Considerations:
•  Deployment Choice
–  Linux, Windows
–  On-Premises, Cloud, Hybrid
•  “Tethered” Clusters
–  Compatible services
–  An explicit “connection”
•  Synchronized
Datasets
–  Efficient sharing & access
–  Governance & lineage
Develop/POC
Bursting
Backup/Archive
Production
Learn
Page 20 © Hortonworks Inc. 2014
Hybrid Hadoop Scenarios:
Cloud Backup and Archive
Azure blob storage as low cost,
offsite backup
§  Run HDP and HDInsight to power
analytics on your data in the cloud
Automated data upload & backup
•  Use Falcon to schedule data load rules,
push data based on business needs
Global aggregation
§  Capture data centers around the world
§  Run Hadoop local to a DC, or aggregate
across DC’s to query the entire dataset
Seamless transfer to other storage
§  Leverage Azure SQL DB & Azure storage
as sources or destinations data
On-­‐premises	
  or	
  
	
  “private	
  cloud”	
  
Microso6	
  Analy9cs	
  
Pla;orm	
  System	
  
Microso6	
  Azure	
  
Azure Storage
Azure HDInsight
Page 21 © Hortonworks Inc. 2014
Hybrid Hadoop Scenarios:
App Development/POC
Develop new apps on 100%
interoperable infrastructure
•  Develop & test without pre-committing to
on-prem or cloud deployment
Create new development & test
environments on demand
•  Do development with predictable costs
De-risk application development
•  Protect production data & SLA workloads
from new dev errors and load spikes
Experiment with new types of data
to create new apps
•  Defer decisions on data value and
integration with the Data Lake
On-­‐premises	
  or	
  
	
  “private	
  cloud”	
  
Microso6	
  Analy9cs	
  
Pla;orm	
  System	
  
Microso6	
  Azure	
  
Azure Storage
Azure HDInsight
…
Page 22 © Hortonworks Inc. 2014
Hybrid Hadoop Scenarios:
Bursting
Handle peak workloads in 100%
interoperable environments
§  Run HDP and HDInsight to power analytics on
your data in the cloud
§  Runs the same application code
Make additional capacity available
by separating jobs, e.g.
•  Ad hoc from scheduled
•  analytics from reporting
•  recent data from archived data
•  ETL from aggregation
•  SLA from non-SLA
•  departmental
•  by priorities
On-­‐premises	
  or	
  
	
  “private	
  cloud”	
  
Microso6	
  Analy9cs	
  
Pla;orm	
  System	
  
Microso6	
  Azure	
  
Azure Storage
Azure HDInsight
…
Page 23 © Hortonworks Inc. 2014
Demo
Matt Winkler
Principal Program Manager
Microsoft
Page 24 © Hortonworks Inc. 2014
Story line: Leveraging Falcon to enable data
movement to the cloud
Microsoft
Azure
Azure
Storage
HDInsight
Hadoop cluster
deployed to IaaS
On-Premises
Hadoop Cluster (HDP 2.1)
Running on CentOS
HDFS
YARN
Tez
Hive
MR
Falcon
•  Leveraging Falcon to seamlessly move data to the cloud
•  Leveraging HDInsight to create a cluster on demand to process the same data
with the same job
Page 25 © Hortonworks Inc. 2014
Wait for it….Wait for it…
Page 26 © Hortonworks Inc. 2014
Demo wrap-up…
Why Cloud?
•  Elasticity
•  Cost Optimization
•  Economic flexibility
•  Support for bursting workloads
•  Global footprint
Why on Premises?
•  Compliance requirements
•  Specific control over hardware/networking
•  Integration requirements for additional apps to be close to cluster
Why Both?
•  Offsite backup
•  Dev/Test
•  Burst to Cloud
Page 27 © Hortonworks Inc. 2014
Next steps…
Industry leading Hadoop Sandbox
§  Free download
§  Personal, portable Hadoop environment
Included Tutorials for Microsoft
§  How to Use Excel 2013 to Access Hadoop Data
§  How to Use Excel 2013 to Analyze Hadoop Data
§  How to Install and Configure the Hortonworks ODBC
driver on Windows 7
Try Hadoop in the Cloud
•  Up and running in minutes
•  Spin up without hardware
Free Trial: www.windowsazure.com/bigdata
hortonworks.com/sandbox
Page 28 © Hortonworks Inc. 2014
Thank you
Time for Q&A

Más contenido relacionado

La actualidad más candente

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 

La actualidad más candente (20)

Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
The Destiny of Data
The Destiny of DataThe Destiny of Data
The Destiny of Data
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 

Destacado

A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 

Destacado (20)

Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Construindo um Data Lake na AWS
Construindo um Data Lake na AWSConstruindo um Data Lake na AWS
Construindo um Data Lake na AWS
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWS
 
Transforming Insurance Operations through Data and Analytics
Transforming Insurance Operations through Data and AnalyticsTransforming Insurance Operations through Data and Analytics
Transforming Insurance Operations through Data and Analytics
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 

Similar a Data Lake for the Cloud: Extending your Hadoop Implementation

J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pmJ ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
Jathin Ullal
 

Similar a Data Lake for the Cloud: Extending your Hadoop Implementation (20)

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pmJ ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 

Más de Hortonworks

Más de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Data Lake for the Cloud: Extending your Hadoop Implementation

  • 1. Page 1 © Hortonworks Inc. 2014June 2014 We do Hadoop. Data Lake for the Cloud …Extending your Hadoop Implementation
  • 2. Page 2 © Hortonworks Inc. 2014 Your speakers… John O’Brien Principal Analyst and CEO Radiant Advisors Bob Page VP Partner Product Management Hortonworks Matt Winkler Principal Program Manager Microsoft
  • 3. Page 3 © Hortonworks Inc. 2014 Poll: Where are you on your Hadoop Journey? •  Researching our options •  Currently Evaluating •  Deep in a trial •  What’s Hadoop?
  • 4. Page 4 © Hortonworks Inc. 2014 Trends and drivers… John O’Brien Principal Analyst and CEO Radiant Advisors
  • 5. Page 5 © Hortonworks Inc. 2014 Leading Business Drivers and Trends 1.  Scale down operational infrastructure management costs •  General evaluation for all on-premises to private/public/hybrid cloud •  Hadoop does not fit IT efficiency through economies of scale and standards 2.  Centralize Hadoop data management •  Resolve costly data movement, duplication and latency between data centers •  Cloud Data Lake Strategy for shared access across geographic regions 3.  Moving data store closer to data sources and Users •  Performance and costs (Internet/VPN, LAN Ethernet, InfiniBand) •  Data sources are increasingly external to the company 4.  Ecosystem of strategic IT relationships “Our sister organization just signed a great deal with Microsoft Azure and we want to leverage shared services.”
  • 6. Page 6 © Hortonworks Inc. 2014 Technical Drivers for Hadoop in the Cloud 1.  Elasticity – setting nominal resources and handling load volatility 2.  Flexibility – managing base workloads and handling others 3.  Scalability – can on-premises handle scalable requirements 4.  Security – requirements dictate from Hadoop apps to networking 5.  Proximity – distance data travels impacts cost and performance 6.  Functionality – not all distributions are equal (Hive, HBase versions) 7.  Usability – Internal existing skillsets with OS and scripting 8.  Manageability – monitoring cloud and hybrid easily Reference: Microsoft Big Data Solutions. Wiley 2014. Adam Jorgensen, James Rowland-Jones, John Welch, Dan Clark, Christopher Price, Brian Mitchell.
  • 7. Page 7 © Hortonworks Inc. 2014 Hadoop Operating Models and Maturity 1.  On-Premises Hadoop Clusters •  Predefined balanced configurations with internal connectivity •  May leverage private cloud architecture for elasticity 2.  Cloud-based Hadoop Clusters and Storage •  Always-on Infrastructure-as-a-Service (IaaS) pricing model and workload •  On-demand Platform-as-a-Service (PaaS) pricing model and workloads 3.  Hybrid Hadoop Architectures •  Affordable storage and access to second class data •  Separation of production Analytic Applications from temporary activities •  Enabling on-premises clusters to efficiently meet the demands of volatility
  • 8. Page 8 © Hortonworks Inc. 2014 Hybrid Cloud Architecture Driver #1 Driver: Lower cost through optimized data platform •  Lower cost storage for lower value data needs (lower SLA) •  Regulatory requirements of historical data Online Transparent Archive: •  Data policy driven by time, status and read-only state •  10/90 or 10/100 data architecture to simplify data management Online Backup and Business Continuity: •  Hadoop has good fault tolerance built-in with multiple data copies •  “Clusters” are single location oriented and not disaster recovery
  • 9. Page 9 © Hortonworks Inc. 2014 Hybrid Cloud Architecture Driver #2 Driver: Flexibility for on-demand and temporary needs •  Workload and cluster management (Prioritize jobs) •  Separate Production from Dev/Test and Discovery (mindset) Discovery Sandboxes: •  Load external data to cloud for evaluation is easier than into the data centers (network load, storage, security) Proof of Concepts: •  Verifying new technologies and analytic apps on smaller subset •  Beyond exploring new data (not evaluation of Hadoop distribution) Separating environment for Analytic Applications: •  Ensuring SLA-driven operational applications from discovery
  • 10. Page 10 © Hortonworks Inc. 2014 Hybrid Cloud Architecture Driver #3 Driver: Need for temporary elasticity •  On-premises clusters typically configured for nominal •  Volatility requires on-demand temporary resources Bursting: •  Setting and managing ongoing nominal workloads with expected volatility in data volumes (threshold) Surging: •  Maintaining performance levels during surging event data volumes or surging user activity (dynamic) Electric Grids maintain the balance of dynamic energy generation with dynamic demand.
  • 11. Page 11 © Hortonworks Inc. 2014 Dig Deeper Considerations 1.  Network Connectivity between corporate data centers and cloud locations are often taken for granted where configuration stability and latency have become obstacles. 2.  Unified Data Access can become an issue when federated access involves extracting data out rather than pushing workloads into Hadoop clusters. 3.  Hybrid Cloud Architectures vary for IaaS and PaaS implementations of Hadoop. Understand the drivers for either Always-on IaaS or On-Demand PaaS first then adjust the hybrid architecture.
  • 12. Page 12 © Hortonworks Inc. 2014 Key Takeaways 1.  Hadoop with the Cloud is driven by a set of business drivers and then feasibility assessments for an increasing number of use cases, architecture patterns and balance. 2.  Understand the different value propositions for Hadoop in the Cloud with both IaaS and PaaS architectures as Cloud elasticity comes in various forms. 3.  Strategic relationships play a significant roll in determining Cloud and Hybrid-Cloud Hadoop architectures.
  • 13. Page 13 © Hortonworks Inc. 2014 Data Lake for the Cloud… Bob Page VP Partner Product Management Hortonworks
  • 14. Page 14 © Hortonworks Inc. 2014 Hadoop Deployments Start Small SCALE SCOPE New Analytic Apps New types of data LOB-driven
  • 15. Page 15 © Hortonworks Inc. 2014 And Then Grow Into Data LakesSCALE SCOPE A Modern Data Architecture/Data Lake   New Analytic Apps New types of data LOB-driven RDBMS MPP EDW Governance &Integration Security Operations Data Access Data Management Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale. Supporting multiple applications and workloads.
  • 16. Page 16 © Hortonworks Inc. 2014 Example Applications on the Data Lake $ •  New Account Risk Screens •  Fraud Prevention •  Trading Risk •  Maximize Deposit Spread •  Insurance Underwriting •  Accelerate Loan Processing •  Call Detail Records (CDRs) •  Infrastructure Investment •  Next Product to Buy (NPTB) •  Real-time Bandwidth Allocation •  New Product Development •  360° View of the Customer •  Analyze Brand Sentiment •  Localized, Personalized Promotions •  Website Optimization •  Optimal Store Layout Financial Services Retail Telecom Healthcare Utilities, Oil & Gas Public Sector •  Genomic data for medical trials •  Monitor patient vitals •  Reduce re-admittance rates •  Store medical research data •  Recruit cohorts for pharmaceutical trials •  Smart meter stream analysis •  Slow oil well decline curves •  Optimize lease bidding •  Compliance reporting •  Proactive equipment repair •  Seismic image processing •  Analyze public sentiment •  Protect critical networks •  Prevent fraud and waste •  Crowdsource reporting for repairs to infrastructure •  Fulfill open records requests •  Supplier Consolidation •  Supply Chain and Logistics •  Assembly Line Quality Assurance •  Proactive Maintenance •  Crowdsourced Quality Assurance Manufacturing
  • 17. Page 17 © Hortonworks Inc. 2014 Efficient Data Lakes can Span to the Cloud On-Premises Cloud HDP on Windows HDP on Linux Your deployment of Hadoop hosted as a VM in Azure HDP on Windows HDP on Linux Full control of HW and software configs Analytics Platform System Turnkey Hadoop and relational warehouse appliance HDInsight Managed Hadoop Service Built on Azure storage Enjoy cross-platform interoperability based on 100% open source HDP 1 2 3 4
  • 18. Page 18 © Hortonworks Inc. 2014 …and Provide On-Premises and Cloud Interoperability Deployment choice: run the same apps in the environment of your choice Consistent management story Co-locate Hadoop processing next to your apps, deployed on-premises or in the cloud Leverage Azure for cloud hosting, Hadoop as a service, or as a destination for backup On-­‐premises  or    “private  cloud”   Microso6  Analy9cs   Pla;orm  System   Opera9onal   Tools   Microso6  Azure   Microsoft Applications Azure Storage Azure HDInsight
  • 19. Page 19 © Hortonworks Inc. 2014 Hybrid Hadoop Scenarios Key Considerations: •  Deployment Choice –  Linux, Windows –  On-Premises, Cloud, Hybrid •  “Tethered” Clusters –  Compatible services –  An explicit “connection” •  Synchronized Datasets –  Efficient sharing & access –  Governance & lineage Develop/POC Bursting Backup/Archive Production Learn
  • 20. Page 20 © Hortonworks Inc. 2014 Hybrid Hadoop Scenarios: Cloud Backup and Archive Azure blob storage as low cost, offsite backup §  Run HDP and HDInsight to power analytics on your data in the cloud Automated data upload & backup •  Use Falcon to schedule data load rules, push data based on business needs Global aggregation §  Capture data centers around the world §  Run Hadoop local to a DC, or aggregate across DC’s to query the entire dataset Seamless transfer to other storage §  Leverage Azure SQL DB & Azure storage as sources or destinations data On-­‐premises  or    “private  cloud”   Microso6  Analy9cs   Pla;orm  System   Microso6  Azure   Azure Storage Azure HDInsight
  • 21. Page 21 © Hortonworks Inc. 2014 Hybrid Hadoop Scenarios: App Development/POC Develop new apps on 100% interoperable infrastructure •  Develop & test without pre-committing to on-prem or cloud deployment Create new development & test environments on demand •  Do development with predictable costs De-risk application development •  Protect production data & SLA workloads from new dev errors and load spikes Experiment with new types of data to create new apps •  Defer decisions on data value and integration with the Data Lake On-­‐premises  or    “private  cloud”   Microso6  Analy9cs   Pla;orm  System   Microso6  Azure   Azure Storage Azure HDInsight …
  • 22. Page 22 © Hortonworks Inc. 2014 Hybrid Hadoop Scenarios: Bursting Handle peak workloads in 100% interoperable environments §  Run HDP and HDInsight to power analytics on your data in the cloud §  Runs the same application code Make additional capacity available by separating jobs, e.g. •  Ad hoc from scheduled •  analytics from reporting •  recent data from archived data •  ETL from aggregation •  SLA from non-SLA •  departmental •  by priorities On-­‐premises  or    “private  cloud”   Microso6  Analy9cs   Pla;orm  System   Microso6  Azure   Azure Storage Azure HDInsight …
  • 23. Page 23 © Hortonworks Inc. 2014 Demo Matt Winkler Principal Program Manager Microsoft
  • 24. Page 24 © Hortonworks Inc. 2014 Story line: Leveraging Falcon to enable data movement to the cloud Microsoft Azure Azure Storage HDInsight Hadoop cluster deployed to IaaS On-Premises Hadoop Cluster (HDP 2.1) Running on CentOS HDFS YARN Tez Hive MR Falcon •  Leveraging Falcon to seamlessly move data to the cloud •  Leveraging HDInsight to create a cluster on demand to process the same data with the same job
  • 25. Page 25 © Hortonworks Inc. 2014 Wait for it….Wait for it…
  • 26. Page 26 © Hortonworks Inc. 2014 Demo wrap-up… Why Cloud? •  Elasticity •  Cost Optimization •  Economic flexibility •  Support for bursting workloads •  Global footprint Why on Premises? •  Compliance requirements •  Specific control over hardware/networking •  Integration requirements for additional apps to be close to cluster Why Both? •  Offsite backup •  Dev/Test •  Burst to Cloud
  • 27. Page 27 © Hortonworks Inc. 2014 Next steps… Industry leading Hadoop Sandbox §  Free download §  Personal, portable Hadoop environment Included Tutorials for Microsoft §  How to Use Excel 2013 to Access Hadoop Data §  How to Use Excel 2013 to Analyze Hadoop Data §  How to Install and Configure the Hortonworks ODBC driver on Windows 7 Try Hadoop in the Cloud •  Up and running in minutes •  Spin up without hardware Free Trial: www.windowsazure.com/bigdata hortonworks.com/sandbox
  • 28. Page 28 © Hortonworks Inc. 2014 Thank you Time for Q&A