SlideShare una empresa de Scribd logo
1 de 23
EXTENDING YOUR HADOOP
IMPLEMENTATION TO THE CLOUD
Matt Winkler
Principal Lead Program Manager
Big Data @ Microsoft  We’re Hiring
@mwinkle
AGENDA
Decisions
IaaS
Cloud Storage
Hadoop as a Service
Hybrid Scenarios
Next Steps
WHY CLOUD?
Elasticity
Cost Optimization
Economic flexibility
Support for bursting workloads
Global footprint
WHY ON-PREMISES?
Compliance requirements
Specific control over hardware/networking
Integration requirements for additional apps to be close to cluster
WHEN CLOUD?
Data born in the cloud
Global apps
Satisfy geopolitical or compliance constraints
Dev/Test
Backup
Geo-Redundancy
Bursting to cloud
IAAS – RUN YOUR HADOOP IN THE CLOUD
IaaS offerings across the cloud providers offer:
 OS choice
 Node configuration
 Customized networking topology
 Repeatable, scriptable deployments
You still have to:
 Set up the cluster
 Manage data movement into the cluster
 Integrate with your other applications
 Manage patching and updates of OS and apps
 Obtain support and/or licenses
DEMO
Deploying a Hadoop Cluster to Azure
LEVERAGE CLOUD STORAGE FOR
FLEXIBILITY
Cloud storage enables economic flexibility, scale and rich features
 Size clusters independent of storage needs
 Clusters become stateless to operate across the data
 Price continues decreasing
 Geo-Redundancy allows for business continuity/disaster recover planning
CLOUD STORAGE USAGE PATTERNS
HDFS within the cluster
 Move data in from cloud storage on boot
 (optional) backup/age data to cloud storage
 (optional) move data out to cloud storage to rebuild cluster
Default file system using cloud storage connectors
 To Hadoop apps, they just see a path to data and most things “just work”
 Apps which rely specifically on HDFS may encounter compat issues
 The physics change in exchange for flexibility
LEVERAGING HADOOP AS A SERVICE
Hadoop Services
 Cluster creation on demand
 Default integration with cloud storage
 Integration across services and apps
 Higher level abstractions
 API set for integrating into apps
Azure HDInsight
 Clusters provisioned on top of Azure Blob storage
 Deploy clusters of any size
 Entire stack supported by Microsoft
Azure Active Directory
Service Bus
Scheduler
Multi-Factor
Authentication
Express Route
Azure SQL Database
Azure Web
Site
Some example services
DEMO
Getting Started with HDInsight
HYBRID SCENARIOS
Key scenarios
 Offsite backup
 Dev/Test
 Burst to Cloud
The decision is not an XOR, it’s on-premises AND cloud
Microsoft
Azure
Azure Storage
HDInsight (Hadoop)
Hadoop cluster
deployed to IaaS
DEMO
On-Premises
Hadoop Cluster (HDP 2.1)
Running on CentOS
HDFS
YARN
Tez
Hive
MR
Falcon
GETTING STARTED
Get started in the cloud (getting
started cards available @ the
Microsoft booth and up here at the
stage)
Create an HDInsight cluster, or try
out deploying a Hadoop cluster to
Azure
http://aka.ms/howtohdinsight
Falcon command line
Falcon configuration files
FALCON CONFIGURATION…
Register & schedule
Data being landed into
HDFS, on-prem
Syncing to blob store
New file @ 1:38pm
@ 2:01 pm
From HDI Cluster

Más contenido relacionado

La actualidad más candente

BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
Milind gunjan
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 

La actualidad más candente (20)

Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyThe Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and Resiliency
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
Enabling Apache Spark for Hybrid Cloud
Enabling Apache Spark for Hybrid CloudEnabling Apache Spark for Hybrid Cloud
Enabling Apache Spark for Hybrid Cloud
 
Hadoop
HadoopHadoop
Hadoop
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 

Destacado

Forrester whos hot in bsm
Forrester whos hot in bsmForrester whos hot in bsm
Forrester whos hot in bsm
IBM_BSM
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 

Destacado (15)

Azure Machine Learning using R
Azure Machine Learning using RAzure Machine Learning using R
Azure Machine Learning using R
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Forrester whos hot in bsm
Forrester whos hot in bsmForrester whos hot in bsm
Forrester whos hot in bsm
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
Improve the Development Process with DevOps Practices by Fedorov Vadim
Improve the Development Process with DevOps Practices by Fedorov VadimImprove the Development Process with DevOps Practices by Fedorov Vadim
Improve the Development Process with DevOps Practices by Fedorov Vadim
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 

Similar a Extending your Hadoop Implementation to the Cloud

Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
Alluxio, Inc.
 
ICS-Azure Migrations & Application Modernization_V2.pptx
ICS-Azure Migrations & Application Modernization_V2.pptxICS-Azure Migrations & Application Modernization_V2.pptx
ICS-Azure Migrations & Application Modernization_V2.pptx
mustafa435048
 

Similar a Extending your Hadoop Implementation to the Cloud (20)

Cloud computing
Cloud computingCloud computing
Cloud computing
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Introduction to SQL Azure
Introduction to SQL AzureIntroduction to SQL Azure
Introduction to SQL Azure
 
Agile Infrastructure with Windows Azure
Agile Infrastructure with Windows AzureAgile Infrastructure with Windows Azure
Agile Infrastructure with Windows Azure
 
Cloud1 Computing 01
Cloud1 Computing 01Cloud1 Computing 01
Cloud1 Computing 01
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
OpenStack and CloudForms Do's and Dont's
OpenStack and CloudForms Do's and Dont'sOpenStack and CloudForms Do's and Dont's
OpenStack and CloudForms Do's and Dont's
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Dallas Breakfast Seminar
Dallas Breakfast SeminarDallas Breakfast Seminar
Dallas Breakfast Seminar
 
Intro to Windows Azure
Intro to Windows AzureIntro to Windows Azure
Intro to Windows Azure
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Microsoft-Azure-Overvi2222222222222ew.pptx
Microsoft-Azure-Overvi2222222222222ew.pptxMicrosoft-Azure-Overvi2222222222222ew.pptx
Microsoft-Azure-Overvi2222222222222ew.pptx
 
ICS-Azure Migrations & Application Modernization_V2.pptx
ICS-Azure Migrations & Application Modernization_V2.pptxICS-Azure Migrations & Application Modernization_V2.pptx
ICS-Azure Migrations & Application Modernization_V2.pptx
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Extending your Hadoop Implementation to the Cloud

  • 1. EXTENDING YOUR HADOOP IMPLEMENTATION TO THE CLOUD Matt Winkler Principal Lead Program Manager Big Data @ Microsoft  We’re Hiring @mwinkle
  • 2. AGENDA Decisions IaaS Cloud Storage Hadoop as a Service Hybrid Scenarios Next Steps
  • 3. WHY CLOUD? Elasticity Cost Optimization Economic flexibility Support for bursting workloads Global footprint
  • 4. WHY ON-PREMISES? Compliance requirements Specific control over hardware/networking Integration requirements for additional apps to be close to cluster
  • 5. WHEN CLOUD? Data born in the cloud Global apps Satisfy geopolitical or compliance constraints Dev/Test Backup Geo-Redundancy Bursting to cloud
  • 6. IAAS – RUN YOUR HADOOP IN THE CLOUD IaaS offerings across the cloud providers offer:  OS choice  Node configuration  Customized networking topology  Repeatable, scriptable deployments You still have to:  Set up the cluster  Manage data movement into the cluster  Integrate with your other applications  Manage patching and updates of OS and apps  Obtain support and/or licenses
  • 7. DEMO Deploying a Hadoop Cluster to Azure
  • 8. LEVERAGE CLOUD STORAGE FOR FLEXIBILITY Cloud storage enables economic flexibility, scale and rich features  Size clusters independent of storage needs  Clusters become stateless to operate across the data  Price continues decreasing  Geo-Redundancy allows for business continuity/disaster recover planning
  • 9. CLOUD STORAGE USAGE PATTERNS HDFS within the cluster  Move data in from cloud storage on boot  (optional) backup/age data to cloud storage  (optional) move data out to cloud storage to rebuild cluster Default file system using cloud storage connectors  To Hadoop apps, they just see a path to data and most things “just work”  Apps which rely specifically on HDFS may encounter compat issues  The physics change in exchange for flexibility
  • 10. LEVERAGING HADOOP AS A SERVICE Hadoop Services  Cluster creation on demand  Default integration with cloud storage  Integration across services and apps  Higher level abstractions  API set for integrating into apps Azure HDInsight  Clusters provisioned on top of Azure Blob storage  Deploy clusters of any size  Entire stack supported by Microsoft Azure Active Directory Service Bus Scheduler Multi-Factor Authentication Express Route Azure SQL Database Azure Web Site Some example services
  • 12. HYBRID SCENARIOS Key scenarios  Offsite backup  Dev/Test  Burst to Cloud The decision is not an XOR, it’s on-premises AND cloud
  • 13. Microsoft Azure Azure Storage HDInsight (Hadoop) Hadoop cluster deployed to IaaS DEMO On-Premises Hadoop Cluster (HDP 2.1) Running on CentOS HDFS YARN Tez Hive MR Falcon
  • 14. GETTING STARTED Get started in the cloud (getting started cards available @ the Microsoft booth and up here at the stage) Create an HDInsight cluster, or try out deploying a Hadoop cluster to Azure http://aka.ms/howtohdinsight
  • 19. Data being landed into HDFS, on-prem
  • 21. New file @ 1:38pm