SlideShare una empresa de Scribd logo
1 de 19
Overview
1. Introduction to HPC
2. Hadoop (HDFS, MapReduce)
3. AWS toolkit (Amazon S3, Amazon EMR, Amazon Redshift)
4. Case study
Why?
Large data files from sequencers.
Computational bottleneck.
Processing time.
Data persistence and reliability.
Data security.
Bottlenecks in Genome Analysis
How?
Introduction
“High Performance Computing (HPC) most generally refers to the practice of
aggregating computing power in a way that delivers much higher performance
than one could get out of a typical desktop computer or workstation in order to
solve large problems in science, engineering, or business. ”
Dedicated supercomputer.
Commodity HPC cluster.
Grid computing.
HPC in cloud.
Forms of HPC
What?
Hadoop
Open source Java based framework for reliable, scalable and distributed
computing.
Doug Cutting and Mike Cafarella, 2006-08 in Yahoo!- inspired by Google (GFS)
in 2003.
Key Components
Hadoop Distributed File System (HDFS)
MapReduce
Hadoop
HDFS (Hadoop Distributed File System)
Data management layer
Master-Slave architecture
Fault Tolerant
Key Components:
NameNode
SecondaryNamenode
Hadoop- Continued
MapReduce
Mappers and Reducers
Batch oriented
Key Components
JobTracker
TaskTracker
Hadoop- Architecture
AWS ToolKit - Amazon Elastic MapReduce (EMR)
Managed Hadoop framework.
Runs almost all popular distributed frameworks such as Apache Spark, HBase,
Presto, and Flink.
Elastic.
Flexible Data storage (S3, HDFS, RedShift, Glacier, RDS).
Secure and reliable.
Full control and root access.
AWS ToolKit - Amazon EMR
aws emr create-cluster 
--name "demo" 
--release-label emr-4.5.0 
--instance-type m3.xlarge 
--instance-count 2 
--ec2-attributes KeyName=YOUR-AWS-SSH-KEY 
--use-default-roles 
--applications Name=Hive Name=Spark
aws emr create-cluster 
--name "Test cluster" 
--ami-version 2.4 
--applications Name=Hive Name=Pig 
--use-default-roles --ec2-attributes KeyName=myKey 
--instance-groups 
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge 
--steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE,
Args=[-f,s3://mybucket/scripts/pigscript.pig,-p, 
INPUT=s3://mybucket/inputdata/,-p, 
OUTPUT=s3://mybucket/outputdata/, 
$INPUT=s3://mybucket/inputdata/, 
$OUTPUT=s3://mybucket/outputdata/]
AWS ToolKit - Amazon S3 (Simple Storage Service)
Virtually infinite storage.
Single object size up to 5TB.
Why use S3?
Durable, Low Cost, Scalable, High Performance, Secure, Integrated, Easy to Use.
Decouple storage and computation resources.
HDFS requirements and implements EMRFS.
AWS ToolKit - Amazon Redshift
Fast, simple petabyte-scale data warehouse.
Use SQL query to interact.
Massively parallel.
Relational.
Architecture - Leader Node and Compute node.
Fast - 4 GB/sec/node.
Case Study- Rail RNA
Cloud-enabled spliced aligner that analyzes many samples at once.
Architecture - Amazon S3, Amazon EMR.
~50000 (from NCBI archive) human RNA sample using Rail-RNA - 150 Tbps.
Input to result - 2 weeks.
Cost- ~$1.40/sample.
Paper- Splicing across SRA.
Thank you

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

DevOps for Network Engineers
DevOps for Network EngineersDevOps for Network Engineers
DevOps for Network Engineers
 
Rancher 2.0 Technical Deep Dive
Rancher 2.0 Technical Deep DiveRancher 2.0 Technical Deep Dive
Rancher 2.0 Technical Deep Dive
 
Kubernetes Architecture
 Kubernetes Architecture Kubernetes Architecture
Kubernetes Architecture
 
Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka
 
Introduction to OpenStack
Introduction to OpenStackIntroduction to OpenStack
Introduction to OpenStack
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShiftKubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack Architecture
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
cloud computing architecture.pptx
cloud computing architecture.pptxcloud computing architecture.pptx
cloud computing architecture.pptx
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
OpenShift Enterprise
OpenShift EnterpriseOpenShift Enterprise
OpenShift Enterprise
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
PCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop SlidesPCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop Slides
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overview
 
Api observability
Api observability Api observability
Api observability
 

Similar a High Performance Computing (HPC) in cloud

AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
Amazon Web Services
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
Amazon Web Services Korea
 

Similar a High Performance Computing (HPC) in cloud (20)

(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
How containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go liveHow containers helped a SaaS startup be developed and go live
How containers helped a SaaS startup be developed and go live
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
 

Más de Accubits Technologies

Más de Accubits Technologies (6)

AI-powered real-time video analytics for Manufacturing sector
AI-powered real-time video analytics for Manufacturing sectorAI-powered real-time video analytics for Manufacturing sector
AI-powered real-time video analytics for Manufacturing sector
 
AI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sectorAI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sector
 
Blockchain and IoT For Supply Chain Traceability
Blockchain and IoT For Supply Chain TraceabilityBlockchain and IoT For Supply Chain Traceability
Blockchain and IoT For Supply Chain Traceability
 
ICOs : past, present and future
ICOs : past, present and futureICOs : past, present and future
ICOs : past, present and future
 
Blockchain in Bioinformatics
Blockchain in BioinformaticsBlockchain in Bioinformatics
Blockchain in Bioinformatics
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 

Último

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Último (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

High Performance Computing (HPC) in cloud

  • 1.
  • 2. Overview 1. Introduction to HPC 2. Hadoop (HDFS, MapReduce) 3. AWS toolkit (Amazon S3, Amazon EMR, Amazon Redshift) 4. Case study
  • 4. Large data files from sequencers. Computational bottleneck. Processing time. Data persistence and reliability. Data security. Bottlenecks in Genome Analysis
  • 6. Introduction “High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. ”
  • 7. Dedicated supercomputer. Commodity HPC cluster. Grid computing. HPC in cloud. Forms of HPC
  • 9. Hadoop Open source Java based framework for reliable, scalable and distributed computing. Doug Cutting and Mike Cafarella, 2006-08 in Yahoo!- inspired by Google (GFS) in 2003. Key Components Hadoop Distributed File System (HDFS) MapReduce
  • 10. Hadoop HDFS (Hadoop Distributed File System) Data management layer Master-Slave architecture Fault Tolerant Key Components: NameNode SecondaryNamenode
  • 11. Hadoop- Continued MapReduce Mappers and Reducers Batch oriented Key Components JobTracker TaskTracker
  • 13. AWS ToolKit - Amazon Elastic MapReduce (EMR) Managed Hadoop framework. Runs almost all popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink. Elastic. Flexible Data storage (S3, HDFS, RedShift, Glacier, RDS). Secure and reliable. Full control and root access.
  • 14. AWS ToolKit - Amazon EMR aws emr create-cluster --name "demo" --release-label emr-4.5.0 --instance-type m3.xlarge --instance-count 2 --ec2-attributes KeyName=YOUR-AWS-SSH-KEY --use-default-roles --applications Name=Hive Name=Spark
  • 15. aws emr create-cluster --name "Test cluster" --ami-version 2.4 --applications Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE, Args=[-f,s3://mybucket/scripts/pigscript.pig,-p, INPUT=s3://mybucket/inputdata/,-p, OUTPUT=s3://mybucket/outputdata/, $INPUT=s3://mybucket/inputdata/, $OUTPUT=s3://mybucket/outputdata/]
  • 16. AWS ToolKit - Amazon S3 (Simple Storage Service) Virtually infinite storage. Single object size up to 5TB. Why use S3? Durable, Low Cost, Scalable, High Performance, Secure, Integrated, Easy to Use. Decouple storage and computation resources. HDFS requirements and implements EMRFS.
  • 17. AWS ToolKit - Amazon Redshift Fast, simple petabyte-scale data warehouse. Use SQL query to interact. Massively parallel. Relational. Architecture - Leader Node and Compute node. Fast - 4 GB/sec/node.
  • 18. Case Study- Rail RNA Cloud-enabled spliced aligner that analyzes many samples at once. Architecture - Amazon S3, Amazon EMR. ~50000 (from NCBI archive) human RNA sample using Rail-RNA - 150 Tbps. Input to result - 2 weeks. Cost- ~$1.40/sample. Paper- Splicing across SRA.