SlideShare a Scribd company logo
1 of 11
CloudBurst
• CloudBurst : Highly Sensitive Short Read
  Mapping with MapReduce

• New parallel read-mapping algorithm
  optimized for mapping NGS data to the
  human genome and other reference
  genomes

• SNP discovery, genotyping, and personal
  genomics
CloudBurst
• It is modeled after the short read mapping
  program RMAP

• Reports either all alignments or the unambiguous
  best alignment for each read with any number of
  mismatches or differences

• This level of sensitivity could be prohibitively time
  consuming, but CloudBurst uses the open-source
  Hadoop implementation of MapReduce to
  parallelize execution using multiple compute
  nodes.
CloudBurst
• Running time
  – scales linearly with the number of reads mapped
  – with near linear speedup as the number of
    processors increases.


• CloudBurst reduces the running time from
  hours to mere minutes for typical jobs
  involving mapping of millions of short reads to
  the human genome.
Algorithm Overview
• CloudBurst uses seed-and-extend algorithms to
  map reads to a reference genome.

• Seed
  – k differences : the alignment must have a region of
    length s=r/k+1 called a seed that exactly matches the
    reference.

• Extend
  – CloudBurst attempts to extend the alignment into an
    end-to-end alignment with at most k mismatches or
    differences
Algorithm Overview
• CloudBurst uses the Hadoop implementation of
  MapReduce to catalog and extend the seeds

• Map phase emits
   – all length-s k-mers from the reference sequences
   – all non-overlapping length-s kmers from the reads

• Shuffle phase
   – read and reference kmers are brought together

• Reduce phase
   – the seeds are extended into end-to-end alignments
Algorithm Overview
Demo



Getting Started.docx 참고
Related Tools
•   Bowtie: Ultrafast short read alignment
•   SoapSNP: Accurate SNP/consensus calling
•   Tophat: RNA-Seq splice junction mapper
•   Cufflinks: Isoform assembly, quantitation
•   Hadoop: Open Source MapReduce
•   CloudBurst: Sensitive MapReduce alignment
•   Crossbow: Read Mapping and SNP calling in the clouds
•   Jnomics: Cloud-Scale Sequence Analysis
•   Contrail: Cloud-based de novo assembly
•   Myrna: Cloud-Scale differential expression of RNAseq
Q&A
Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing.



                                                       출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html

More Related Content

What's hot

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)Takumi Asai
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource AllocationSujith Jay Nair
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Koichi Shirahata
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabVijay Srinivas Agneeswaran, Ph.D
 
Data mining-2011-09
Data mining-2011-09Data mining-2011-09
Data mining-2011-09Ted Dunning
 

What's hot (20)

ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource Allocation
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Data mining-2011-09
Data mining-2011-09Data mining-2011-09
Data mining-2011-09
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 

Viewers also liked (20)

Natural Disaster - Cloud Burst
Natural Disaster - Cloud BurstNatural Disaster - Cloud Burst
Natural Disaster - Cloud Burst
 
Weather winds, air masses, air pressures, fronts
Weather   winds, air masses, air pressures, frontsWeather   winds, air masses, air pressures, fronts
Weather winds, air masses, air pressures, fronts
 
2010 leh cloud bursting
2010 leh cloud bursting2010 leh cloud bursting
2010 leh cloud bursting
 
Портрет бабушки
Портрет бабушкиПортрет бабушки
Портрет бабушки
 
Laporan
LaporanLaporan
Laporan
 
турнир вежливых ребят
турнир вежливых ребяттурнир вежливых ребят
турнир вежливых ребят
 
Indira's New York Food tour
Indira's New York Food tourIndira's New York Food tour
Indira's New York Food tour
 
El aquí y el ahora
El aquí y el ahoraEl aquí y el ahora
El aquí y el ahora
 
Larisa
LarisaLarisa
Larisa
 
Shira
ShiraShira
Shira
 
Google AdWords 101
Google AdWords 101Google AdWords 101
Google AdWords 101
 
Tabitha 2012 power point
Tabitha 2012 power pointTabitha 2012 power point
Tabitha 2012 power point
 
Azul
AzulAzul
Azul
 
Física y ciencia
Física y cienciaFísica y ciencia
Física y ciencia
 
Обо мне
Обо мнеОбо мне
Обо мне
 
ADHD In The Workplace
ADHD In The WorkplaceADHD In The Workplace
ADHD In The Workplace
 
Cорокин Захар Артёмович
Cорокин Захар АртёмовичCорокин Захар Артёмович
Cорокин Захар Артёмович
 
конёк горбунок
конёк горбунокконёк горбунок
конёк горбунок
 
WaterHackathonTelAviv - instructions
WaterHackathonTelAviv - instructionsWaterHackathonTelAviv - instructions
WaterHackathonTelAviv - instructions
 
Встреча с поэтом Е.Н. Ткач
Встреча с поэтом Е.Н. ТкачВстреча с поэтом Е.Н. Ткач
Встреча с поэтом Е.Н. Ткач
 

Similar to Highly Sensitive Cloud-Based Read Mapping with CloudBurst

Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduceScott Cinnamond
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersAmjith Singh
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataFrens Jan Rumph
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc NetworksKishan Patel
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand PredictionCoupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand Predictionivaderivader
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Hybrid networking and distribution
Hybrid networking and distribution Hybrid networking and distribution
Hybrid networking and distribution vivek pratap singh
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCHimanshu Bedi
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 

Similar to Highly Sensitive Cloud-Based Read Mapping with CloudBurst (20)

NGBT_poster_v0.4
NGBT_poster_v0.4NGBT_poster_v0.4
NGBT_poster_v0.4
 
Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduce
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big Data
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc Networks
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Wireless Sensor
Wireless SensorWireless Sensor
Wireless Sensor
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand PredictionCoupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
CCI DAY PRESENTATION
CCI DAY PRESENTATIONCCI DAY PRESENTATION
CCI DAY PRESENTATION
 
Manet
ManetManet
Manet
 
9517cnc06
9517cnc069517cnc06
9517cnc06
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Hybrid networking and distribution
Hybrid networking and distribution Hybrid networking and distribution
Hybrid networking and distribution
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Fa25939942
Fa25939942Fa25939942
Fa25939942
 

More from 주영 송

5일차.map reduce 활용
5일차.map reduce 활용5일차.map reduce 활용
5일차.map reduce 활용주영 송
 
Regression & Classification
Regression & ClassificationRegression & Classification
Regression & Classification주영 송
 
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)주영 송
 
SNA & R (20121011)
SNA & R (20121011)SNA & R (20121011)
SNA & R (20121011)주영 송
 
Recommendation system 소개 (1)
Recommendation system 소개 (1)Recommendation system 소개 (1)
Recommendation system 소개 (1)주영 송
 
Cloud burst tutorial
Cloud burst tutorialCloud burst tutorial
Cloud burst tutorial주영 송
 
Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7주영 송
 

More from 주영 송 (12)

R_datamining
R_dataminingR_datamining
R_datamining
 
Giraph
GiraphGiraph
Giraph
 
Mahout
MahoutMahout
Mahout
 
5일차.map reduce 활용
5일차.map reduce 활용5일차.map reduce 활용
5일차.map reduce 활용
 
Regression & Classification
Regression & ClassificationRegression & Classification
Regression & Classification
 
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
 
SNA & R (20121011)
SNA & R (20121011)SNA & R (20121011)
SNA & R (20121011)
 
Recommendation system 소개 (1)
Recommendation system 소개 (1)Recommendation system 소개 (1)
Recommendation system 소개 (1)
 
Cloud burst tutorial
Cloud burst tutorialCloud burst tutorial
Cloud burst tutorial
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
R intro
R introR intro
R intro
 
Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7
 

Recently uploaded

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Recently uploaded (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Highly Sensitive Cloud-Based Read Mapping with CloudBurst

  • 1. CloudBurst • CloudBurst : Highly Sensitive Short Read Mapping with MapReduce • New parallel read-mapping algorithm optimized for mapping NGS data to the human genome and other reference genomes • SNP discovery, genotyping, and personal genomics
  • 2. CloudBurst • It is modeled after the short read mapping program RMAP • Reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences • This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes.
  • 3. CloudBurst • Running time – scales linearly with the number of reads mapped – with near linear speedup as the number of processors increases. • CloudBurst reduces the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome.
  • 4. Algorithm Overview • CloudBurst uses seed-and-extend algorithms to map reads to a reference genome. • Seed – k differences : the alignment must have a region of length s=r/k+1 called a seed that exactly matches the reference. • Extend – CloudBurst attempts to extend the alignment into an end-to-end alignment with at most k mismatches or differences
  • 5. Algorithm Overview • CloudBurst uses the Hadoop implementation of MapReduce to catalog and extend the seeds • Map phase emits – all length-s k-mers from the reference sequences – all non-overlapping length-s kmers from the reads • Shuffle phase – read and reference kmers are brought together • Reduce phase – the seeds are extended into end-to-end alignments
  • 8.
  • 9. Related Tools • Bowtie: Ultrafast short read alignment • SoapSNP: Accurate SNP/consensus calling • Tophat: RNA-Seq splice junction mapper • Cufflinks: Isoform assembly, quantitation • Hadoop: Open Source MapReduce • CloudBurst: Sensitive MapReduce alignment • Crossbow: Read Mapping and SNP calling in the clouds • Jnomics: Cloud-Scale Sequence Analysis • Contrail: Cloud-based de novo assembly • Myrna: Cloud-Scale differential expression of RNAseq
  • 10. Q&A
  • 11. Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing. 출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html