SlideShare una empresa de Scribd logo
Embarrassingly Parallel
Problems
CS5225 Parallel and Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Some slides adapted from Dr. Srinath Perera
Embarrassingly Parallel Problems
 A.k.a. Delightfully Parallel Problems
 Can be easily parallelizable
 Usually use simple communication patterns
 Usually work without much communication
among each other
 Map-Reduce programming model provides a
powerful abstraction to handle embarrassingly
parallel problems
2
Map-Reduce
 Common pattern to solve parallel problems
 Based on 2 constructs from functional programming,
map & reduce
 Introduced by Google
 Dean et. al., “MapReduce: Simplified Data Processing
on Large Clusters,” OSDI, 2004
 Extensible for different applications
 Scale to very large number of nodes
 Hide details like failures from users
3
High-Order Functions
 Programming languages (e.g., Java) pass data
as parameters & results of functions
 Higher-order functions pass both data as well as
functions as parameters or results of functions
 E.g., Python, Ruby, JavaScript
 For example
def f(x):
return x + 3
def g(function, x):
return function(x) * function(x)
print g(f, 7) 4
Map-Reduce
 Accepts 2 functions as inputs
1. Map function
 Y fn1(X)
 Accepts input X & outputs another Y
2. Reduce function
 Z fn2(List<Y>)
 Accepts array of Y’s & returns another output Z
5
Map-Reduce (Contd.)
 Map-reduce support is provided by a function
like following
 Y map-reduce(mapfn, reducefn, List<X>)
 Map reduce implementation takes list of inputs
(list) & does following
 Apply map function to each entry in the list, which
emit (key, value) pairs
 Collect results, group them by keys, & then pass them
to reduce function as array
6
Map-Reduce (Contd.)
7
Source: www.datasciencecentral.com/profiles/blogs/practical-
illustration-of-map-reduce-hadoop-style-on-real-data
Map-Reduce for Word Counting
8
Source: http://xiaochongzhang.me/blog/?p=338
How to do this for a large dataset using a distributed system?
In Class Activity
1. Card sorting
2. Card sorting with 2 rounds
3. Identify missing cards
9
Inspired by Marcio Silva's “The MapReduce Card Game” at
http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
Why Map-Reduce?
 Implementing same pattern in a distributed
system isn’t that easy
 Need to worry about communication, failures,
initialization, etc.
 MapReduce frameworks worry about all those
 You write map & reduce functions & call
framework
 It forces you to think parallel in design time
 It gives you a higher-level of abstraction to think in
 It’s very generic, & covers lot of usecases
 See http://wiki.apache.org/hadoop/PoweredBy
10
Map-Reduce Implementations
 Can be implemented in many ways
 In-memory implementation
 Distributed implementation
 Communication by messages
 Communication by file system
 Communication by databases
 Communication Requirements
 Need broadcast & reduce operations only
11
Map-Reduce with Hadoop
 Apache Hadoop is an implementation of Map-
reduce
 Handles all details about distributed execution
 You just have to give Map & Reduce functions
12
Map-Reduce Data Model
13
Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
Map-Reduce Data Model (Cont.)
 Hadoop breaks input data into multiple data items by
new lines & runs map function once for each data item
 When executed, map function outputs (key, value) pairs
 Hadoop collects all (key, value) pairs generated by map
function, sorts them by the key, & groups values with the
same key together into groups
 For each distinct key, Hadoop runs reduce function once
while passing key & list of values for that key as input
 Reduce function outputs (key, value) pairs, & Hadoop
writes them to a file as final result
14
Execution on a Cluster/Cloud
15
Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
MapReduce Execution
16
Source: Dean et. al.,
“MapReduce, OSDI, 2004
Designing Map-Reduce Applications
 You control task granularity by changing no of
map & reduce tasks
 How many map tasks?
 How many reduce tasks?
 Fine Grain  more parallelism  more
communication overhead and vise versa
 Usually frameworks handle load balancing &
failures
 If large number of maps are there, you need a
Combine Function as well
17
Examples
 Sorting
 How to sort an array of 1 million integers using
MapReduce?
 Inverted Index
 Normal index is a mapping from document to terms
 Inverted index is mapping from terms to documents
 If we have a million documents, how do we build a
inverted index using MapReduce?
 Frequency Distribution of Word Occurrences
 Count number of occurrences & build a histogram
18
Examples (Cont.)
 Stitch Imagery
 For Google maps, Google need to combine many
map data into a single set of data
 Business Intelligence
 A business want to create a graph of income
generated by each region & marketing money spend
on each region
19
Examples (Cont.)
 K-Means
 Assume you are given a list of earth quakes
coordinates happened in the world in last 50 years.
 You are asked to use K-Means Clustering algorithm
to find 10 locations around which those earth quakes
were located.
 K-Means starts with 10 random cluster locations.
 It proceeds iteratively, & at each iteration, it assigns each
data point (earth quake) to the closest cluster location
 At end of each iteration, it recalculates each cluster location
using mean of all data point coordinates assigned to that
location
 It stops when cluster locations doesn’t change after
recalculation 20
K-Means Algorithm
List kmeans(datapointsList , initialClustersList){
oldlocations = null;
newLocations = initialClustersList ;
while(oldlocations != newLocations){
for(d in datapointsList){
oldlocations = newLocations ;
newLocations = //recalculate locations
}
//assign d to closest location in newLocations
}
}
return newLocations ;
21

Más contenido relacionado

Similar a Embarrassingly/Delightfully Parallel Problems

Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
Shubham Bansal
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
Geoffrey Fox
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
Big data
Big dataBig data
Big data
rajsandhu1989
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
Jyotirmoy Dey
 
B04 06 0918
B04 06 0918B04 06 0918
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
dharanis15
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
Robert Grossman
 
Hadoop
HadoopHadoop
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
ijsrd.com
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Yahoo Developer Network
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
Noha Elprince
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 

Similar a Embarrassingly/Delightfully Parallel Problems (20)

Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
E031201032036
E031201032036E031201032036
E031201032036
 
Big data
Big dataBig data
Big data
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Hadoop
HadoopHadoop
Hadoop
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 

Más de Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Dilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
Dilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
Dilum Bandara
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
Dilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
Dilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
Dilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
Dilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
Dilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
Dilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
Dilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
Dilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
Dilum Bandara
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
Dilum Bandara
 

Más de Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
 

Último

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Último (20)

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Embarrassingly/Delightfully Parallel Problems

  • 1. Embarrassingly Parallel Problems CS5225 Parallel and Concurrent Programming Dilum Bandara Dilum.Bandara@uom.lk Some slides adapted from Dr. Srinath Perera
  • 2. Embarrassingly Parallel Problems  A.k.a. Delightfully Parallel Problems  Can be easily parallelizable  Usually use simple communication patterns  Usually work without much communication among each other  Map-Reduce programming model provides a powerful abstraction to handle embarrassingly parallel problems 2
  • 3. Map-Reduce  Common pattern to solve parallel problems  Based on 2 constructs from functional programming, map & reduce  Introduced by Google  Dean et. al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI, 2004  Extensible for different applications  Scale to very large number of nodes  Hide details like failures from users 3
  • 4. High-Order Functions  Programming languages (e.g., Java) pass data as parameters & results of functions  Higher-order functions pass both data as well as functions as parameters or results of functions  E.g., Python, Ruby, JavaScript  For example def f(x): return x + 3 def g(function, x): return function(x) * function(x) print g(f, 7) 4
  • 5. Map-Reduce  Accepts 2 functions as inputs 1. Map function  Y fn1(X)  Accepts input X & outputs another Y 2. Reduce function  Z fn2(List<Y>)  Accepts array of Y’s & returns another output Z 5
  • 6. Map-Reduce (Contd.)  Map-reduce support is provided by a function like following  Y map-reduce(mapfn, reducefn, List<X>)  Map reduce implementation takes list of inputs (list) & does following  Apply map function to each entry in the list, which emit (key, value) pairs  Collect results, group them by keys, & then pass them to reduce function as array 6
  • 8. Map-Reduce for Word Counting 8 Source: http://xiaochongzhang.me/blog/?p=338 How to do this for a large dataset using a distributed system?
  • 9. In Class Activity 1. Card sorting 2. Card sorting with 2 rounds 3. Identify missing cards 9 Inspired by Marcio Silva's “The MapReduce Card Game” at http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
  • 10. Why Map-Reduce?  Implementing same pattern in a distributed system isn’t that easy  Need to worry about communication, failures, initialization, etc.  MapReduce frameworks worry about all those  You write map & reduce functions & call framework  It forces you to think parallel in design time  It gives you a higher-level of abstraction to think in  It’s very generic, & covers lot of usecases  See http://wiki.apache.org/hadoop/PoweredBy 10
  • 11. Map-Reduce Implementations  Can be implemented in many ways  In-memory implementation  Distributed implementation  Communication by messages  Communication by file system  Communication by databases  Communication Requirements  Need broadcast & reduce operations only 11
  • 12. Map-Reduce with Hadoop  Apache Hadoop is an implementation of Map- reduce  Handles all details about distributed execution  You just have to give Map & Reduce functions 12
  • 13. Map-Reduce Data Model 13 Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
  • 14. Map-Reduce Data Model (Cont.)  Hadoop breaks input data into multiple data items by new lines & runs map function once for each data item  When executed, map function outputs (key, value) pairs  Hadoop collects all (key, value) pairs generated by map function, sorts them by the key, & groups values with the same key together into groups  For each distinct key, Hadoop runs reduce function once while passing key & list of values for that key as input  Reduce function outputs (key, value) pairs, & Hadoop writes them to a file as final result 14
  • 15. Execution on a Cluster/Cloud 15 Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
  • 16. MapReduce Execution 16 Source: Dean et. al., “MapReduce, OSDI, 2004
  • 17. Designing Map-Reduce Applications  You control task granularity by changing no of map & reduce tasks  How many map tasks?  How many reduce tasks?  Fine Grain  more parallelism  more communication overhead and vise versa  Usually frameworks handle load balancing & failures  If large number of maps are there, you need a Combine Function as well 17
  • 18. Examples  Sorting  How to sort an array of 1 million integers using MapReduce?  Inverted Index  Normal index is a mapping from document to terms  Inverted index is mapping from terms to documents  If we have a million documents, how do we build a inverted index using MapReduce?  Frequency Distribution of Word Occurrences  Count number of occurrences & build a histogram 18
  • 19. Examples (Cont.)  Stitch Imagery  For Google maps, Google need to combine many map data into a single set of data  Business Intelligence  A business want to create a graph of income generated by each region & marketing money spend on each region 19
  • 20. Examples (Cont.)  K-Means  Assume you are given a list of earth quakes coordinates happened in the world in last 50 years.  You are asked to use K-Means Clustering algorithm to find 10 locations around which those earth quakes were located.  K-Means starts with 10 random cluster locations.  It proceeds iteratively, & at each iteration, it assigns each data point (earth quake) to the closest cluster location  At end of each iteration, it recalculates each cluster location using mean of all data point coordinates assigned to that location  It stops when cluster locations doesn’t change after recalculation 20
  • 21. K-Means Algorithm List kmeans(datapointsList , initialClustersList){ oldlocations = null; newLocations = initialClustersList ; while(oldlocations != newLocations){ for(d in datapointsList){ oldlocations = newLocations ; newLocations = //recalculate locations } //assign d to closest location in newLocations } } return newLocations ; 21

Notas del editor

  1. Shovel example