SlideShare una empresa de Scribd logo
1 de 21
Optimal Chain Matrix Multiplication Big Data
Perspective
Presented By
Pollab Kumar Roy
pollabroy.242@gmail.com
STUDY AND REPORT
Presentation Outline
 Introduction
 Big Data Overview
• Definition
• Three V presentation
• Application
 Introduction to Hadoop
• Architecture
• How it works
• Advantage
 MapReduce
• What is MapReduce?
• The Algorithm
• Example Scenario
 HDFS
 Matrix Multiplication
 Multi Way Join
 Proposed Work
 Conclusions
Dept. of ICT, MBSTU
2
Introduction
Matrix multiplication is widely used for many graph algorithms, such
as those that calculate the transitive closure. MapReduce is good to
implement multi way join operation for very large graphs and metrices.
 In this presentation we will see Big Data overview. Matrix
multiplication representation in database. Parallel multi way matrix
join in database with benefit and limitation.
 And a proposal for making chain multiplication more optimal with
raw join key.
Dept. of ICT, MBSTU
3
Big Data Overview
Big data is a term that refers to data sets whose size , complexity, and
rate of growth make them difficult to be captured, managed,
processed by conventional technologies.
 Big Data Source :
Dept. of ICT, MBSTU
4
Stock
Exchange
data
Social
Media
data
Black Box
data
Volume
Till 2003 was 5 billion GB.
Two days in 2011.
Every ten minutes in 2013
Variety
Structured: Relational data.
Semi Structured: XML data.
Unstructured: Word, PDF, Text,
Media Logs.
Velocity
Big Data Velocity deals with the
pace at which data flows in from sources and human interaction.
The three dimensions of Big Data
Dept. of ICT, MBSTU
5
Big Data Application Segments
Analytics
Predictive Modeling
Decision Processing
Behavior Analysis
Demographics
Data Warehouse
Hosting
Digitization/archive
Backup
Web 2.0
Engineering Collaborating
Design Optimization
Process Flow
Fluid Dynamics
3D Modeling
Analytics
Predictive Modeling
Decision Processing
Behavior Analysis
Demographics
Dept. of ICT, MBSTU
6
Introduction to Hadoop
 Hadoop: Apache open source framework written in java that allows
distributed processing of large datasets across clusters of computers using
simple programming models.
 Doug Cutting son’s toy.
 Hadoop Architecture :
Two major layers.
• Processing layer :
MapReduce
• Storage layer :
Hadoop Distributed
File System
Dept. of ICT, MBSTU
7
MapReduce
(Distributed Computation)
HDFS
(Distributed Storage)
YARN Framework Common Utilities
Introduction to Hadoop (cont.)
 How Hadoop works : Core tasks across a cluster of computers
• Data dividing into directories and files(128M/64M).
• Files are then distributed across various cluster nodes.
• HDFS, supervises the processing.
• Blocks are replicated.
• Performing sort between the map and reduce stages.
• Sending the sorted data to a certain computer.
 Advantage :
• Low-cost alternative to build bigger servers.
• Fault-tolerance and high availability.
• Dynamic clustering.
• Automatic data distribution and open source
Dept. of ICT, MBSTU
8
MapReduce
 What is MapReduce : A processing technique and a program
model for distributed computing based on java.
• Mapper
• Shuffle
• Reducer
• Java based
• Key Value
Dept. of ICT, MBSTU
9
MapReduce (cont.)
 The algorithm: Mapper Reducer Key Value
Dept. of ICT, MBSTU
10
MapReduce (cont.)
 Word Count Example :
Dept. of ICT, MBSTU
11
Apple Orange Mango
Orange Grapes Plum
Apple Orange Mango
Orange Grapes Plum
Apple Plum Mango
Apple Apple Plum
Apple Plum Mango
Apple Apple Plum
Apple,1
Orange ,1
Mango,1
Orange,1
Grapes ,1
Plum,1
Apple,1
Plum ,1
Mango,1
Apple,1
Apple ,1
Plum,1
Apple,1
Apple,1
Apple,1
Apple,1
Grapes ,1
Mango,1
Mango,1
Orange,1
Orange,1
Plum,1
Plum,1
Plum,1
Apple,4
Grapes,1
Mango,2
Orange,2
Plum,3
Apple,4
Grapes,1
Mango,2
Orange,2
Plum,3
input Files each line to individual mapper
map key value splitting sort, shuffle Produce key value pairs
Final output
Hadoop Distributed File System(HDFS)
 The HDFS is a distributed, scalable, and portable file-system written
in Java for the Hadoop framework.
 Feature :
• Distributed storage and processing
• Name Node
• Data Node
• Interface in Hadoop
• Streaming access
• Cluster status check
Dept. of ICT, MBSTU
12
Hadoop Distributed File System(cont.)
 Architecture : Data Node, Name Node, Block
Dept. of ICT, MBSTU
13
Name Node
Meta data(Name, replica…)
/home/foo/data, 3…
Client
Blocks
Replication
Read
D a t a n o d e s D a t a n o d e s
Rack 1 Rack 2
Matrix Multiplication (Via multi-way join)
 Usage : Widely used in many graph algorithms
• Transitive closure
• N-hop neighbors
 Join Operation :
• Matrices A [p×q] and B [q×r]
• C [p×r] = 𝐀 × 𝑩
• Each (i,k) th element of C is 𝒋=𝟏
𝒒
𝑨𝒊𝒋 × 𝑩𝒋𝒌
• A and B by relations 𝑹 𝟏 and 𝑹 𝟐 in database, attributes{row, col, val}
• 𝐀 × 𝑩 in terms of SQL
Dept. of ICT, MBSTU
14
User_1
User_2
User_7
User_3
User_5
User_6
User_4
Fig : Social Network
SELECT 𝑅1.row, 𝑅2.col, sum(𝑅1.val* 𝑅2.val)
FROM 𝑅1, 𝑅2
WHERE 𝑅1.col= 𝑅2.row
GROUP BY 𝑅1.row, 𝑅2.col
Matrix Multiplication (cont.)
Dept. of ICT, MBSTU
15
Fig : Database representation
Matrix Multiplication (cont.)
 Chain way join :
• Eq.(1) typical method,serial two-way join (S2). Separate MR
job, Intra-operation parallelism.
• Eq.(2) parallel two-way join (P2). Inter-operation parallelism.
and simultaneously
• Eq.(3) parallel m-way join (PM)
Dept. of ICT, MBSTU
16
((A *B) * (C *D))= (2)
(A * B * C * D)= (3)
(((A *B) * C) * D)= (1)A * B * C * D
A * B C * D
Matrix Multiplication (cont.)
 Parallel M-way join :
• S2(n-1) = 4
• P2 = 3
• PM = 2
Dept. of ICT, MBSTU
17
Input : Relations M1, M2,…. Mn representing matrices
1: LIST_Mnext <= M1, M2,…. Mn
2: while |LIST_Mnext|> 1 do
3: for I = 1 to |LIST_Mnext | do
4: if ( i mod m ) == 1 then
5: add Mi to LIST_Mleft
6: Mleft = Mi
7: else
8: add Mi to LIST_Mright ( Mleft )
9: end if
10: end for
11: LIST_Mnext = doMR-PM (LIST_Mleft,LIST_Mright )
12: end while
M1
M4 M5M2 M3M1
M1 M4
<1st MR job>
<2nd MR job> < result >
Fig : Example of parallel 3 way
Fig : Algorithm for PM join
[𝑙𝑜𝑔2
𝑛
]
[𝑙𝑜𝑔 𝑚
𝑛 ]
Matrix Multiplication (cont.)
 Efficiency of m-way join :
• MR job iteration
• Time
 Limitation :
• Join key number
• Greater network
and sorting overhead
Dept. of ICT, MBSTU
18
Fig : PM Join key
Future study and Proposed Work
 Future study :
• Amazon EC2
• Apache Whirr tools
• Larger graph datasets to matrix
• Hadoop , more Papers
 Proposed work :
• PM with the raw key.
• This improvement should reduce the number of duplications and
increase the diversity of the join key.
• MapReduce framework that does not perform sort operations in
mappers.
Dept. of ICT, MBSTU
19
Conclusion
In this presentation, i explained the multiplication of matrices
into a multi-way join operation s. The implementation of three
types algorithms: S2, P2, and PM.
Parallel m-way join operation can improve the performance of
the matrix chain multiplication process.
However, using the composite key introduces a number of
disadvantages, such as greater network and sorting overhead.
Finally i propose Parallel m-way join operation with raw key to
make it optimal.
Dept. of ICT, MBSTU
20
References
 Apache hadoop. Website. http://hadoop.apache.org
 http://www.sas.com/en_us/insights/big-data/hadoop.html
 Zikopoulos, P. C., Eaton, C., DeRoos, D., Deutsch, T., & Lapis, G.
(2012). Understanding big data. New York et al: McGraw-Hill.
 Myung, J., & Lee, S. G. (2012, February). Matrix chain
multiplication via multi-way join algorithms in MapReduce. In
Proceedings of the 6th International Conference on Ubiquitous
Information Management and Communication (p. 53). ACM.
 J. Dean and S. Ghemawat Map-Reduce: simplified data processing
on large clusters.
Dept. of ICT, MBSTU
21

Más contenido relacionado

La actualidad más candente

Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithmRicha Kumari
 
Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)Pramit Kumar
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceUniversity of Technology - Iraq
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
 

La actualidad más candente (20)

Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Matrix Multiplication Report
Matrix Multiplication ReportMatrix Multiplication Report
Matrix Multiplication Report
 
Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)Matrix Multiplication(An example of concurrent programming)
Matrix Multiplication(An example of concurrent programming)
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Chap12 slides
Chap12 slidesChap12 slides
Chap12 slides
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Chap10 slides
Chap10 slidesChap10 slides
Chap10 slides
 
Chap9 slides
Chap9 slidesChap9 slides
Chap9 slides
 
Digital Signal Processing Assignment Help
Digital Signal Processing Assignment HelpDigital Signal Processing Assignment Help
Digital Signal Processing Assignment Help
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Chap11 slides
Chap11 slidesChap11 slides
Chap11 slides
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Digital Signal Processing Homework Help
Digital Signal Processing Homework HelpDigital Signal Processing Homework Help
Digital Signal Processing Homework Help
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Environmental Engineering Assignment Help
Environmental Engineering Assignment HelpEnvironmental Engineering Assignment Help
Environmental Engineering Assignment Help
 
MATLAB
MATLABMATLAB
MATLAB
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
 

Similar a Optimal Chain Matrix Multiplication Big Data Perspective

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningGianvito Siciliano
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Parallel Machine Learning
Parallel Machine LearningParallel Machine Learning
Parallel Machine LearningJanani C
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graphijdms
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudBharat Rane
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467IJRAT
 
High Performance Computing for Satellite Image Processing and Analyzing – A ...
High Performance Computing for Satellite Image  Processing and Analyzing – A ...High Performance Computing for Satellite Image  Processing and Analyzing – A ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...Editor IJCATR
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 

Similar a Optimal Chain Matrix Multiplication Big Data Perspective (20)

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Parallel Machine Learning
Parallel Machine LearningParallel Machine Learning
Parallel Machine Learning
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Pregel
PregelPregel
Pregel
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Cross cloud map reduce for big data
Cross cloud map reduce for big dataCross cloud map reduce for big data
Cross cloud map reduce for big data
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
High Performance Computing for Satellite Image Processing and Analyzing – A ...
High Performance Computing for Satellite Image  Processing and Analyzing – A ...High Performance Computing for Satellite Image  Processing and Analyzing – A ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
E031201032036
E031201032036E031201032036
E031201032036
 

Último

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Último (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

Optimal Chain Matrix Multiplication Big Data Perspective

  • 1. Optimal Chain Matrix Multiplication Big Data Perspective Presented By Pollab Kumar Roy pollabroy.242@gmail.com STUDY AND REPORT
  • 2. Presentation Outline  Introduction  Big Data Overview • Definition • Three V presentation • Application  Introduction to Hadoop • Architecture • How it works • Advantage  MapReduce • What is MapReduce? • The Algorithm • Example Scenario  HDFS  Matrix Multiplication  Multi Way Join  Proposed Work  Conclusions Dept. of ICT, MBSTU 2
  • 3. Introduction Matrix multiplication is widely used for many graph algorithms, such as those that calculate the transitive closure. MapReduce is good to implement multi way join operation for very large graphs and metrices.  In this presentation we will see Big Data overview. Matrix multiplication representation in database. Parallel multi way matrix join in database with benefit and limitation.  And a proposal for making chain multiplication more optimal with raw join key. Dept. of ICT, MBSTU 3
  • 4. Big Data Overview Big data is a term that refers to data sets whose size , complexity, and rate of growth make them difficult to be captured, managed, processed by conventional technologies.  Big Data Source : Dept. of ICT, MBSTU 4 Stock Exchange data Social Media data Black Box data
  • 5. Volume Till 2003 was 5 billion GB. Two days in 2011. Every ten minutes in 2013 Variety Structured: Relational data. Semi Structured: XML data. Unstructured: Word, PDF, Text, Media Logs. Velocity Big Data Velocity deals with the pace at which data flows in from sources and human interaction. The three dimensions of Big Data Dept. of ICT, MBSTU 5
  • 6. Big Data Application Segments Analytics Predictive Modeling Decision Processing Behavior Analysis Demographics Data Warehouse Hosting Digitization/archive Backup Web 2.0 Engineering Collaborating Design Optimization Process Flow Fluid Dynamics 3D Modeling Analytics Predictive Modeling Decision Processing Behavior Analysis Demographics Dept. of ICT, MBSTU 6
  • 7. Introduction to Hadoop  Hadoop: Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.  Doug Cutting son’s toy.  Hadoop Architecture : Two major layers. • Processing layer : MapReduce • Storage layer : Hadoop Distributed File System Dept. of ICT, MBSTU 7 MapReduce (Distributed Computation) HDFS (Distributed Storage) YARN Framework Common Utilities
  • 8. Introduction to Hadoop (cont.)  How Hadoop works : Core tasks across a cluster of computers • Data dividing into directories and files(128M/64M). • Files are then distributed across various cluster nodes. • HDFS, supervises the processing. • Blocks are replicated. • Performing sort between the map and reduce stages. • Sending the sorted data to a certain computer.  Advantage : • Low-cost alternative to build bigger servers. • Fault-tolerance and high availability. • Dynamic clustering. • Automatic data distribution and open source Dept. of ICT, MBSTU 8
  • 9. MapReduce  What is MapReduce : A processing technique and a program model for distributed computing based on java. • Mapper • Shuffle • Reducer • Java based • Key Value Dept. of ICT, MBSTU 9
  • 10. MapReduce (cont.)  The algorithm: Mapper Reducer Key Value Dept. of ICT, MBSTU 10
  • 11. MapReduce (cont.)  Word Count Example : Dept. of ICT, MBSTU 11 Apple Orange Mango Orange Grapes Plum Apple Orange Mango Orange Grapes Plum Apple Plum Mango Apple Apple Plum Apple Plum Mango Apple Apple Plum Apple,1 Orange ,1 Mango,1 Orange,1 Grapes ,1 Plum,1 Apple,1 Plum ,1 Mango,1 Apple,1 Apple ,1 Plum,1 Apple,1 Apple,1 Apple,1 Apple,1 Grapes ,1 Mango,1 Mango,1 Orange,1 Orange,1 Plum,1 Plum,1 Plum,1 Apple,4 Grapes,1 Mango,2 Orange,2 Plum,3 Apple,4 Grapes,1 Mango,2 Orange,2 Plum,3 input Files each line to individual mapper map key value splitting sort, shuffle Produce key value pairs Final output
  • 12. Hadoop Distributed File System(HDFS)  The HDFS is a distributed, scalable, and portable file-system written in Java for the Hadoop framework.  Feature : • Distributed storage and processing • Name Node • Data Node • Interface in Hadoop • Streaming access • Cluster status check Dept. of ICT, MBSTU 12
  • 13. Hadoop Distributed File System(cont.)  Architecture : Data Node, Name Node, Block Dept. of ICT, MBSTU 13 Name Node Meta data(Name, replica…) /home/foo/data, 3… Client Blocks Replication Read D a t a n o d e s D a t a n o d e s Rack 1 Rack 2
  • 14. Matrix Multiplication (Via multi-way join)  Usage : Widely used in many graph algorithms • Transitive closure • N-hop neighbors  Join Operation : • Matrices A [p×q] and B [q×r] • C [p×r] = 𝐀 × 𝑩 • Each (i,k) th element of C is 𝒋=𝟏 𝒒 𝑨𝒊𝒋 × 𝑩𝒋𝒌 • A and B by relations 𝑹 𝟏 and 𝑹 𝟐 in database, attributes{row, col, val} • 𝐀 × 𝑩 in terms of SQL Dept. of ICT, MBSTU 14 User_1 User_2 User_7 User_3 User_5 User_6 User_4 Fig : Social Network SELECT 𝑅1.row, 𝑅2.col, sum(𝑅1.val* 𝑅2.val) FROM 𝑅1, 𝑅2 WHERE 𝑅1.col= 𝑅2.row GROUP BY 𝑅1.row, 𝑅2.col
  • 15. Matrix Multiplication (cont.) Dept. of ICT, MBSTU 15 Fig : Database representation
  • 16. Matrix Multiplication (cont.)  Chain way join : • Eq.(1) typical method,serial two-way join (S2). Separate MR job, Intra-operation parallelism. • Eq.(2) parallel two-way join (P2). Inter-operation parallelism. and simultaneously • Eq.(3) parallel m-way join (PM) Dept. of ICT, MBSTU 16 ((A *B) * (C *D))= (2) (A * B * C * D)= (3) (((A *B) * C) * D)= (1)A * B * C * D A * B C * D
  • 17. Matrix Multiplication (cont.)  Parallel M-way join : • S2(n-1) = 4 • P2 = 3 • PM = 2 Dept. of ICT, MBSTU 17 Input : Relations M1, M2,…. Mn representing matrices 1: LIST_Mnext <= M1, M2,…. Mn 2: while |LIST_Mnext|> 1 do 3: for I = 1 to |LIST_Mnext | do 4: if ( i mod m ) == 1 then 5: add Mi to LIST_Mleft 6: Mleft = Mi 7: else 8: add Mi to LIST_Mright ( Mleft ) 9: end if 10: end for 11: LIST_Mnext = doMR-PM (LIST_Mleft,LIST_Mright ) 12: end while M1 M4 M5M2 M3M1 M1 M4 <1st MR job> <2nd MR job> < result > Fig : Example of parallel 3 way Fig : Algorithm for PM join [𝑙𝑜𝑔2 𝑛 ] [𝑙𝑜𝑔 𝑚 𝑛 ]
  • 18. Matrix Multiplication (cont.)  Efficiency of m-way join : • MR job iteration • Time  Limitation : • Join key number • Greater network and sorting overhead Dept. of ICT, MBSTU 18 Fig : PM Join key
  • 19. Future study and Proposed Work  Future study : • Amazon EC2 • Apache Whirr tools • Larger graph datasets to matrix • Hadoop , more Papers  Proposed work : • PM with the raw key. • This improvement should reduce the number of duplications and increase the diversity of the join key. • MapReduce framework that does not perform sort operations in mappers. Dept. of ICT, MBSTU 19
  • 20. Conclusion In this presentation, i explained the multiplication of matrices into a multi-way join operation s. The implementation of three types algorithms: S2, P2, and PM. Parallel m-way join operation can improve the performance of the matrix chain multiplication process. However, using the composite key introduces a number of disadvantages, such as greater network and sorting overhead. Finally i propose Parallel m-way join operation with raw key to make it optimal. Dept. of ICT, MBSTU 20
  • 21. References  Apache hadoop. Website. http://hadoop.apache.org  http://www.sas.com/en_us/insights/big-data/hadoop.html  Zikopoulos, P. C., Eaton, C., DeRoos, D., Deutsch, T., & Lapis, G. (2012). Understanding big data. New York et al: McGraw-Hill.  Myung, J., & Lee, S. G. (2012, February). Matrix chain multiplication via multi-way join algorithms in MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (p. 53). ACM.  J. Dean and S. Ghemawat Map-Reduce: simplified data processing on large clusters. Dept. of ICT, MBSTU 21