SlideShare una empresa de Scribd logo
1 de 32
Big Data Project
on
Crystal Ball
Submitted By:
Sushil Sedai(984474)
Suvash Shah(984461)
Submitted to:
Prof. Prem Nair
Pair approach (Mapper) – pseudo
code
method map(docid id, doc d)
for each term w in doc d do
total = 0;
for each neighbor u in Neighbor(w) do
Emit(Pair(w, u), 1);
total++;
Emit(Pair(w, *), total);
Pair approach (Mapper) – Java
Code
Pair approach (Reducer) – pseudo
code
method reduce(Pair p, Iterable<Int> values)
if p.secondValue == *
if p.firstValue is new
currentvalue = p.firstvalue;
marginal = sum(values)
else
marginal += sum(values)
else
Emit(p, sum(values)/marginal);
Pair approach (Reducer) – Java
Code
Pair approach - input
Mapper1 input
18 29 12 34 79 18 56 12 34 92
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Pair approach – Output (Reducer1)
(10,12) 0.5
(10,34) 0.5
(12,10) 0.09090909090909091
(12,18) 0.09090909090909091
(12,34) 0.36363636363636365
(12,56) 0.18181818181818182
(12,79) 0.09090909090909091
(12,92) 0.18181818181818182
(18,12) 0.25
(18,29) 0.125
(18,34) 0.25
(18,56) 0.125
(18,79) 0.125
(18,92) 0.125
(29,10) 0.06666666666666667
(29,12) 0.26666666666666666
(29,18) 0.06666666666666667
(29,34) 0.26666666666666666
(29,56) 0.13333333333333333
(29,79) 0.06666666666666667
(29,92) 0.13333333333333333
(34,10) 0.08333333333333333
(34,12) 0.25
(34,18) 0.08333333333333333
(34,29) 0.08333333333333333
(34,56) 0.25
(34,79) 0.08333333333333333
(34,92) 0.16666666666666666
(56,10) 0.1
(56,12) 0.3
(56,29) 0.1
(56,34) 0.3
(56,92) 0.2
(92,10) 0.3333333333333333
(92,12) 0.3333333333333333
(92,34) 0.3333333333333333
Pair approach – Output (Reducer2)
(79,12) 0.2
(79,18) 0.2
(79,34) 0.2
(79,56) 0.2
(79,92)0.2
Stripe approach (Mapper) –
pseudo code
method map(docid id, doc d)
Stripe H;
for each term w in doc d do
clear(H);
for each neighbor u in Neighbor(w) do
if H.containsKey(u)
H{u} += 1;
else
H.add(u, 1);
Emit(w, H);
Stripe approach (Mapper) – Java
Code
Stripe approach (Reducer) –
pseudo code
total = 0;
method reduce(Text key, Stripe H [H1, H2, …])
total = sumValues(H);
for each Item h in H do
h.secondValue /= total;
Emit(key, H);
Stripe approach (Reducer) – Java
Code
Stripe appoach (Reducer) – Java
Code
Stripe approach – input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Stripe approach –
Output(Reducer1)
10 [ (34,0.5000) (12,0.5000) ]
12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ]
18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ]
29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667)
(12,0.2667) ]
34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833)
(12,0.2500) ]
56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ]
92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
Stripe approach –
Output(Reducer2)
79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000)
(12,0.2000) ]
Hybrid approach (Mapper) –
pseudo code
method map(docid id, doc d)
HashMap H;
for each term w in doc d do
for each neighbor u in Neighbor(w) do
if H.contains(Pair(w, u))
H{Pair(w, u)} += 1;
else
H.add(Pair(w, u));
for each Pair p in H do
Emit(p, H(p));
Hybrid approach (Mapper) – Java
Code
Hybrid approach (Reducer) –
pseudo codeprev = null;
HashMap H;
Method reduce(Pair p, Iterable<Int> values)
if p.firstValue != prev and not first
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
clear(H);
End if
prev = p.firstValue;
H.add(p.secondValue, sum(values));
Method close
//for last pair
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
Hybrid approach (Reducer) – Java
Code
Hybrid approach (Reducer) – Java
Code
Hybrid approach - Input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Hybrid approach –
Output(Reducer1)
10 (12,0.5) (34,0.5)
12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909)
(92,0.18181819)
18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125)
29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334)
(79,0.06666667) (92,0.13333334)
34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336)
(92,0.16666667)
56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2)
92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
Hybrid approach –
Output(Reducer2)
79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
Comparison
Apache Spark
Write a java program on spark to calculate total number of
students in MUM coming in different entries.This program
should display total number student by country.
Spark - Java Code
Spark - input
2014 Feb Nepal 20
2014 Feb India 15
2014 Oct Italy 2
2014 July France 1
2015 Feb Nepal 10
2015 Feb India 25
2015 Oct Italy 7
Spark - Output
(France,1)
(Italy,9)
(Nepal,30)
(India,40)
Tools Used
• VMPlayer Pro 7
• cloudera-quickstart-vm-5.4.0-0-vmware
• EclipseVersion: Luna Service Release 2 (4.4.2)
• Windows 8.1
References
• http://glebche.appspot.com/static/hadoop-
ecosystem/mapreduce-job-java.html
• https://hadoopi.wordpress.com/2013/06/05/hadoop-
implementing-the-tool-interface-for-mapreduce-driver/
• http://www.bogotobogo.com/Hadoop/BigData_hadoop_
Apache_Spark.php
ThankYou

Más contenido relacionado

La actualidad más candente

OPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHOPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHCool Guy
 
Bellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksBellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksSimranJain63
 
Lecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsLecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsAakash deep Singhal
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojurePaul Lam
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 
Monads from Definition
Monads from DefinitionMonads from Definition
Monads from DefinitionDierk König
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial NetsJinho Lee
 
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)Kobkrit Viriyayudhakorn
 
Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Takao Wada
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk TomasSri Ambati
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 HokkaidoIntroduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 Hokkaidoikegami__
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Namgee Lee
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
A Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman ProblemA Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman Problemvsubhashini
 

La actualidad más candente (20)

OPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHOPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCH
 
Bellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksBellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer Networks
 
Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014
 
Lecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsLecture 15 data structures and algorithms
Lecture 15 data structures and algorithms
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Monads from Definition
Monads from DefinitionMonads from Definition
Monads from Definition
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial Nets
 
R
RR
R
 
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
 
Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk Tomas
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 HokkaidoIntroduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 Hokkaido
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303
 
Aaex2 group2
Aaex2 group2Aaex2 group2
Aaex2 group2
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
A Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman ProblemA Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman Problem
 

Similar a CrystalBall - Compute Relative Frequency in Hadoop

Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkJivan Nepali
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfShiwani Gupta
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological ApproachDon Sheehy
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd Iaetsd
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programsAmit Kapoor
 
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...Revolution Analytics
 
Dic rd theory_quantization_07
Dic rd theory_quantization_07Dic rd theory_quantization_07
Dic rd theory_quantization_07wtyru1989
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileKushagraChadha1
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkAshutosh Trivedi
 
Geolocation on Rails
Geolocation on RailsGeolocation on Rails
Geolocation on Railsnebirhos
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationVector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationJanindu Arukgoda
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 

Similar a CrystalBall - Compute Relative Frequency in Hadoop (20)

Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdf
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Jan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoopJan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoop
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detection
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programs
 
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
 
Dic rd theory_quantization_07
Dic rd theory_quantization_07Dic rd theory_quantization_07
Dic rd theory_quantization_07
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical File
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache Spark
 
Geolocation on Rails
Geolocation on RailsGeolocation on Rails
Geolocation on Rails
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationVector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 

Último

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 

Último (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

CrystalBall - Compute Relative Frequency in Hadoop

  • 1. Big Data Project on Crystal Ball Submitted By: Sushil Sedai(984474) Suvash Shah(984461) Submitted to: Prof. Prem Nair
  • 2. Pair approach (Mapper) – pseudo code method map(docid id, doc d) for each term w in doc d do total = 0; for each neighbor u in Neighbor(w) do Emit(Pair(w, u), 1); total++; Emit(Pair(w, *), total);
  • 3. Pair approach (Mapper) – Java Code
  • 4. Pair approach (Reducer) – pseudo code method reduce(Pair p, Iterable<Int> values) if p.secondValue == * if p.firstValue is new currentvalue = p.firstvalue; marginal = sum(values) else marginal += sum(values) else Emit(p, sum(values)/marginal);
  • 5. Pair approach (Reducer) – Java Code
  • 6. Pair approach - input Mapper1 input 18 29 12 34 79 18 56 12 34 92 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 7. Pair approach – Output (Reducer1) (10,12) 0.5 (10,34) 0.5 (12,10) 0.09090909090909091 (12,18) 0.09090909090909091 (12,34) 0.36363636363636365 (12,56) 0.18181818181818182 (12,79) 0.09090909090909091 (12,92) 0.18181818181818182 (18,12) 0.25 (18,29) 0.125 (18,34) 0.25 (18,56) 0.125 (18,79) 0.125 (18,92) 0.125 (29,10) 0.06666666666666667 (29,12) 0.26666666666666666 (29,18) 0.06666666666666667 (29,34) 0.26666666666666666 (29,56) 0.13333333333333333 (29,79) 0.06666666666666667 (29,92) 0.13333333333333333 (34,10) 0.08333333333333333 (34,12) 0.25 (34,18) 0.08333333333333333 (34,29) 0.08333333333333333 (34,56) 0.25 (34,79) 0.08333333333333333 (34,92) 0.16666666666666666 (56,10) 0.1 (56,12) 0.3 (56,29) 0.1 (56,34) 0.3 (56,92) 0.2 (92,10) 0.3333333333333333 (92,12) 0.3333333333333333 (92,34) 0.3333333333333333
  • 8. Pair approach – Output (Reducer2) (79,12) 0.2 (79,18) 0.2 (79,34) 0.2 (79,56) 0.2 (79,92)0.2
  • 9. Stripe approach (Mapper) – pseudo code method map(docid id, doc d) Stripe H; for each term w in doc d do clear(H); for each neighbor u in Neighbor(w) do if H.containsKey(u) H{u} += 1; else H.add(u, 1); Emit(w, H);
  • 10. Stripe approach (Mapper) – Java Code
  • 11. Stripe approach (Reducer) – pseudo code total = 0; method reduce(Text key, Stripe H [H1, H2, …]) total = sumValues(H); for each Item h in H do h.secondValue /= total; Emit(key, H);
  • 12. Stripe approach (Reducer) – Java Code
  • 13. Stripe appoach (Reducer) – Java Code
  • 14. Stripe approach – input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 15. Stripe approach – Output(Reducer1) 10 [ (34,0.5000) (12,0.5000) ] 12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ] 18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ] 29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667) (12,0.2667) ] 34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833) (12,0.2500) ] 56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ] 92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
  • 16. Stripe approach – Output(Reducer2) 79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000) (12,0.2000) ]
  • 17. Hybrid approach (Mapper) – pseudo code method map(docid id, doc d) HashMap H; for each term w in doc d do for each neighbor u in Neighbor(w) do if H.contains(Pair(w, u)) H{Pair(w, u)} += 1; else H.add(Pair(w, u)); for each Pair p in H do Emit(p, H(p));
  • 18. Hybrid approach (Mapper) – Java Code
  • 19. Hybrid approach (Reducer) – pseudo codeprev = null; HashMap H; Method reduce(Pair p, Iterable<Int> values) if p.firstValue != prev and not first total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H); clear(H); End if prev = p.firstValue; H.add(p.secondValue, sum(values)); Method close //for last pair total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H);
  • 20. Hybrid approach (Reducer) – Java Code
  • 21. Hybrid approach (Reducer) – Java Code
  • 22. Hybrid approach - Input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 23. Hybrid approach – Output(Reducer1) 10 (12,0.5) (34,0.5) 12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909) (92,0.18181819) 18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125) 29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334) (79,0.06666667) (92,0.13333334) 34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336) (92,0.16666667) 56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2) 92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
  • 24. Hybrid approach – Output(Reducer2) 79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
  • 26. Apache Spark Write a java program on spark to calculate total number of students in MUM coming in different entries.This program should display total number student by country.
  • 27. Spark - Java Code
  • 28. Spark - input 2014 Feb Nepal 20 2014 Feb India 15 2014 Oct Italy 2 2014 July France 1 2015 Feb Nepal 10 2015 Feb India 25 2015 Oct Italy 7
  • 30. Tools Used • VMPlayer Pro 7 • cloudera-quickstart-vm-5.4.0-0-vmware • EclipseVersion: Luna Service Release 2 (4.4.2) • Windows 8.1