SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Apache Hama 
Big Data 2nd Generation, 
Big Compute and Big Insight! 
2014 Samsung Open Source Conference
Edward J. Yoon @ 
eddieyoon 
edwardyoon@apache.org
1. Big Compute Platform based on Apache Hama. 
2. Big Data Crowdsourcing Service: datacrowds.com
1. Big Data Trends 
2. What’s Hama? 
3. Future Architecture of Hama
Big Data Analytics 
● Large-scale unstructured data processing 
● Statistics and Data mining 
To mine the valuable insights
HDFS 
Data Gathering 
(Flume) 
SQL-on-Hadoop 
Pig, Hive, Tez, 
Impala, Presto, …, 
etc. 
MapReduce 
? 
Legacy DW 
or OLAP
HDFS 
Real-time Data 
Processing 
(Storm, .., etc) 
In-Memory or Message-Passing 
Spark, Hama, GraphLab, Giraph, Flink, …, etc
2. In-Memory 
and Message-Passing 
Spark, 
Hama, 
Giraph, 
Storm 
1. MapReduce 
and SQL-on-Hadoop 
Mahout, 
Pig, 
Hive, ... 
2007 2010 2014
● 1990 ~ : Web Documents 
Web 2.0 
Blog, Open API 
Smartphone 
Social Network 
● ~ 2014 : Responsive Apps for multi-devices
● 1990 ~ : Server/Web Hosting 
Google Apps 
Cloud Computing 
IaaS, PaaS, SaaS 
● ~ 2014 : Cloud/App Hosting
● 1990 ~ : Text processing and mining 
MapReduce 
SQL-on-Hadoop 
In-memory or Message-Passing 
● ~ 2014 : Matrix, Mining Networks and Graphs
Hama[hɑ́ːma] is a general-purpose 
BSP computing engine on Top of Hadoop
1 / 154
Apache Hama is 
listed on the Best Open 
Source Big Data tools, 
Bossie Awards 2013
Streaming Graph Machine Learning Incremental Learning 
Hama 
(General-purpose BSP) 
O O O O 
Spark (Databricks) 
(In-memory MapReduce) 
O O 
(GraphX) 
O X 
GraphLab 
(Asynchronous graph computing) 
X O O O 
(Limited) 
Giraph 
(BSP-based graph computing) 
X O X X
Think about the Spam Filtering of Google’s Gmail! 
Real-time Data 
Processing Apache Hama 
HDFS
M 
M 
M 
Twitter 
Twitter 
1. Streaming Job: 
Group1: 
Filter 
mentions 
M 
M 
Group2: 
Extract node and 
edge 
iteration 1 iteration 2 … 
G 
G 
G 
Twitter 
Another Example 
2. Graph Job: 
Streaming graph updates while 
computing 
Direct 
Network 
Transfer 
Multi-BSP Job on Apache Hama 
G 
G 
G 
…
Appendix. 
Why all platforms uses BSP-style for graph-parallel?
MR version: Shortest Path 
- A map task receives a node n as a key, and (D, points-to) 
as its value 
- D is the distance to the node from the start 
- points-to is a list of nodes reachable from n 
∀p ∈ points-to, emit (p, D+1) 
- Reduce task gathers possible distances to a given p and 
selects the minimum one
BSP version: Shortest Path 
1: (2, 4) 
2: (1, 3, 4) 
3: (2) 
4: (1, 2, 5, 6) 
5: (4) 
6: (4, 7) 
7: (6) 
1: (2, 4) D(0) 
2: (1, 3, 4) D(1) 
3: (2) D(2) 
4: (1, 2, 5, 6) D(1) 
5: (4) D(2) 
6: (4, 7) D(2) 
7: (6) D(3)
Why Google’s Pregel (graph) 
and DistBelief (deep learning) uses BSP-style?
Apache Hama’s Advanced Analytics Examples: 
● Sparse Matrix-Vector Multiplication 
● Semi-Clustering 
● K-means Clustering 
● Neural Networks 
● Gradient Descent 
● PageRank 
● Single Source Shortest Path 
● Bipartite Matching
Hama Supports: 
Hadoop 1.0 
Hama on Hadoop 2.0 YARN 
Hama on Apache Mesos
Apache Hama 
at Sogou
Sogou.com runs 7,200 cores Hama cluster for SiteRank. 
● SiteRank is the ranking generated by applying the 
classical PageRank algorithm to the graph of Web sites. 
● Dataset is about 400GB, contains 6 Billion edges.
The future features of Apache Hama: 
Kryo serialization, Rootbeer GPU acceleration (Martin 
Illecker, University of Innsbruck), …, etc.
References 
● Hama Website http://hama.apache.org/ 
● Scientific Computing in the Cloud with Apache Hadoop 
and Apache Hama on GPU by Martin Illecker
If you want to be one of us, be one of us. 
Thanks!

Más contenido relacionado

La actualidad más candente

Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
sscdotopen
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 

La actualidad más candente (20)

A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
 

Destacado

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
Edward Yoon
 
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
Αννα Παππα
 

Destacado (9)

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
What's in your filter bubble? Or, how has the internet censored you today?
What's in your filter bubble? Or, how has the internet censored you today?What's in your filter bubble? Or, how has the internet censored you today?
What's in your filter bubble? Or, how has the internet censored you today?
 
Why Web 3.0?
Why Web 3.0?Why Web 3.0?
Why Web 3.0?
 
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...Τα   παιδιά   ως   κοινωνικοί   ερευνητές.   Οδηγός   για   δασκάλους   και  ...
Τα παιδιά ως κοινωνικοί ερευνητές. Οδηγός για δασκάλους και ...
 
Web 1.0 2.0 3.0 características, definiciones, ejemplos.
Web 1.0 2.0 3.0 características, definiciones, ejemplos.Web 1.0 2.0 3.0 características, definiciones, ejemplos.
Web 1.0 2.0 3.0 características, definiciones, ejemplos.
 
Web 3.0 Intro
Web 3.0 IntroWeb 3.0 Intro
Web 3.0 Intro
 
Web 3.0
Web 3.0Web 3.0
Web 3.0
 
Web 3.0 The Semantic Web
Web 3.0 The Semantic WebWeb 3.0 The Semantic Web
Web 3.0 The Semantic Web
 
Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0
 

Similar a Apache Hama at Samsung Open Source Conference

Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
 

Similar a Apache Hama at Samsung Open Source Conference (20)

Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Data Science
Data ScienceData Science
Data Science
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 

Más de Edward Yoon (11)

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introduction
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in clouds
 
Apache hama 0.2-userguide
Apache hama 0.2-userguideApache hama 0.2-userguide
Apache hama 0.2-userguide
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
Heart Proposal
Heart ProposalHeart Proposal
Heart Proposal
 

Último

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Último (20)

Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 

Apache Hama at Samsung Open Source Conference

  • 1. Apache Hama Big Data 2nd Generation, Big Compute and Big Insight! 2014 Samsung Open Source Conference
  • 2. Edward J. Yoon @ eddieyoon edwardyoon@apache.org
  • 3. 1. Big Compute Platform based on Apache Hama. 2. Big Data Crowdsourcing Service: datacrowds.com
  • 4. 1. Big Data Trends 2. What’s Hama? 3. Future Architecture of Hama
  • 5. Big Data Analytics ● Large-scale unstructured data processing ● Statistics and Data mining To mine the valuable insights
  • 6. HDFS Data Gathering (Flume) SQL-on-Hadoop Pig, Hive, Tez, Impala, Presto, …, etc. MapReduce ? Legacy DW or OLAP
  • 7. HDFS Real-time Data Processing (Storm, .., etc) In-Memory or Message-Passing Spark, Hama, GraphLab, Giraph, Flink, …, etc
  • 8. 2. In-Memory and Message-Passing Spark, Hama, Giraph, Storm 1. MapReduce and SQL-on-Hadoop Mahout, Pig, Hive, ... 2007 2010 2014
  • 9. ● 1990 ~ : Web Documents Web 2.0 Blog, Open API Smartphone Social Network ● ~ 2014 : Responsive Apps for multi-devices
  • 10. ● 1990 ~ : Server/Web Hosting Google Apps Cloud Computing IaaS, PaaS, SaaS ● ~ 2014 : Cloud/App Hosting
  • 11. ● 1990 ~ : Text processing and mining MapReduce SQL-on-Hadoop In-memory or Message-Passing ● ~ 2014 : Matrix, Mining Networks and Graphs
  • 12. Hama[hɑ́ːma] is a general-purpose BSP computing engine on Top of Hadoop
  • 13.
  • 15. Apache Hama is listed on the Best Open Source Big Data tools, Bossie Awards 2013
  • 16.
  • 17. Streaming Graph Machine Learning Incremental Learning Hama (General-purpose BSP) O O O O Spark (Databricks) (In-memory MapReduce) O O (GraphX) O X GraphLab (Asynchronous graph computing) X O O O (Limited) Giraph (BSP-based graph computing) X O X X
  • 18. Think about the Spam Filtering of Google’s Gmail! Real-time Data Processing Apache Hama HDFS
  • 19. M M M Twitter Twitter 1. Streaming Job: Group1: Filter mentions M M Group2: Extract node and edge iteration 1 iteration 2 … G G G Twitter Another Example 2. Graph Job: Streaming graph updates while computing Direct Network Transfer Multi-BSP Job on Apache Hama G G G …
  • 20. Appendix. Why all platforms uses BSP-style for graph-parallel?
  • 21. MR version: Shortest Path - A map task receives a node n as a key, and (D, points-to) as its value - D is the distance to the node from the start - points-to is a list of nodes reachable from n ∀p ∈ points-to, emit (p, D+1) - Reduce task gathers possible distances to a given p and selects the minimum one
  • 22. BSP version: Shortest Path 1: (2, 4) 2: (1, 3, 4) 3: (2) 4: (1, 2, 5, 6) 5: (4) 6: (4, 7) 7: (6) 1: (2, 4) D(0) 2: (1, 3, 4) D(1) 3: (2) D(2) 4: (1, 2, 5, 6) D(1) 5: (4) D(2) 6: (4, 7) D(2) 7: (6) D(3)
  • 23. Why Google’s Pregel (graph) and DistBelief (deep learning) uses BSP-style?
  • 24.
  • 25. Apache Hama’s Advanced Analytics Examples: ● Sparse Matrix-Vector Multiplication ● Semi-Clustering ● K-means Clustering ● Neural Networks ● Gradient Descent ● PageRank ● Single Source Shortest Path ● Bipartite Matching
  • 26. Hama Supports: Hadoop 1.0 Hama on Hadoop 2.0 YARN Hama on Apache Mesos
  • 27.
  • 28. Apache Hama at Sogou
  • 29. Sogou.com runs 7,200 cores Hama cluster for SiteRank. ● SiteRank is the ranking generated by applying the classical PageRank algorithm to the graph of Web sites. ● Dataset is about 400GB, contains 6 Billion edges.
  • 30.
  • 31. The future features of Apache Hama: Kryo serialization, Rootbeer GPU acceleration (Martin Illecker, University of Innsbruck), …, etc.
  • 32.
  • 33.
  • 34. References ● Hama Website http://hama.apache.org/ ● Scientific Computing in the Cloud with Apache Hadoop and Apache Hama on GPU by Martin Illecker
  • 35. If you want to be one of us, be one of us. Thanks!