SlideShare a Scribd company logo
1 of 13
Project Progress
What we’ve been doing(1)
 • Hacking Hadoop API.
 • Writing different kinds of programs to
   understand it. (Not CV programs)
 • Adaboost
 • SIFT, SURF
 • Reading, Reading
Segmentation

ROI   ROI
segmentation with overlap


             get SIFT/SURF descriptor for partial segments


              reduce no. of descriptors by grouping them.


region of interest (positive&negative)

          count the frequency of occurrence of visual words


                               AdaBoost
Methodology

• For simplicity, assume the the same image is
  stored on all slave nodes.
• Use ROI to run the algorithm.
• Hopefully this will make it easier for the
  “Reduce”
Map-Reduce???
• It’s just a framework
• You can also implement it by reading the
  paper[1]. :)
• Hadoop is one implementation. (Apache +
  Yahoo)
• Google’s implementation is not made
  public.
Map-Reduce for Machine
 Learning on Multi-core
Introduction

• Algorithm fitting Statistical Query Model
  may be written in a certain “summation
  form”
• Divide into data set into as many pieces as
  the number of cores.
• Algorithm fitting Statistical Query Model may be
  written in a certain “summation form”
• Divide into data set into as many pieces as the number
  of cores.
Algorithms(1)
• Locally Weight Linear Regression
• Naive Bayes
• Gaussian Discriminative Analysis
• k-means
• Logistic Regression
• Neural Network
Algorithms(2)

• Principal Components Analysis
• Independent Components Analysis
• Expansion Maximization
• Support Vector Machine
Example (LWLR)


          divide the computation among different mappers to compute:




2 reducers sum up the partial values for A and b and finally computes the solution
Experiment Result
• Used UCI Machine Learning repository
• Used only 2 cores.
• 1.9x times faster
• 54 times speed up on 64 cores.
• Speed up is achieved by “throwing cores”
  only

More Related Content

What's hot

Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopJosh Patterson
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software FoundationShalin Shekhar Mangar
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
3rd Hivemall meetup
3rd Hivemall meetup3rd Hivemall meetup
3rd Hivemall meetupMakoto Yui
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningMakoto Yui
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit
 
Deep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDeep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDatabricks
 
Spark_Intro_Syed_Academy
Spark_Intro_Syed_AcademySpark_Intro_Syed_Academy
Spark_Intro_Syed_AcademySyed Hadoop
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 

What's hot (20)

Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza Karimi
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
3rd Hivemall meetup
3rd Hivemall meetup3rd Hivemall meetup
3rd Hivemall meetup
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital Kedia
 
Deep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDeep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAI
 
Spark_Intro_Syed_Academy
Spark_Intro_Syed_AcademySpark_Intro_Syed_Academy
Spark_Intro_Syed_Academy
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 

Viewers also liked

Wildi 2009 Resume Addendum
Wildi 2009 Resume  AddendumWildi 2009 Resume  Addendum
Wildi 2009 Resume AddendumWildi
 
OW2con'14 - Nanoko, 2 years feedback, Ubidreams
OW2con'14 - Nanoko, 2 years feedback, UbidreamsOW2con'14 - Nanoko, 2 years feedback, Ubidreams
OW2con'14 - Nanoko, 2 years feedback, UbidreamsOW2
 
Chapter 13
Chapter 13Chapter 13
Chapter 13dphil002
 
Microsoft Power Point Customview360 Linked In
Microsoft Power Point   Customview360 Linked InMicrosoft Power Point   Customview360 Linked In
Microsoft Power Point Customview360 Linked InMichiel Castelijns
 
Billboard Liberation Front - Steve Lambert
Billboard Liberation Front - Steve LambertBillboard Liberation Front - Steve Lambert
Billboard Liberation Front - Steve LambertCrisis 999
 
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...OW2
 
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris. OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris. OW2
 
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...OW2
 
Kalimucho Research Project, OW2con11, Nov 24-25, Paris
Kalimucho Research Project, OW2con11, Nov 24-25, ParisKalimucho Research Project, OW2con11, Nov 24-25, Paris
Kalimucho Research Project, OW2con11, Nov 24-25, ParisOW2
 
NFPA Presentation Social Media
NFPA Presentation Social MediaNFPA Presentation Social Media
NFPA Presentation Social Mediatellem
 
Git, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомGit, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомAlex Musayev
 
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...OW2
 
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...OW2
 
Big Data with SpagoBI. OW2con'15, November 17, Paris.
Big Data with SpagoBI. OW2con'15, November 17, Paris. Big Data with SpagoBI. OW2con'15, November 17, Paris.
Big Data with SpagoBI. OW2con'15, November 17, Paris. OW2
 
Slide Boothphotos
Slide BoothphotosSlide Boothphotos
Slide Boothphotosparisyoyo
 
Hahn Golf Academia & Club
Hahn Golf Academia & ClubHahn Golf Academia & Club
Hahn Golf Academia & ClubCsaba Hahn
 
Adivina Que Ciudad Es
Adivina Que Ciudad EsAdivina Que Ciudad Es
Adivina Que Ciudad Esalfcoltrane
 

Viewers also liked (20)

Wildi 2009 Resume Addendum
Wildi 2009 Resume  AddendumWildi 2009 Resume  Addendum
Wildi 2009 Resume Addendum
 
OW2con'14 - Nanoko, 2 years feedback, Ubidreams
OW2con'14 - Nanoko, 2 years feedback, UbidreamsOW2con'14 - Nanoko, 2 years feedback, Ubidreams
OW2con'14 - Nanoko, 2 years feedback, Ubidreams
 
Chapter 13
Chapter 13Chapter 13
Chapter 13
 
Microsoft Power Point Customview360 Linked In
Microsoft Power Point   Customview360 Linked InMicrosoft Power Point   Customview360 Linked In
Microsoft Power Point Customview360 Linked In
 
Billboard Liberation Front - Steve Lambert
Billboard Liberation Front - Steve LambertBillboard Liberation Front - Steve Lambert
Billboard Liberation Front - Steve Lambert
 
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
 
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris. OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
 
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
 
Kalimucho Research Project, OW2con11, Nov 24-25, Paris
Kalimucho Research Project, OW2con11, Nov 24-25, ParisKalimucho Research Project, OW2con11, Nov 24-25, Paris
Kalimucho Research Project, OW2con11, Nov 24-25, Paris
 
NFPA Presentation Social Media
NFPA Presentation Social MediaNFPA Presentation Social Media
NFPA Presentation Social Media
 
Git, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомGit, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентом
 
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
 
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
 
Los 88 pelda+os del +ëxitov 02
Los 88 pelda+os del +ëxitov 02Los 88 pelda+os del +ëxitov 02
Los 88 pelda+os del +ëxitov 02
 
Serpica Naro
Serpica NaroSerpica Naro
Serpica Naro
 
Big Data with SpagoBI. OW2con'15, November 17, Paris.
Big Data with SpagoBI. OW2con'15, November 17, Paris. Big Data with SpagoBI. OW2con'15, November 17, Paris.
Big Data with SpagoBI. OW2con'15, November 17, Paris.
 
Slide Boothphotos
Slide BoothphotosSlide Boothphotos
Slide Boothphotos
 
Hahn Golf Academia & Club
Hahn Golf Academia & ClubHahn Golf Academia & Club
Hahn Golf Academia & Club
 
Adivina Que Ciudad Es
Adivina Que Ciudad EsAdivina Que Ciudad Es
Adivina Que Ciudad Es
 

Similar to Project Progress

High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)Jose Luis Lopez Pino
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionDong Ngoc
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
SSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJSSSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJSEugene Lazutkin
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Ryuichi ITO
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in HadoopAnalyticsWeek
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore
 

Similar to Project Progress (20)

High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Implementing your own Google App Engine
Implementing your own Google App Engine Implementing your own Google App Engine
Implementing your own Google App Engine
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
SSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJSSSJS, NoSQL, GAE and AppengineJS
SSJS, NoSQL, GAE and AppengineJS
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
JavaFX 101
JavaFX 101JavaFX 101
JavaFX 101
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 

Project Progress

  • 2. What we’ve been doing(1) • Hacking Hadoop API. • Writing different kinds of programs to understand it. (Not CV programs) • Adaboost • SIFT, SURF • Reading, Reading
  • 4. segmentation with overlap get SIFT/SURF descriptor for partial segments reduce no. of descriptors by grouping them. region of interest (positive&negative) count the frequency of occurrence of visual words AdaBoost
  • 5. Methodology • For simplicity, assume the the same image is stored on all slave nodes. • Use ROI to run the algorithm. • Hopefully this will make it easier for the “Reduce”
  • 6. Map-Reduce??? • It’s just a framework • You can also implement it by reading the paper[1]. :) • Hadoop is one implementation. (Apache + Yahoo) • Google’s implementation is not made public.
  • 7. Map-Reduce for Machine Learning on Multi-core
  • 8. Introduction • Algorithm fitting Statistical Query Model may be written in a certain “summation form” • Divide into data set into as many pieces as the number of cores.
  • 9. • Algorithm fitting Statistical Query Model may be written in a certain “summation form” • Divide into data set into as many pieces as the number of cores.
  • 10. Algorithms(1) • Locally Weight Linear Regression • Naive Bayes • Gaussian Discriminative Analysis • k-means • Logistic Regression • Neural Network
  • 11. Algorithms(2) • Principal Components Analysis • Independent Components Analysis • Expansion Maximization • Support Vector Machine
  • 12. Example (LWLR) divide the computation among different mappers to compute: 2 reducers sum up the partial values for A and b and finally computes the solution
  • 13. Experiment Result • Used UCI Machine Learning repository • Used only 2 cores. • 1.9x times faster • 54 times speed up on 64 cores. • Speed up is achieved by “throwing cores” only