SlideShare a Scribd company logo
1 of 16
By :: Jaideep Katkar
Under the Guidance of :: Dr. Tran Thanh
GraphLab Overview
A New Framework For Parallel Machine
Learning
– high-level abstractions for machine
learning problems
– Shared-memory multiprocessor
– Assume no fault tolerance needed
– Concurrent access precessing models with
sequential-consistency guarantees
How GraphLab Works?
– Represent the user's data by a directed graph
– Each block of data is represented by a vertex
and a directed edge
– Shared data table
– User functions:
 Update: modify the vertex and edges state, read
only to shared table
 Fold: sequential aggregation to a key entry in the
shared table, modify vertex data
 Merge: Parallelize Fold function
 Apply: Finalize the key entry in the shared table
GAS Decomposition
GraphLab Toolkit
 Topic Modeling contains applications like LDA which can be used to
cluster documents and extract topical representations.
 Graph Analytics contains application like pagerank and triangle
counting which can be applied to general graphs to estimate
community structure.
 Clustering contains standard data clustering tools such as Kmeans
 Collaborative Filtering contains a collection of applications used to
make predictions about users interests and factorize large matrices.
 Graphical Models contains tools for making joint predictions about
collections of related random variables.
 Computer Vision contains a collection of tools for reasoning about
images.
Running GraphLab on EC2 Cluster
Requirements ::
• You should have Amazon EC2 account eligible to run on us-east-1a zone.
• Amazon AWS console your AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY (under your account name on the top right
corner-> security credentials -> access keys)
• You should have a keypair attached to the zone you are running on (in our
example us-east-1a)
• Install boto. This is the AWS Python client. To install, run: ‘sudo pip boto’.
• Download and install Graphlab as mentioned on next slides.
Satisfying Dependencies on Ubuntu
All the dependencies can be satisfied from the repository:
Below command will install gcc , jdk need to compile graphlab Programs:
Downloading GraphLab version 2.2
You can download GraphLab directly from our Github Repository.
Github also offers a zip download of the repository if you do not have
git. The git command line for cloning the repository is:
Compiling and Running Graphlab
In the graphlabapi directory, will create two sub-directories, release/ and
debug/ . cd into either of these directories and running make will build the
release or the debug versions respectively. Note that this will compile all of
GraphLab, including all toolkits.
Running Stochastic gradient descent (SGD) in
Collaborative Filtering toolkit
The collaborative filtering toolkit contains tools for computing a linear model
of the data, and predicting missing values based on this linear model. This is
useful when computing recommendations for users
http://docs.graphlab.org/collaborative_filtering.html
Running SGD for Netflix Data to predict
User Rating
Input File (Training) for Netflix Data
[User] [item] [rating]
1000 2 5.0
3 7 12.0
6 2 2.1
Creating Directory to load Netflix data
Command Line Arguments to Run SGD
--gamma=XX Gradient descent step size
--lambda=XX Gradient descent regularization
--step_dec=XX Multiplicative step decrease. Should be between 0.1
to 1. Default is 0.9.
--D=X Feature vector width. Common values are 20 - 150.
--max_iter=XX Max number of iterations
--maxval=XX Maximum allowed rating
--minval=XX Min allowed rating
--predictions=XX File name to write prediction to. Note that you will
need a user/item pair input file named something. predict to enable
predictions (see section: ratings).
--tol=XX Stop computation when absolute error of prediction is less
than tolerance. Default is 1e-3.
O/P file
SGD is a simple gradient descent algorithm. Prediction in SGD is
done as : r_ui = p_u * q_i Where r_ui is a scalar rating of user u to
item i, and p_u is the user feature vector of size D, q_i is the item
feature vector of size D and the product is a vector product.
Creating a GraphLab project
 Create a GraphLab project, simply create a sub-
directory in the graphlab/apps/ folder with your
project Name.
 For instance,
graphlab/apps/my_first_GraphLabProject.
 Create a text file called CMakeLists.txt with the
following contents ::
project(My_GraphLabProject)
add_graphlab_executable(my_first_GraphLabProject <ProgramName>.cpp)
Hello World in GraphLab
#include <graphlab.hpp>
using namespace graphlab;
#include <graphlab.hpp>
int main(int argc, char** argv)
{
graphlab::mpi_tools::init(argc, argv);
graphlab::distributed_control dc;
dc.cout() << "Hello World!n";
graphlab::mpi_tools::finalize();
}
• dc is the distributed communication layer which is needed by a number of
the core GraphLab objects, whether you are running distributed or not
• To create the program run the configure script, than run "make" in the
•debug/ release/ build folders. The program when executed, will print "Hello
World!".
Thank You
References ::
http://graphlab.com/community/events/conference14.html
http://graphlab.com/learn/notebooks/introduction_to_sframes.html
http://en.wikipedia.org/wiki/GraphLab
https://www.youtube.com/watch?v=lRN91_-hlkg
https://wiki.engr.illinois.edu/download/attachments/227740647/GraphLab
.pdf?version=1&modificationDate=1382500521000#page=1&zoom=auto,
0,280
http://arxiv.org/pdf/1204.6078v1.pdf
http://select.cs.cmu.edu/code/graphlab/doxygen/html/index.html

More Related Content

What's hot

A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O Sri Ambati
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsRevolution Analytics
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)Lynn Cherny
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkAshutosh Trivedi
 

What's hot (20)

A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache Spark
 

Viewers also liked

Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1Lalad
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hoodZuhair khayyat
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabDanny Bickson
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisMLconf
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 

Viewers also liked (10)

Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hood
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLab
 
GraphLab
GraphLabGraphLab
GraphLab
 
PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 

Similar to CS267_Graph_Lab

R programming for data science
R programming for data scienceR programming for data science
R programming for data scienceSovello Hildebrand
 
Tech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboardsTech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboardsAppsilon Data Science
 
Polyline download and visualization over terrain models
Polyline download and visualization over terrain modelsPolyline download and visualization over terrain models
Polyline download and visualization over terrain modelsgraphitech
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Srivatsan Ramanujam
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsYash Khandelwal
 
Vipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentationVipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentationVipul Divyanshu
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docxhoney725342
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R StudioSusan Johnston
 
Decrease build time and application size
Decrease build time and application sizeDecrease build time and application size
Decrease build time and application sizeKeval Patel
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Gradle(the innovation continues)
Gradle(the innovation continues)Gradle(the innovation continues)
Gradle(the innovation continues)Sejong Park
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...InfluxData
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBArangoDB Database
 

Similar to CS267_Graph_Lab (20)

R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
02 c++g3 d (1)
02 c++g3 d (1)02 c++g3 d (1)
02 c++g3 d (1)
 
Tech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboardsTech Talk - Overview of Dash framework for building dashboards
Tech Talk - Overview of Dash framework for building dashboards
 
Polyline download and visualization over terrain models
Polyline download and visualization over terrain modelsPolyline download and visualization over terrain models
Polyline download and visualization over terrain models
 
GCF
GCFGCF
GCF
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Vipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentationVipul divyanshu mahout_documentation
Vipul divyanshu mahout_documentation
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Decrease build time and application size
Decrease build time and application sizeDecrease build time and application size
Decrease build time and application size
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Gradle(the innovation continues)
Gradle(the innovation continues)Gradle(the innovation continues)
Gradle(the innovation continues)
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Potter’S Wheel
Potter’S WheelPotter’S Wheel
Potter’S Wheel
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Final Algos
Final AlgosFinal Algos
Final Algos
 

Recently uploaded

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

CS267_Graph_Lab

  • 1. By :: Jaideep Katkar Under the Guidance of :: Dr. Tran Thanh
  • 2. GraphLab Overview A New Framework For Parallel Machine Learning – high-level abstractions for machine learning problems – Shared-memory multiprocessor – Assume no fault tolerance needed – Concurrent access precessing models with sequential-consistency guarantees
  • 3. How GraphLab Works? – Represent the user's data by a directed graph – Each block of data is represented by a vertex and a directed edge – Shared data table – User functions:  Update: modify the vertex and edges state, read only to shared table  Fold: sequential aggregation to a key entry in the shared table, modify vertex data  Merge: Parallelize Fold function  Apply: Finalize the key entry in the shared table
  • 5. GraphLab Toolkit  Topic Modeling contains applications like LDA which can be used to cluster documents and extract topical representations.  Graph Analytics contains application like pagerank and triangle counting which can be applied to general graphs to estimate community structure.  Clustering contains standard data clustering tools such as Kmeans  Collaborative Filtering contains a collection of applications used to make predictions about users interests and factorize large matrices.  Graphical Models contains tools for making joint predictions about collections of related random variables.  Computer Vision contains a collection of tools for reasoning about images.
  • 6. Running GraphLab on EC2 Cluster Requirements :: • You should have Amazon EC2 account eligible to run on us-east-1a zone. • Amazon AWS console your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (under your account name on the top right corner-> security credentials -> access keys) • You should have a keypair attached to the zone you are running on (in our example us-east-1a) • Install boto. This is the AWS Python client. To install, run: ‘sudo pip boto’. • Download and install Graphlab as mentioned on next slides.
  • 7. Satisfying Dependencies on Ubuntu All the dependencies can be satisfied from the repository: Below command will install gcc , jdk need to compile graphlab Programs: Downloading GraphLab version 2.2 You can download GraphLab directly from our Github Repository. Github also offers a zip download of the repository if you do not have git. The git command line for cloning the repository is:
  • 8. Compiling and Running Graphlab In the graphlabapi directory, will create two sub-directories, release/ and debug/ . cd into either of these directories and running make will build the release or the debug versions respectively. Note that this will compile all of GraphLab, including all toolkits.
  • 9. Running Stochastic gradient descent (SGD) in Collaborative Filtering toolkit The collaborative filtering toolkit contains tools for computing a linear model of the data, and predicting missing values based on this linear model. This is useful when computing recommendations for users http://docs.graphlab.org/collaborative_filtering.html
  • 10. Running SGD for Netflix Data to predict User Rating Input File (Training) for Netflix Data [User] [item] [rating] 1000 2 5.0 3 7 12.0 6 2 2.1 Creating Directory to load Netflix data
  • 11. Command Line Arguments to Run SGD --gamma=XX Gradient descent step size --lambda=XX Gradient descent regularization --step_dec=XX Multiplicative step decrease. Should be between 0.1 to 1. Default is 0.9. --D=X Feature vector width. Common values are 20 - 150. --max_iter=XX Max number of iterations --maxval=XX Maximum allowed rating --minval=XX Min allowed rating --predictions=XX File name to write prediction to. Note that you will need a user/item pair input file named something. predict to enable predictions (see section: ratings). --tol=XX Stop computation when absolute error of prediction is less than tolerance. Default is 1e-3.
  • 12.
  • 13. O/P file SGD is a simple gradient descent algorithm. Prediction in SGD is done as : r_ui = p_u * q_i Where r_ui is a scalar rating of user u to item i, and p_u is the user feature vector of size D, q_i is the item feature vector of size D and the product is a vector product.
  • 14. Creating a GraphLab project  Create a GraphLab project, simply create a sub- directory in the graphlab/apps/ folder with your project Name.  For instance, graphlab/apps/my_first_GraphLabProject.  Create a text file called CMakeLists.txt with the following contents :: project(My_GraphLabProject) add_graphlab_executable(my_first_GraphLabProject <ProgramName>.cpp)
  • 15. Hello World in GraphLab #include <graphlab.hpp> using namespace graphlab; #include <graphlab.hpp> int main(int argc, char** argv) { graphlab::mpi_tools::init(argc, argv); graphlab::distributed_control dc; dc.cout() << "Hello World!n"; graphlab::mpi_tools::finalize(); } • dc is the distributed communication layer which is needed by a number of the core GraphLab objects, whether you are running distributed or not • To create the program run the configure script, than run "make" in the •debug/ release/ build folders. The program when executed, will print "Hello World!".