SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Machine Learning on Graphs

Joseph Gonzalez
Co-Founder, GraphLab Inc.
joseph@graphlab.com
Postdoc, UC Berkeley AMPLab
jegonzal@eecs.berkeley.edu
Big
 Data
 
Graphs
More 
Signal

More 
Noise


2!
Social Media

Science

Advertising

Web

Graphs encode relationships between:

People

Products
Ideas
Facts
Interests

Big: billions of vertices and edges & rich metadata
Facebook	
  (10/2012):	
  1B	
  users,	
  144B	
  friendships	
  	
  
Twi>er	
  (2011):	
  15B	
  follower	
  edges	
  


3
Graphs are Essential to "
Data Mining and Machine Learning



Identify influential people and information
Find communities
Understand people’s shared interests
Model complex data dependencies
Predicting User Behavior
?	
  

?	
  
?	
  

Liberal	

?	
  

?	
  

?	
  

Conservative	


?	
  
?	
  

?	
  

Post	
  
Post	
  

?	
  

?	
  
Post	
  

Post	
  

Post	


?	
  
Post	
  

?	
  

Post	


Post	

?	
  

?	
  

?	
  

Post	
  

?	
  

Post	

Post	


Post	
  

?	
  

Conditional Random Field! ?	
  
?	
  
?	
  
?	
  
?	
  
?	
  
Belief Propagation!

Post	


?	
  
?	
  

Post	

Post	


Post	
  

?	
  
?	
  

?	
  

?	
  
5	
  
Finding Communities
Count triangles passing through each vertex:
"


2

3

1
4



Measures “cohesiveness” of local community

Fewer Triangles
Weaker Community

More Triangles
Stronger Community
Recommending Products
Users


Ratings


Items
Recommending Products

≈

Movies

f(j)

f(i)

Movies
Iterate:

f [i] = arg min

w2Rd

X

j2Nbrs(i)

rij

f(1)

User Factors (U)

Users

Netflix

x

f(2)

T

w f [j]

r13
r14
r24
r25

2

f(3)
f(4)
f(5)

Movie Factors (M)

Users

Low-Rank Matrix Factorization:

+ ||w||2
2
8
Identifying Leaders

9	
  
Identifying Leaders
R[i] = 0.15 +

X

wji R[j]

j2Nbrs(i)

Rank of
user i

Weighted sum of
neighbors’ ranks

Everyone starts with equal ranks
Update ranks in parallel 
Iterate until convergence
10	
  
Graph-Parallel Algorithms

Model / Alg. 
State

Computation depends
only on the neighbors
11	
  
Many More Graph Algorithms
•  Collaborative Filtering!
– 
– 
– 
– 

•  Graph Analytics!

Alternating Least Squares!
Stochastic Gradient Descent!
Tensor Factorization!
SVD!

•  Structured Prediction!
–  Loopy Belief Propagation!
–  Max-Product Linear
Programs!
–  Gibbs Sampling!

•  Semi-supervised ML!
–  Graph SSL !
–  CoEM!

– 
– 
– 
– 
– 
– 

PageRank!
Shortest Path!
Triangle-Counting!
Graph Coloring!
K-core Decomposition!
Personalized PageRank!

•  Classification!
–  Neural Networks!
–  Lasso!
…!

12
How should we program"
graph-parallel algorithms?

13
Structure of Computation
Data-Parallel

Graph-Parallel
Dependency Graph

Table
Row
6. Before

Row
Row

Result
7. After

Row
14

8. After
How should we program"
graph-parallel algorithms?

“Think like a Vertex.”	

- Pregel [SIGMOD’10]	


15
The Graph-Parallel Abstraction
A user-defined Vertex-Program runs on each vertex
Graph constrains interaction along edges
Using messages (e.g. Pregel [PODC’09, SIGMOD’10])
Through shared state (e.g., GraphLab [UAI’10, VLDB’12])










Parallelism: run multiple vertex programs simultaneously

16
The GraphLab Vertex Program	

Vertex Programs directly access adjacent vertices and edges	

GraphLab_PageRank(i)	
  	
  
	
  	
  //	
  Compute	
  sum	
  over	
  neighbors	
  
	
  	
  total	
  =	
  0	
  
	
  	
  foreach	
  (j	
  in	
  neighbors(i)):	
  	
  
	
  	
  	
  	
  total	
  =	
  total	
  +	
  R[j]	
  *	
  wji	
  
	
  
	
  	
  //	
  Update	
  the	
  PageRank	
  
	
  	
  R[i]	
  =	
  0.15	
  +	
  total	
  	
  
	
  
	
  	
  //	
  Trigger	
  neighbors	
  to	
  run	
  again	
  
	
  	
  if	
  R[i]	
  not	
  converged	
  then	
  
	
  	
  	
  signal	
  nbrsOf(i)	
  to	
  be	
  recomputed	
  

R[4]	
  *	
  w41	
  

4

+	
  

1

+	
  
3

Signaled vertices are recomputed eventually.	


2

17	
  
Num-­‐Ver1ces	
  

Be>er	
  

Convergence of Dynamic PageRank

100000000	
  

51%	
  updated	
  only	
  once!	
  

1000000	
  
10000	
  
100	
  
1	
  
0	
  

10	
  

20	
  

30	
  
40	
  
Number	
  of	
  Updates	
  

50	
  

60	
  

70	
  
18	
  
Adaptive Belief Propagation
Challenge = Boundaries	


Many	

Updates	


Splash	
  

Noisy “Sunset” Image	


Few	

Updates	


Cumulative Vertex Updates	


Algorithm identifies and focuses 	

on hidden sequential structure	

Graphical Model
6. Before

Graph-­‐parallel	
  Abstrac(ons	
  
BeDer	
  for	
  Machine	
  Learning	
  

Messaging	
  

	
  

i

Synchronous	
  

7. After

8. After

Shared	
  State	
  
i

Dynamic	
  Asynchronous	
  
20	
  
Natural Graphs

Graphs derived from natural
phenomena

21	
  
Properties of Natural Graphs

Regular Mesh

Natural Graph

Power-Law Degree Distribution
22
Power-Law Degree Distribution

“Star Like” Motif
President
Obama

Followers

23
Challenges	
  of	
  High-­‐Degree	
  VerMces	
  

SequenMally	
  process	
  
edges	
  

Touches	
  a	
  large	
  
fracMon	
  of	
  graph	
  

CPU 1

CPU 2

Provably	
  Difficult	
  to	
  ParMMon	
  
24	
  
ment. While fast and easy to implement,
placement cuts most of the edges:
Random	
  ParMMoning	
  

em 5.1. If vertices random	
  (hashed)	
   assigne
are randomly
•  GraphLab	
  resorts	
  to	
  
parMMoning	
  on	
  natural	
  graphs	
  
nes then the expected fraction of edges cut


|Edges Cut|
E
=1
|E|

1
p

10	
  Machines	
  !	
  90%	
  of	
  edges	
  cut	
  
example if just two machines are used, hal
100	
  Machines	
  !	
  99%	
  of	
  edges	
  cut!	
  
Machine	
  1	
  
Machine	
  2	
  
es will be cut requiring order |E| /2 commun
25	
  
Program	
  
For	
  This	
  

Run	
  on	
  This	
  
Machine 1

Machine 2

•  Split	
  High-­‐Degree	
  verMces	
  
•  New	
  Abstrac1on	
  !	
  Equivalence	
  on	
  Split	
  Ver(ces	
  
26	
  
A Common Pattern in

Vertex Programs
GraphLab_PageRank(i)	
  	
  
	
  	
  //	
  Compute	
  sum	
  over	
  neighbors	
  
	
  	
  total	
  =	
  0	
  
Gather	
  Informa1on	
  
	
  	
  foreach(	
  j	
  in	
  neighbors(i)):	
  	
  
About	
  Neighborhood	
  
	
  	
  	
  	
  total	
  =	
  total	
  +	
  R[j]	
  *	
  wji	
  
	
  
	
  	
  //	
  Update	
  the	
  PageRank	
  
Update	
  Vertex	
  
	
  	
  R[i]	
  =	
  total	
  	
  
	
  
	
  	
  //	
  Trigger	
  neighbors	
  to	
  run	
  again	
  
	
  	
  priority	
  =	
  |R[i]	
  –	
  oldR[i]|	
  
Signal	
  Neighbors	
  &	
  
	
  	
  if	
  R[i]	
  not	
  converged	
  then	
  
Modify	
  Edge	
  Data	
  
	
  	
  	
  	
  signal	
  neighbors(i)	
  with	
  priority	
  
	
  
27	
  
GAS Decomposition
Machine	
  1	
  

Machine	
  2	
  

Master	
  

Gather	
  
Apply	
  
Sca>er	
  

Y’	
  
Y’	
  
Y’	
  
Y’	
  

Σ1	
  

Σ

Σ2	
  

+	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  +	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  +	
  	
  	
  

Mirror	
  

Y	
  

Σ3	
  

Σ4	
  
Mirror	
  

Machine	
  3	
  

Mirror	
  

Machine	
  4	
  

28	
  
Minimizing Communication in
PowerGraph

Y
Communication is linear in "
the number of machines "
each vertex spans

A vertex-cut minimizes "
machines each vertex spans
Percolation theory suggests that power law
graphs have good vertex cuts. [Albert et al. 2000]
29
Machine Learning and Data-Mining
Toolkits
Graph	
  	
  
AnalyMcs	
  

Graphical	
  
Models	
  

Computer	
  
Vision	
  

Clustering	
  

Topic	
  
Modeling	
  

CollaboraMve	
  
Filtering	
  

GraphLab2	
  System	
  
MPI/TCP-­‐IP	
  

PThreads	
  

HDFS	
  

EC2	
  HPC	
  Nodes	
  

http://graphlab.org
Apache 2 License
PageRank on Twitter Follower Graph
Natural Graph with 40M Users, 1.4 Billion Links
Run1me	
  Per	
  Itera1on	
  
0	
  

50	
  

100	
  

150	
  

200	
  

Hadoop	
  
GraphLab	
  
Twister	
  
Piccolo	
  

Order of magnitude
by exploiting
properties of Natural
Graphs

PowerGraph	
  
Hadoop results from [Kang et al. '11]
Twister (in-memory MapReduce) [Ekanayake et al. ‘10]

31
GraphLab2 is Scalable
Yahoo Altavista Web Graph (2002):

One of the largest publicly available web graphs

1.4 Billion Webpages, 6.6 Billion Links


7 Seconds per Iter.
1B links Nodes
processed per second
64 HPC
1024 Cores (2048
30 lines of user code
 HT)

32
Topic Modeling
English language Wikipedia 
–  2.6M Documents, 8.3M Words, 500M Tokens

–  Computationally intensive algorithm
Million	
  Tokens	
  Per	
  Second	
  
0	
  

Smola	
  et	
  al.	
  

PowerGraph	
  

20	
  

40	
  

60	
  

80	
  

100	
  

120	
  

140	
  

160	
  

100 Yahoo! Machines

Specifically engineered for this task
64 cc2.8xlarge EC2 Nodes
200 lines of code & 4 human hours
33	
  
Triangle Counting on Twitter
40M Users, 1.4 Billion Links

Counted: 34.8 Billion Triangles

Hadoop
[WWW’11]	


1536 Machines	

423 Minutes	


64 Machines	

15 Seconds	

 1000 x Faster	

34	
  

S.	
  Suri	
  and	
  S.	
  Vassilvitskii,	
  “CounMng	
  triangles	
  and	
  the	
  curse	
  of	
  the	
  last	
  reducer,”	
  WWW’11	
  
7. After

8. After

By exploiting common patterns in graph data and computation:

New ways to represent 

real-world graphs

New ways execute 

graph algorithms
Machine 1
 Machine 2

Orders of magnitude improvements
over existing systems
7. After

8. After

Possibility
Scalability
Usability
Exciting Time to Work in ML
J Unique opportunities to change the world!!
With ML, I will
cure cancer!!!

With ML I will 
find true love.

Why won’t 
ML read
my mind???

L Building scalable learning system requires experts …
But… 
Even	
  basics	
  of	
  scalable	
  ML	
  
can	
  be	
  challenging	
  
ML key to any
new service we
want to build

6	
  months	
  from	
  prototype	
  
to	
  producMon	
  
State-­‐of-­‐art	
  ML	
  algorithms	
  
trapped	
  in	
  research	
  papers	
  

Goal of GraphLab 3: 
Make large-scale machine learning accessible to all! J
Adding a Python Layer
Python	
  API	
  
Graph	
  	
  
AnalyMcs	
  

Graphical	
  
Models	
  

Computer	
  
Vision	
  

Clustering	
  

Topic	
  
Modeling	
  

CollaboraMve	
  
Filtering	
  

GraphLab2	
  System	
  
MPI/TCP-­‐IP	
  

PThreads	
  

EC2	
  HPC	
  Nodes	
  

HDFS	
  
Learning ML with 

GraphLab Notebook

https://beta.graphlab.com/examples!
Prototype to Production

with Python GraphLab: 
Easily install  prototype locally

Deploy to the cluster in one step
Learn: 

GraphLab
Notebook

Prototype: 

pip install graphlab 

è 

local prototyping

Production: 

Same code scales
to EC2 cluster
GraphLab Toolkits
Highly scalable, state-of-the-art 

machine learning straight from python

Graph 

Analytics

Graphical

Models

Computer

Vision

Clustering

Topic

Modeling

Collaborative

Filtering
Machine Learning on Graphs
partners@graphlab.com

NIPS Workshop on Big Learning: biglearn.org
Lake Tahoe, December 9th

Joseph Gonzalez
Co-Founder, GraphLab Inc.
joseph@graphlab.com

Más contenido relacionado

La actualidad más candente

Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O Sri Ambati
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsRevolution Analytics
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flowShannon McCormick
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 

La actualidad más candente (20)

Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flow
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 

Destacado

Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1Lalad
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hoodZuhair khayyat
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabDanny Bickson
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisMLconf
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 

Destacado (10)

Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1Hakin9 nmap-ebook-ch1
Hakin9 nmap-ebook-ch1
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hood
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLab
 
GraphLab
GraphLabGraphLab
GraphLab
 
PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 

Similar a Joey gonzalez, graph lab, m lconf 2013

High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab Arshit Rai
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is youDavide Carboni
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programminginside-BigData.com
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab Arshit Rai
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Don't Call It a Comeback: Attribute Grammars for Big Data Visualization
Don't Call It a Comeback: Attribute Grammars for Big Data VisualizationDon't Call It a Comeback: Attribute Grammars for Big Data Visualization
Don't Call It a Comeback: Attribute Grammars for Big Data VisualizationLeo Meyerovich
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQLLuigi Dell'Aquila
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverviewMotaz El-Saban
 

Similar a Joey gonzalez, graph lab, m lconf 2013 (20)

F14 lec12graphs
F14 lec12graphsF14 lec12graphs
F14 lec12graphs
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Pregel
PregelPregel
Pregel
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is you
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Portfolio
PortfolioPortfolio
Portfolio
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Don't Call It a Comeback: Attribute Grammars for Big Data Visualization
Don't Call It a Comeback: Attribute Grammars for Big Data VisualizationDon't Call It a Comeback: Attribute Grammars for Big Data Visualization
Don't Call It a Comeback: Attribute Grammars for Big Data Visualization
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
 

Más de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Joey gonzalez, graph lab, m lconf 2013

  • 1. Machine Learning on Graphs Joseph Gonzalez Co-Founder, GraphLab Inc. joseph@graphlab.com Postdoc, UC Berkeley AMPLab jegonzal@eecs.berkeley.edu
  • 2. Big Data Graphs More Signal More Noise 2!
  • 3. Social Media Science Advertising Web Graphs encode relationships between: People Products Ideas Facts Interests Big: billions of vertices and edges & rich metadata Facebook  (10/2012):  1B  users,  144B  friendships     Twi>er  (2011):  15B  follower  edges   3
  • 4. Graphs are Essential to " Data Mining and Machine Learning Identify influential people and information Find communities Understand people’s shared interests Model complex data dependencies
  • 5. Predicting User Behavior ?   ?   ?   Liberal ?   ?   ?   Conservative ?   ?   ?   Post   Post   ?   ?   Post   Post   Post ?   Post   ?   Post Post ?   ?   ?   Post   ?   Post Post Post   ?   Conditional Random Field! ?   ?   ?   ?   ?   ?   Belief Propagation! Post ?   ?   Post Post Post   ?   ?   ?   ?   5  
  • 6. Finding Communities Count triangles passing through each vertex: " 2 3 1 4 Measures “cohesiveness” of local community Fewer Triangles Weaker Community More Triangles Stronger Community
  • 8. Recommending Products ≈ Movies f(j) f(i) Movies Iterate: f [i] = arg min w2Rd X j2Nbrs(i) rij f(1) User Factors (U) Users Netflix x f(2) T w f [j] r13 r14 r24 r25 2 f(3) f(4) f(5) Movie Factors (M) Users Low-Rank Matrix Factorization: + ||w||2 2 8
  • 10. Identifying Leaders R[i] = 0.15 + X wji R[j] j2Nbrs(i) Rank of user i Weighted sum of neighbors’ ranks Everyone starts with equal ranks Update ranks in parallel Iterate until convergence 10  
  • 11. Graph-Parallel Algorithms Model / Alg. State Computation depends only on the neighbors 11  
  • 12. Many More Graph Algorithms •  Collaborative Filtering! –  –  –  –  •  Graph Analytics! Alternating Least Squares! Stochastic Gradient Descent! Tensor Factorization! SVD! •  Structured Prediction! –  Loopy Belief Propagation! –  Max-Product Linear Programs! –  Gibbs Sampling! •  Semi-supervised ML! –  Graph SSL ! –  CoEM! –  –  –  –  –  –  PageRank! Shortest Path! Triangle-Counting! Graph Coloring! K-core Decomposition! Personalized PageRank! •  Classification! –  Neural Networks! –  Lasso! …! 12
  • 13. How should we program" graph-parallel algorithms? 13
  • 14. Structure of Computation Data-Parallel Graph-Parallel Dependency Graph Table Row 6. Before Row Row Result 7. After Row 14 8. After
  • 15. How should we program" graph-parallel algorithms? “Think like a Vertex.” - Pregel [SIGMOD’10] 15
  • 16. The Graph-Parallel Abstraction A user-defined Vertex-Program runs on each vertex Graph constrains interaction along edges Using messages (e.g. Pregel [PODC’09, SIGMOD’10]) Through shared state (e.g., GraphLab [UAI’10, VLDB’12]) Parallelism: run multiple vertex programs simultaneously 16
  • 17. The GraphLab Vertex Program Vertex Programs directly access adjacent vertices and edges GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0      foreach  (j  in  neighbors(i)):            total  =  total  +  R[j]  *  wji        //  Update  the  PageRank      R[i]  =  0.15  +  total          //  Trigger  neighbors  to  run  again      if  R[i]  not  converged  then        signal  nbrsOf(i)  to  be  recomputed   R[4]  *  w41   4 +   1 +   3 Signaled vertices are recomputed eventually. 2 17  
  • 18. Num-­‐Ver1ces   Be>er   Convergence of Dynamic PageRank 100000000   51%  updated  only  once!   1000000   10000   100   1   0   10   20   30   40   Number  of  Updates   50   60   70   18  
  • 19. Adaptive Belief Propagation Challenge = Boundaries Many Updates Splash   Noisy “Sunset” Image Few Updates Cumulative Vertex Updates Algorithm identifies and focuses on hidden sequential structure Graphical Model
  • 20. 6. Before Graph-­‐parallel  Abstrac(ons   BeDer  for  Machine  Learning   Messaging     i Synchronous   7. After 8. After Shared  State   i Dynamic  Asynchronous   20  
  • 21. Natural Graphs
 Graphs derived from natural phenomena 21  
  • 22. Properties of Natural Graphs Regular Mesh Natural Graph Power-Law Degree Distribution 22
  • 23. Power-Law Degree Distribution “Star Like” Motif President Obama Followers 23
  • 24. Challenges  of  High-­‐Degree  VerMces   SequenMally  process   edges   Touches  a  large   fracMon  of  graph   CPU 1 CPU 2 Provably  Difficult  to  ParMMon   24  
  • 25. ment. While fast and easy to implement, placement cuts most of the edges: Random  ParMMoning   em 5.1. If vertices random  (hashed)   assigne are randomly •  GraphLab  resorts  to   parMMoning  on  natural  graphs   nes then the expected fraction of edges cut  |Edges Cut| E =1 |E| 1 p 10  Machines  !  90%  of  edges  cut   example if just two machines are used, hal 100  Machines  !  99%  of  edges  cut!   Machine  1   Machine  2   es will be cut requiring order |E| /2 commun 25  
  • 26. Program   For  This   Run  on  This   Machine 1 Machine 2 •  Split  High-­‐Degree  verMces   •  New  Abstrac1on  !  Equivalence  on  Split  Ver(ces   26  
  • 27. A Common Pattern in
 Vertex Programs GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0   Gather  Informa1on      foreach(  j  in  neighbors(i)):     About  Neighborhood          total  =  total  +  R[j]  *  wji        //  Update  the  PageRank   Update  Vertex      R[i]  =  total          //  Trigger  neighbors  to  run  again      priority  =  |R[i]  –  oldR[i]|   Signal  Neighbors  &      if  R[i]  not  converged  then   Modify  Edge  Data          signal  neighbors(i)  with  priority     27  
  • 28. GAS Decomposition Machine  1   Machine  2   Master   Gather   Apply   Sca>er   Y’   Y’   Y’   Y’   Σ1   Σ Σ2   +                        +                          +       Mirror   Y   Σ3   Σ4   Mirror   Machine  3   Mirror   Machine  4   28  
  • 29. Minimizing Communication in PowerGraph Y Communication is linear in " the number of machines " each vertex spans A vertex-cut minimizes " machines each vertex spans Percolation theory suggests that power law graphs have good vertex cuts. [Albert et al. 2000] 29
  • 30. Machine Learning and Data-Mining Toolkits Graph     AnalyMcs   Graphical   Models   Computer   Vision   Clustering   Topic   Modeling   CollaboraMve   Filtering   GraphLab2  System   MPI/TCP-­‐IP   PThreads   HDFS   EC2  HPC  Nodes   http://graphlab.org Apache 2 License
  • 31. PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links Run1me  Per  Itera1on   0   50   100   150   200   Hadoop   GraphLab   Twister   Piccolo   Order of magnitude by exploiting properties of Natural Graphs PowerGraph   Hadoop results from [Kang et al. '11] Twister (in-memory MapReduce) [Ekanayake et al. ‘10] 31
  • 32. GraphLab2 is Scalable Yahoo Altavista Web Graph (2002): One of the largest publicly available web graphs 1.4 Billion Webpages, 6.6 Billion Links 7 Seconds per Iter. 1B links Nodes processed per second 64 HPC 1024 Cores (2048 30 lines of user code HT) 32
  • 33. Topic Modeling English language Wikipedia –  2.6M Documents, 8.3M Words, 500M Tokens –  Computationally intensive algorithm Million  Tokens  Per  Second   0   Smola  et  al.   PowerGraph   20   40   60   80   100   120   140   160   100 Yahoo! Machines Specifically engineered for this task 64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours 33  
  • 34. Triangle Counting on Twitter 40M Users, 1.4 Billion Links Counted: 34.8 Billion Triangles Hadoop [WWW’11] 1536 Machines 423 Minutes 64 Machines 15 Seconds 1000 x Faster 34   S.  Suri  and  S.  Vassilvitskii,  “CounMng  triangles  and  the  curse  of  the  last  reducer,”  WWW’11  
  • 35. 7. After 8. After By exploiting common patterns in graph data and computation: New ways to represent 
 real-world graphs New ways execute 
 graph algorithms Machine 1 Machine 2 Orders of magnitude improvements over existing systems
  • 37. Exciting Time to Work in ML J Unique opportunities to change the world!! With ML, I will cure cancer!!! With ML I will find true love. Why won’t ML read my mind??? L Building scalable learning system requires experts …
  • 38. But… Even  basics  of  scalable  ML   can  be  challenging   ML key to any new service we want to build 6  months  from  prototype   to  producMon   State-­‐of-­‐art  ML  algorithms   trapped  in  research  papers   Goal of GraphLab 3: Make large-scale machine learning accessible to all! J
  • 39. Adding a Python Layer Python  API   Graph     AnalyMcs   Graphical   Models   Computer   Vision   Clustering   Topic   Modeling   CollaboraMve   Filtering   GraphLab2  System   MPI/TCP-­‐IP   PThreads   EC2  HPC  Nodes   HDFS  
  • 40. Learning ML with 
 GraphLab Notebook https://beta.graphlab.com/examples!
  • 41. Prototype to Production
 with Python GraphLab: Easily install prototype locally Deploy to the cluster in one step
  • 42. Learn: 
 GraphLab Notebook Prototype: 
 pip install graphlab 
 è 
 local prototyping Production: 
 Same code scales to EC2 cluster
  • 43. GraphLab Toolkits Highly scalable, state-of-the-art 
 machine learning straight from python Graph 
 Analytics Graphical
 Models Computer
 Vision Clustering Topic
 Modeling Collaborative
 Filtering
  • 44. Machine Learning on Graphs partners@graphlab.com NIPS Workshop on Big Learning: biglearn.org Lake Tahoe, December 9th Joseph Gonzalez Co-Founder, GraphLab Inc. joseph@graphlab.com