SlideShare una empresa de Scribd logo
1 de 128
Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU
Thanks! ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Our goal: ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Graphs - why should we care? C. Faloutsos (CMU) Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] Hadoop Summit '10
Graphs - why should we care? ,[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 D 1 D N T 1 T M ... ...
Graphs - why should we care? ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem #1 - network and graph mining ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem #1 - network and graph mining ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem #1 - network and graph mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Graph mining ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Laws and patterns ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Solution# S.1 ,[object Object],C. Faloutsos (CMU) log(rank) log(degree) internet domains att.com ibm.com Hadoop Summit '10 -0.82
Solution# S.2: Eigen Exponent  E ,[object Object],C. Faloutsos (CMU) E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 Hadoop Summit '10
Solution# S.2: Eigen Exponent  E ,[object Object],C. Faloutsos (CMU) E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 Hadoop Summit '10
But: ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
More power laws: ,[object Object],C. Faloutsos (CMU) Web Site Traffic in-degree (log scale) Count (log scale) Zipf ``ebay’’ Hadoop Summit '10 users sites
epinions.com ,[object Object],C. Faloutsos (CMU) (out) degree count trusts-2000-people user Hadoop Summit '10
And numerous more ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Solution# S.3: Triangle ‘Laws’ ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Solution# S.3: Triangle ‘Laws’ ,[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Triangle Law: #S.3  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
Triangle Law: #S.3  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
Triangle Law: #S.4  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) SN Reuters Epinions X-axis: degree Y-axis: mean # triangles n  friends -> ~ n 1.6  triangles Hadoop Summit '10
Triangle Law: Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? details Hadoop Summit '10
Triangle Law: Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (   i 3  ) (and, because of skewness, we only need  the top few eigenvalues! details Hadoop Summit '10
Triangle Law: Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) 1000x+ speed-up, >90% accuracy details Hadoop Summit '10
EigenSpokes ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 N N details
EigenSpokes ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) u1 u2 Hadoop Summit '10 1 st  Principal  component 2 nd  Principal  component
EigenSpokes ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 u1 u2 90 o
EigenSpokes - pervasiveness ,[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
EigenSpokes - explanation ,[object Object],[object Object],[object Object],[object Object],[object Object],spy plot of top 20 nodes C. Faloutsos (CMU) Hadoop Summit '10
Bipartite Communities! magnified bipartite community patents from same inventor(s) cut-and-paste bibliography! C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Observations on  weighted graphs? ,[object Object],C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos  Weighted Graphs and Disconnected Components: Patterns and a Generator.   SIG-KDD  2008  Hadoop Summit '10
Observation W.1: Fortification ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Observation W.1: Fortification C. Faloutsos (CMU) More donors,  more $ ? $10 $5 Hadoop Summit '10 ‘ Reagan’ ‘ Clinton’ $7
Observation W.1: fortification: Snapshot Power Law ,[object Object],[object Object],Edges (# donors) In-weights ($) C. Faloutsos (CMU) Orgs-Candidates e.g. John Kerry,  $10M received, from 1K donors More donors,  even  more $ $10 $5 Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Problem: Time evolution ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.1 Evolution of the Diameter ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.1 Evolution of the Diameter ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.1 Diameter – “Patents” ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) time [years] diameter Hadoop Summit '10
T.2 Temporal Evolution of the Graphs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.2 Temporal Evolution of the Graphs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.2 Densification – Patent Citations ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) N(t) E(t) 1.66 Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
More on Time-evolving   graphs C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos  Weighted Graphs and Disconnected Components: Patterns and a Generator.   SIG-KDD  2008  Hadoop Summit '10
Observation T.3: NLCC behavior ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Observation T.3: NLCC behavior ,[object Object],C. Faloutsos (CMU) IMDB CC size Time-stamp Hadoop Summit '10
Timing for Blogs ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t +  lag Hadoop Summit '10
T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links ( log ) 1 2 3 days after post ( log ) Hadoop Summit '10
T.4 : popularity over time C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],[object Object],[object Object],# in links ( log ) 1 2 3 -1.6 days after post ( log ) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
CenterPiece Subgraphs ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Ce nter- P iece  S ubgraph Discovery  [Tong+ KDD 06] Original Graph Q: Who is the most central node wrt the black nodes?  (e.g., master-mind criminal, common advisor/collaborator, etc) Input C. Faloutsos (CMU) Hadoop Summit '10 B A C
Ce nter- P iece  S ubgraph Discovery  [Tong+ KDD 06] Q: How to find hub for the query nodes? Input:  original graph Output:  CePS CePS Node C. Faloutsos (CMU) A: Combine proximity scores (RWR) Hadoop Summit '10 B A C B A C
CePS : Example (AND Query) ? C. Faloutsos (CMU) Hadoop Summit '10 ,[object Object],[object Object],[object Object]
CePS : Example (AND Query) C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
G raph X -Ray:  Fast Best-Effort Pattern Matching  in Large Attributed Graphs Hanghang Tong, Brian Gallagher,  Christos Faloutsos, Tina Eliassi-Rad KDD’07
Output Input Attributed Data Graph Query Graph Matching Subgraph Hadoop Summit '10 C. Faloutsos (CMU)
Effectiveness: star-query  Query  Result Hadoop Summit '10 C. Faloutsos (CMU)
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
OddBall: Spotting  A n o m a l i e s  in  Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University  School of Computer Science To appear in PAKDD 2010, Hyderabad, India
Main idea ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
What is an egonet? ego egonet C. Faloutsos (CMU) Hadoop Summit '10
Selected Features ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Near-Clique/Star Hadoop Summit '10 C. Faloutsos (CMU)
Near-Clique/Star C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
HADI for diameter estimation ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
[object Object],[object Object],???? ?? 19+? [Barabasi+]  (‘99, O(10 6 ) nodes) C. Faloutsos (CMU) Radius Count Hadoop Summit '10
[object Object],[object Object],???? C. Faloutsos (CMU) Radius Count Hadoop Summit '10 14 (dir.) ~7 (undir.) 19+? [Barabasi+]  (‘99, O(10 6 ) nodes)
[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10 Shape?
[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],Hadoop Summit '10
Radius Plot of  GCC  of YahooWeb. C. Faloutsos (CMU) Hadoop Summit '10
Running time -  Kronecker and Erdos-Renyi  Graphs with billions edges. details
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) PEGASUS: A Peta-Scale Graph Mining  System - Implementation and Observations .  U Kang, Charalampos E. Tsourakakis,  and Christos Faloutsos.  ( ICDM ) 2009, Miami, Florida, USA.  Best Application Paper (runner-up) .   Hadoop Summit '10
Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Matrix – vector Multiplication (iterated) Hadoop Summit '10 details
Example: GIM-V At Work ,[object Object],Size Count C. Faloutsos (CMU) Hadoop Summit '10
Example: GIM-V At Work ,[object Object],Size Count C. Faloutsos (CMU) Hadoop Summit '10 ~0.7B  singleton nodes
Example: GIM-V At Work ,[object Object],Size Count C. Faloutsos (CMU) Hadoop Summit '10
Example: GIM-V At Work ,[object Object],Size Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? C. Faloutsos (CMU) Hadoop Summit '10
Example: GIM-V At Work ,[object Object],Size Count suspicious financial-advice sites (not existing now) C. Faloutsos (CMU) Hadoop Summit '10
GIM-V At Work ,[object Object],[object Object],Stable tail slope after the gelling point C. Faloutsos (CMU) Hadoop Summit '10
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
Triangles : Computations  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (   i 3  ) (and, because of skewness, we only need  the top few eigenvalues! Mentioned already Hadoop Summit '10
Triangle Law: #1  [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Mentioned already Hadoop Summit '10
Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
Visualization: ShiftR ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
C. Faloutsos (CMU) Hadoop Summit '10
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Other topics - part#1 - tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Tensors ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU) keyword 1990 Author
Tensors ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU) keyword 1991 1992 1990 Author
Tensors ,[object Object],~ + PARAFAC tensor decomposition (generalization of SVD) Hadoop Summit '10 C. Faloutsos (CMU) keyword 1991 1992 1990 Author
Other topics – part#2 - generators ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Kronecker Product – a Graph ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Other topics - part#3 – virus propagation ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
More info ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
OVERALL CONCLUSIONS – low level: ,[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
OVERALL CONCLUSIONS – high level ,[object Object],[object Object],[object Object],[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
References ,[object Object],[object Object],C. Faloutsos (CMU) Hadoop Summit '10
References ,[object Object],C. Faloutsos (CMU) Hadoop Summit '10
Joint papers with LLNL ,[object Object],[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Joint papers with LLNL ,[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Joint papers with LLNL ,[object Object],[object Object],Hadoop Summit '10 C. Faloutsos (CMU)
Project info ,[object Object],C. Faloutsos (CMU) Akoglu,  Leman Chau,  Polo Kang, U McGlohon,  Mary Tsourakakis,  Babis Tong,  Hanghang Prakash, Aditya Hadoop Summit '10 Thanks to: Yahoo (M45 + gifts + data) NSF, LLNL, CTA-INARC, IBM, SPRINT, INTEL, HP

Más contenido relacionado

Similar a Mining Billion-node Graphs: Patterns, Generators and Tools__HadoopSummit2010

Similar a Mining Billion-node Graphs: Patterns, Generators and Tools__HadoopSummit2010 (20)

Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory Science
 
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
 
Streamly: Concurrent Data Flow Programming
Streamly: Concurrent Data Flow ProgrammingStreamly: Concurrent Data Flow Programming
Streamly: Concurrent Data Flow Programming
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So Far
 
Continuous Deep Analytics
Continuous Deep AnalyticsContinuous Deep Analytics
Continuous Deep Analytics
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Complex Models for Big Data
Complex Models for Big DataComplex Models for Big Data
Complex Models for Big Data
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Web Services Catalog
Web Services CatalogWeb Services Catalog
Web Services Catalog
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networks
 
Systems-of-Systems in our Future?
Systems-of-Systems in our Future?Systems-of-Systems in our Future?
Systems-of-Systems in our Future?
 
Technology Disruption
Technology DisruptionTechnology Disruption
Technology Disruption
 
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
Bruce, T. R., and Richards, R. C. (2011). Adapting Specialized Legal Metadata...
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 

Más de Yahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

Más de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Mining Billion-node Graphs: Patterns, Generators and Tools__HadoopSummit2010

  • 1. Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU
  • 2.
  • 3.
  • 4.
  • 5. Graphs - why should we care? C. Faloutsos (CMU) Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] Hadoop Summit '10
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Triangle Law: #S.3 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
  • 25. Triangle Law: #S.3 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Hadoop Summit '10
  • 26. Triangle Law: #S.4 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) SN Reuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~ n 1.6 triangles Hadoop Summit '10
  • 27. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? details Hadoop Summit '10
  • 28. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (  i 3 ) (and, because of skewness, we only need the top few eigenvalues! details Hadoop Summit '10
  • 29. Triangle Law: Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) 1000x+ speed-up, >90% accuracy details Hadoop Summit '10
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Bipartite Communities! magnified bipartite community patents from same inventor(s) cut-and-paste bibliography! C. Faloutsos (CMU) Hadoop Summit '10
  • 41.
  • 42.
  • 43.
  • 44. Observation W.1: Fortification C. Faloutsos (CMU) More donors, more $ ? $10 $5 Hadoop Summit '10 ‘ Reagan’ ‘ Clinton’ $7
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. More on Time-evolving graphs C. Faloutsos (CMU) M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Hadoop Summit '10
  • 56.
  • 57.
  • 58.
  • 59. T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t + lag Hadoop Summit '10
  • 60. T.4 : popularity over time C. Faloutsos (CMU) Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links ( log ) 1 2 3 days after post ( log ) Hadoop Summit '10
  • 61.
  • 62.
  • 63.
  • 64. Ce nter- P iece S ubgraph Discovery [Tong+ KDD 06] Original Graph Q: Who is the most central node wrt the black nodes? (e.g., master-mind criminal, common advisor/collaborator, etc) Input C. Faloutsos (CMU) Hadoop Summit '10 B A C
  • 65. Ce nter- P iece S ubgraph Discovery [Tong+ KDD 06] Q: How to find hub for the query nodes? Input: original graph Output: CePS CePS Node C. Faloutsos (CMU) A: Combine proximity scores (RWR) Hadoop Summit '10 B A C B A C
  • 66.
  • 67.
  • 68.
  • 69. G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad KDD’07
  • 70. Output Input Attributed Data Graph Query Graph Matching Subgraph Hadoop Summit '10 C. Faloutsos (CMU)
  • 71. Effectiveness: star-query Query Result Hadoop Summit '10 C. Faloutsos (CMU)
  • 72.
  • 73. OddBall: Spotting A n o m a l i e s in Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science To appear in PAKDD 2010, Hyderabad, India
  • 74.
  • 75. What is an egonet? ego egonet C. Faloutsos (CMU) Hadoop Summit '10
  • 76.
  • 77. Near-Clique/Star Hadoop Summit '10 C. Faloutsos (CMU)
  • 78. Near-Clique/Star C. Faloutsos (CMU) Hadoop Summit '10
  • 79.
  • 80. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. Radius Plot of GCC of YahooWeb. C. Faloutsos (CMU) Hadoop Summit '10
  • 88. Running time - Kronecker and Erdos-Renyi Graphs with billions edges. details
  • 89. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 90. Generalized Iterated Matrix Vector Multiplication (GIMV) C. Faloutsos (CMU) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations . U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. ( ICDM ) 2009, Miami, Florida, USA. Best Application Paper (runner-up) . Hadoop Summit '10
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 99. Triangles : Computations [Tsourakakis ICDM 2008] C. Faloutsos (CMU) But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum (  i 3 ) (and, because of skewness, we only need the top few eigenvalues! Mentioned already Hadoop Summit '10
  • 100. Triangle Law: #1 [Tsourakakis ICDM 2008] C. Faloutsos (CMU) ASN HEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes Mentioned already Hadoop Summit '10
  • 101. Outline – Algorithms & results C. Faloutsos (CMU) Hadoop Summit '10 Centralized Hadoop/PEGASUS Degree Distr. old old Pagerank old old Diameter/ANF old DONE Conn. Comp old DONE Triangles DONE Visualization STARTED
  • 102.
  • 103. C. Faloutsos (CMU) Hadoop Summit '10
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.

Notas del editor

  1. Faloutsos
  2. Faloutsos et al 06/30/10
  3. Faloutsos
  4. Faloutsos
  5. Faloutsos
  6. Faloutsos
  7. Faloutsos
  8. Faloutsos
  9. Faloutsos
  10. Faloutsos
  11. Faloutsos
  12. Faloutsos
  13. Faloutsos
  14. Faloutsos
  15. Faloutsos
  16. Faloutsos
  17. A = U Sigma U^T
  18. A = U Sigma U^T vec{u}_1 vec{u}_i
  19. Faloutsos
  20. Faloutsos
  21. Faloutsos
  22. Faloutsos
  23. Faloutsos
  24. Faloutsos
  25. Faloutsos
  26. Faloutsos
  27. Faloutsos
  28. Faloutsos Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow
  29. Faloutsos Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow
  30. Faloutsos
  31. Faloutsos
  32. Faloutsos
  33. Faloutsos
  34. Faloutsos
  35. Faloutsos
  36. Faloutsos
  37. SOME OLD RULES
  38. SOME OLD RULES
  39. Faloutsos et al 06/30/10
  40. Faloutsos et al 06/30/10
  41. Faloutsos
  42. Faloutsos et al 06/30/10
  43. Faloutsos et al 06/30/10
  44. Faloutsos
  45. Faloutsos
  46. Faloutsos
  47. lambda_1 Faloutsos
  48. Faloutsos
  49. Faloutsos
  50. Faloutsos
  51. Faloutsos
  52. Faloutsos
  53. Faloutsos
  54. Faloutsos
  55. Faloutsos et al 06/30/10