SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Scalable Algorithms for Analysis of Massive,
Streaming Graphs
Jason Riedy, Georgia Institute of Technology;
Henning Meyerhenke, Karlsruhe Inst. of Technology;
with David Bader, David Ediger, and others at GT

                                        15 February, 2012
Outline


       Motivation


       Technical
          Why analyze data streams?
          Overall streaming approach
          Clustering coefficients
          Connected components
          Common aspects and questions


       Session




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   2/29
Exascale Data Analysis


              Health care Finding outbreaks, population epidemiology
        Social networks Advertising, searching, grouping
               Intelligence Decisions at scale, regulating algorithms
       Systems biology Understanding interactions, drug design
               Power grid Disruptions, conservation
               Simulation Discrete events, cracking meshes

                     The data is full of semantically rich relationships.
                                  Graphs! Graphs! Graphs!



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   3/29
Graphs are pervasive
           • Sources of massive data: petascale simulations, experimental
               devices, the Internet, scientific applications.
           • New challenges for analysis: data sizes, heterogeneity,
               uncertainty, data quality.


  Astrophysics                            Bioinformatics                          Social Informatics
                                          Problem Identifying target              Problem Emergent behavior,
  Problem Outlier detection
                                          proteins                                information spread
  Challenges Massive data
                                          Challenges Data                         Challenges New analysis,
  sets, temporal variation
                                          heterogeneity, quality                  data uncertainty
  Graph problems Matching,
                                          Graph problems Centrality,              Graph problems Clustering,
  clustering
                                          clustering                              flows, shortest paths




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                    4/29
These are not easy graphs.
                                Yifan Hu’s (AT&T) visualization of the Livejournal data set




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy        5/29
But no shortage of structure...




           Protein interactions, Giot et al., “A Protein
           Interaction Map of Drosophila melanogaster”,
                                                                        Jason’s network via LinkedIn Labs
           Science 302, 1722-1736, 2003.



           • Globally, there rarely are good, balanced separators in the
               scientific computing sense.
           • Locally, there are clusters or communities and many levels of
               detail.


SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                      6/29
Also no shortage of data...
       Existing (some out-of-date) data volumes
                 NYSE 1.5 TB generated daily into a maintained 8 PB archive
                Google “Several dozen” 1PB data sets (CACM, Jan 2010)
                   LHC 15 PB per year (avg. 21 TB daily)
                       http://public.web.cern.ch/public/en/lhc/
                       Computing-en.html
            Wal-Mart 536 TB, 1B entries daily (2006)
                  EBay 2 PB, traditional DB, and 6.5PB streaming, 17 trillion
                       records, 1.5B records/day, each web click is 50-150
                       details. http://www.dbms2.com/2009/04/30/
                       ebays-two-enormous-data-warehouses/
            Faceboot 845 M users... and growing.

           • All data is rich and semantic (graphs!) and changing.
           • Base data rates include items and not relationships.
SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   7/29
General approaches

           • High-performance static graph analysis
               • Develop techniques that apply to unchanging massive graphs.
               • Provides useful after-the-fact information, starting points.
               • Serves many existing applications well: market research, much
                 bioinformatics, ...
           • High-performance streaming graph analysis
               • Focus on the dynamic changes within massive graphs.
               • Find trends or new information as they appear.
               • Serves upcoming applications: fault or threat detection, trend
                 analysis, ...


                     Both very important to different areas.
                        Remaining focus is on streaming.
          Note: Not CS theory streaming, but analysis of streaming data.


SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   8/29
Why analyze data streams?

                                                    Data transfer
                                                        • 1 Gb Ethernet: 8.7TB daily at
    Data volumes
                                                            100%, 5-6TB daily realistic
              NYSE 1.5TB daily
                                                        • Multi-TB storage on 10GE: 300TB
                LHC 41TB daily                              daily read, 90TB daily write
        Facebook Who knows?                             • CPU ↔ Memory: QPI,HT:
                                                            2PB/day@100%

    Data growth
                                                    Speed growth
        • Facebook: > 2×/yr
                                                        • Ethernet/IB/etc.: 4× in next 2
        • Twitter: > 10×/yr
                                                            years. Maybe.
        • Growing sources:
                                                        • Flash storage, direct: 10× write,
           Bioinformatics,                                  4× read. Relatively huge cost.
           µsensors, security

SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy        9/29
Overall streaming approach




           Protein interactions, Giot et al., “A Protein
           Interaction Map of Drosophila melanogaster”,
                                                                        Jason’s network via LinkedIn Labs
           Science 302, 1722-1736, 2003.



       Assumptions
           • A graph represents some real-world phenomenon.
               • But not necessarily exactly!
               • Noise comes from lost updates, partial information, ...



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                      10/29
Overall streaming approach




           Protein interactions, Giot et al., “A Protein
           Interaction Map of Drosophila melanogaster”,
                                                                        Jason’s network via LinkedIn Labs
           Science 302, 1722-1736, 2003.



       Assumptions
           • We target massive, “social network” graphs.
              • Small diameter, power-law degrees
              • Small changes in massive graphs often are unrelated.



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                      11/29
Overall streaming approach




           Protein interactions, Giot et al., “A Protein
           Interaction Map of Drosophila melanogaster”,
                                                                        Jason’s network via LinkedIn Labs
           Science 302, 1722-1736, 2003.



       Assumptions
           • The graph changes, but we don’t need a continuous view.
               • We can accumulate changes into batches...
               • But not so many that it impedes responsiveness.



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                      12/29
Difficulties for performance

         • What partitioning
             methods apply?
                 • Geometric? Nope.
                 • Balanced? Nope.
                 • Is there a single, useful
                    decomposition?
                        Not likely.
         • Some partitions exist, but
             they don’t often help
             with balanced bisection or
             memory locality.
         • Performance needs new
             approaches, not just
             standard scientific                                         Jason’s network via LinkedIn Labs
             computing methods.

SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                      13/29
STING’s focus

                            Control
             action                                prediction


                                                                               summary
          Source              data         Simulation / query                            Viz


           • STING manages queries against changing graph data.
                • Visualization and control often are application specific.
           • Ideal: Maintain many persistent graph analysis kernels.
                • Keep one current snapshot of the graph resident.
                • Let kernels maintain smaller histories.
                • Also (a harder goal), coordinate the kernels’ cooperation.
           • Gather data into a typed graph structure, STINGER.



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy         14/29
STINGER
       STING Extensible Representation:




           • Rule #1: No explicit locking.
               • Rely on atomic operations.
           • Massive graph: Scattered updates, scattered reads rarely
               conflict.
           • Use time stamps for some view of time.


SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   15/29
Initial results

       Prototype STING and STINGER
       Monitoring the following properties:
           1   clustering coefficients,
           2   connected components, and
           3   community structure (in progress).

       High-level
           • Support high rates of change, over 10k updates per second.
           • Performance scales somewhat with available processing.
           • Gut feeling: Scales as much with sockets as cores.

                         http://www.cc.gatech.edu/stinger/


SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   16/29
Experimental setup
       Unless otherwise noted
                     Line             Model          Speed (GHz)              Sockets    Cores
                 Nehalem             X5570                  2.93                   2       4
                 Westmere           E7-8870                 2.40                   4      10

           • Westmere loaned by Intel (thank you!)
           • All memory: 1067MHz DDR3, installed appropriately
           • Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.0 kernel
           • Artificial graph and edge stream generated by R-MAT
               [Chakrabarti, Zhan, & Faloutsos].
                   • Scale x, edge factor f ⇒ 2x vertices, ≈ f · 2x edges.
                   • Edge actions: 7/8th insertions, 1/8th deletions
                   • Results over five batches of edge actions.
           • Caveat: No vector instructions, low-level optimizations yet.

SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy           17/29
Clustering coefficients
        • Used to measure “small-world-ness”
           [Watts & Strogatz] and potential                                              i       m
           community structure
        • Larger clustering coefficient ⇒ more                                                 v
           inter-connected
                                                                                         j       n
        • Roughly the ratio of the number of actual
           to potential triangles

           • Defined in terms of triplets.
           • i – v – j is a closed triplet (triangle).
           • m – v – n is an open triplet.
           • Clustering coefficient:
                                    # of closed triplets / total # of triplets
           • Locally around v or globally for entire graph.

SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy               18/29
Updating triangle counts

                  Given Edge {u, v } to be inserted (+) or deleted (-)
            Approach Search for vertices adjacent to both u and v , update
                     counts on those and u and v

       Three methods
         Brute force Intersect neighbors of u and v by iterating over each,
                     O(du dv ) time.
           Sorted list Sort u’s neighbors. For each neighbor of v , check if in
                       the sorted list.
       Compressed bits Summarize u’s neighbors in a bit array. Reduces
                   check for v ’s neighbors to O(1) time each.
                   Approximate with Bloom filters. [MTAAP10]
       All rely on atomic addition.


SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   19/29
Batches of 10k actions

                                                                                Brute force                              Bloom filter                                   Sorted list
                                                                 1.7e+04                  3.4e+04         8.8e+04                   1.2e+05        8.8e+04                        1.4e+05
                                                        105.5
                                                                 1.5e+04                                  1.3e+05                                  1.2e+05
                                                                5.1e+03                                  2.4e+04                                  2.2e+04
         Updates per seconds, both metric and STINGER




                                                                3.9e+03                                  2.0e+04                                  1.7e+04                         q        q
                                                                                                                                         q                                        q    q
                                                                                                                    q                                                             q
                                                                                                                                                                                  q
                                                                                                                                                                                  q
                                                            5                                                           q                                                q
                                                                                                                    q                                               q
                                                         10                                                    q                                           q
                                                                                                                                                               qq
                                                                                                                                                                q
                                                                                                                q                                               q
                                                                                                               q                                           q

                                                                                                               q
                                                                                                                                                       q

                                                                                                                                                   q                                            Machine
                                                        104.5                                                                                       q                                           a   4 x E7−8870

                                                                           q                                                                       q
                                                                                                                                                                                                a   2 x X5570

                                                                                                                                                   q
                                                                                                                                                   q


                                                                     q q
                                                                     q

                                                         104




                                                        103.5

                                                                 0         20      40         60    80     0        20       40     60       80    0            20           40       60   80
                                                                                                                 Threads
                                                                                                   Graph size: scale 22, edge factor 16




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                                                                                                                            20/29
Different batch sizes

                                                                                Brute force                                     Bloom filter                                    Sorted list
                                                        105.5

                                                         105                                                      q
                                                                                                                  q
                                                                                                                                                                    q
                                                                                                                                                                    q
                                                                                                                                                                  q q

                                                                                              q                            q                                            q                         q
         Updates per seconds, both metric and STINGER




                                                                                              q                                        q                                                  q




                                                                                                                                                                                                           100
                                                          4.5                                                                                    q
                                                        10                                                   q                                                                       q
                                                                      q                                                                     q                                             q
                                                                                                                                   q        q
                                                                q q
                                                         104     q

                                                                qq                                                     q
                                                          3.5                                 q   q
                                                        10
                                                        105.5

                                                         105                                                      q
                                                                                                                  q
                                                                                                                   q
                                                                                                                                                                  q                           q
                                                                                                                 q                                   q   q                                                         Machine




                                                                                                                                                                                                           1000
                                                        104.5                                                 q
                                                                                                                                                              q
                                                                                                                                                                                                                      4 x E7−8870
                                                                                                             q                                                q
                                                                                                                                                              q
                                                                                                                                                                                                                      2 x X5570
                                                         104    q
                                                                  q
                                                                q
                                                                q
                                                                q
                                                                q

                                                        103.5
                                                        105.5
                                                                                                                                                     q                                        q   q   q
                                                                                                                           q                                                                  q
                                                                                                                                                                                              q
                                                         105                                                      qq
                                                                                                                  q
                                                                                                                           q   q
                                                                                                                                                                    qqq
                                                                                                                                                                      q
                                                                                                                                                                    q q
                                                                                                                                                                            q    q

                                                                                                                 q




                                                                                                                                                                                                           10000
                                                                                                                                                                q
                                                          4.5                                                                                                 q
                                                        10                 q                                                                                  q
                                                                                                                                                               q

                                                                                                                                                              q
                                                                     q q
                                                         104

                                                        103.5
                                                                0          20      40     60          80     0             20          40       60       80   0         20           40       60      80
                                                                                                                    Threads
                                                                                                      Graph size: scale 22, edge factor 16




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                                                                                                                                              21/29
Different batch sizes: Reactivity

                                                                                    Brute force                               Bloom filter                                      Sorted list

                                                              100
                                                            10−0.5
         Seconds between updates, both metric and STINGER




                                                             10−1




                                                                                                                                                                                                           100
                                                                                                                                                                    q
                                                                           qq
                                                            10−1.5   q
                                                                     q
                                                                                                  q
                                                                                                  q
                                                                                                  q   q


                                                             10−2     q
                                                                     q q
                                                                                                                                 q        q
                                                                                                                                          q                                               q
                                                                       q                                                                                                             q
                                                                                                               q
                                                            10−2.5                                q
                                                                                                  q                      q           q
                                                                                                                                               q
                                                                                                                                                                        q                 q       q
                                                                                                                   q
                                                             10−3                                                  q                                            q
                                                                                                                                                                    q




                                                              100
                                                            10−0.5   q
                                                                     q
                                                                     q                                                                                                                                             Machine
                                                             10−1    qq




                                                                                                                                                                                                           1000
                                                                                                                                                            q
                                                                                                               q                                            q
                                                                                                                                                            q                                                         4 x E7−8870
                                                            10−1.5                                             q
                                                                                                                                                       q
                                                                                                                   q                               q
                                                             10−2
                                                                                                                    q
                                                                                                                    qq
                                                                                                                                                                q                             q                       2 x X5570
                                                            10−2.5
                                                             10−3

                                                              100        q q
                                                                                                                                                            q
                                                                               q                                                                            q
                                                            10−0.5                                                 q
                                                                                                                                                            q
                                                                                                                                                            q
                                                                                                                                                              q
                                                                                                                    q                                             q q
                                                             10−1                                                  qq                                             qqq




                                                                                                                                                                                                           10000
                                                                                                                         q   q                                      q       q    q
                                                                                                                         q                         q                                          q
                                                                                                                                                                                              q   q   q

                                                            10−1.5
                                                             10−2
                                                            10−2.5
                                                             10−3

                                                                     0         20      40     60          80   0         20          40       60       80   0           20           40       60      80
                                                                                                                    Threads
                                                                                                      Graph size: scale 22, edge factor 16




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                                                                                                                                              22/29
Connected components


           • Maintain a mapping from vertex to
               component.
           • Global property, unlike triangle
               counts
           • In “scale free” social networks:
                • Often one big component, and
                • many tiny ones.
           • Edge changes often sit within
               components.
           • Remaining insertions merge
               components.
           • Deletions are more difficult...



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   23/29
Connected components


           • Maintain a mapping from vertex to
               component.
           • Global property, unlike triangle
               counts
           • In “scale free” social networks:
                • Often one big component, and
                • many tiny ones.
           • Edge changes often sit within
               components.
           • Remaining insertions merge
               components.
           • Deletions are more difficult...



SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   24/29
Connected components: Deleted edges

    The difficult case
        • Very few deletions
          matter.
        • Determining which
          matter may require a
          large graph search.
               • Re-running static
                 component
                 detection.
               • (Long history, see
                 related work in
                 [MTAAP11].)
        • Coping mechanisms:
            • Heuristics.
            • Second level of
              batching.

SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   25/29
Deletion heuristics



       Rule out effect-less deletions
           • Use the spanning tree by-product of static connected
               component algorithms.
           • Ignore deletions when one of the following occur:
               1 The deleted edge is not in the spanning tree.
               2 If the endpoints share a common neighbor∗ .
               3 If the loose endpoint can reach the root∗ .

           • In the last two (∗), also fix the spanning tree.

                                     Rules out 99.7% of deletions.




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   26/29
Connected components: Performance

                                                        106   2.4e+03 1.6e+04                                                  6.4e+03
                                                              3.2e+03 2.0e+03
                                                        105




                                                                                                                                         100
         Updates per seconds, both metric and STINGER




                                                        104

                                                        103



                                                        106   1.7e+04 7.7e+04                                                  1.4e+04
                                                              1.9e+04 2.0e+04
                                                        105                                                                                      Machine




                                                                                                                                         1000
                                                        104
                                                                                                                                                 a   4 x E7−8870
                                                                                                                                                 a   2 x X5570
                                                          3
                                                        10



                                                        106   5.8e+04 1.3e+05                                                  1.5e+05
                                                              5.5e+04 1.1e+05
                                                          5
                                                        10




                                                                                                                                         10000
                                                        104

                                                        103


                                                               12 4 6 8   12 16    24      32      40      48     56     64   72   80
                                                                                                Threads
                                                                                  Graph size: scale 22, edge factor 16




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                                                                             27/29
Common aspects
           • Each parallelizes sufficiently well over the affected vertices V ,
               those touched by new or removed edges.
           • Total amount of work is O(Vol(V )) = O(                                 v ∈V   deg(v )).
           • Our in-progress work on refining or re-agglomerating
               communities with updates also is O(Vol(V )).

           • How many interesting graph properties can be updated with
               O(Vol(V )) work?
           • Do these parallelize well?
           • The hidden constant and how quickly performance becomes
               asymptotic determines the metric update rate. What
               implementation techniques bash down the constant?
           • How sensitive are these metrics to noise and error?
           • How quickly can we “forget” data and still maintain metrics?

SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy                  28/29
Session outline


       Emergent Behavior Detection in Massive Graphs :
                  Nadya Bliss and Benjamin Miller, Massachusetts
                  Institute of Technology, USA
       Scalable Graph Clustering and Analysis with KDT :
                    John R. Gilbert and Adam Lugowski, University of
                    California, Santa Barbara, USA; Steve Reinhardt, Cray,
                    USA
       Multiscale Approach for Network Compression-friendly Ordering :
                    Ilya Safro, Argonne National Laboratory, USA; Boris
                    Temkin, Weizmann Institute of Science, Israel




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   29/29
Acknowledgment of support




SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy   30/29

Más contenido relacionado

Similar a SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs

SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsJason Riedy
 
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...Jason Riedy
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Social Network Applications
Social Network ApplicationsSocial Network Applications
Social Network ApplicationsJason Riedy
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Jason Riedy
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
The Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 PresentationThe Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 PresentationMerck
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph MiningSabri Skhiri
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictiveEditorJST
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 

Similar a SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs (20)

SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
 
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
FR.pptx
FR.pptxFR.pptx
FR.pptx
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Social Network Applications
Social Network ApplicationsSocial Network Applications
Social Network Applications
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
The Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 PresentationThe Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 Presentation
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 

Más de Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 

Más de Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 

Último

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs

  • 1. Scalable Algorithms for Analysis of Massive, Streaming Graphs Jason Riedy, Georgia Institute of Technology; Henning Meyerhenke, Karlsruhe Inst. of Technology; with David Bader, David Ediger, and others at GT 15 February, 2012
  • 2. Outline Motivation Technical Why analyze data streams? Overall streaming approach Clustering coefficients Connected components Common aspects and questions Session SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 2/29
  • 3. Exascale Data Analysis Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating algorithms Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes The data is full of semantically rich relationships. Graphs! Graphs! Graphs! SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 3/29
  • 4. Graphs are pervasive • Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications. • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Astrophysics Bioinformatics Social Informatics Problem Identifying target Problem Emergent behavior, Problem Outlier detection proteins information spread Challenges Massive data Challenges Data Challenges New analysis, sets, temporal variation heterogeneity, quality data uncertainty Graph problems Matching, Graph problems Centrality, Graph problems Clustering, clustering clustering flows, shortest paths SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 4/29
  • 5. These are not easy graphs. Yifan Hu’s (AT&T) visualization of the Livejournal data set SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 5/29
  • 6. But no shortage of structure... Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. • Globally, there rarely are good, balanced separators in the scientific computing sense. • Locally, there are clusters or communities and many levels of detail. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 6/29
  • 7. Also no shortage of data... Existing (some out-of-date) data volumes NYSE 1.5 TB generated daily into a maintained 8 PB archive Google “Several dozen” 1PB data sets (CACM, Jan 2010) LHC 15 PB per year (avg. 21 TB daily) http://public.web.cern.ch/public/en/lhc/ Computing-en.html Wal-Mart 536 TB, 1B entries daily (2006) EBay 2 PB, traditional DB, and 6.5PB streaming, 17 trillion records, 1.5B records/day, each web click is 50-150 details. http://www.dbms2.com/2009/04/30/ ebays-two-enormous-data-warehouses/ Faceboot 845 M users... and growing. • All data is rich and semantic (graphs!) and changing. • Base data rates include items and not relationships. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 7/29
  • 8. General approaches • High-performance static graph analysis • Develop techniques that apply to unchanging massive graphs. • Provides useful after-the-fact information, starting points. • Serves many existing applications well: market research, much bioinformatics, ... • High-performance streaming graph analysis • Focus on the dynamic changes within massive graphs. • Find trends or new information as they appear. • Serves upcoming applications: fault or threat detection, trend analysis, ... Both very important to different areas. Remaining focus is on streaming. Note: Not CS theory streaming, but analysis of streaming data. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 8/29
  • 9. Why analyze data streams? Data transfer • 1 Gb Ethernet: 8.7TB daily at Data volumes 100%, 5-6TB daily realistic NYSE 1.5TB daily • Multi-TB storage on 10GE: 300TB LHC 41TB daily daily read, 90TB daily write Facebook Who knows? • CPU ↔ Memory: QPI,HT: 2PB/day@100% Data growth Speed growth • Facebook: > 2×/yr • Ethernet/IB/etc.: 4× in next 2 • Twitter: > 10×/yr years. Maybe. • Growing sources: • Flash storage, direct: 10× write, Bioinformatics, 4× read. Relatively huge cost. µsensors, security SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 9/29
  • 10. Overall streaming approach Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. Assumptions • A graph represents some real-world phenomenon. • But not necessarily exactly! • Noise comes from lost updates, partial information, ... SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 10/29
  • 11. Overall streaming approach Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. Assumptions • We target massive, “social network” graphs. • Small diameter, power-law degrees • Small changes in massive graphs often are unrelated. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 11/29
  • 12. Overall streaming approach Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. Assumptions • The graph changes, but we don’t need a continuous view. • We can accumulate changes into batches... • But not so many that it impedes responsiveness. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 12/29
  • 13. Difficulties for performance • What partitioning methods apply? • Geometric? Nope. • Balanced? Nope. • Is there a single, useful decomposition? Not likely. • Some partitions exist, but they don’t often help with balanced bisection or memory locality. • Performance needs new approaches, not just standard scientific Jason’s network via LinkedIn Labs computing methods. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 13/29
  • 14. STING’s focus Control action prediction summary Source data Simulation / query Viz • STING manages queries against changing graph data. • Visualization and control often are application specific. • Ideal: Maintain many persistent graph analysis kernels. • Keep one current snapshot of the graph resident. • Let kernels maintain smaller histories. • Also (a harder goal), coordinate the kernels’ cooperation. • Gather data into a typed graph structure, STINGER. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 14/29
  • 15. STINGER STING Extensible Representation: • Rule #1: No explicit locking. • Rely on atomic operations. • Massive graph: Scattered updates, scattered reads rarely conflict. • Use time stamps for some view of time. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 15/29
  • 16. Initial results Prototype STING and STINGER Monitoring the following properties: 1 clustering coefficients, 2 connected components, and 3 community structure (in progress). High-level • Support high rates of change, over 10k updates per second. • Performance scales somewhat with available processing. • Gut feeling: Scales as much with sockets as cores. http://www.cc.gatech.edu/stinger/ SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 16/29
  • 17. Experimental setup Unless otherwise noted Line Model Speed (GHz) Sockets Cores Nehalem X5570 2.93 2 4 Westmere E7-8870 2.40 4 10 • Westmere loaned by Intel (thank you!) • All memory: 1067MHz DDR3, installed appropriately • Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.0 kernel • Artificial graph and edge stream generated by R-MAT [Chakrabarti, Zhan, & Faloutsos]. • Scale x, edge factor f ⇒ 2x vertices, ≈ f · 2x edges. • Edge actions: 7/8th insertions, 1/8th deletions • Results over five batches of edge actions. • Caveat: No vector instructions, low-level optimizations yet. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 17/29
  • 18. Clustering coefficients • Used to measure “small-world-ness” [Watts & Strogatz] and potential i m community structure • Larger clustering coefficient ⇒ more v inter-connected j n • Roughly the ratio of the number of actual to potential triangles • Defined in terms of triplets. • i – v – j is a closed triplet (triangle). • m – v – n is an open triplet. • Clustering coefficient: # of closed triplets / total # of triplets • Locally around v or globally for entire graph. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 18/29
  • 19. Updating triangle counts Given Edge {u, v } to be inserted (+) or deleted (-) Approach Search for vertices adjacent to both u and v , update counts on those and u and v Three methods Brute force Intersect neighbors of u and v by iterating over each, O(du dv ) time. Sorted list Sort u’s neighbors. For each neighbor of v , check if in the sorted list. Compressed bits Summarize u’s neighbors in a bit array. Reduces check for v ’s neighbors to O(1) time each. Approximate with Bloom filters. [MTAAP10] All rely on atomic addition. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 19/29
  • 20. Batches of 10k actions Brute force Bloom filter Sorted list 1.7e+04 3.4e+04 8.8e+04 1.2e+05 8.8e+04 1.4e+05 105.5 1.5e+04 1.3e+05 1.2e+05 5.1e+03 2.4e+04 2.2e+04 Updates per seconds, both metric and STINGER 3.9e+03 2.0e+04 1.7e+04 q q q q q q q q q 5 q q q q 10 q q qq q q q q q q q q Machine 104.5 q a 4 x E7−8870 q q a 2 x X5570 q q q q q 104 103.5 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Threads Graph size: scale 22, edge factor 16 SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 20/29
  • 21. Different batch sizes Brute force Bloom filter Sorted list 105.5 105 q q q q q q q q q q Updates per seconds, both metric and STINGER q q q 100 4.5 q 10 q q q q q q q q q 104 q qq q 3.5 q q 10 105.5 105 q q q q q q q q Machine 1000 104.5 q q 4 x E7−8870 q q q 2 x X5570 104 q q q q q q 103.5 105.5 q q q q q q q 105 qq q q q qqq q q q q q q 10000 q 4.5 q 10 q q q q q q 104 103.5 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Threads Graph size: scale 22, edge factor 16 SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 21/29
  • 22. Different batch sizes: Reactivity Brute force Bloom filter Sorted list 100 10−0.5 Seconds between updates, both metric and STINGER 10−1 100 q qq 10−1.5 q q q q q q 10−2 q q q q q q q q q q 10−2.5 q q q q q q q q q 10−3 q q q 100 10−0.5 q q q Machine 10−1 qq 1000 q q q q 4 x E7−8870 10−1.5 q q q q 10−2 q qq q q 2 x X5570 10−2.5 10−3 100 q q q q q 10−0.5 q q q q q q q 10−1 qq qqq 10000 q q q q q q q q q q q 10−1.5 10−2 10−2.5 10−3 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Threads Graph size: scale 22, edge factor 16 SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 22/29
  • 23. Connected components • Maintain a mapping from vertex to component. • Global property, unlike triangle counts • In “scale free” social networks: • Often one big component, and • many tiny ones. • Edge changes often sit within components. • Remaining insertions merge components. • Deletions are more difficult... SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 23/29
  • 24. Connected components • Maintain a mapping from vertex to component. • Global property, unlike triangle counts • In “scale free” social networks: • Often one big component, and • many tiny ones. • Edge changes often sit within components. • Remaining insertions merge components. • Deletions are more difficult... SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 24/29
  • 25. Connected components: Deleted edges The difficult case • Very few deletions matter. • Determining which matter may require a large graph search. • Re-running static component detection. • (Long history, see related work in [MTAAP11].) • Coping mechanisms: • Heuristics. • Second level of batching. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 25/29
  • 26. Deletion heuristics Rule out effect-less deletions • Use the spanning tree by-product of static connected component algorithms. • Ignore deletions when one of the following occur: 1 The deleted edge is not in the spanning tree. 2 If the endpoints share a common neighbor∗ . 3 If the loose endpoint can reach the root∗ . • In the last two (∗), also fix the spanning tree. Rules out 99.7% of deletions. SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 26/29
  • 27. Connected components: Performance 106 2.4e+03 1.6e+04 6.4e+03 3.2e+03 2.0e+03 105 100 Updates per seconds, both metric and STINGER 104 103 106 1.7e+04 7.7e+04 1.4e+04 1.9e+04 2.0e+04 105 Machine 1000 104 a 4 x E7−8870 a 2 x X5570 3 10 106 5.8e+04 1.3e+05 1.5e+05 5.5e+04 1.1e+05 5 10 10000 104 103 12 4 6 8 12 16 24 32 40 48 56 64 72 80 Threads Graph size: scale 22, edge factor 16 SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 27/29
  • 28. Common aspects • Each parallelizes sufficiently well over the affected vertices V , those touched by new or removed edges. • Total amount of work is O(Vol(V )) = O( v ∈V deg(v )). • Our in-progress work on refining or re-agglomerating communities with updates also is O(Vol(V )). • How many interesting graph properties can be updated with O(Vol(V )) work? • Do these parallelize well? • The hidden constant and how quickly performance becomes asymptotic determines the metric update rate. What implementation techniques bash down the constant? • How sensitive are these metrics to noise and error? • How quickly can we “forget” data and still maintain metrics? SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 28/29
  • 29. Session outline Emergent Behavior Detection in Massive Graphs : Nadya Bliss and Benjamin Miller, Massachusetts Institute of Technology, USA Scalable Graph Clustering and Analysis with KDT : John R. Gilbert and Adam Lugowski, University of California, Santa Barbara, USA; Steve Reinhardt, Cray, USA Multiscale Approach for Network Compression-friendly Ordering : Ilya Safro, Argonne National Laboratory, USA; Boris Temkin, Weizmann Institute of Science, Israel SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 29/29
  • 30. Acknowledgment of support SIAM PP 2012—Scalable Algorithms for Analysis of Massive, Streaming Graphs—Jason Riedy 30/29