SlideShare a Scribd company logo
1 of 19
STINGER
Dynamic Graph Analysis
Contributors
• David Bader
• David Ediger
• Rob McColl
• Jason Riedy
• Kamesh Madduri
• Jason Poovey
Outline
• Motivation


• Dynamic Graph Basics


• What is STINGER?


• What can STINGER do?


• Why STINGER?
Big Data problems need Graph Analysis
    Health Care      • Finding outbreaks, population epidemiology


   Social Networks   • Advertising, searching, grouping, influence


     Intelligence    • Decisions at scale, regulating algorithms


  Systems Biology    • Understanding interactions, drug design


     Power Grid      • Disruptions, conversion


     Simulation      • Discrete events, cracking meshes
Graphs are pervasive
 • Graphs: things and relationships
    • Different kinds of things, different kinds of relationships, but graphs provide a
      framework for analyzing the relationships.
    • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.


         Astrophysics                     Bioinformatics                  Social Informatics
Problem: Outlier detection       Problem:                           Problem: Emergent behavior,
Challenges: Massive data         Identifying target proteins        information spread
sets, temporal variation         Challenges:                        Challenges: New analysis,
Graph Problems: matching,        Data heterogeneity, quality        data uncertainty, scale
clustering                       Graph Problems:                    Graph Problems: clustering,
                                 Centrality, clustering             flows, shortest paths
Data rates and volumes are immense
• Facebook:
  • ~1 billion users
  • average 130 friends
  • 30 billion pieces of content shared / month
• Twitter:
   • 500 million active users
   • 340 million tweets / day
• Internet – 100s of exabytes / year
   • 300 million new websites per year
   • 48 hours of video to You Tube per minute
   • 30,000 YouTube videos played per second
Our focus is streaming graphs
• As relationships change
  • Edges (relationships) are inserted, updated, and removed
  • New vertices (things) join and leave the network


• What are the effects?
  • On information flow
  • On community structure
                                                z       x      y
  • On the integrity of data and structure


• Which actors and relationships are…
  • The key players and influencers in the change?
  • The anomalies and threats?
What is STINGER?
Spatio-Temporal Interaction Networks and Graphs Extensible Representation
D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarr´ıa-Miranda, C. Hastings, K. Madduri, S. C. Poulos


• A scalable, high performance in-memory dynamic graph data
  structure
   •   Stores semantic and temporal information.
   •   Designed to be flexible and extendable.
   •   Be useful for the entire “large graph” community.
   •   Permit good performance: No single structure is optimal for all.
   •   Assume globally addressable memory access.
   •   Support multiple, parallel readers and a single parallel writer.

• A software suite for dynamic graph analysis
  • Targets large shared-memory x86 and the Cray XMT
  • Written in C with OpenMP and XMT pragma support for parallelism
As a data structure
• Fast insertions, deletions, and updates:
 A data structure that grows and changes at the speed of the data.

• Edge and vertex types and weights:
 Represent complex relationships and multiple simultaneous networks.

• Filtering traversal mechanisms:
 Traverse serially or in parallel on specific edge types, time ranges,
 vertex sets, etc.

• Experimental workflow server:
 Multiple data streams and analytics with one persistent data structure.

• Experimental Java and Python bindings:
 Use efficiency-oriented languages without sacrificing performance-
 oriented results.
As an analysis package
• Streaming edge insertions and deletions:
  Performs new edge insertions, updates, and deletions in batches or individually.

• Streaming clustering coefficients:
  Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions.

• Streaming connected components:
  Accurately tracks the connected components of a graph with insertions and deletions.

• Streaming community detection:
  Track and update the community structures within the graph as they change.

• Parallel agglomerative clustering:
  Find clusters that are optimized for a user-defined edge scoring function.

• Streaming Betweenness Centrality:
  Find the key points within information flows and structural vulnerabilities.

• K-core Extraction:
  Extract additional communities and filter noisy high-degree vertices.

• Classic breadth-first search:
  Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths.
How is the graph stored?
What can STINGER represent?
• Nearly any set of
  relationships
   •   Healthcare
   •   Social Networks
   •   Intelligence
   •   Systems biology
   •   Power grid
   •   Travel networks

• Example: Twitter
   • Users, hashtags, tweets as vertex types
   • Authorship, retweet, mentions, follows / followed by edge types


• Example: Work Environment
   • Users, PCs, printers, emails, URLs, files, etc. as vertex types
   • Email alias, from, to, access, logon/off, print, IM, etc. as edge types
What can STINGER do?
• Optimized to update at rates of over 3 million edges per second on
 graphs of one billion edges
  •   D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming
      Graphs,'' The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20-
      22, 2012. Best Paper Award.




                       RMAT – Recursive MATrix graph generator. RMAT(N) indicates 2^N vertices.
What can STINGER do?
• Maintaining connected components in a graph of half a billion edges
  • Up to 1.26 million updates per sec.
  • 137x faster than recomputing.

• Scalable parallel streaming community detection
  • Built on parallel insert / delete mechanisms.

• Streaming approximate betweenness
  • Used to analyze influencers on Twitter during Hurricane Sandy over time.
What does STINGER not do?
• Does not provide all ACID properties
   • Why: Not intended to be the backing data store.
   • Why: Allows for greater ingest and processing speeds.
   • Alternative: Back STINGER ingest with an ACID DB
   • Alternative: STINGER does provide consistency, partial isolation


• No text base query language – for now
   • Why: Currently, no language is general enough to describe most or all queries
   • Alternative: Filtering traversal APIs, unlimited query flexibility through code
   • Alternative: Productivity language bindings (Python, Java)


• No distributed / Hadoop-like cluster support
   • Why: Good fit for ingest, but poor for streaming analysis, random access is too slow
   • Alternative: Larger shared memory systems such as the Cray XMT and SGI UV systems
   • Alternative: Processing billion-edge graphs in shared memory on affordable Intel servers
   • Alternative: Extract key portions of the graph from a larger data store and perform fast in-
     memory processing in STINGER
What sizes, performance can it handle?
                                                                  Server 4x Opteron 6282 256GB DDR3
    Desktop (Intel Core i7-2600 16GB DDR3)                                                     Connected      Updates
                                                            V      E      Config Size (GB)
                                 Connected      Updates                                      Components (s)   per Sec.
V      E    Config Size (GB)
                               Components (s)   per Sec.
                                                           16M 512M       25-14    60GB           13.7         696K
1M    8M    22-14    1.184         0.316         2.7M
                                                           16M 256M       25-14    24.6GB         9.82         2.1M
2M    16M   22-14    2.384          0.75         2.3M
4M    33M   22-14    4.768           2           2.3M           Cray XMT2 – 64 Processors 2TB DDR2
8M    67M   24-14    9.536          5.36         0.85M                                         Connected      Updates
                                                            V       E     Config Size (GB)
                                                                                             Components (s)   per Sec.
4M    67M   24-14    7.984           3           1.38M
                                                           67M    512M     28-32    86GB          13.8         3.3M
4M   134M   24-14    14.336         5.7          0.8M
                                                           268M    4.3B    28-32   312GB          52.3         2.34M


                        • The only limitation on size is system memory
                            • Billions of vertices and edges are possible

                        • V vertices and E edges in each graph
                             • E counts are undirected
                             • STINGER stores both directions
                        • Config is STINGER-specific parameters
Why not existing technologies?
• Traditional SQL databases
   • Not structured to do any meaningful graph queries with any level of
     efficiency or timeliness

• Graph databases - mostly on-disk
  • Distributed disk can keep up with storing / indexing, but is simply too
    slow at random graph access to process on as the graph updates

• Hadoop and HDFS-based projects
  • Not really the right programming model for many structural queries
    over the entire graph, random access performance is poor

• Smaller graph libraries, processing tools
  • Can't scale, can't process dynamic graphs, frequently leads to
    impossible visualization attempts
Who is GTRI?
• Georgia Tech Research Institute
  • Largest research entity at Georgia Institute of Technology
  • One of the world's premier university-based applied R&D
    organizations for 75 years
  • Non-profit with over 1,600 employees and 21 locations world-wide
  • Over $240 million per year of government and industry contracts


• Innovative Computing Division
 of the Cyber Technology and Information Security Lab
  • Dedicated to the application of practical HPC expertise and
    cutting-edge fundamental research to solve real-world problems
  • Experts in high-performance computing, algorithms, and big data
How can I start using STINGER?
• Information, code, help
   • http://cc.gatech.edu/stinger
   • robert.mccoll@gtri.gatech.edu


• Together, GTRI and Georgia Tech can offer
   • Consulting
     Understand how your organization can benefit from graph analytics.

  • Training
    Learn how to use graph analysis and apply STINGER to your data.

  • Implementation
    Customize and extend STINGER to suit your needs using our experts.

  • Research Expertise
    Connect with researchers on the cutting edge of big data to develop novel
    solutions to your open problems.

More Related Content

Viewers also liked

Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCGilad Lotan
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsdatablend
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionWill Johnson
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 

Viewers also liked (16)

Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYC
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
 
Temporal graph
Temporal graphTemporal graph
Temporal graph
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Gephi with CSV File
Gephi with CSV FileGephi with CSV File
Gephi with CSV File
 
Sparksee overview
Sparksee overviewSparksee overview
Sparksee overview
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 

Similar to Dynamic Graph Analysis with STINGER

Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...Simon Lia-Jonassen
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Deepak Shankar
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law? Jeffrey Funk
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scaleRan Levy
 

Similar to Dynamic Graph Analysis with STINGER (20)

Google file system
Google file systemGoogle file system
Google file system
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law?
 
TARDEC Presentation 2
TARDEC Presentation 2TARDEC Presentation 2
TARDEC Presentation 2
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Dynamic Graph Analysis with STINGER

  • 2. Contributors • David Bader • David Ediger • Rob McColl • Jason Riedy • Kamesh Madduri • Jason Poovey
  • 3. Outline • Motivation • Dynamic Graph Basics • What is STINGER? • What can STINGER do? • Why STINGER?
  • 4. Big Data problems need Graph Analysis Health Care • Finding outbreaks, population epidemiology Social Networks • Advertising, searching, grouping, influence Intelligence • Decisions at scale, regulating algorithms Systems Biology • Understanding interactions, drug design Power Grid • Disruptions, conversion Simulation • Discrete events, cracking meshes
  • 5. Graphs are pervasive • Graphs: things and relationships • Different kinds of things, different kinds of relationships, but graphs provide a framework for analyzing the relationships. • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Astrophysics Bioinformatics Social Informatics Problem: Outlier detection Problem: Problem: Emergent behavior, Challenges: Massive data Identifying target proteins information spread sets, temporal variation Challenges: Challenges: New analysis, Graph Problems: matching, Data heterogeneity, quality data uncertainty, scale clustering Graph Problems: Graph Problems: clustering, Centrality, clustering flows, shortest paths
  • 6. Data rates and volumes are immense • Facebook: • ~1 billion users • average 130 friends • 30 billion pieces of content shared / month • Twitter: • 500 million active users • 340 million tweets / day • Internet – 100s of exabytes / year • 300 million new websites per year • 48 hours of video to You Tube per minute • 30,000 YouTube videos played per second
  • 7. Our focus is streaming graphs • As relationships change • Edges (relationships) are inserted, updated, and removed • New vertices (things) join and leave the network • What are the effects? • On information flow • On community structure z x y • On the integrity of data and structure • Which actors and relationships are… • The key players and influencers in the change? • The anomalies and threats?
  • 8. What is STINGER? Spatio-Temporal Interaction Networks and Graphs Extensible Representation D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarr´ıa-Miranda, C. Hastings, K. Madduri, S. C. Poulos • A scalable, high performance in-memory dynamic graph data structure • Stores semantic and temporal information. • Designed to be flexible and extendable. • Be useful for the entire “large graph” community. • Permit good performance: No single structure is optimal for all. • Assume globally addressable memory access. • Support multiple, parallel readers and a single parallel writer. • A software suite for dynamic graph analysis • Targets large shared-memory x86 and the Cray XMT • Written in C with OpenMP and XMT pragma support for parallelism
  • 9. As a data structure • Fast insertions, deletions, and updates: A data structure that grows and changes at the speed of the data. • Edge and vertex types and weights: Represent complex relationships and multiple simultaneous networks. • Filtering traversal mechanisms: Traverse serially or in parallel on specific edge types, time ranges, vertex sets, etc. • Experimental workflow server: Multiple data streams and analytics with one persistent data structure. • Experimental Java and Python bindings: Use efficiency-oriented languages without sacrificing performance- oriented results.
  • 10. As an analysis package • Streaming edge insertions and deletions: Performs new edge insertions, updates, and deletions in batches or individually. • Streaming clustering coefficients: Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions. • Streaming connected components: Accurately tracks the connected components of a graph with insertions and deletions. • Streaming community detection: Track and update the community structures within the graph as they change. • Parallel agglomerative clustering: Find clusters that are optimized for a user-defined edge scoring function. • Streaming Betweenness Centrality: Find the key points within information flows and structural vulnerabilities. • K-core Extraction: Extract additional communities and filter noisy high-degree vertices. • Classic breadth-first search: Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths.
  • 11. How is the graph stored?
  • 12. What can STINGER represent? • Nearly any set of relationships • Healthcare • Social Networks • Intelligence • Systems biology • Power grid • Travel networks • Example: Twitter • Users, hashtags, tweets as vertex types • Authorship, retweet, mentions, follows / followed by edge types • Example: Work Environment • Users, PCs, printers, emails, URLs, files, etc. as vertex types • Email alias, from, to, access, logon/off, print, IM, etc. as edge types
  • 13. What can STINGER do? • Optimized to update at rates of over 3 million edges per second on graphs of one billion edges • D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming Graphs,'' The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20- 22, 2012. Best Paper Award. RMAT – Recursive MATrix graph generator. RMAT(N) indicates 2^N vertices.
  • 14. What can STINGER do? • Maintaining connected components in a graph of half a billion edges • Up to 1.26 million updates per sec. • 137x faster than recomputing. • Scalable parallel streaming community detection • Built on parallel insert / delete mechanisms. • Streaming approximate betweenness • Used to analyze influencers on Twitter during Hurricane Sandy over time.
  • 15. What does STINGER not do? • Does not provide all ACID properties • Why: Not intended to be the backing data store. • Why: Allows for greater ingest and processing speeds. • Alternative: Back STINGER ingest with an ACID DB • Alternative: STINGER does provide consistency, partial isolation • No text base query language – for now • Why: Currently, no language is general enough to describe most or all queries • Alternative: Filtering traversal APIs, unlimited query flexibility through code • Alternative: Productivity language bindings (Python, Java) • No distributed / Hadoop-like cluster support • Why: Good fit for ingest, but poor for streaming analysis, random access is too slow • Alternative: Larger shared memory systems such as the Cray XMT and SGI UV systems • Alternative: Processing billion-edge graphs in shared memory on affordable Intel servers • Alternative: Extract key portions of the graph from a larger data store and perform fast in- memory processing in STINGER
  • 16. What sizes, performance can it handle? Server 4x Opteron 6282 256GB DDR3 Desktop (Intel Core i7-2600 16GB DDR3) Connected Updates V E Config Size (GB) Connected Updates Components (s) per Sec. V E Config Size (GB) Components (s) per Sec. 16M 512M 25-14 60GB 13.7 696K 1M 8M 22-14 1.184 0.316 2.7M 16M 256M 25-14 24.6GB 9.82 2.1M 2M 16M 22-14 2.384 0.75 2.3M 4M 33M 22-14 4.768 2 2.3M Cray XMT2 – 64 Processors 2TB DDR2 8M 67M 24-14 9.536 5.36 0.85M Connected Updates V E Config Size (GB) Components (s) per Sec. 4M 67M 24-14 7.984 3 1.38M 67M 512M 28-32 86GB 13.8 3.3M 4M 134M 24-14 14.336 5.7 0.8M 268M 4.3B 28-32 312GB 52.3 2.34M • The only limitation on size is system memory • Billions of vertices and edges are possible • V vertices and E edges in each graph • E counts are undirected • STINGER stores both directions • Config is STINGER-specific parameters
  • 17. Why not existing technologies? • Traditional SQL databases • Not structured to do any meaningful graph queries with any level of efficiency or timeliness • Graph databases - mostly on-disk • Distributed disk can keep up with storing / indexing, but is simply too slow at random graph access to process on as the graph updates • Hadoop and HDFS-based projects • Not really the right programming model for many structural queries over the entire graph, random access performance is poor • Smaller graph libraries, processing tools • Can't scale, can't process dynamic graphs, frequently leads to impossible visualization attempts
  • 18. Who is GTRI? • Georgia Tech Research Institute • Largest research entity at Georgia Institute of Technology • One of the world's premier university-based applied R&D organizations for 75 years • Non-profit with over 1,600 employees and 21 locations world-wide • Over $240 million per year of government and industry contracts • Innovative Computing Division of the Cyber Technology and Information Security Lab • Dedicated to the application of practical HPC expertise and cutting-edge fundamental research to solve real-world problems • Experts in high-performance computing, algorithms, and big data
  • 19. How can I start using STINGER? • Information, code, help • http://cc.gatech.edu/stinger • robert.mccoll@gtri.gatech.edu • Together, GTRI and Georgia Tech can offer • Consulting Understand how your organization can benefit from graph analytics. • Training Learn how to use graph analysis and apply STINGER to your data. • Implementation Customize and extend STINGER to suit your needs using our experts. • Research Expertise Connect with researchers on the cutting edge of big data to develop novel solutions to your open problems.