Se está descargando tu SlideShare. ×

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

1 de 15 Anuncio

# Gephi, Graphx, and Giraph

Presentation by Doug Needham discussing Gephi, GraphX, and Giraph. Overview of Graph Theory and Network Analysis. Very high level.

Presentation by Doug Needham discussing Gephi, GraphX, and Giraph. Overview of Graph Theory and Network Analysis. Very high level.

Anuncio
Anuncio

Anuncio

Anuncio

### Gephi, Graphx, and Giraph

1. 1. Graph Theory at work doug.needham@ilwllc.com
2. 2. • @dougneedham • Data Guy - Started as a DBA in the Marine Corps, evolved to Architect, now aspiring Data Scientist. • Oracle, SQL Server, Cassandra, Hadoop, MySQL. • I have a strong relational/traditional background. • Perpetual Student • Learning new things challenges our assumptions. Forces us to take a new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.
3. 3. • Stand back, we are going to talk about math! • Basically we are talking about a bunch of dots joined together by lines • Vertex – Dot on a graph • Edge – Line connecting the two points • Triangle – 3 Vertices, 3 Edges • Square – 4 Vertices, 4 edges • Open Triangle - 3 Vertices, 2 edges • A lot of things are networks if you look at them the right way. • Mark Newman has done a number of really cool presentations, available on Youtube about Network analysis. • https://www.youtube.com/watch?v=lETt7IcDWLI
4. 4. • The 7 Bridges of Konisberg • Every tome on Graph theory or Network analysis devotes a small portion of there time to the 7 Bridges of Konisberg. • If I don’t cover this with you, the gods of mathematics will strike me down, and never allow me to do analysis again in the future.
5. 5. • Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another. • Eventually Leonhard Euler was brought into the debate about the efficiency problem. • Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable. • And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory. • http://en.wikipedia.org/wiki/Leonhard_Euler
6. 6. • http://gephi.github.io/ • From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.” • To get this yourself go into Facebook and search for: Netvizz. (You have to authorized it. You can un-authorized it later) • Click the application. • Click “personal network” • Click Start • Download your gdf file • Quick Demo:
7. 7. • Shortest path – How are two vertices connected? • What is a path? • Centrality • Transitivity • Homophily • Directed Graphs – or Digraphs • Contagion – How do things “spread” through a network? • Let’s rearrange things, how does the layout affect understanding? • This is not just data visualization, it can also be used for prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU
8. 8. • Requires Spark, which is not a bad deal. • Jump to Demo • http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx. html
9. 9. • Giraph, I haven’t really done as much with as I wanted to do. Perhaps a later presentation with a more detailed example comparing GraphX with Giraph.
10. 10. • I started doing some analysis some time ago using Graph models to understand metadata. • I came up with two types of Graphs: • Data Structure Graph Level 1 – This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges. • Data Structure Graph Level 2 – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.
11. 11. • A DSG Level 1 can show you where you are going to have the most interesting query performance of your tables. • A DSG Level 2 can show you where the most amount of work is going on in your Enterprise.
12. 12. • Network/Graph Analysis is cool. • It can show you some interesting things about your data. • Some things to consider. • Some thought needs to be put into how the raw data is organized for a Graph Analysis. • Directed graph, undirected, bigraph? Some up front setup work needs to be done. • Tools help with the detailed calculations, and show the paths, walks, etc. • However, due thought should be put towards a network analysis project.