3. What are networks?
Networks (graphs) are set of nodes (verticies) connected by edges (links,
ties, arcs)
Additional details
Whole vs. ego: whole networks have all
nodes within a natural boundary
(platform, organization, etc.). An ego
network has one node and all of its
immediate neighbors.
Edges can be directed or undirected and
weighted or unweighted
Additionally, networks may be multilayer
and/or multimodal.
4. Why?
Characterize network structure
How far apart / well-connected are nodes?
Are some nodes at more important positions?
Is the network composed of communities?
How does network structure affect processes?
Information diffusion
Coordination/cooperation
Resilience to failure/attack
5. A network
First questions when approaching a network
What are edges? What are nodes?
What kind of network?
Inclusion/exclusion criteria
7. Python resources
tweepy: Package for Twitter stream and search APIs (only python 2.7 at
the moment)
search and stream API example code along with code to create
mentions/retweet network at
https://github.com/computermacgyver/twitter-python
Python two versions:
2.7.x – many packages, issues with non-English scripts
3.x – less packages, but excellent handling of international scripts
(unicode)
8. NetworkX
http://networkx.github.io/
Package to represent networks as python objects
Convenient functions to add, delete, iterate nodes/edges
Functions to calculate network statistics (degree, clustering, etc.)
Easily generate comparison graphs based on statistical models
Visualization
Alternatives include igraph (available for Python and R)
9. Gephi
Open-source, cross-platform GUI interface
Primary strength is to visualize networks
Basic statistical properties are also available
Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more
10. Network measures
With many nodes visualizations are often difficult/impossible to interpret.
Statistical measures can be very revealing, however.
Node-level
Degree (in, out): How many incoming/outgoing edges does a node have?
Centrality (next slide)
Constraint
Network-level
Components: Number of disconnected subsets of nodes
Density: observed edges
maximum number of edges possible
Clustering coefficient closed triplets
connected triples
Path length distribution
Distributions of node-level measures
11. Centrality measures
Degree
Closeness: Measures the average geodesic distance to ALL other nodes.
Informally, an indication of the ability of a node to diffuse a property
efficiently.
Betweenness: Number of shortest paths the node lies on. Informally,
the betweenness is high if a node bridges clusters.
Eigenvector: A weighted degree centrality (inbound links from highly
central nodes count more).
PageRank: Not strictly a centrality measure, but similar to eigenvector
but modeled as a random walk with a teleportation parameter
12. NetworkX: Nodes
import networkx as nx
g=nx.Graph() #A new (empty) undirected graph
g.add_node("Alan") #Add one new node
g.add_nodes_from(["Bob","Carol","Denise"])#Add three new nodes
#Nodes can have attributes
g.node["Alan"]["gender"]="M"
g.node["Bob"]["gender"]="M"
g.node["Carol"]["gender"]="F"
g.node["Denise"]["gender"]="F"
for n in g:
print("{0} has gender {1}".format(n,g.node[n]["gender"]))
13. NetworkX: Edges
#Interesting graphs have edges
g.add_edge("Alan","Bob") #Add one new edge
#Add two new edges
g.add_edges_from([["Carol","Denise"],["Carol","Bob"]])
#Edge attributes
g.edge["Alan"]["Bob"]["relationship"]="Friends"
g.edge["Carol"]["Denise"]["relationship"]="Friends"
g.edge["Carol"]["Bob"]["relationship"]="Married"
#New edge with an attribute
g.add_edges_from([["Carol","Alan",
{"relationship":"Friends"}]])
14. NetworkX: Edges
for e in g.edges_iter():
n1=e[0]
n2=e[1]
print("{0} and {1} are {2}".format(n1,n2,
g.edge[n1][n2]["relationship"]))
16. NetworkX: Visualize or save
#Save g to the file my_graph.graphml in graphml format
#prettyprint will make it nice for a human to read
nx.write_graphml(g,"my_graph.graphml",prettyprint=True)
#Layout g with the Fruchterman-Reingold force-directed
#algorithm and save the result to my_graph.png
#with_labels will label each node with its id
import matplotlib.pyplot as plt
nx.draw_spring(g,with_labels=True)
plt.savefig("my_graph.png")
plt.clf() #Clear plot
17. NetworkX: Odds and ends
#Read a graph from the file my_graph.graphml in graphml format
g=nx.read_graphml("my_graph.graphml")
#Create a (empty) directed graph
g=nx.DiGraph()
See http://networkx.github.io/documentation/latest/reference/
index.html for many more commands. Note that some commands are only
available on directed or undirected graphs.
18. Resources
Newman, M.E.J., Networks: An Introduction
Kadushin, C., Understanding Social Networks: Theories, Concepts, and
Findings
De Nooy, W., et al., Exploratory Social Network Analysis with Pajek
Shneiderman B., and Smith, M., Analyzing Social Media Networks with
NodeXL