1. Exploratory Social Network
Analysis with Pajek
Fundamentals in Social Network Analysis
by Wouter de Nooy, Andres Mrvar and Vladimir Batagelj
Slides created by Thomas Plotkowiak
26.08.2010
2. Agenda
1. Fundamentals
2. Attributes and Relations
3. Cohesion
4. Sentiments and Friendship
5. Affiliations
6. Core - Periphery
4. Fundamentals
Sociometry studies interpersonal relations. Society is not an
aggregate of individuals and the characteristics (as statisticians
assume) but a structure of interpersonal ties. Therefore, the
individual is not the basic social unit. The social atom consists
of an individual and his or her social, economic, or cultural ties.
Social atoms are linked into groups, and , ultimately, society
consists of interrelated groups.
5. Example of a Sociogram
Choices of twenty-six girls living in one dormitory at a New York state training school.
The girls were asked to choose the girls they liked best as their dining-table partners.
6. Exploratory Social Network Analysis
The main goal of social network analysis is detecting and
interpreting patterns of social ties among actors.
It consists of four parts:
1. The definition of a network
2. Network manipulation
3. Determination of structural features
4. Visual inspection
7. Network Definition
A graph is a set of vertices and a set of lines between pairs of
vertices.
A ver tex is the smallest unit in a network. In SNA it represents
an actor (girl, organization, country…)
A line is a tie between two vertices in a network. In SNA it can
be any social relation.
A loop is a special kind of line, namely, a line that connects a
vertex to itself.
8. Network Definition II
A directed lins is called an arc . Whereas an undirected line is an
arc.
edge.
A directed graph or digraph contains one or more arcs. An
undirected graph contains no arcs (all of its lines are edges
edges).
A simple directed graph contains no multiple arcs.
A simple undirected graph contains neither multiple edges nor
loops.
9. Network Definition III
A network consists of a graph and additional information on the
vertices or the lines of the graph.
10. Application
1. We use the computer program Pajek – Slovenian for spider –
to analyzed and draw social networks. (get it from
http://vlado.fmf.uni-lj.si/pub/networks/)
Number of vertices
Specific vertex and orientation
List of Arcs
15. Automatic Drawing
• Layout by Energy: Move vertices to locations that minimize
the variation in line length. ( Imagine that the lines are springs
pulling vertices together, though never too close)
• Energy Layouts:
– Kamada-Kawai (computationally expensive)
– Fruchtemann Reingold (faster)
• Draw by Hand
18. Example – The world system
In 1974,ImmanuelWallerstein introduced the concept of a
capitalist world system which came into existence in the
system,
sixteenth century. This system is characterized by a world
economy that is stratified into a core, a semiperiphery, and a
periphery. Countries owe their wealth or poverty to their
positionin the world economy. The core,Wallerstein argues,
exists because it succeeds in exploiting the periphery and, to a
lesser extent, the semiperiphery.The semiperiphery profits
from being an intermediary between the coreand the
periphery.
Which countries belong to the core, semiperiphery or
periphery?
19. The world system network
• Network contains 80 countries with attributes:
• continent
• world system position in 1980
• gross domestic product per capita in U.S. dollars in 1995
• The arcs represent imports (of metal) into one country from
another.
20. Partition
A par tition of a network is a classification or clustering of the
vertices in the network such that each vertex is assigned to
exactly one class or cluster.
24. Reduction of a Network
To extract a subnetwork from a network, select a subset of its
vertices and all lines that are only incident with the selected
vertices.
• Operations > Extract from Network (select class 6)
26. Global View
To shrink a network, replace a subset of its vertices by one new
vertex that is incident to all lines that were incident with the
vertices of the subset in the original network.
1. Operations > Shrink Network
27. Contextual View
In a contextual view, all classes are shunk except the one in
which you are particularly interested.
• Operations > Shrink Network (Don't shrink class 6)
28. Vectors and Coordinates Load & Edit
A vector assigns a numerical value to each vertex in a network.
• File > Vector > Read (.vec File)
• File> Vector > Edit
30. Vector Partition
• Vector > Make Partition > by Truncating (Abs)
• Vector > Make Partition by Intervals > First Threshold and
Step
• Vector > Make Partition by Intervals >Selected Thresholds
33. Network Analysis and Statistics
• Example: Crosstabulation of two partitions and some
measures of association between the classifications
represented by two partitions.
• Partition > Info > Cramer's , Rajski
• Cramer's V measures the statistical dependence between two
classifications.
• Rajski's indices measure the degree to which the information in one
classification is preserved in the other classification.
37. Cohesive Subgroups
Cohesive subgroups: We hypothesize that cohesive subgroups
are the basis for solidarity, shared norms, identity and
collective behavior. Perceived similarity, for instance,
membership of a social group, is expected to promote
interaction. We expect similar people to interact a lot, at least
more often than with dissimilar people. This peonomenon is
called homophily: "Birds of a feather flock together."
Birds
38. Example – Families in Haciendas (1948)
Each arc represents "frequent visits" from one family to another.
39. Density & Degree I
Density is the number of lines in a simple network, expressed as
a proportion of the maximum possible number of lines.
A complete network is a network with maximum density.
The degree of a vertex is the number of lines incident with it.
40. Density & Degree II
Two vertices are adjacent if they are connected by a line.
The indegree of a vertex is the number of arcs it receives.
The outdegree is the number of arcs it sends.
To symmetrize a directed network is to replace unilateral and
bidirectional arcs by edges.
42. Computing Degree
• Net > Transform > Arcs Edges > All
• Net > Partitions > Degree > {In, Out, All}
43. Components
A semiwalk from vertex u to vertex v is a sequence of lines such
that the end vertex of one line is the starting vertex of the next
line and the sequence starts at vertex u and end at vertex v.
A walk is a semiwalk with the additional condition that none of
its lines are an arc of which the end vertex is the arc's tail
Note that v5 v3 v4 v5 v3
is also a walk to v3
44. Paths
A semipath is a semiwalk in which no vertex in between the first
and last vertex of the semiwalk occurs more than once.
A path is a walk in which no vertex in between the first and last
vertex of the walk occurs more than once.
45. Connectedness
A network is (weakly) connected if each pair of vertices is
connected by a semipath.
A network is strongly connected if each pair of vertices is
connected by a path.
This network is not connected
because v2 is isolated.
46. Connected Components
A (weak) component is a maximal (weakly) connected
subnetwork.
A strong component is a maximal strongly connected
subnetwork.
v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component
48. Cliques and Complete Subnetworks
A clique is a maximal complete subnetwork containing three
vertices or more. (cliques can overlap)
v2,v4,v5 is not a clique
v1,v6,v5 is a clique v2,v3,v4,v5 is a clique
49. n-Clique & n-Clan
n-Clique: Is a maximal complete subgraph, in the analyzed graph,
each vertex has maximally the distance n. A Clique is a n-Clique
with n=1.
n-Clan: Ist a maximal complete subgraph, where each vertex has
maximally the distance n in the resulting graph
2-Clique
2-Clan
51. k-Plexes
k-Plex: A k-Plex is a maximal complete subgraph with gs Vertext,
in which each vertex has at least connections with gs-k vertices.
6 5
1 4
2 3
2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123
In general k-Plexes are more robust than Cliques und Clans.
53. Overview Groupconcepts
• 1-Clique, 1-Clan und 1-Plex are identical
• A n-Clan is always included in a higher order n-Clique
Component
2-Clique
2-Clan
2-Plex
Clique
54. Finding Cliques
• Example: We are looking for occurences of triads
• Nets > First Network, Second Network
• Nets > Fragment (1 in 2 ) > Find
The figure shows the hierarchy for the example of overlapping complete triads. There are five complete triads; each of the triads
is represented by a gray vertex. Each triad consits of three vertices.
55. Finding Social Circles
• Partitions > First, Second
• Partitions > Extract Second from First
We have found three social circles.
56. k-Cores
A •k-core is a maximal subnetwork in which each vertex has at
Net > Components > {Strong, Weak}
least degree k within the subnetwork.
57. k-Cores
k-cores are nested which means that a vertex in a 3-core is also
part of a 2-core but not all members of a 2-core belong to a 3-
core.
58. k-Cores Application
• K-cores help to detect cohesive subgroups by removing the
lowes k-cores from the network until the network breaks up
into relatively dense components.
• Net > Partitions > Core >{Input, Output, All}
61. Balance Theory
Franz Heider (1940): A person (P) feels uncomfortable whe he
ore she disagrees with his ore her friend(O) on a topic (X).
P feels an urge to change this imbalance. He can adjust his
opinion, change his affection for O, or convince himself that O is
not really opposed to X.
62. Signed Graphs
A signed graph is a graph in which each line carries either a
positive or a negative sign.
{O,P,X} form a cycle. All balanced cycles contain an even number
of negative lines or no negative lines at all.
63. Signed Graphs with Arcs
A cycle is a closed path.
A semicycle is a closed semipath.
A (semi-)cycle is balanced if it does not contain an uneven
number of negative arcs.
64. Balanced Networks
A signed graph is balanced if all of its (semi-)cycles are balanced.
A signed graph is balanced if it can be partitioned into two
clusters such that all positive ties are contained within the clusters
and all negative ties are situated between the clusters.
65. Clusterability = Generalized Balance
A cycle or a semicycle is clusterable if it does not contain exactly
one negative arc.
A signed graph is clusterable if it can be partitioned into clusters
such that all positive ties are contained within clusters and all
negative ties are situated between clusters.
66. Example – Community in a New England
monastery
Options > Values of Lines > Similaritiies
Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)
67. Issues on Clustering
1. An optimization may find several solutions that fit equally well.
It is up to the researcher to select one or present all.
2. There is no guarantee that there is not a better solution than
the found one, unless it is optimal.
3. Different starting options yield different results. (It is hard to
tell the exact number of clusters that will yield the lowest
error score)
4. Negative ars are often tolerated less in a cluster than positive
arcs between clusters.
70. Loading and Drawing Networks in Time
• Net > Transform > Generate in Time
• Draw > {Previous, Next}
Network consists of 3 choices, hence the bigger errors.
We see a tendency towards clusterability.
73. Example – Corporate interlocks in Scotland in
the beginning of the twentieth century (1904-5)
A fragment of the Scottish directorates network. Companies are classified according t
oil & mining, railway, engineering...
Directors (grey) and Firms (black)
74. Two-Mode and One-Mode Networks
In a one mode network each vertex can be related to each
network,
other vertex.
In a two-mode network, vertices are divided into two sets and
two-
vertices can only be related to vertices in the other set.
• The degree of a firm specifies the number of its multiple
directors, also known as size of an event.
• The degree of a director equals the number of boards he sits
on, also known as the rate of par ticipation of an actor.
• Also note that some measures must be computed
two-
differently for two-node networks.
75. Transforming two-mode networks into one-
mode networks
Whenever two firms share a director in the two-mode network, there is a line between
them in the one-mode network.
76. Transforming two-mode networks into one-
mode networks II
The events of the two-mode networks are represented by lines and loops in the one-
mode network of actors. J.S.T ait meets W. Sanderson in board meetings of two
companies.
77. Transforming two-mode networks into one-
mode networks III
• Net > Transform > 2-Mode to 1-Mode {Rows,Columns}
• Net > Transform > 2-Mode to 1-Mode > Include Loops,
Multiple Lines
• Info > Network > Line Values
78. m-Slices
An m-slice is a maximal subnetwork containing the lines with a
multiplicity equal to or greater than m and the vertices incident
with these lines.
83. 6 – Center and Periphery
//Slides need to be translated
//Input from book 2 needed.
84. Example – Communication ties within a sawmill
H – Hispanic
E – English
M- Mill
P – Planer section
Y - Yard
Vertex labels indicate the ethnicity and the type of work of each employee, for example
HP-10 is an Hispanic (H) working in the planer section (P)
85. Distance
• The larger the number of sources accessible to a person, the
easier it is to obtain information. Social ties constitute a social
capital that may be used to mobilize social resources.
• The simples indicator of centrality is the number of its
neighbors (degree in a simple undirected network)
86. Degree centrality I
The degree centrality of a vertex is its degree.
Degree centralization of a network is the variation in the
degrees of vertices divided by the maximum degree variation
which is possible in a networks of the same size.
88. Closeness Centrality
1. Closeness centrality : Eine Person ist dann zentral, wenn sie
bezüglich der Netzwerkrelation sehr nah bei allen anderen
Liegt. Eine solche zentrale Lage steigert die Effizienz, mit der
ein Akteur im Netzwerk agieren kann. Ein solcher Akteur kann
Informationen schnell empfangen und verbreiten.
g −1
C c ( ni ) = g
∑ d (n , n
j =1
i j )
89. Closeness Centrality
1 4 6 10
3 8 9
2 5 7 11
ni nj d
n Cc
3 1 1
1 0,27
3 2 1
11 − 1 2 0,29
C c ( n3 ) = = 0, 4 3
3 4 1
3 0,40
3 5 1
3 6 2
23 4 0,45
3 7 2
5 0,45
3 8 3 6 0,45
3 9 4 Achtung: Hier wurde können nur 7 0,45
3 10 5 symmetrische Verbindungen 8 0,45
3 11 5 betrachtet werden und nur 9 0,37
23 Netze.
verbundene Netze 10 0,27
90. Zentralisierung
1. Zentralisierung =! Zentralität
2. Zentralisierung ist eine strukturelle Eigenschaft der Gruppe
und nicht der relationalen Eigenschaft einzelner Akteure.
3. Index für Zentralisierung: Man berechnet die Differenzen
zwischen der Zentralität des zentralsten Akteurs und der
Zentralität aller Anderen. Man summiert dann diese diff. über
alle anderen Akteure.
g
∑ C (n*) − C (n ) i
C= i =1
g −1
91. Zentralisierung
1. Dieser weißt nur dann einen hohen Wert auf wenn genau ein
Akteur zentral ist, und nicht mehrere Akteure ein Zentrum
bilden.
2. Nur der Vergleich von Daten einer Gruppe zwischen
mehreren Zeitpunkten erlaubt sinnvoll interpretierbare
Aussagen.
92. Betweenness centrality
1. Betweenness Centrality: Personen (Cutpoints), die zwei die
ansonsten unverbundene Teilpopulationen miteinander
verbinden, sind Akteure mit einer hohen betweenness
centrality. (Annahme: man nutzt nur die kürzesten Verbindungen zur Kommunikation)
2. Indem man für jedes Paar von Akteuren j, k != i unter allen kürzesten Pfaden, die j un k verbinden , den Anteil von Pfaden
bestimmt die über Akteur i laufen. Anschließens müssen diese Anteile über alle Paare j, k != i gemittelt werden.
∑
j≠k
g jk ( n i )g jk
i ≠ j ,k
C b ( ni ) =
( g − 1)( g − 2 )
93. Betweenness centrality
1. Achtung: Es ist möglich das einige Akteure zwar nicht
erreichbar sind, selbst aber die anderen von sich aus erreichen
können.
1 4 6 10
3 8 9
2 5 7 11
1 2 3 4 5 6 7 8 9 10 11
0 0 0,37 0,22 0,22 0,22 0,22 0,48 0,37 0 0
94. Degree Prestige
1. Prestige lässt sich sinnvoll messen als relativer Innengrad dieses
Akteurs (degree prestige) [Wasserman und K. Faust 1994:
202]
Pd (n j ) = x+ j / ( g − 1)
n j Akteur j xij Matrix-Eintrag Zeile i, Spalte j
xij Anzahl Knoten im Netzwerk x+ j = ∑ xij
i
95. Prestige Beispiel
1 4 6 10
3 8 9
2 5 7 11
Prestige von Knoten 3: Pd (n3 ) = 2+3 / (11 − 1) =0, 2
Generell: Prestige ist unabhängig von der Gruppengröße und sein Wert liegt zwischen
0 und 1 (Stern).