Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Sylva workshop.gt that camp.2012
1. Social Network Analysis with Sylva
Social Network Analysis with Sylva
Juan Luis Suárez & Anabel Quan-Haase
Western University
2. Overview of Workshop
• General overview of the social network
approach
• Key terminology
• Uniqueness of collecting and analyzing
social network data
• Entering data into Sylva
• Importing/exporting data into Sylva
• Example I:
• Example II:
• Understanding limitations and problems
• Future Work and Gephi.org
3. What is SNA?
Social network analysis is focused on uncovering
the patterning of people’s interaction.…Network
analysts believe that how an individual lives
depends in large part on how that individual is
tied into the larger web of social connections.
Many believe, moreover, that the success or
failure of societies and organizations often
depends on the patterning of their internal
structure (Freeman, 1998, November 11).
4.
5. What is Unique about SNA?
Social science research and theory tends to
focus on social actors’:
•attributes
•attitudes
•opinions
•behavior
Focus is on individual level of analysis,
less on network-structural level.
7. Key Terminology
• 1. Social structure
• 2. Social network
• 3. Nodes
• 4. Linkages/relations
• 5. Additional terms of relevance:
– Nodes & edges
– Directed graphs vs. undirected graphs
– Ego
– Alter
– Homophily
8. 1. Social Structure
• Sociological inquiry consists of understanding
the constraining influence of social structure
on social action
• BUT; how do we study social structure?
Attributes Networks
9. 2. Social Network
Social Actors
Ties
Figure 2: Social Structure as Social Network
10. 3. Nodes
• The actors considered in a social network are
exclusively social (alternatively referred to as
agents, nodes, or social entities).
• These include individuals, organizations,
institutions, nations, or groups (Wasserman &
Faust, 1994).
11. Blurred Nodes
• Social actors can therefore be distinguished
from non-social actors – e.g., neurons
comprising a neural network.
• On occasion, the distinction between a social
and a non-social actor is not absolute. For
example, computer networks represent a
hybrid type of network.
12. Node Attributes
• Every single node can have one or more
attributes.
• These attributes describe the nodes and allow
researchers to conduct complex queries of the
database.
• Node attributes can include the time of
publication of a book, its length, the number
of authors, etc.
13. One-mode vs. Two-mode
• Most social network analysis methods allow only one type
of social actor (for instance, individuals or corporations) in
their analysis; these are referred to as one-mode networks
(Wasserman & Faust, 1994).
• However, methods exist which allow two different types of
social actors in their analysis; these are referred to as two-
mode networks. For instance, a study may simultaneously
analyze corporations and their directors.
• Two-mode networks may also include social actors from
distinct networks, for example, a network comprised of
adults and a network comprised of children.
• Two-mode networks allow for comparison between
different types and sets of social actors.
14. 4. Relationships
• Ties are links that connect social actors, and are
the main focus of social network analysis. Ties
are seen as “channels for transfer or “flow” of
resources (either material or nonmaterial)”
(Wasserman & Faust, 1994, p. 4).
15. Simple Relationships
• Naturally occurring ties among social actors are inherently
complex and consist of numerous different interaction
activities.
• However, unlike ethnographers network analysts do not
focus on the complexity of interactions among individuals
(Burt, 1983).
• Instead, social network analysts focus more on the pattern
of relations amongst individuals and to do so simplify the
inherent complexity of social relationships by categorizing
interactions into different broad types. The types can be
manifold. For example, a pair of social actors may have
friendship, working, cooperation, or citation ties.
16. 5. Additional Terms
• Directed graphs vs. undirected graphs
• Ego
• Alter
• Homophily
17. Types of Network Analysis
• Ego-centered/Socio-centered Social Networks
• Community-centered social networks
19. Actor-Level Centrality
• Actor level degree centrality: Degree
centrality measures the extent to which an
actor is linked to all of the other actors in the
network. Three different measures can be
distinguished: nodal degree, indegree, and
outdegree.
• Actor level closeness centrality: Closeness
measures the distance that an actor has to all
of the other actors in the network.
20. • Actor level betweenness centrality:
Betweenness measures the extent to which an
actor lies between two other actors and thus
facilitates/controls the flow of information.
22. Network Level Centralization
• Cohesion Distance: measures the degree of separation
between actors in a network. It indicates how many
other people are between two actors - that is, actors
between an actor and the actor this person needs to
talk to.
• Network Centralization: measures the number of
actors that are connected to each actor in the network.
The more connections among actors, the greater the
network centrality.
• Density: measures the degree of connection that exists
in a network. The more actors talk to each other, the
higher the density.
23. Measures of Centrality and Assumptions
Measure Level Data Type Symmetry/Asymmetry
Nodal Degree Centrality Actor Dichotomized (>5) Symmetric (Maximum)
Asymmetric
Indegree Centrality Actor Valued
Outdegree Centrality Actor Valued Asymmetric
Closeness Centrality Actor Dichotomized (>5) Symmetric (Maximum)
Betweenness Centrality Actor Dichotomized (>5) Symmetric (Maximum)
Network Cohesion Network Valued Asymmetric
Network Centrality Network Dichotomized (>5) Asymmetric
Network Density Network Dichotomized (>5) Symmetric (Maximum)
24. Uniqueness of Collecting and Analyzing
Social Network Data
• Relational data
• Boundary specification and sampling
• Interdependence of data points
• Query search
• Complexity of data collection
– Manually-harvested
– Data set
– Behavioral
– Self-report
25. Internet Resources of
Social Network Analysis
• Center for the Study of Group Processes
http://lime.weeg.uiowa.edu/~grpproc/
• INSNA International Network of Social Network Analysis
http://www.heinz.cmu.edu/project/INSNA/
• Barry Wellman’s Homepage
http://www.chass.utoronto.ca/~wellman/index.html
• CulturePlex
• http://cultureplex.ca/
• Gephi.org
• NodeXL
http://nodexl.codeplex.com/
25
26.
27. Limitations of
Social Network Analysis
• Boundary specification
• Data source
• Definition of social actors
• No distinct method
27
28. What is Sylva?
• A database system management system
• Graph databases
• NoSQL database
• Built on top of Neo4J
29. Whose Needs Does Sylva Serve?
• Sylva requires no programming skills
• On-the-go modification of the schema
• Storing data in a graph form
• Work from the nodes or from the edges
• Collaborative platform
• Easy-to-use interface thanks to forms,
autocomplete, …
• Multiple visualizations
• Search and Query Engines
35. Creating a Schema on Sylva
(manually)
• New Type of Node (person)
• (2nd) New Type of Node (work)
• Relation
– Incoming or outgoing
– Allowed relationships
• (3rd) New Type of Node (institution)
36.
37. Properties of Objects
• Data objects have properties
• A property is an attribute that defines certain
operations than can be performed on the
object
• We need properties to enter our data
48. Cuba’s Prominence: Modeling The
Latin American Afro in Topic Maps
• Objectives:
– locating the various nodes of bibliographic
production associated with the generation of an
image of the Latin-American Afro
– evaluating the causes that make certain nodes,
i.e., Cuba and various Cuban intellectuals, emerge
as key nodes in the network of production of Afro-
Latin American images
49. Cuba’s Prominence
• Methodology:
– a combination of traditional close-reading of texts
(extraction of nodes and relations) with
– graph analysis of the emerging network with Page
Rank algorithm
50.
51. Measurements (Gephi)
• Closeness centrality: expresses how well connected an individual is to the whole
network. A high value in this measurement indicates better connectivity and thus
expresses the importance of the individual with respect to other elements in the
network.
• Betweenness centrality: indicates how important the individual is as a connection
and transference point within the network. A high value indicates that it is a topic
that is passed through in the communications (relationships) between the other
topics on the map.
• Modularity: is a coefficient that enables us to group together those nodes which
share connections and zones on the network, so that it divides the map into zones
with high relationships between them.
• Influence between nodes: is an analysis which we shall carry out in the second
part of the article. It is based on the Page Ranking algorithm. This is basic
algorithm on which the Google search engine was originally based for calculating
the importance of the pages that it comes up with after a search, and which it
used to order the results. Its basic idea is that a given node within a network
becomes important based on the importance of the nodes that relate with it or
that point to it.
55. Sustaining a Global Community
• Henrich et al. [1] have proven that the existence of norms that
sustain fairness in exchanges among strangers are connected with
the diffusion of institutions such as market integration and the
participation in world religions.
• Their research confirms the hypothesis that modern world religion
may have contributed to the sustainability of large- scale societies
and large-scale interactions and we propose that art is another
institution that contributes to the arising and sustainability of large-
scale societies.
• We use the case of the formation of an artistic network of
paintings, schools, themes, genres, and artists whose development
goes along with the expansion and colonization of the Hispanic
Monarchy across America to show that this artistic network has a
presence in all political territories encompassing most ethnicities
and religions of indigenous origin.
56. Methodology
• The data set comprising the paintings from the Baroque period are
organized and stored in a PostgreSQL web based database.
• The data includes more than 100,000 total topics (11,443 of them
are artworks). A distinctive feature of the information is that it is
organized around both text fields and ad-hoc descriptors that follow
the model of a formal ontology.
• For our study we have decided to model the data in one of the
possible networks, a network created from common descriptors as
weighted edges and artworks as nodes.
• Some pruning methods had to be applied in order to overcome
some of the shortcomings resulting from the millions of edges and
the too many relational joins. We also split the dataset in 12
sections, each covering a 25 year-period, from 1550 to 1850 [4].
58. Research Questions
• Our research addresses the issue of the
sustainability of communities through the
existence of a flow of shared information.
• This question is of the utmost importance to
understand the formation and dynamics of
cultural groups and cultural areas.
• As important as the latter is the study of the
spatial and temporal dimensions of any given
political and cultural community as this will shed
light on the cultural processes resulting from
previous and currents waves of globalization
59. Baroque Paintings in the Hispanic
World: A Network.
• The graph shows, for the first two periods of
our study, the growth of the saints-related
paintings (red cluster) as compared to the
decrease of the cluster with virgins (blue).
Portraits’ size (brown cluster) remains more
or less the same, but they get more
connected to saints’.
• FOTO
60. Clustering & Visualizations: Raw Graphs
1550-1575 1575-1600 1600-1625 1625-1650 1650-1675 v 1675-1700 v
v
v
1700-1725 v 1725-1750 v 1750-1775 v 1775-1800 v 1800-1825 v 1825-1850 v
v
http://zoom.it/vJVw#full
61. Further Work with Sylva
• Visualization of Schema
• Two Visualizations of Data:
– Node-centered
– Community centered
• Query System:
– Pattern-matching
– Traversals
• Need for multi-disciplinary teams
• Complexity of analysis