Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI

Big Data Analysis
Midterm Assignment
2017
USE CASES SOLVED USING NETWORK ANALYSIS
TECHNIQUES IN GEPHI.
KIRAN BABU MAMPILLY 2014 2036| MILI GUPTA 20142118| RUCHIKA SHARMA 20142019

1
Executive Summary
Through this report, we are trying to understand the different scenarios in which data
visualization techniques, particularly using the Gephi Software can be used to understand the
problem in hand better.
The fundamental background states that to undertake any research or analysis we need to
ascertain the problem first. Then the research design and analysis can be undertaken. During
the analysis, data visualization helps in understanding the issue more coherently. Additionally,
these problems can be pertaining to any domain- Marketing, Finance, Law, HR or liberal arts,
etc. and the data visualization software will be helpful.
This report helped us in appreciating the presence of social analytics tools which helps people
in various domains to understand the nature of their network structures and relationship with
others in the field. We have used theories such as Centrality (betweenness, in and out degree,
eigen vector, etc.), reciprocity, clustering coefficient etc. to understand the cases in had better.
In this report, we have mainly focused on literature review of already used use-cases in the
visualization task. We have worked on use cases pertaining to varied use of social media site
Twitter in the political, cultural and business context; use by drug marketers and musicians
among others.

2
Table of Content
Executive Summary.......................................................................................................................1
Table of Content............................................................................................................................2
Use Cases........................................................................................................................................3
1. Microblogging of Twitter users in 2010 Swedish Election campaign (Larsson &
Moe, 2012)...........................................................................................................................3
2. Who is on your sofa? TV audience communities and second screening social
networks (Doughty, 2012)..................................................................................................5
3. Before and After Series C funding – a network analysis of Domo (Koehler, 2014) 7
5. Using Twitter Data for Cruise Tourism Marketing and Research .............................11
6. Exploring characteristics of video consuming behavior in different social media
using K-pop videos (Yong Hwan Kim, 2014)...................................................................14
7. Mapping dynamic conversation networks on Twitter .............................................17
8. Graphical visualization of analogous relationships of Raags (Kelkar,2015)........20
9. Untangling the Social Network of Musicians (Focht, 2017)...................................22
10. Social Networks and Text Messaging in Public Health (Beck and Armbruster,
2014) ....................................................................................................................................25
References....................................................................................................................................27

3
Use Cases
1. Microblogging of Twitter users in 2010 Swedish Election
campaign (Larsson & Moe, 2012)
1. Introduction
Twitter is one of the most popular and well known social media site used for short posts and
statuses. Ever since the successful use of internet during the 2008 US presidential elections, it
is important to analyse the importance of social media in garnering opportunities for online
campaigning and deliberation. (Larsson & Moe, 2012) Microblogging refers to making short
and/ or frequent posts. (Elsevier Early Career Resources, 2012) Aim of Study- to study
participation in political debate on Twitter, since new media is used very often to
communicate and comment on political issues and support.
2. Theoretical foundation (if any)
In graph theory, centrality indicators identify the most important vertices within a graph.
These can be based on closeness, degree, etc. In this case we focused on the following-
➢ In degree- number of edges going into a node is known as in degree of - -the
corresponding node.
➢ Out degree- and number of edges coming out of a graph is known as outdegree of the
corresponding node.
➢ Reciprocity- two-way relationship between two nodes (Andrei Brodera, 2000)
3. Data collection
Data was collected via secondary sources. This report only entails to the literature review of a
use case already established on academic journals and other web sources. Data was collected
from one month prior to the election onwards. YourTwapperKeeper was used to store the
tweets.1
4. Methodology
Gephi Data Visualization is used on data which have nodes and can be analysed. This study
helps in analysing a specific subset of that “twitter” online sphere, focusing on one set of use,
i.e. political communication. The paper (Larsson & Moe, 2012) also attempts to establish
interaction between users, (in terms of volume and forms of use) and on who these users are
1 https://github.com/540co/yourTwapperKeeper

4
and how they relate (or not) to each other. Force Atlas layout is used on Gephi- It is scaled
for small to medium-size graphs, and is adapted to qualitative interpretation of graphs.
5. Findings
6. Conclusion
This usage of network analysis helps us in identifying the importance of people who mainly
indulge in retweeting and other who are retweeted the most. Using the theoretical background
of centrality in networks, we are able to analyse the key influencers on social media and thus
The Figure 1 features many nodes, each
representing a particular Twitter user. The
colour of the nodes represents the
outdegree of each user – the darker the
color, the more @ messages the specific
user sent. Node size is dependent on
indegree – the larger the node, the more
messages were directed towards the
specific user. Straight lines between nodes
specify unidirectional communication,
while curved lines indicate reciprocity in
exchanges of messages. (Larsson & Moe,
2012)
Figure 2 provides a network map of RT
(retweet) activity, identifying the high-end
users in this regard.
Each node in Figure 3 represents a user.
The darker the colour of the node, the
more active the user is at retweeting the
messages of others.
Users who are often retweeted are
identified by larger node sizes.
(Larsson & Moe, 2012)

5
we know who to target during a political campaign. The most retweeted ones would tend to
be the opinion leaders and would exert most influence.
2. Who is on your sofa? TV audience communities and second
screening social networks (Doughty, 2012)
1. Introduction
Viewing of second or third screens along with Television is becoming popular mong
audiences as it is affordable and pervasive. This use case tries to decipher and explore the
message activity while using the Twitter blogging service. The network of viewers connected
reveal the different characteristics they possess and their motivations of using a second screen.
2. Theoretical foundation
➢ The measures of centrality play a major role in this use case as well. In degree- number of
edges going into a node is known as in degree of the corresponding node. This helps us in
ascertaining groups which are connected greatly or are isolated.
➢ Clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster
together. (P.W. Holland, 1971) It measures the ratio of number of edges between a node
and its immediate neighbourhood and the maximum number of edges which could exist.
➢ Reciprocity explains the two-way relationship between two nodes. (Andrei Brodera, 2000)
3. Data collection
The data was collected by the analysts during the timing of screening of two popular shows-
Strictly coming dancing (prime time, celebrity oriented) and BBC Question Time (current
affairs show) from Twitter using Twitter stream API2
. The collect data either in a sample
mode or a filter mode. The limitation of this method is that tweet with required hashtags are
only sourced.
4. Methodology
The datasets were imported to the Gephi software. The nodes and edges were connected to
each other based on their mention in tweets for others or themselves (resulting in self-loop).
The OpenOrd network visualization algorithm was used to form the clusters and community
structures within a network. “In-degree” of each node was established to better visualize the
graph.
2 https://developer.twitter.com/en/docs

6
5. Findings
Strictly coming Dancing
The in-degree was found out and the larger
nodes display users with most mentions or
retweets.
There also exist certain isolated nodes
along with nodes with greater in degree.
#scd contains 3903 edges between 2895
nodes, out of which 443 edges and 695
nodes are isolated which means they use
the hashtag of the show but do not mention
anyone.
The reciprocity level in this case is very
less since it involves celebrities with fan
following. (Doughty, 2012)
BBC Question Time
Similarly using the in-degree, visualization
represents the larger nodes are highly
mentioned or retweeted.
#bbcqt contains 3769 edges between 3955
nodes, out of which 2640 nodes and 1913
edges are in isolated groupings.
However, in this case isolated grouping are
more tightly connected and networked
“core” of each other. (Doughty, 2012)
6. Conclusion
The two TV shows chosen, display distinct characteristics.
1. Prime-time celebrity oriented dance show- connecting with celebrities and stars of the
show using mentions and retweets. Reciprocation was low in these.
2. Late night current affairs news show- reflects viewers engaging in conversation and
involves higher rates of reciprocation

7
Both the shows display different behaviours, however the concern for marketers is that
people are resorting to a second screen to experience deeper communal viewing with
friends or like-minded people. Therefore, while advertising they must focus on that.
3. Before and After Series C funding – a network analysis of
Domo (Koehler, 2014)
1. Introduction
This case refers to the company Domo3
which is primarily a Big Data company and works in
the field of network analysis of Venture Capital connections. The analysis around the
company was done when it received above average funding in the form Series C funding
even after being a relatively young company. To be precise, Domo received $125M from
Greylock, Fidelity, Morgan Stanley and Salesforce among others.
The analysis was carried out by a data scientist to understand the effect in the network
structure after the announcement of this news. Apparently, Domo came out to be one of the
best-connected nodes and through the following analysis we will find out why.
➢ Series C funding4
could be used to buy another company. As the operation gets less risky,
more investors come to play. In Series C, groups such as hedge funds, investment banks,
private equity firms and big secondary market groups accompany the before-mentioned
investors
➢ Betweenness Centrality- It is equal to the number of shortest paths from all vertices to all
others that pass through that node. It indicates the node’s centrality in a network. (Andrei
Brodera, 2000)
3. Data collection
This analysis focuses on a literature review of the established use-case for Domo’s increasing
centrality. The data collection was carried out by the scientist to get values for the
connections between various Big Data companies are connected to each other and also to
other Series C funders.
4. Methodology
Gephi software was used to plot the centrality of the various Data Science start-ups and
companies before and after the news of financing for Domo were announced. This helped in
3 Domo
4 SeriesCFunding

8
understanding how firms become more popular if they receive substantial amount of finance
from big players unexpectedly.
5. Findings
Before Series C Funding
The dark nodes are the
more connected ones
The betweenness
centrality is constant
between many popular
nodes on average.
(Koehler, 2014)
After Series C Funding
The dark nodes are the
more connected ones.
However, Domo’s
(green node) huge
Betweenness Centrality
almost dwarfs the other
nodes in the network.
(Koehler, 2014)
6. Conclusion
The new funding round now only increases Domo’s centrality but also MongoDB’s because
of the shared investors Salesforce, T. Rowe Price and Fidelity Investments. (Koehler, 2014) It
can be therefore concluded that if a young firm invests relevant amount of time in its’
marketing activities in terms of pitching to funders, it can grow to become the more sought
after and central company in the entire network. In this case, Domo became one of the
highest central Big Data company due to the news of the finance he received from funders.
This resulted in a positive sign for its’ future as a company.

9
4. Drug Marketers Use Social Network Diagrams to Help Locate
Influential Doctors ((unknown), Institutuion - Activate Networks, 2013)
1. Introduction
This case is concerned with how an effective use of social network diagrams obtained
through data visualization can help in pinpointing effective influencers in a pharmaceutical
market. Like all other industries marketing plays a crucial role in the pharmaceutical industry
and to make a cogent marketing strategy and to ensure its success often the help of network
analysis coupled with data visualization is used.
Given in this case is an example of how the consulting firm ‘Activate Networks’ create social
network diagrams to identify and thereby assist pharmaceutical companies to better target
physicians with the largest reach. ((unknown), Institutuion - Activate Networks, 2013). The
network analysis done by the consulting firm gives a clear picture on which physicians to
target for maximum reach.
2. Data collection (source of data)
The data was collected by the consultancy firm ‘Activate Networks’. Although the exact
source of the data is not mentioned in the use case, it can be assumed that the data was taken
from a medical database. The data used is that of Doctors based in a Northeastern U.S.
community who have prescribed, or are potential customers for, an oncology drug.
3. Methodology
After gaining the data, the data was
filtered so that it only contains the
data of Doctors based in a
Northeastern U.S. community who
have prescribed, or are potential
customers for, an oncology drug.
Once the data is cleaned, using the
help of Data Visualization
software’s like Gephi a network
diagram was constructed as shown
below;
Source: ((unknown), Institutuion -
Activate Networks, 2013)

10
The following factors were kept in mind while developing the network diagrams.
➢ Each circle represents one doctor.
➢ The dots in orange refer to relevant specialists. Lite blue dots refer to doctors who have
not prescribed the drugs and dark blue dots refers to doctors who have prescribed the drug.
➢ The size of the dot depends on the prescribing volume for an oncology drug by the doctor.
➢ The edges represent the connections between doctors (can be by common patients, it
assumes that if one doctor prescribes the new drug another doctor will come to know
about it from a common patient and thus do the same)
4. Findings
A number of findings can be inferred from this
network data visualization for a marketer. Some
of the findings found by marketer in this example
is given below.
For a cluster like the one on the left there are
many interrelated physicians with all similar
volumes of prescription. In this case the marketer
need not prioritize all the doctors instead select a
few. It saves cost while at the same time
maintaining the same level of reach.
Certain doctors in the diagram seemed to form key links between two different clusters.
Marketers identify these ‘bridges’ as a good target as it offers entry into both the clusters.
Marketers identify the node with higher ‘connectedness’ as targeting them can lead to better
marketing reach among the physicians group.
5. Conclusion
Network analysis and its visualization can greatly help the marketers in the pharmaceutical
industry to better target their marketing promotions and interventions. By targeting the right
people and thereby maximizing the reach attained, pharmaceutical organizations can better
manage their marketing expenditure and can do aggressive cost cutting in marketing without
compromising on the output. We believe that the future belongs to those companies whose
marketing department relies on such tools as discussed in this case. Social Network analysis
and its diagrams are the new competitive advantage for pharmaceutical companies of the
future in terms of channelizing their marketing efforts.

11
5. Using Twitter Data for Cruise Tourism Marketing and Research
(Seunghyun “Brian” Park, 2015)
1. Introduction
This research article stands as a great example on how network analysis can be used to
strengthen the marketing efforts in the Cruise tourism entertainment Industry. Very limited
research has been done in the field of cruise tourism and its marketing. The case exemplifies
the need of businesses to identify the large amount of data created by Facebook and twitter
(big data) and how these data can be properly analyzed to identify problems as well as
suggest solutions.
Twitter is selected in this research as it has drawn the attention of researchers as it helps
address different topics like social networking, information diffusion, customer engagement,
and product promotion.
Network analysis and mapping is not the sole tool used in this research but is one which is
used with many other to come at a cohesive conclusion. Network analysis in this case uses
such metrics as Eigenvector centrality, closeness centrality, and betweenness centrality which
can be used to evaluate the importance of vertices (or users) in the network (Seunghyun
“Brian” Park, 2015). The research also uses cluster analysis to identify the subgroups inside
the network.
2. Data collection
The Data collection method used for the research was done with the help of online platforms
as well as Twitter API. Twitter data about cruise tours between May 2 and June 5 2014 were
collected. ScraperWiki, an online platform, was used to collect and archive tweets during this
period. ‘Cruise’ related hashtags were used to extract tweets and meta data from Twitter API.
The data extracted was collected and combined and subject to further cleaning. For example
some hashtags referred to a different cruise (car cruise control, Tom cruise etc.) these had to
be removed from the data set. Also only English tweets were retained for the study which
filtered out about 4% from the initial data.
3. Methodology
The research used three different techniques for analysing the data collected (content analysis)
word frequency, content analysis, and network analysis. For the word frequency analysis,
sentence tokenization was first performed to separate each tweet into discrete words. Other
NLP techniques (e.g. stopping words) were also applied to the data set.

12
For descriptive analysis, RapidMiner, an open source data analytics tool, was used to
disassemble and count the frequency of all texts and hashtags in the tweets.
The third part of the analysis consisted of the network analysis and which was carried out
through Gephi software. Given below is the network diagram attained from Gephi after
visually representing the collected data in Gephi
Source: (Seunghyun “Brian” Park, 2015). The diagram shows the visualization of the entire
data along with the major clusters.
4. Findings
(For focusing on the Network analysis component, only the findings pertaining to the
network analysis part is described)
The network analysis lead to the surprising identification of different clusters maily four of
them. They are; celebrity centered, Blogger centered, Cruise Ship company centered, Travel
Agency centered. Further the concepts of Eigenvector centrality, closeness centrality, and
betweenness centrality were used to identify the influencers or the most prominent entities
within the clusters. The findings are as follows;

13
➢ @TheCarlosPena with 2.5 million users showed the highest centrality in the
respective cluster. The celebrity is relavent because she posts pictures in a cruise
extensively and also uses hashtags extensively
➢ (@CruiseMiss) in theUnited Kingdom (UK) who shares cruise tourism- related
information and pictures is the one with highest centrality in bloggers group. Bloggers
had less closeness centrality compared to celebrities but more content.
➢ @royalcarribean is the most active account although it has very few tweets. This is
because of its high eigenvector centrality. It is a prominent player among cruise ship
company centered.
➢ @WaldoWorldTrvl has the highest centrality in travel agency centered cluster.
5. Conclusion
The research concludes the presence of number of sub groups in twitter who resort to
information from different sources to get knowledge about cruise tourism. It suggests that
marketers should acknowledge the phenomenon of ‘celebrity practice’ in twitter. Twitter has
become a place where celebrities develop in-depth relation with their fan following and
targeting these celebrities can make a difference. Although the blogger tweets spread less
faster compared to celebrity tweets (less *closeness centrality), bloggers were posting more
cruise content than celebrities and hence shouldn’t be avoided while deciding on marketing
interventions. Maintaining strong ties with the identified bloggers can significantly contribute
to brand recognition of the cruises. The research also concludes that Twitter, a microblogging
platform, is and will continue to be important for the tourism and hospitality industry and
related research even in the coming years.

14
6. Exploring characteristics of video consuming behavior in
different social media using K-pop videos (Yong Hwan Kim, 2014)
1. Introduction:
This research focuses on analysing how people consume web videos (cultural web videos)
differently through different medium. The research uses network analysis and data
visualization extensively to reach at findings.
The genre of video selected for analysis in this case is K-Pop videos (Korean Popular song
videos) which are the music videos originating from South Korea. These music videos are
gaining immense popularity and fame. It wasn’t long back that ‘Gagnam’ style was at the
pinnacle of fame. Till now it remains one of the most viewed YouTube videos ever (2.9
Billion views). The primary objective of the study is to investigate diverse patterns in
consuming cultural content (K-Pop videos in this case) depending on different social media.
An important theoretical foundation used in this case is that of Gratification theory. The use
and gratification theory is a useful method to interpret why people use media. The theory
helps in finding the relationship between the user and the media. According to the theory,
users are active audiences and they use media to satisfy their motivation and select media
depending on the motives they want to fulfil.
The data was collected with the help of YouTube API and got the relevant tweets by Web
Crawling. To get the seed data from YouTube the researchers used queries which include k-
pop, kpop, Korean pop, SM Entertainment, YG Entertainment and JYP Entertainment (the
last three are famous k-pop content developers. From the seed data the following variables of
a video were also collected which include; video ID (URL), title, category and uploader
account name in the YouTube database
The twitter tweets were identified by Web Crawling in which tweets with the URL of the
video (identified from YouTube API) were extracted. There were also other three methods
used to collect co-link videos in twitter which helped in ease in data collection.
4. Methodology
The whole research was divided into four stages for ease of execution. They are-
➢ Collection of Seed Data

15
➢ Collection of related and co-linked videos
➢ Pair and Network generation
The suggestion list in YouTube was used to identify which video from the collected
data pops up when another video from the data is viewed. Same was done for twitter,
the ‘co-linked’ pair implies an implicit linkage between two videos mentioned in
tweets of the same Twitter account. A ‘co-linked’ pair between the video X and Y is
produced when they are both exported to tweets of a specific Twitter account.
➢ Multilateral Analysis Approaches
In network Analysis part of this stage, the objective was to first identify the core
nodes since this allows a meaningful conclusion to be drawn from them. To identify
the core nodes, the researchers adopted four centrality measures: degree centrality,
weighted degree centrality, betweenness centrality and PageRank. The degree
centrality of a node is defined as the number of edges that are adjacent to that node.
Weighted degree centrality is a variation of degree centrality, calculated by summing
the frequency of every node pair for a given node. Betweenness centrality denotes the
number of shortest paths passing through a particular node. PageRank measures the
importance of a node based on the sum of the ranks of the number of its incoming
links. (Yong Hwan Kim, 2014)
5. Findings
The diagram is the network
visualization of the YouTube
network. The modularity algorithm
was used in this case. The
modularity algorithm identifies
groups of nodes that are more
similar to each other than to other
groups and optimizes the detection
of the community structure in a
network.

16
Analysis of the network community shows the structure of sub communities in entire network.
A large number of communities in network implies that users are associated with a variety of
subject or various videos whereas a small number of communities shows that users focus on a
particular topic or interest. Node labels such as ‘j7_ISP8Vc3o’ and ‘9bZkp7q1910’ are IDs of
YouTube videos and the size of each node becomes larger by its score of degree centrality.
The edge denotes the occurrence frequency of each pair.
In YouTube visualization two main communities are identified in the YouTube network.
Community 1 has more nodes than community 2, and the videos with high degrees centrality
are found only in community 1. All videos that have high degree centrality in community 1
are related to singers from SM, JYP, YG entertainment companies. Consequently, important
nodes of the YouTube network are found to be the videos of these three major entertainment
companies.
This the network
visualization of the twitter
data. However, the Twitter
network has three major
communities and two minor
communities. The videos
with high degree centrality
are spread across
communities 1 and 2, and
some of these videos are
related to singers who
belong to other small or
medium-sized entertainment
companies.
6. Conclusion
The study concludes that the videos in both twitter and Youtube are consumed by users in a
different manner, these are considered as clues for tracking users’ socio-cultural behaviours.
It was proved through this research analysis that people in Twitter network consume more
diverse videos than people in the YouTube network (Youtube’s recommendation algorithm
played a part in this). Findings like these can have significant use to marketers as tracking the
socio cultural behaviors of customers in different online mediums can help in choosing the

17
right channel to use for marketing communications. Gephi once again lets the modern
marketer to compute this heavy data driven analysis.
7. Mapping dynamic conversation networks on Twitter (Bruns, 2011)
1. Introduction
The research throws light on the need of mapping dynamic conversations in twitter and how
it can be done. Twitter is a social medium which restricts posts to 140 characters. This
limitation put forward by Twitter is almost circumvented by the users by the effective use of
hashtags ‘#’ and even @replies. This has helped to sort content in Twitter. A user is able to
contribute to the discussion and with other people even if he/she is not following them. This
extensive use of hashtags and @replies have aroused the curiosity of researchers to analyze
the happenings in twitter during any public issue.
This paper through the analysis of millions of tweet which happened on June 2010 against
Australian Prime Minister Kevin Rudd and his resignation is used as a reference to do data
visualization and come up with insights. Although the research does not solve a specific
research problem, it serves the purpose of explaining the importance of Gephi as a data
visualization tool and how it can be used to analyze the Twitter activity in the occurrence of a
significant event in the society.
The twitter data was bought from ‘Gnip’ which is the commercial data provider for twitter. In
this specific use case, since the tweets about Kevin Rudd during June 2010 issue was needed,
the web service ‘twapperkeeper’ was used to get hashtags and keyword tweets.
The data set of @reply data from discussion of a purported leadership challenge against the
then-Australian Prime Minister Kevin Rudd by his Deputy Julia Gillard, under the Twitter
hashtag #spill (Australian political slang for such a challenge) in the evening of 23 June 2010
was the data set analyzed using Gephi.
3. Methodology-
Two kinds of analysis were done using Gephi. This can be explained using the network
visualization graphs below;
Figure 1.1 shows the graph with node size showing the indegree and the node colour showing
the degree (yellow-red).

18
First (with node size set to indicate the amount of @replies received), it is obvious that a
large number of participants in the #spill conversation not only discuss the potential fate of
the Prime Minister, but do so while referring to him using his Twitter username
@KevinRuddPM rather than merely his name
The second part of the research was to create a dynamic network graph. For this the time
stamp of the tweets were included in the data. Another problem which needed to be addressed
was that there were tweets from various parts around the world and the time stamps were
Figure 1.1

19
according to the local time. A software was used to rectify this and bring all the time stamps
to a standard time. Thus the dynamic graph was created as shown below.
4. Findings
From the first data visualization graph we can understand that;
➢ The significant level of incoming @replies does not result in any responses from the
Prime Ministerial account; with node colour indicating the amount of @replies sent,
@KevinRuddPM remains a pale yellow, indicating no activity.
➢ At the same time there are also a number of very highly active senders of @replies
(shown in red), who do not necessarily also receive a significant number of answers to
their messages
➢ Journalist’s @latikambourke and @renailemay are hubs in the network. They are
notable both as senders and recepients. (Marked in red and has highest centrality).
The eigenvector centrality could also be high (not given in the case)
From the dynamic graph the following findings were made;
7. Moving from figure (a) to (b) we can find that many tweets became inactive as the
exchange between the entities using @replies stopped. The ones which maintained
exchanges are connected using edges while the others are shown as distinct nodes
(separated from the rest of the diagram). This shows the change through the time of
analysis.
5. Conclusion
The dynamic network visualizations which we produce using Gephi by visualizing the
hashtag data and the period in which they are available, we can find out the shifting roles
played by individual participants over time, as well as the response of the overall #hashtag
community to new stimuli – such as the entry of new participants or the availability of new
information.
In a situation of crisis or a social incident, we can understand who the key players in twitter
are who aggregate and disseminate the information and whose views and tweets creates an
influence on others. All these in-depth analyses which was once thought to be beyond our
reach can now be done through Gephi and many of its capabilities including dynamic
network mapping. Like many other journals in academia, this research backed journal further
confirms the role network analysis and visualization can play in improving businesses as well
as personal life.

20
8. Graphical visualization of analogous relationships of Raags
(Kelkar,2015)
1. Introduction:
The Idea of visual representation has been found in several concepts of Hindustani classical
Music which includes paintings to describe raags to the reflection of different seasons in
raags-the theory HCM has many multi modal ideas authors has contributed a method of
analysing Raag spaces from the point of view of tonality, giving a method of creating
combinations of modes through the point of view of colour harmony and the colour wheel.
Hindustani music also has the concepts of cadences, themes and variation that are com-
parable to Western theoryInvalid source specified.
Raag is thought to a mode with grammar and constituent notes becomes the most important
property of raags. Thaats are the small group of similar Raags.
2. Data Collection
Data set contains 163 commonly sung raags. The final set of descriptors contains all the
elements which is classified under different thaats, Authors coded them as
S,r,R,g,G,m,M,P,d,D,n,N .They sourced this information from seminal authoritative books
on Indian Music theory. Invalid source
specified.
3. Methodology:
After receiving the data, they considered all the unique entities to be the independent nodes
for creating graph. They created edges amongst these nodes. They created a directed graph
with edges directed outwards from parents node to the attribute. The figure shows a total of
393 nodes and 7010 edges

21
Following things were considered while constructing graph-
• Edges were computed between each participating notes in the raag as well as notes
found in the short description phrase in a raag.
• If a nodes happens to be the tenor(vadi-the most important note which is followed by
samvadi ) it is assigned the larger weight than the option note (anuvadi)
• Weights are also assigned based on the degree of connectivity.
• The layout used are fruchterman reigngold, Force atlas2 to create 10 clusters with the
resolution of 0.68 and modularity of 0.185,Average degree of this network was 4.751
and average weighted degree is 18.704 with sparse graph density of 0.012.Invalid
source specified.Invalid source specified.
4. Findings:
• The nodes with the highest degree was located near to the center with increasing
spatially radially outwards
• The clusters of notes come closer based upon their frequent use together.
• The parent thaats to which raags belong also separate out to separate angles around
the circle.
• The close neighbors of ‘Sa’ are ‘Pa’ and ‘Ma’, the fifth and fourth scales degrees
respectively, either of which is must to be present to complete a Raag.
• Thaat clusters are formed around two pairs of notes that go together. Hindustani
Classical Music as well as graph visualization showed the same results.
• There are larger variance and diversity in the raagas of flat notes, while the ones
which has the tone on a major scale are huddled much closer.
• It is possible to visualize which family is most prevalent.
5. Conclusion:
Network analysis and data visualization can be used to understand the most prevalent famiy ,
raags and thaats. The study conducted suggests the visualization of raags through their
representation as graphs facilitates their arrangement, grouping and similar properties of
different nodes. Several relationships can be derived from the final layout.

22
9. Untangling the Social Network of Musicians (Focht, 2017)
1. Introduction:
This research tends to investigate the cultural contexts, circumstances, inspirational models
and different ways of knowledge, experience and expertise have been transferred over time.
The aim of the project was Understand “creative transfer” within the field of Music
This is done to understand the everlasting significance of musical works, relationships
between musicians. It is generally seen that print media usually single relation between the
two musicians are narrated and only one of the two’s biography is used to report the
relationships.
2. Data Collection
Some musicologists have examined relationships between musicians and print media that
results in a unique database which is valuable for all kinds of music professionals.
BMLO (Bavarian Musicians encyclopedia Online) provided the authors information of about
28,000 musicians.Invalid source specified.Invalid source specified.
3. Methodology
The objective of the visualization was to develop a
graph design that makes the social network of
musicians visually accessible in the first time
The resulted graph created by GEPHI has total of
1420 components, the largest connects 5539
musicians, and the second largest connects only 56
musicians and 1385 connected components contain
less than 10 musicians.
The preliminary step while generating the social
network graph is filtering according to the research
question.
• A filtering can be done by relationship type(s).
• It is possible to focus exclusively on musicians
with specific professions (e.g., -
instrumentalists).
Authors focused on the analysis of teacher-student
relationships to investigate how musical knowledge,
experience and expertise have been transferred over
time. Hence After filtering the data, 3,994 musicians
which is the largest connected component of this sub-

23
network – the research object of the musicologists – contains 2,769 teachers and students
Following things were taken care of while creating graphs
• Temporally aligned graph: The relations are chronologically analyzed from left-to-
right for this they used the layout which has temporal dimension. Therefore, Authors
applied a force-directed graph layout and used fixed x-values to represent time, this
reflects the middle of a musician’s creative lifetime on a horizontal time axis. Which
gave them the result which has the nodes that are spread vertically and the
chronological order remains intact.
• Node grouping: Since the underlying research question investigates how musical
knowledge are transferred, Authors hide the nodes of musicians who never had the
role of a teacher. but these musicians(students) were grouped with their teachers. This
design decision reduces the number of nodes to be displayed from 2,769 to 608.
• Node layout: To illustrate the significance and influence of these personalities, the
bigger size of nodes reflect the more number of students of the teachers, which makes
teachers with many students salient. By default, node labels are hidden, but for
navigation purposes, a user-defined number of node labels with the musician’s names
can be shown on demand. Either the most popular musicians or the teachers with most
students can be highlighted.
• Interactivity: By hovering over node shows the corresponding musician and two lists
of students (those who became teachers and those who did not) in a popup box. By
Clicking a node highlights all connections of a teacher’s students who became
themselves teachers. This way, transfer paths of musical knowledge can be assembled
interactively.
• Musical profession analysis: The musicians in the graph can be selected to see, the
evolution of musical professions which can be analysed. Therefore, all musical
professions of the teachers’ students are listed by decreasing frequency.

24
4. Findings:
To understand the application of the graph
Examples of the 2 musicians is taken
• Joseph Rheinberger and Wilhelm
sandberger are the teachers with
highest number of students.
• Visualisation of the graph reflects the
change in music profile of students
from composition to composition
science.
• Musicologist is the most frequent
musical profession, composer is
getting less frequent.
The author infer that due to more number of change in music profession from composition to
composition science there is an increase in musicology
5. Conclusion:
Through close collaborations with computer science and Cloud applications like GEPHI ,
Authors did a data visualization .Network analysis of the same has provided us with the
insights about the creative transfer of Music knowledge. It also answered the question about
the How musical knowledge is transferred through their social network of musicians.
Further data can be analyzed to understand the emerging trends of music professions
It also gave an Idea about the teacher student relationship affects knowledge sharing. The
graph can be altered based on the specific research questions.

25
10. Social Networks and Text Messaging in Public Health (Beck
and Armbruster, 2014)
1. Introduction:
This project is done to understand the usage of social media to develop an optimal strategy to
design interventions for preventing diseasesInvalid source specified. .This project has
helped researchers to predict the H1N1 outbreak in US in 2009.Invalid source specified.
Research is mainly aimed to show how actual social
networks and social network analysis can be incorporated
into the design of behavioral prevention using text
messaging.
2. Data Collection:
Data was collected by surveying the community members
in Hyderabad.
3. Methodology:
To identify opinion leaders to target for behavior change
campaigns a survey was conducted by the author in
Hyderabad –
• It was plotted in Data visualization tool GEPHI
• Opinion leaders were identified using network
centrality measures which includes
➢ Degree-The numbers of social contacts and a measure of popularity
➢ Closeness centrality- Those in the center of the network with few hops
between them and everyone else in the target population
➢ Between centrality-Those playing a broker and bridge role by connecting
different parts of the network.
4. Findings
• Contact testing is used in the network based public health intervention .This
is the popular method of tracking the cases of HIV, Tuberculosis and STD.
• Mobile phones have a penetration of 96% globally .Text messages are
primarily used for promotion and education of different health care issues.
• These interventions are sent in bulk to large number of users or
volunteers/public health workers send individually tailored text messages.
Both the methods have some challenges in terms of richness and reach
Authors have used a simulation framework and network datasets, such as in Figure 1, to
compare the proposed P2P text messaging design against other intervention designs
currently usedInvalid source specified., such as bulk message interventions and personalized
interventions.

26
They believe that leveraging the peer-distributor’s social network and phone , with the ability
to send personalised messages and then follow-up on these messages with his peers, This will
increase the richness of the information conveyed, which will result in increase in acceptance
of the messages.
5. Conclusion
Social media has been used by
government to spread public health
welfare. Although author’s Idea of Bulk
messaging and personalised messaging
can yield results (high acceptability) but
people in this generation finds its
annoying but with usage of Peer to peer
network this message can be effectively
reach to masses as social network is the
platform which allows viral like spread
of intervention among the target
population and is very cost effective.

27
References
(unknown), Institutuion - Activate Networks. (2013, 5 16). Business - Drug Marketers
use Social Network Diagrams to Help Locate Influential Donors. Retrieved from
New York Times:
http://www.nytimes.com/interactive/2013/05/16/business/PHARMA.html
Andrei Brodera, R. K. (2000). Graph structure in the Web. Elsevier, 309-320.
Bruns, A. (2011). Mapping Dynamic Conversation Networks on Twitter using Gawk
and Gephi. Information Communication and Society, 1323-1351.
Doughty, M. &. (2012). Who is on your sofa?: TV audience communities and second
screening social networks. Proceedings of the 10th European Conference on
Interactive TV and Video. EuroiTV'12 .
Elsevier Early Career Resources. (2012). How to use blogging and microblogging to
disseminate your research. Elsevier.
Koehler, B. (2014). Before and After Series C funding – a network analysis of Domo.
Beautiful Data.
Larsson, A. O., & Moe, H. (2012). Studying political microblogging: Twitter users in the
2010 Swedish election campaign. Sage Journals, 729-747.
P.W. Holland, S. L. (1971). Transitivity in structural models of small groups.
Comparative Group Studies, 107-124.
Seunghyun “Brian” Park, C. “. (2015). Using Twitter Data For Cruise Tourism Marketing
and Research. Journal of Travel & Tourism Marketing, 15.
Yong Hwan Kim, D. L. (2014). Exploring characteristics of video consuming behaviour
in different social media using K-pop videos. Journal of Information Science,
806-822.

Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Similar a Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI

Similar a Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI (20)

Último

Último (20)

Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI