SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Big Data Analysis
Midterm Assignment
2017
USE CASES SOLVED USING NETWORK ANALYSIS
TECHNIQUES IN GEPHI.
KIRAN BABU MAMPILLY 2014 2036| MILI GUPTA 20142118| RUCHIKA SHARMA 20142019
1
Executive Summary
Through this report, we are trying to understand the different scenarios in which data
visualization techniques, particularly using the Gephi Software can be used to understand the
problem in hand better.
The fundamental background states that to undertake any research or analysis we need to
ascertain the problem first. Then the research design and analysis can be undertaken. During
the analysis, data visualization helps in understanding the issue more coherently. Additionally,
these problems can be pertaining to any domain- Marketing, Finance, Law, HR or liberal arts,
etc. and the data visualization software will be helpful.
This report helped us in appreciating the presence of social analytics tools which helps people
in various domains to understand the nature of their network structures and relationship with
others in the field. We have used theories such as Centrality (betweenness, in and out degree,
eigen vector, etc.), reciprocity, clustering coefficient etc. to understand the cases in had better.
In this report, we have mainly focused on literature review of already used use-cases in the
visualization task. We have worked on use cases pertaining to varied use of social media site
Twitter in the political, cultural and business context; use by drug marketers and musicians
among others.
2
Table of Content
Executive Summary.......................................................................................................................1
Table of Content............................................................................................................................2
Use Cases........................................................................................................................................3
1. Microblogging of Twitter users in 2010 Swedish Election campaign (Larsson &
Moe, 2012)...........................................................................................................................3
2. Who is on your sofa? TV audience communities and second screening social
networks (Doughty, 2012)..................................................................................................5
3. Before and After Series C funding – a network analysis of Domo (Koehler, 2014) 7
5. Using Twitter Data for Cruise Tourism Marketing and Research .............................11
6. Exploring characteristics of video consuming behavior in different social media
using K-pop videos (Yong Hwan Kim, 2014)...................................................................14
7. Mapping dynamic conversation networks on Twitter .............................................17
8. Graphical visualization of analogous relationships of Raags (Kelkar,2015)........20
9. Untangling the Social Network of Musicians (Focht, 2017)...................................22
10. Social Networks and Text Messaging in Public Health (Beck and Armbruster,
2014) ....................................................................................................................................25
References....................................................................................................................................27
3
Use Cases
1. Microblogging of Twitter users in 2010 Swedish Election
campaign (Larsson & Moe, 2012)
1. Introduction
Twitter is one of the most popular and well known social media site used for short posts and
statuses. Ever since the successful use of internet during the 2008 US presidential elections, it
is important to analyse the importance of social media in garnering opportunities for online
campaigning and deliberation. (Larsson & Moe, 2012) Microblogging refers to making short
and/ or frequent posts. (Elsevier Early Career Resources, 2012) Aim of Study- to study
participation in political debate on Twitter, since new media is used very often to
communicate and comment on political issues and support.
2. Theoretical foundation (if any)
In graph theory, centrality indicators identify the most important vertices within a graph.
These can be based on closeness, degree, etc. In this case we focused on the following-
➢ In degree- number of edges going into a node is known as in degree of - -the
corresponding node.
➢ Out degree- and number of edges coming out of a graph is known as outdegree of the
corresponding node.
➢ Reciprocity- two-way relationship between two nodes (Andrei Brodera, 2000)
3. Data collection
Data was collected via secondary sources. This report only entails to the literature review of a
use case already established on academic journals and other web sources. Data was collected
from one month prior to the election onwards. YourTwapperKeeper was used to store the
tweets.1
4. Methodology
Gephi Data Visualization is used on data which have nodes and can be analysed. This study
helps in analysing a specific subset of that “twitter” online sphere, focusing on one set of use,
i.e. political communication. The paper (Larsson & Moe, 2012) also attempts to establish
interaction between users, (in terms of volume and forms of use) and on who these users are
1 https://github.com/540co/yourTwapperKeeper
4
and how they relate (or not) to each other. Force Atlas layout is used on Gephi- It is scaled
for small to medium-size graphs, and is adapted to qualitative interpretation of graphs.
5. Findings
6. Conclusion
This usage of network analysis helps us in identifying the importance of people who mainly
indulge in retweeting and other who are retweeted the most. Using the theoretical background
of centrality in networks, we are able to analyse the key influencers on social media and thus
The Figure 1 features many nodes, each
representing a particular Twitter user. The
colour of the nodes represents the
outdegree of each user – the darker the
color, the more @ messages the specific
user sent. Node size is dependent on
indegree – the larger the node, the more
messages were directed towards the
specific user. Straight lines between nodes
specify unidirectional communication,
while curved lines indicate reciprocity in
exchanges of messages. (Larsson & Moe,
2012)
Figure 2 provides a network map of RT
(retweet) activity, identifying the high-end
users in this regard.
Each node in Figure 3 represents a user.
The darker the colour of the node, the
more active the user is at retweeting the
messages of others.
Users who are often retweeted are
identified by larger node sizes.
(Larsson & Moe, 2012)
5
we know who to target during a political campaign. The most retweeted ones would tend to
be the opinion leaders and would exert most influence.
2. Who is on your sofa? TV audience communities and second
screening social networks (Doughty, 2012)
1. Introduction
Viewing of second or third screens along with Television is becoming popular mong
audiences as it is affordable and pervasive. This use case tries to decipher and explore the
message activity while using the Twitter blogging service. The network of viewers connected
reveal the different characteristics they possess and their motivations of using a second screen.
2. Theoretical foundation
➢ The measures of centrality play a major role in this use case as well. In degree- number of
edges going into a node is known as in degree of the corresponding node. This helps us in
ascertaining groups which are connected greatly or are isolated.
➢ Clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster
together. (P.W. Holland, 1971) It measures the ratio of number of edges between a node
and its immediate neighbourhood and the maximum number of edges which could exist.
➢ Reciprocity explains the two-way relationship between two nodes. (Andrei Brodera, 2000)
3. Data collection
The data was collected by the analysts during the timing of screening of two popular shows-
Strictly coming dancing (prime time, celebrity oriented) and BBC Question Time (current
affairs show) from Twitter using Twitter stream API2
. The collect data either in a sample
mode or a filter mode. The limitation of this method is that tweet with required hashtags are
only sourced.
4. Methodology
The datasets were imported to the Gephi software. The nodes and edges were connected to
each other based on their mention in tweets for others or themselves (resulting in self-loop).
The OpenOrd network visualization algorithm was used to form the clusters and community
structures within a network. “In-degree” of each node was established to better visualize the
graph.
2 https://developer.twitter.com/en/docs
6
5. Findings
Strictly coming Dancing
The in-degree was found out and the larger
nodes display users with most mentions or
retweets.
There also exist certain isolated nodes
along with nodes with greater in degree.
#scd contains 3903 edges between 2895
nodes, out of which 443 edges and 695
nodes are isolated which means they use
the hashtag of the show but do not mention
anyone.
The reciprocity level in this case is very
less since it involves celebrities with fan
following. (Doughty, 2012)
BBC Question Time
Similarly using the in-degree, visualization
represents the larger nodes are highly
mentioned or retweeted.
#bbcqt contains 3769 edges between 3955
nodes, out of which 2640 nodes and 1913
edges are in isolated groupings.
However, in this case isolated grouping are
more tightly connected and networked
“core” of each other. (Doughty, 2012)
6. Conclusion
The two TV shows chosen, display distinct characteristics.
1. Prime-time celebrity oriented dance show- connecting with celebrities and stars of the
show using mentions and retweets. Reciprocation was low in these.
2. Late night current affairs news show- reflects viewers engaging in conversation and
involves higher rates of reciprocation
7
Both the shows display different behaviours, however the concern for marketers is that
people are resorting to a second screen to experience deeper communal viewing with
friends or like-minded people. Therefore, while advertising they must focus on that.
3. Before and After Series C funding – a network analysis of
Domo (Koehler, 2014)
1. Introduction
This case refers to the company Domo3
which is primarily a Big Data company and works in
the field of network analysis of Venture Capital connections. The analysis around the
company was done when it received above average funding in the form Series C funding
even after being a relatively young company. To be precise, Domo received $125M from
Greylock, Fidelity, Morgan Stanley and Salesforce among others.
The analysis was carried out by a data scientist to understand the effect in the network
structure after the announcement of this news. Apparently, Domo came out to be one of the
best-connected nodes and through the following analysis we will find out why.
2. Theoretical foundation
➢ Series C funding4
could be used to buy another company. As the operation gets less risky,
more investors come to play. In Series C, groups such as hedge funds, investment banks,
private equity firms and big secondary market groups accompany the before-mentioned
investors
➢ Betweenness Centrality- It is equal to the number of shortest paths from all vertices to all
others that pass through that node. It indicates the node’s centrality in a network. (Andrei
Brodera, 2000)
3. Data collection
This analysis focuses on a literature review of the established use-case for Domo’s increasing
centrality. The data collection was carried out by the scientist to get values for the
connections between various Big Data companies are connected to each other and also to
other Series C funders.
4. Methodology
Gephi software was used to plot the centrality of the various Data Science start-ups and
companies before and after the news of financing for Domo were announced. This helped in
3 Domo
4 SeriesCFunding
8
understanding how firms become more popular if they receive substantial amount of finance
from big players unexpectedly.
5. Findings
Before Series C Funding
The dark nodes are the
more connected ones
The betweenness
centrality is constant
between many popular
nodes on average.
(Koehler, 2014)
After Series C Funding
The dark nodes are the
more connected ones.
However, Domo’s
(green node) huge
Betweenness Centrality
almost dwarfs the other
nodes in the network.
(Koehler, 2014)
6. Conclusion
The new funding round now only increases Domo’s centrality but also MongoDB’s because
of the shared investors Salesforce, T. Rowe Price and Fidelity Investments. (Koehler, 2014) It
can be therefore concluded that if a young firm invests relevant amount of time in its’
marketing activities in terms of pitching to funders, it can grow to become the more sought
after and central company in the entire network. In this case, Domo became one of the
highest central Big Data company due to the news of the finance he received from funders.
This resulted in a positive sign for its’ future as a company.
9
4. Drug Marketers Use Social Network Diagrams to Help Locate
Influential Doctors ((unknown), Institutuion - Activate Networks, 2013)
1. Introduction
This case is concerned with how an effective use of social network diagrams obtained
through data visualization can help in pinpointing effective influencers in a pharmaceutical
market. Like all other industries marketing plays a crucial role in the pharmaceutical industry
and to make a cogent marketing strategy and to ensure its success often the help of network
analysis coupled with data visualization is used.
Given in this case is an example of how the consulting firm ‘Activate Networks’ create social
network diagrams to identify and thereby assist pharmaceutical companies to better target
physicians with the largest reach. ((unknown), Institutuion - Activate Networks, 2013). The
network analysis done by the consulting firm gives a clear picture on which physicians to
target for maximum reach.
2. Data collection (source of data)
The data was collected by the consultancy firm ‘Activate Networks’. Although the exact
source of the data is not mentioned in the use case, it can be assumed that the data was taken
from a medical database. The data used is that of Doctors based in a Northeastern U.S.
community who have prescribed, or are potential customers for, an oncology drug.
3. Methodology
After gaining the data, the data was
filtered so that it only contains the
data of Doctors based in a
Northeastern U.S. community who
have prescribed, or are potential
customers for, an oncology drug.
Once the data is cleaned, using the
help of Data Visualization
software’s like Gephi a network
diagram was constructed as shown
below;
Source: ((unknown), Institutuion -
Activate Networks, 2013)
10
The following factors were kept in mind while developing the network diagrams.
➢ Each circle represents one doctor.
➢ The dots in orange refer to relevant specialists. Lite blue dots refer to doctors who have
not prescribed the drugs and dark blue dots refers to doctors who have prescribed the drug.
➢ The size of the dot depends on the prescribing volume for an oncology drug by the doctor.
➢ The edges represent the connections between doctors (can be by common patients, it
assumes that if one doctor prescribes the new drug another doctor will come to know
about it from a common patient and thus do the same)
4. Findings
A number of findings can be inferred from this
network data visualization for a marketer. Some
of the findings found by marketer in this example
is given below.
For a cluster like the one on the left there are
many interrelated physicians with all similar
volumes of prescription. In this case the marketer
need not prioritize all the doctors instead select a
few. It saves cost while at the same time
maintaining the same level of reach.
Certain doctors in the diagram seemed to form key links between two different clusters.
Marketers identify these ‘bridges’ as a good target as it offers entry into both the clusters.
Marketers identify the node with higher ‘connectedness’ as targeting them can lead to better
marketing reach among the physicians group.
5. Conclusion
Network analysis and its visualization can greatly help the marketers in the pharmaceutical
industry to better target their marketing promotions and interventions. By targeting the right
people and thereby maximizing the reach attained, pharmaceutical organizations can better
manage their marketing expenditure and can do aggressive cost cutting in marketing without
compromising on the output. We believe that the future belongs to those companies whose
marketing department relies on such tools as discussed in this case. Social Network analysis
and its diagrams are the new competitive advantage for pharmaceutical companies of the
future in terms of channelizing their marketing efforts.
11
5. Using Twitter Data for Cruise Tourism Marketing and Research
(Seunghyun “Brian” Park, 2015)
1. Introduction
This research article stands as a great example on how network analysis can be used to
strengthen the marketing efforts in the Cruise tourism entertainment Industry. Very limited
research has been done in the field of cruise tourism and its marketing. The case exemplifies
the need of businesses to identify the large amount of data created by Facebook and twitter
(big data) and how these data can be properly analyzed to identify problems as well as
suggest solutions.
Twitter is selected in this research as it has drawn the attention of researchers as it helps
address different topics like social networking, information diffusion, customer engagement,
and product promotion.
Network analysis and mapping is not the sole tool used in this research but is one which is
used with many other to come at a cohesive conclusion. Network analysis in this case uses
such metrics as Eigenvector centrality, closeness centrality, and betweenness centrality which
can be used to evaluate the importance of vertices (or users) in the network (Seunghyun
“Brian” Park, 2015). The research also uses cluster analysis to identify the subgroups inside
the network.
2. Data collection
The Data collection method used for the research was done with the help of online platforms
as well as Twitter API. Twitter data about cruise tours between May 2 and June 5 2014 were
collected. ScraperWiki, an online platform, was used to collect and archive tweets during this
period. ‘Cruise’ related hashtags were used to extract tweets and meta data from Twitter API.
The data extracted was collected and combined and subject to further cleaning. For example
some hashtags referred to a different cruise (car cruise control, Tom cruise etc.) these had to
be removed from the data set. Also only English tweets were retained for the study which
filtered out about 4% from the initial data.
3. Methodology
The research used three different techniques for analysing the data collected (content analysis)
word frequency, content analysis, and network analysis. For the word frequency analysis,
sentence tokenization was first performed to separate each tweet into discrete words. Other
NLP techniques (e.g. stopping words) were also applied to the data set.
12
For descriptive analysis, RapidMiner, an open source data analytics tool, was used to
disassemble and count the frequency of all texts and hashtags in the tweets.
The third part of the analysis consisted of the network analysis and which was carried out
through Gephi software. Given below is the network diagram attained from Gephi after
visually representing the collected data in Gephi
Source: (Seunghyun “Brian” Park, 2015). The diagram shows the visualization of the entire
data along with the major clusters.
4. Findings
(For focusing on the Network analysis component, only the findings pertaining to the
network analysis part is described)
The network analysis lead to the surprising identification of different clusters maily four of
them. They are; celebrity centered, Blogger centered, Cruise Ship company centered, Travel
Agency centered. Further the concepts of Eigenvector centrality, closeness centrality, and
betweenness centrality were used to identify the influencers or the most prominent entities
within the clusters. The findings are as follows;
13
➢ @TheCarlosPena with 2.5 million users showed the highest centrality in the
respective cluster. The celebrity is relavent because she posts pictures in a cruise
extensively and also uses hashtags extensively
➢ (@CruiseMiss) in theUnited Kingdom (UK) who shares cruise tourism- related
information and pictures is the one with highest centrality in bloggers group. Bloggers
had less closeness centrality compared to celebrities but more content.
➢ @royalcarribean is the most active account although it has very few tweets. This is
because of its high eigenvector centrality. It is a prominent player among cruise ship
company centered.
➢ @WaldoWorldTrvl has the highest centrality in travel agency centered cluster.
5. Conclusion
The research concludes the presence of number of sub groups in twitter who resort to
information from different sources to get knowledge about cruise tourism. It suggests that
marketers should acknowledge the phenomenon of ‘celebrity practice’ in twitter. Twitter has
become a place where celebrities develop in-depth relation with their fan following and
targeting these celebrities can make a difference. Although the blogger tweets spread less
faster compared to celebrity tweets (less *closeness centrality), bloggers were posting more
cruise content than celebrities and hence shouldn’t be avoided while deciding on marketing
interventions. Maintaining strong ties with the identified bloggers can significantly contribute
to brand recognition of the cruises. The research also concludes that Twitter, a microblogging
platform, is and will continue to be important for the tourism and hospitality industry and
related research even in the coming years.
14
6. Exploring characteristics of video consuming behavior in
different social media using K-pop videos (Yong Hwan Kim, 2014)
1. Introduction:
This research focuses on analysing how people consume web videos (cultural web videos)
differently through different medium. The research uses network analysis and data
visualization extensively to reach at findings.
The genre of video selected for analysis in this case is K-Pop videos (Korean Popular song
videos) which are the music videos originating from South Korea. These music videos are
gaining immense popularity and fame. It wasn’t long back that ‘Gagnam’ style was at the
pinnacle of fame. Till now it remains one of the most viewed YouTube videos ever (2.9
Billion views). The primary objective of the study is to investigate diverse patterns in
consuming cultural content (K-Pop videos in this case) depending on different social media.
2. Theoretical foundation
An important theoretical foundation used in this case is that of Gratification theory. The use
and gratification theory is a useful method to interpret why people use media. The theory
helps in finding the relationship between the user and the media. According to the theory,
users are active audiences and they use media to satisfy their motivation and select media
depending on the motives they want to fulfil.
3. Data collection (source of data)
The data was collected with the help of YouTube API and got the relevant tweets by Web
Crawling. To get the seed data from YouTube the researchers used queries which include k-
pop, kpop, Korean pop, SM Entertainment, YG Entertainment and JYP Entertainment (the
last three are famous k-pop content developers. From the seed data the following variables of
a video were also collected which include; video ID (URL), title, category and uploader
account name in the YouTube database
The twitter tweets were identified by Web Crawling in which tweets with the URL of the
video (identified from YouTube API) were extracted. There were also other three methods
used to collect co-link videos in twitter which helped in ease in data collection.
4. Methodology
The whole research was divided into four stages for ease of execution. They are-
➢ Collection of Seed Data
15
➢ Collection of related and co-linked videos
➢ Pair and Network generation
The suggestion list in YouTube was used to identify which video from the collected
data pops up when another video from the data is viewed. Same was done for twitter,
the ‘co-linked’ pair implies an implicit linkage between two videos mentioned in
tweets of the same Twitter account. A ‘co-linked’ pair between the video X and Y is
produced when they are both exported to tweets of a specific Twitter account.
➢ Multilateral Analysis Approaches
In network Analysis part of this stage, the objective was to first identify the core
nodes since this allows a meaningful conclusion to be drawn from them. To identify
the core nodes, the researchers adopted four centrality measures: degree centrality,
weighted degree centrality, betweenness centrality and PageRank. The degree
centrality of a node is defined as the number of edges that are adjacent to that node.
Weighted degree centrality is a variation of degree centrality, calculated by summing
the frequency of every node pair for a given node. Betweenness centrality denotes the
number of shortest paths passing through a particular node. PageRank measures the
importance of a node based on the sum of the ranks of the number of its incoming
links. (Yong Hwan Kim, 2014)
5. Findings
The diagram is the network
visualization of the YouTube
network. The modularity algorithm
was used in this case. The
modularity algorithm identifies
groups of nodes that are more
similar to each other than to other
groups and optimizes the detection
of the community structure in a
network.
16
Analysis of the network community shows the structure of sub communities in entire network.
A large number of communities in network implies that users are associated with a variety of
subject or various videos whereas a small number of communities shows that users focus on a
particular topic or interest. Node labels such as ‘j7_ISP8Vc3o’ and ‘9bZkp7q1910’ are IDs of
YouTube videos and the size of each node becomes larger by its score of degree centrality.
The edge denotes the occurrence frequency of each pair.
In YouTube visualization two main communities are identified in the YouTube network.
Community 1 has more nodes than community 2, and the videos with high degrees centrality
are found only in community 1. All videos that have high degree centrality in community 1
are related to singers from SM, JYP, YG entertainment companies. Consequently, important
nodes of the YouTube network are found to be the videos of these three major entertainment
companies.
This the network
visualization of the twitter
data. However, the Twitter
network has three major
communities and two minor
communities. The videos
with high degree centrality
are spread across
communities 1 and 2, and
some of these videos are
related to singers who
belong to other small or
medium-sized entertainment
companies.
6. Conclusion
The study concludes that the videos in both twitter and Youtube are consumed by users in a
different manner, these are considered as clues for tracking users’ socio-cultural behaviours.
It was proved through this research analysis that people in Twitter network consume more
diverse videos than people in the YouTube network (Youtube’s recommendation algorithm
played a part in this). Findings like these can have significant use to marketers as tracking the
socio cultural behaviors of customers in different online mediums can help in choosing the
17
right channel to use for marketing communications. Gephi once again lets the modern
marketer to compute this heavy data driven analysis.
7. Mapping dynamic conversation networks on Twitter (Bruns, 2011)
1. Introduction
The research throws light on the need of mapping dynamic conversations in twitter and how
it can be done. Twitter is a social medium which restricts posts to 140 characters. This
limitation put forward by Twitter is almost circumvented by the users by the effective use of
hashtags ‘#’ and even @replies. This has helped to sort content in Twitter. A user is able to
contribute to the discussion and with other people even if he/she is not following them. This
extensive use of hashtags and @replies have aroused the curiosity of researchers to analyze
the happenings in twitter during any public issue.
This paper through the analysis of millions of tweet which happened on June 2010 against
Australian Prime Minister Kevin Rudd and his resignation is used as a reference to do data
visualization and come up with insights. Although the research does not solve a specific
research problem, it serves the purpose of explaining the importance of Gephi as a data
visualization tool and how it can be used to analyze the Twitter activity in the occurrence of a
significant event in the society.
2. Data collection (source of data)
The twitter data was bought from ‘Gnip’ which is the commercial data provider for twitter. In
this specific use case, since the tweets about Kevin Rudd during June 2010 issue was needed,
the web service ‘twapperkeeper’ was used to get hashtags and keyword tweets.
The data set of @reply data from discussion of a purported leadership challenge against the
then-Australian Prime Minister Kevin Rudd by his Deputy Julia Gillard, under the Twitter
hashtag #spill (Australian political slang for such a challenge) in the evening of 23 June 2010
was the data set analyzed using Gephi.
3. Methodology-
Two kinds of analysis were done using Gephi. This can be explained using the network
visualization graphs below;
Figure 1.1 shows the graph with node size showing the indegree and the node colour showing
the degree (yellow-red).
18
First (with node size set to indicate the amount of @replies received), it is obvious that a
large number of participants in the #spill conversation not only discuss the potential fate of
the Prime Minister, but do so while referring to him using his Twitter username
@KevinRuddPM rather than merely his name
The second part of the research was to create a dynamic network graph. For this the time
stamp of the tweets were included in the data. Another problem which needed to be addressed
was that there were tweets from various parts around the world and the time stamps were
Figure 1.1
19
according to the local time. A software was used to rectify this and bring all the time stamps
to a standard time. Thus the dynamic graph was created as shown below.
4. Findings
From the first data visualization graph we can understand that;
➢ The significant level of incoming @replies does not result in any responses from the
Prime Ministerial account; with node colour indicating the amount of @replies sent,
@KevinRuddPM remains a pale yellow, indicating no activity.
➢ At the same time there are also a number of very highly active senders of @replies
(shown in red), who do not necessarily also receive a significant number of answers to
their messages
➢ Journalist’s @latikambourke and @renailemay are hubs in the network. They are
notable both as senders and recepients. (Marked in red and has highest centrality).
The eigenvector centrality could also be high (not given in the case)
From the dynamic graph the following findings were made;
7. Moving from figure (a) to (b) we can find that many tweets became inactive as the
exchange between the entities using @replies stopped. The ones which maintained
exchanges are connected using edges while the others are shown as distinct nodes
(separated from the rest of the diagram). This shows the change through the time of
analysis.
5. Conclusion
The dynamic network visualizations which we produce using Gephi by visualizing the
hashtag data and the period in which they are available, we can find out the shifting roles
played by individual participants over time, as well as the response of the overall #hashtag
community to new stimuli – such as the entry of new participants or the availability of new
information.
In a situation of crisis or a social incident, we can understand who the key players in twitter
are who aggregate and disseminate the information and whose views and tweets creates an
influence on others. All these in-depth analyses which was once thought to be beyond our
reach can now be done through Gephi and many of its capabilities including dynamic
network mapping. Like many other journals in academia, this research backed journal further
confirms the role network analysis and visualization can play in improving businesses as well
as personal life.
20
8. Graphical visualization of analogous relationships of Raags
(Kelkar,2015)
1. Introduction:
The Idea of visual representation has been found in several concepts of Hindustani classical
Music which includes paintings to describe raags to the reflection of different seasons in
raags-the theory HCM has many multi modal ideas authors has contributed a method of
analysing Raag spaces from the point of view of tonality, giving a method of creating
combinations of modes through the point of view of colour harmony and the colour wheel.
Hindustani music also has the concepts of cadences, themes and variation that are com-
parable to Western theoryInvalid source specified.
Raag is thought to a mode with grammar and constituent notes becomes the most important
property of raags. Thaats are the small group of similar Raags.
2. Data Collection
Data set contains 163 commonly sung raags. The final set of descriptors contains all the
elements which is classified under different thaats, Authors coded them as
S,r,R,g,G,m,M,P,d,D,n,N .They sourced this information from seminal authoritative books
on Indian Music theory. Invalid source
specified.
3. Methodology:
After receiving the data, they considered all the unique entities to be the independent nodes
for creating graph. They created edges amongst these nodes. They created a directed graph
with edges directed outwards from parents node to the attribute. The figure shows a total of
393 nodes and 7010 edges
21
Following things were considered while constructing graph-
• Edges were computed between each participating notes in the raag as well as notes
found in the short description phrase in a raag.
• If a nodes happens to be the tenor(vadi-the most important note which is followed by
samvadi ) it is assigned the larger weight than the option note (anuvadi)
• Weights are also assigned based on the degree of connectivity.
• The layout used are fruchterman reigngold, Force atlas2 to create 10 clusters with the
resolution of 0.68 and modularity of 0.185,Average degree of this network was 4.751
and average weighted degree is 18.704 with sparse graph density of 0.012.Invalid
source specified.Invalid source specified.
4. Findings:
• The nodes with the highest degree was located near to the center with increasing
spatially radially outwards
• The clusters of notes come closer based upon their frequent use together.
• The parent thaats to which raags belong also separate out to separate angles around
the circle.
• The close neighbors of ‘Sa’ are ‘Pa’ and ‘Ma’, the fifth and fourth scales degrees
respectively, either of which is must to be present to complete a Raag.
• Thaat clusters are formed around two pairs of notes that go together. Hindustani
Classical Music as well as graph visualization showed the same results.
• There are larger variance and diversity in the raagas of flat notes, while the ones
which has the tone on a major scale are huddled much closer.
• It is possible to visualize which family is most prevalent.
5. Conclusion:
Network analysis and data visualization can be used to understand the most prevalent famiy ,
raags and thaats. The study conducted suggests the visualization of raags through their
representation as graphs facilitates their arrangement, grouping and similar properties of
different nodes. Several relationships can be derived from the final layout.
22
9. Untangling the Social Network of Musicians (Focht, 2017)
1. Introduction:
This research tends to investigate the cultural contexts, circumstances, inspirational models
and different ways of knowledge, experience and expertise have been transferred over time.
The aim of the project was Understand “creative transfer” within the field of Music
This is done to understand the everlasting significance of musical works, relationships
between musicians. It is generally seen that print media usually single relation between the
two musicians are narrated and only one of the two’s biography is used to report the
relationships.
2. Data Collection
Some musicologists have examined relationships between musicians and print media that
results in a unique database which is valuable for all kinds of music professionals.
BMLO (Bavarian Musicians encyclopedia Online) provided the authors information of about
28,000 musicians.Invalid source specified.Invalid source specified.
3. Methodology
The objective of the visualization was to develop a
graph design that makes the social network of
musicians visually accessible in the first time
The resulted graph created by GEPHI has total of
1420 components, the largest connects 5539
musicians, and the second largest connects only 56
musicians and 1385 connected components contain
less than 10 musicians.
The preliminary step while generating the social
network graph is filtering according to the research
question.
• A filtering can be done by relationship type(s).
• It is possible to focus exclusively on musicians
with specific professions (e.g., -
instrumentalists).
Authors focused on the analysis of teacher-student
relationships to investigate how musical knowledge,
experience and expertise have been transferred over
time. Hence After filtering the data, 3,994 musicians
which is the largest connected component of this sub-
23
network – the research object of the musicologists – contains 2,769 teachers and students
Following things were taken care of while creating graphs
• Temporally aligned graph: The relations are chronologically analyzed from left-to-
right for this they used the layout which has temporal dimension. Therefore, Authors
applied a force-directed graph layout and used fixed x-values to represent time, this
reflects the middle of a musician’s creative lifetime on a horizontal time axis. Which
gave them the result which has the nodes that are spread vertically and the
chronological order remains intact.
• Node grouping: Since the underlying research question investigates how musical
knowledge are transferred, Authors hide the nodes of musicians who never had the
role of a teacher. but these musicians(students) were grouped with their teachers. This
design decision reduces the number of nodes to be displayed from 2,769 to 608.
• Node layout: To illustrate the significance and influence of these personalities, the
bigger size of nodes reflect the more number of students of the teachers, which makes
teachers with many students salient. By default, node labels are hidden, but for
navigation purposes, a user-defined number of node labels with the musician’s names
can be shown on demand. Either the most popular musicians or the teachers with most
students can be highlighted.
• Interactivity: By hovering over node shows the corresponding musician and two lists
of students (those who became teachers and those who did not) in a popup box. By
Clicking a node highlights all connections of a teacher’s students who became
themselves teachers. This way, transfer paths of musical knowledge can be assembled
interactively.
• Musical profession analysis: The musicians in the graph can be selected to see, the
evolution of musical professions which can be analysed. Therefore, all musical
professions of the teachers’ students are listed by decreasing frequency.
24
4. Findings:
To understand the application of the graph
Examples of the 2 musicians is taken
• Joseph Rheinberger and Wilhelm
sandberger are the teachers with
highest number of students.
• Visualisation of the graph reflects the
change in music profile of students
from composition to composition
science.
• Musicologist is the most frequent
musical profession, composer is
getting less frequent.
The author infer that due to more number of change in music profession from composition to
composition science there is an increase in musicology
5. Conclusion:
Through close collaborations with computer science and Cloud applications like GEPHI ,
Authors did a data visualization .Network analysis of the same has provided us with the
insights about the creative transfer of Music knowledge. It also answered the question about
the How musical knowledge is transferred through their social network of musicians.
Further data can be analyzed to understand the emerging trends of music professions
It also gave an Idea about the teacher student relationship affects knowledge sharing. The
graph can be altered based on the specific research questions.
25
10. Social Networks and Text Messaging in Public Health (Beck
and Armbruster, 2014)
1. Introduction:
This project is done to understand the usage of social media to develop an optimal strategy to
design interventions for preventing diseasesInvalid source specified. .This project has
helped researchers to predict the H1N1 outbreak in US in 2009.Invalid source specified.
Research is mainly aimed to show how actual social
networks and social network analysis can be incorporated
into the design of behavioral prevention using text
messaging.
2. Data Collection:
Data was collected by surveying the community members
in Hyderabad.
3. Methodology:
To identify opinion leaders to target for behavior change
campaigns a survey was conducted by the author in
Hyderabad –
• It was plotted in Data visualization tool GEPHI
• Opinion leaders were identified using network
centrality measures which includes
➢ Degree-The numbers of social contacts and a measure of popularity
➢ Closeness centrality- Those in the center of the network with few hops
between them and everyone else in the target population
➢ Between centrality-Those playing a broker and bridge role by connecting
different parts of the network.
4. Findings
• Contact testing is used in the network based public health intervention .This
is the popular method of tracking the cases of HIV, Tuberculosis and STD.
• Mobile phones have a penetration of 96% globally .Text messages are
primarily used for promotion and education of different health care issues.
• These interventions are sent in bulk to large number of users or
volunteers/public health workers send individually tailored text messages.
Both the methods have some challenges in terms of richness and reach
Authors have used a simulation framework and network datasets, such as in Figure 1, to
compare the proposed P2P text messaging design against other intervention designs
currently usedInvalid source specified., such as bulk message interventions and personalized
interventions.
26
They believe that leveraging the peer-distributor’s social network and phone , with the ability
to send personalised messages and then follow-up on these messages with his peers, This will
increase the richness of the information conveyed, which will result in increase in acceptance
of the messages.
5. Conclusion
Social media has been used by
government to spread public health
welfare. Although author’s Idea of Bulk
messaging and personalised messaging
can yield results (high acceptability) but
people in this generation finds its
annoying but with usage of Peer to peer
network this message can be effectively
reach to masses as social network is the
platform which allows viral like spread
of intervention among the target
population and is very cost effective.
27
References
(unknown), Institutuion - Activate Networks. (2013, 5 16). Business - Drug Marketers
use Social Network Diagrams to Help Locate Influential Donors. Retrieved from
New York Times:
http://www.nytimes.com/interactive/2013/05/16/business/PHARMA.html
Andrei Brodera, R. K. (2000). Graph structure in the Web. Elsevier, 309-320.
Bruns, A. (2011). Mapping Dynamic Conversation Networks on Twitter using Gawk
and Gephi. Information Communication and Society, 1323-1351.
Doughty, M. &. (2012). Who is on your sofa?: TV audience communities and second
screening social networks. Proceedings of the 10th European Conference on
Interactive TV and Video. EuroiTV'12 .
Elsevier Early Career Resources. (2012). How to use blogging and microblogging to
disseminate your research. Elsevier.
Koehler, B. (2014). Before and After Series C funding – a network analysis of Domo.
Beautiful Data.
Larsson, A. O., & Moe, H. (2012). Studying political microblogging: Twitter users in the
2010 Swedish election campaign. Sage Journals, 729-747.
P.W. Holland, S. L. (1971). Transitivity in structural models of small groups.
Comparative Group Studies, 107-124.
Seunghyun “Brian” Park, C. “. (2015). Using Twitter Data For Cruise Tourism Marketing
and Research. Journal of Travel & Tourism Marketing, 15.
Yong Hwan Kim, D. L. (2014). Exploring characteristics of video consuming behaviour
in different social media using K-pop videos. Journal of Information Science,
806-822.

Más contenido relacionado

La actualidad más candente

INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
A Novel Frame Work System Used In Mobile with Cloud Based Environment
A Novel Frame Work System Used In Mobile with Cloud Based EnvironmentA Novel Frame Work System Used In Mobile with Cloud Based Environment
A Novel Frame Work System Used In Mobile with Cloud Based Environment
paperpublications3
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
butest
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer School
Scott A. Hale
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
Anvardh Nanduri
 

La actualidad más candente (19)

Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
A Novel Frame Work System Used In Mobile with Cloud Based Environment
A Novel Frame Work System Used In Mobile with Cloud Based EnvironmentA Novel Frame Work System Used In Mobile with Cloud Based Environment
A Novel Frame Work System Used In Mobile with Cloud Based Environment
 
LPM: A DISTRIBUTED ARCHITECTURE AND ALGORITHMS FOR LOCATION PRIVACY IN LBS
LPM: A DISTRIBUTED ARCHITECTURE AND ALGORITHMS FOR LOCATION PRIVACY IN LBSLPM: A DISTRIBUTED ARCHITECTURE AND ALGORITHMS FOR LOCATION PRIVACY IN LBS
LPM: A DISTRIBUTED ARCHITECTURE AND ALGORITHMS FOR LOCATION PRIVACY IN LBS
 
Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)
 
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
Dissemination of Awareness Evolution “What is really going on?” Pilkada 2015 ...
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Identifying ghost users using social media metadata - University College London
Identifying ghost users using social media metadata - University College LondonIdentifying ghost users using social media metadata - University College London
Identifying ghost users using social media metadata - University College London
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
 
27
2727
27
 
Big Data Social Network Analysis
Big Data Social Network AnalysisBig Data Social Network Analysis
Big Data Social Network Analysis
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer School
 
Ppt
PptPpt
Ppt
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
Q046049397
Q046049397Q046049397
Q046049397
 
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
 

Similar a Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI

Similar a Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI (20)

NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
 
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
 
Sub1557
Sub1557Sub1557
Sub1557
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social Network
 
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
 
nm
nmnm
nm
 
JFrank_1
JFrank_1JFrank_1
JFrank_1
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
 
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIATHE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
 
An improvised model for identifying influential nodes in multi parameter soci...
An improvised model for identifying influential nodes in multi parameter soci...An improvised model for identifying influential nodes in multi parameter soci...
An improvised model for identifying influential nodes in multi parameter soci...
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
E017433538
E017433538E017433538
E017433538
 
F017433947
F017433947F017433947
F017433947
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitter
 
Associating events with people on social networks using a priori
Associating events with people on social networks using a prioriAssociating events with people on social networks using a priori
Associating events with people on social networks using a priori
 
Framework for opinion as a service on review data of customer using semantics...
Framework for opinion as a service on review data of customer using semantics...Framework for opinion as a service on review data of customer using semantics...
Framework for opinion as a service on review data of customer using semantics...
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI

  • 1. Big Data Analysis Midterm Assignment 2017 USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI. KIRAN BABU MAMPILLY 2014 2036| MILI GUPTA 20142118| RUCHIKA SHARMA 20142019
  • 2. 1 Executive Summary Through this report, we are trying to understand the different scenarios in which data visualization techniques, particularly using the Gephi Software can be used to understand the problem in hand better. The fundamental background states that to undertake any research or analysis we need to ascertain the problem first. Then the research design and analysis can be undertaken. During the analysis, data visualization helps in understanding the issue more coherently. Additionally, these problems can be pertaining to any domain- Marketing, Finance, Law, HR or liberal arts, etc. and the data visualization software will be helpful. This report helped us in appreciating the presence of social analytics tools which helps people in various domains to understand the nature of their network structures and relationship with others in the field. We have used theories such as Centrality (betweenness, in and out degree, eigen vector, etc.), reciprocity, clustering coefficient etc. to understand the cases in had better. In this report, we have mainly focused on literature review of already used use-cases in the visualization task. We have worked on use cases pertaining to varied use of social media site Twitter in the political, cultural and business context; use by drug marketers and musicians among others.
  • 3. 2 Table of Content Executive Summary.......................................................................................................................1 Table of Content............................................................................................................................2 Use Cases........................................................................................................................................3 1. Microblogging of Twitter users in 2010 Swedish Election campaign (Larsson & Moe, 2012)...........................................................................................................................3 2. Who is on your sofa? TV audience communities and second screening social networks (Doughty, 2012)..................................................................................................5 3. Before and After Series C funding – a network analysis of Domo (Koehler, 2014) 7 5. Using Twitter Data for Cruise Tourism Marketing and Research .............................11 6. Exploring characteristics of video consuming behavior in different social media using K-pop videos (Yong Hwan Kim, 2014)...................................................................14 7. Mapping dynamic conversation networks on Twitter .............................................17 8. Graphical visualization of analogous relationships of Raags (Kelkar,2015)........20 9. Untangling the Social Network of Musicians (Focht, 2017)...................................22 10. Social Networks and Text Messaging in Public Health (Beck and Armbruster, 2014) ....................................................................................................................................25 References....................................................................................................................................27
  • 4. 3 Use Cases 1. Microblogging of Twitter users in 2010 Swedish Election campaign (Larsson & Moe, 2012) 1. Introduction Twitter is one of the most popular and well known social media site used for short posts and statuses. Ever since the successful use of internet during the 2008 US presidential elections, it is important to analyse the importance of social media in garnering opportunities for online campaigning and deliberation. (Larsson & Moe, 2012) Microblogging refers to making short and/ or frequent posts. (Elsevier Early Career Resources, 2012) Aim of Study- to study participation in political debate on Twitter, since new media is used very often to communicate and comment on political issues and support. 2. Theoretical foundation (if any) In graph theory, centrality indicators identify the most important vertices within a graph. These can be based on closeness, degree, etc. In this case we focused on the following- ➢ In degree- number of edges going into a node is known as in degree of - -the corresponding node. ➢ Out degree- and number of edges coming out of a graph is known as outdegree of the corresponding node. ➢ Reciprocity- two-way relationship between two nodes (Andrei Brodera, 2000) 3. Data collection Data was collected via secondary sources. This report only entails to the literature review of a use case already established on academic journals and other web sources. Data was collected from one month prior to the election onwards. YourTwapperKeeper was used to store the tweets.1 4. Methodology Gephi Data Visualization is used on data which have nodes and can be analysed. This study helps in analysing a specific subset of that “twitter” online sphere, focusing on one set of use, i.e. political communication. The paper (Larsson & Moe, 2012) also attempts to establish interaction between users, (in terms of volume and forms of use) and on who these users are 1 https://github.com/540co/yourTwapperKeeper
  • 5. 4 and how they relate (or not) to each other. Force Atlas layout is used on Gephi- It is scaled for small to medium-size graphs, and is adapted to qualitative interpretation of graphs. 5. Findings 6. Conclusion This usage of network analysis helps us in identifying the importance of people who mainly indulge in retweeting and other who are retweeted the most. Using the theoretical background of centrality in networks, we are able to analyse the key influencers on social media and thus The Figure 1 features many nodes, each representing a particular Twitter user. The colour of the nodes represents the outdegree of each user – the darker the color, the more @ messages the specific user sent. Node size is dependent on indegree – the larger the node, the more messages were directed towards the specific user. Straight lines between nodes specify unidirectional communication, while curved lines indicate reciprocity in exchanges of messages. (Larsson & Moe, 2012) Figure 2 provides a network map of RT (retweet) activity, identifying the high-end users in this regard. Each node in Figure 3 represents a user. The darker the colour of the node, the more active the user is at retweeting the messages of others. Users who are often retweeted are identified by larger node sizes. (Larsson & Moe, 2012)
  • 6. 5 we know who to target during a political campaign. The most retweeted ones would tend to be the opinion leaders and would exert most influence. 2. Who is on your sofa? TV audience communities and second screening social networks (Doughty, 2012) 1. Introduction Viewing of second or third screens along with Television is becoming popular mong audiences as it is affordable and pervasive. This use case tries to decipher and explore the message activity while using the Twitter blogging service. The network of viewers connected reveal the different characteristics they possess and their motivations of using a second screen. 2. Theoretical foundation ➢ The measures of centrality play a major role in this use case as well. In degree- number of edges going into a node is known as in degree of the corresponding node. This helps us in ascertaining groups which are connected greatly or are isolated. ➢ Clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. (P.W. Holland, 1971) It measures the ratio of number of edges between a node and its immediate neighbourhood and the maximum number of edges which could exist. ➢ Reciprocity explains the two-way relationship between two nodes. (Andrei Brodera, 2000) 3. Data collection The data was collected by the analysts during the timing of screening of two popular shows- Strictly coming dancing (prime time, celebrity oriented) and BBC Question Time (current affairs show) from Twitter using Twitter stream API2 . The collect data either in a sample mode or a filter mode. The limitation of this method is that tweet with required hashtags are only sourced. 4. Methodology The datasets were imported to the Gephi software. The nodes and edges were connected to each other based on their mention in tweets for others or themselves (resulting in self-loop). The OpenOrd network visualization algorithm was used to form the clusters and community structures within a network. “In-degree” of each node was established to better visualize the graph. 2 https://developer.twitter.com/en/docs
  • 7. 6 5. Findings Strictly coming Dancing The in-degree was found out and the larger nodes display users with most mentions or retweets. There also exist certain isolated nodes along with nodes with greater in degree. #scd contains 3903 edges between 2895 nodes, out of which 443 edges and 695 nodes are isolated which means they use the hashtag of the show but do not mention anyone. The reciprocity level in this case is very less since it involves celebrities with fan following. (Doughty, 2012) BBC Question Time Similarly using the in-degree, visualization represents the larger nodes are highly mentioned or retweeted. #bbcqt contains 3769 edges between 3955 nodes, out of which 2640 nodes and 1913 edges are in isolated groupings. However, in this case isolated grouping are more tightly connected and networked “core” of each other. (Doughty, 2012) 6. Conclusion The two TV shows chosen, display distinct characteristics. 1. Prime-time celebrity oriented dance show- connecting with celebrities and stars of the show using mentions and retweets. Reciprocation was low in these. 2. Late night current affairs news show- reflects viewers engaging in conversation and involves higher rates of reciprocation
  • 8. 7 Both the shows display different behaviours, however the concern for marketers is that people are resorting to a second screen to experience deeper communal viewing with friends or like-minded people. Therefore, while advertising they must focus on that. 3. Before and After Series C funding – a network analysis of Domo (Koehler, 2014) 1. Introduction This case refers to the company Domo3 which is primarily a Big Data company and works in the field of network analysis of Venture Capital connections. The analysis around the company was done when it received above average funding in the form Series C funding even after being a relatively young company. To be precise, Domo received $125M from Greylock, Fidelity, Morgan Stanley and Salesforce among others. The analysis was carried out by a data scientist to understand the effect in the network structure after the announcement of this news. Apparently, Domo came out to be one of the best-connected nodes and through the following analysis we will find out why. 2. Theoretical foundation ➢ Series C funding4 could be used to buy another company. As the operation gets less risky, more investors come to play. In Series C, groups such as hedge funds, investment banks, private equity firms and big secondary market groups accompany the before-mentioned investors ➢ Betweenness Centrality- It is equal to the number of shortest paths from all vertices to all others that pass through that node. It indicates the node’s centrality in a network. (Andrei Brodera, 2000) 3. Data collection This analysis focuses on a literature review of the established use-case for Domo’s increasing centrality. The data collection was carried out by the scientist to get values for the connections between various Big Data companies are connected to each other and also to other Series C funders. 4. Methodology Gephi software was used to plot the centrality of the various Data Science start-ups and companies before and after the news of financing for Domo were announced. This helped in 3 Domo 4 SeriesCFunding
  • 9. 8 understanding how firms become more popular if they receive substantial amount of finance from big players unexpectedly. 5. Findings Before Series C Funding The dark nodes are the more connected ones The betweenness centrality is constant between many popular nodes on average. (Koehler, 2014) After Series C Funding The dark nodes are the more connected ones. However, Domo’s (green node) huge Betweenness Centrality almost dwarfs the other nodes in the network. (Koehler, 2014) 6. Conclusion The new funding round now only increases Domo’s centrality but also MongoDB’s because of the shared investors Salesforce, T. Rowe Price and Fidelity Investments. (Koehler, 2014) It can be therefore concluded that if a young firm invests relevant amount of time in its’ marketing activities in terms of pitching to funders, it can grow to become the more sought after and central company in the entire network. In this case, Domo became one of the highest central Big Data company due to the news of the finance he received from funders. This resulted in a positive sign for its’ future as a company.
  • 10. 9 4. Drug Marketers Use Social Network Diagrams to Help Locate Influential Doctors ((unknown), Institutuion - Activate Networks, 2013) 1. Introduction This case is concerned with how an effective use of social network diagrams obtained through data visualization can help in pinpointing effective influencers in a pharmaceutical market. Like all other industries marketing plays a crucial role in the pharmaceutical industry and to make a cogent marketing strategy and to ensure its success often the help of network analysis coupled with data visualization is used. Given in this case is an example of how the consulting firm ‘Activate Networks’ create social network diagrams to identify and thereby assist pharmaceutical companies to better target physicians with the largest reach. ((unknown), Institutuion - Activate Networks, 2013). The network analysis done by the consulting firm gives a clear picture on which physicians to target for maximum reach. 2. Data collection (source of data) The data was collected by the consultancy firm ‘Activate Networks’. Although the exact source of the data is not mentioned in the use case, it can be assumed that the data was taken from a medical database. The data used is that of Doctors based in a Northeastern U.S. community who have prescribed, or are potential customers for, an oncology drug. 3. Methodology After gaining the data, the data was filtered so that it only contains the data of Doctors based in a Northeastern U.S. community who have prescribed, or are potential customers for, an oncology drug. Once the data is cleaned, using the help of Data Visualization software’s like Gephi a network diagram was constructed as shown below; Source: ((unknown), Institutuion - Activate Networks, 2013)
  • 11. 10 The following factors were kept in mind while developing the network diagrams. ➢ Each circle represents one doctor. ➢ The dots in orange refer to relevant specialists. Lite blue dots refer to doctors who have not prescribed the drugs and dark blue dots refers to doctors who have prescribed the drug. ➢ The size of the dot depends on the prescribing volume for an oncology drug by the doctor. ➢ The edges represent the connections between doctors (can be by common patients, it assumes that if one doctor prescribes the new drug another doctor will come to know about it from a common patient and thus do the same) 4. Findings A number of findings can be inferred from this network data visualization for a marketer. Some of the findings found by marketer in this example is given below. For a cluster like the one on the left there are many interrelated physicians with all similar volumes of prescription. In this case the marketer need not prioritize all the doctors instead select a few. It saves cost while at the same time maintaining the same level of reach. Certain doctors in the diagram seemed to form key links between two different clusters. Marketers identify these ‘bridges’ as a good target as it offers entry into both the clusters. Marketers identify the node with higher ‘connectedness’ as targeting them can lead to better marketing reach among the physicians group. 5. Conclusion Network analysis and its visualization can greatly help the marketers in the pharmaceutical industry to better target their marketing promotions and interventions. By targeting the right people and thereby maximizing the reach attained, pharmaceutical organizations can better manage their marketing expenditure and can do aggressive cost cutting in marketing without compromising on the output. We believe that the future belongs to those companies whose marketing department relies on such tools as discussed in this case. Social Network analysis and its diagrams are the new competitive advantage for pharmaceutical companies of the future in terms of channelizing their marketing efforts.
  • 12. 11 5. Using Twitter Data for Cruise Tourism Marketing and Research (Seunghyun “Brian” Park, 2015) 1. Introduction This research article stands as a great example on how network analysis can be used to strengthen the marketing efforts in the Cruise tourism entertainment Industry. Very limited research has been done in the field of cruise tourism and its marketing. The case exemplifies the need of businesses to identify the large amount of data created by Facebook and twitter (big data) and how these data can be properly analyzed to identify problems as well as suggest solutions. Twitter is selected in this research as it has drawn the attention of researchers as it helps address different topics like social networking, information diffusion, customer engagement, and product promotion. Network analysis and mapping is not the sole tool used in this research but is one which is used with many other to come at a cohesive conclusion. Network analysis in this case uses such metrics as Eigenvector centrality, closeness centrality, and betweenness centrality which can be used to evaluate the importance of vertices (or users) in the network (Seunghyun “Brian” Park, 2015). The research also uses cluster analysis to identify the subgroups inside the network. 2. Data collection The Data collection method used for the research was done with the help of online platforms as well as Twitter API. Twitter data about cruise tours between May 2 and June 5 2014 were collected. ScraperWiki, an online platform, was used to collect and archive tweets during this period. ‘Cruise’ related hashtags were used to extract tweets and meta data from Twitter API. The data extracted was collected and combined and subject to further cleaning. For example some hashtags referred to a different cruise (car cruise control, Tom cruise etc.) these had to be removed from the data set. Also only English tweets were retained for the study which filtered out about 4% from the initial data. 3. Methodology The research used three different techniques for analysing the data collected (content analysis) word frequency, content analysis, and network analysis. For the word frequency analysis, sentence tokenization was first performed to separate each tweet into discrete words. Other NLP techniques (e.g. stopping words) were also applied to the data set.
  • 13. 12 For descriptive analysis, RapidMiner, an open source data analytics tool, was used to disassemble and count the frequency of all texts and hashtags in the tweets. The third part of the analysis consisted of the network analysis and which was carried out through Gephi software. Given below is the network diagram attained from Gephi after visually representing the collected data in Gephi Source: (Seunghyun “Brian” Park, 2015). The diagram shows the visualization of the entire data along with the major clusters. 4. Findings (For focusing on the Network analysis component, only the findings pertaining to the network analysis part is described) The network analysis lead to the surprising identification of different clusters maily four of them. They are; celebrity centered, Blogger centered, Cruise Ship company centered, Travel Agency centered. Further the concepts of Eigenvector centrality, closeness centrality, and betweenness centrality were used to identify the influencers or the most prominent entities within the clusters. The findings are as follows;
  • 14. 13 ➢ @TheCarlosPena with 2.5 million users showed the highest centrality in the respective cluster. The celebrity is relavent because she posts pictures in a cruise extensively and also uses hashtags extensively ➢ (@CruiseMiss) in theUnited Kingdom (UK) who shares cruise tourism- related information and pictures is the one with highest centrality in bloggers group. Bloggers had less closeness centrality compared to celebrities but more content. ➢ @royalcarribean is the most active account although it has very few tweets. This is because of its high eigenvector centrality. It is a prominent player among cruise ship company centered. ➢ @WaldoWorldTrvl has the highest centrality in travel agency centered cluster. 5. Conclusion The research concludes the presence of number of sub groups in twitter who resort to information from different sources to get knowledge about cruise tourism. It suggests that marketers should acknowledge the phenomenon of ‘celebrity practice’ in twitter. Twitter has become a place where celebrities develop in-depth relation with their fan following and targeting these celebrities can make a difference. Although the blogger tweets spread less faster compared to celebrity tweets (less *closeness centrality), bloggers were posting more cruise content than celebrities and hence shouldn’t be avoided while deciding on marketing interventions. Maintaining strong ties with the identified bloggers can significantly contribute to brand recognition of the cruises. The research also concludes that Twitter, a microblogging platform, is and will continue to be important for the tourism and hospitality industry and related research even in the coming years.
  • 15. 14 6. Exploring characteristics of video consuming behavior in different social media using K-pop videos (Yong Hwan Kim, 2014) 1. Introduction: This research focuses on analysing how people consume web videos (cultural web videos) differently through different medium. The research uses network analysis and data visualization extensively to reach at findings. The genre of video selected for analysis in this case is K-Pop videos (Korean Popular song videos) which are the music videos originating from South Korea. These music videos are gaining immense popularity and fame. It wasn’t long back that ‘Gagnam’ style was at the pinnacle of fame. Till now it remains one of the most viewed YouTube videos ever (2.9 Billion views). The primary objective of the study is to investigate diverse patterns in consuming cultural content (K-Pop videos in this case) depending on different social media. 2. Theoretical foundation An important theoretical foundation used in this case is that of Gratification theory. The use and gratification theory is a useful method to interpret why people use media. The theory helps in finding the relationship between the user and the media. According to the theory, users are active audiences and they use media to satisfy their motivation and select media depending on the motives they want to fulfil. 3. Data collection (source of data) The data was collected with the help of YouTube API and got the relevant tweets by Web Crawling. To get the seed data from YouTube the researchers used queries which include k- pop, kpop, Korean pop, SM Entertainment, YG Entertainment and JYP Entertainment (the last three are famous k-pop content developers. From the seed data the following variables of a video were also collected which include; video ID (URL), title, category and uploader account name in the YouTube database The twitter tweets were identified by Web Crawling in which tweets with the URL of the video (identified from YouTube API) were extracted. There were also other three methods used to collect co-link videos in twitter which helped in ease in data collection. 4. Methodology The whole research was divided into four stages for ease of execution. They are- ➢ Collection of Seed Data
  • 16. 15 ➢ Collection of related and co-linked videos ➢ Pair and Network generation The suggestion list in YouTube was used to identify which video from the collected data pops up when another video from the data is viewed. Same was done for twitter, the ‘co-linked’ pair implies an implicit linkage between two videos mentioned in tweets of the same Twitter account. A ‘co-linked’ pair between the video X and Y is produced when they are both exported to tweets of a specific Twitter account. ➢ Multilateral Analysis Approaches In network Analysis part of this stage, the objective was to first identify the core nodes since this allows a meaningful conclusion to be drawn from them. To identify the core nodes, the researchers adopted four centrality measures: degree centrality, weighted degree centrality, betweenness centrality and PageRank. The degree centrality of a node is defined as the number of edges that are adjacent to that node. Weighted degree centrality is a variation of degree centrality, calculated by summing the frequency of every node pair for a given node. Betweenness centrality denotes the number of shortest paths passing through a particular node. PageRank measures the importance of a node based on the sum of the ranks of the number of its incoming links. (Yong Hwan Kim, 2014) 5. Findings The diagram is the network visualization of the YouTube network. The modularity algorithm was used in this case. The modularity algorithm identifies groups of nodes that are more similar to each other than to other groups and optimizes the detection of the community structure in a network.
  • 17. 16 Analysis of the network community shows the structure of sub communities in entire network. A large number of communities in network implies that users are associated with a variety of subject or various videos whereas a small number of communities shows that users focus on a particular topic or interest. Node labels such as ‘j7_ISP8Vc3o’ and ‘9bZkp7q1910’ are IDs of YouTube videos and the size of each node becomes larger by its score of degree centrality. The edge denotes the occurrence frequency of each pair. In YouTube visualization two main communities are identified in the YouTube network. Community 1 has more nodes than community 2, and the videos with high degrees centrality are found only in community 1. All videos that have high degree centrality in community 1 are related to singers from SM, JYP, YG entertainment companies. Consequently, important nodes of the YouTube network are found to be the videos of these three major entertainment companies. This the network visualization of the twitter data. However, the Twitter network has three major communities and two minor communities. The videos with high degree centrality are spread across communities 1 and 2, and some of these videos are related to singers who belong to other small or medium-sized entertainment companies. 6. Conclusion The study concludes that the videos in both twitter and Youtube are consumed by users in a different manner, these are considered as clues for tracking users’ socio-cultural behaviours. It was proved through this research analysis that people in Twitter network consume more diverse videos than people in the YouTube network (Youtube’s recommendation algorithm played a part in this). Findings like these can have significant use to marketers as tracking the socio cultural behaviors of customers in different online mediums can help in choosing the
  • 18. 17 right channel to use for marketing communications. Gephi once again lets the modern marketer to compute this heavy data driven analysis. 7. Mapping dynamic conversation networks on Twitter (Bruns, 2011) 1. Introduction The research throws light on the need of mapping dynamic conversations in twitter and how it can be done. Twitter is a social medium which restricts posts to 140 characters. This limitation put forward by Twitter is almost circumvented by the users by the effective use of hashtags ‘#’ and even @replies. This has helped to sort content in Twitter. A user is able to contribute to the discussion and with other people even if he/she is not following them. This extensive use of hashtags and @replies have aroused the curiosity of researchers to analyze the happenings in twitter during any public issue. This paper through the analysis of millions of tweet which happened on June 2010 against Australian Prime Minister Kevin Rudd and his resignation is used as a reference to do data visualization and come up with insights. Although the research does not solve a specific research problem, it serves the purpose of explaining the importance of Gephi as a data visualization tool and how it can be used to analyze the Twitter activity in the occurrence of a significant event in the society. 2. Data collection (source of data) The twitter data was bought from ‘Gnip’ which is the commercial data provider for twitter. In this specific use case, since the tweets about Kevin Rudd during June 2010 issue was needed, the web service ‘twapperkeeper’ was used to get hashtags and keyword tweets. The data set of @reply data from discussion of a purported leadership challenge against the then-Australian Prime Minister Kevin Rudd by his Deputy Julia Gillard, under the Twitter hashtag #spill (Australian political slang for such a challenge) in the evening of 23 June 2010 was the data set analyzed using Gephi. 3. Methodology- Two kinds of analysis were done using Gephi. This can be explained using the network visualization graphs below; Figure 1.1 shows the graph with node size showing the indegree and the node colour showing the degree (yellow-red).
  • 19. 18 First (with node size set to indicate the amount of @replies received), it is obvious that a large number of participants in the #spill conversation not only discuss the potential fate of the Prime Minister, but do so while referring to him using his Twitter username @KevinRuddPM rather than merely his name The second part of the research was to create a dynamic network graph. For this the time stamp of the tweets were included in the data. Another problem which needed to be addressed was that there were tweets from various parts around the world and the time stamps were Figure 1.1
  • 20. 19 according to the local time. A software was used to rectify this and bring all the time stamps to a standard time. Thus the dynamic graph was created as shown below. 4. Findings From the first data visualization graph we can understand that; ➢ The significant level of incoming @replies does not result in any responses from the Prime Ministerial account; with node colour indicating the amount of @replies sent, @KevinRuddPM remains a pale yellow, indicating no activity. ➢ At the same time there are also a number of very highly active senders of @replies (shown in red), who do not necessarily also receive a significant number of answers to their messages ➢ Journalist’s @latikambourke and @renailemay are hubs in the network. They are notable both as senders and recepients. (Marked in red and has highest centrality). The eigenvector centrality could also be high (not given in the case) From the dynamic graph the following findings were made; 7. Moving from figure (a) to (b) we can find that many tweets became inactive as the exchange between the entities using @replies stopped. The ones which maintained exchanges are connected using edges while the others are shown as distinct nodes (separated from the rest of the diagram). This shows the change through the time of analysis. 5. Conclusion The dynamic network visualizations which we produce using Gephi by visualizing the hashtag data and the period in which they are available, we can find out the shifting roles played by individual participants over time, as well as the response of the overall #hashtag community to new stimuli – such as the entry of new participants or the availability of new information. In a situation of crisis or a social incident, we can understand who the key players in twitter are who aggregate and disseminate the information and whose views and tweets creates an influence on others. All these in-depth analyses which was once thought to be beyond our reach can now be done through Gephi and many of its capabilities including dynamic network mapping. Like many other journals in academia, this research backed journal further confirms the role network analysis and visualization can play in improving businesses as well as personal life.
  • 21. 20 8. Graphical visualization of analogous relationships of Raags (Kelkar,2015) 1. Introduction: The Idea of visual representation has been found in several concepts of Hindustani classical Music which includes paintings to describe raags to the reflection of different seasons in raags-the theory HCM has many multi modal ideas authors has contributed a method of analysing Raag spaces from the point of view of tonality, giving a method of creating combinations of modes through the point of view of colour harmony and the colour wheel. Hindustani music also has the concepts of cadences, themes and variation that are com- parable to Western theoryInvalid source specified. Raag is thought to a mode with grammar and constituent notes becomes the most important property of raags. Thaats are the small group of similar Raags. 2. Data Collection Data set contains 163 commonly sung raags. The final set of descriptors contains all the elements which is classified under different thaats, Authors coded them as S,r,R,g,G,m,M,P,d,D,n,N .They sourced this information from seminal authoritative books on Indian Music theory. Invalid source specified. 3. Methodology: After receiving the data, they considered all the unique entities to be the independent nodes for creating graph. They created edges amongst these nodes. They created a directed graph with edges directed outwards from parents node to the attribute. The figure shows a total of 393 nodes and 7010 edges
  • 22. 21 Following things were considered while constructing graph- • Edges were computed between each participating notes in the raag as well as notes found in the short description phrase in a raag. • If a nodes happens to be the tenor(vadi-the most important note which is followed by samvadi ) it is assigned the larger weight than the option note (anuvadi) • Weights are also assigned based on the degree of connectivity. • The layout used are fruchterman reigngold, Force atlas2 to create 10 clusters with the resolution of 0.68 and modularity of 0.185,Average degree of this network was 4.751 and average weighted degree is 18.704 with sparse graph density of 0.012.Invalid source specified.Invalid source specified. 4. Findings: • The nodes with the highest degree was located near to the center with increasing spatially radially outwards • The clusters of notes come closer based upon their frequent use together. • The parent thaats to which raags belong also separate out to separate angles around the circle. • The close neighbors of ‘Sa’ are ‘Pa’ and ‘Ma’, the fifth and fourth scales degrees respectively, either of which is must to be present to complete a Raag. • Thaat clusters are formed around two pairs of notes that go together. Hindustani Classical Music as well as graph visualization showed the same results. • There are larger variance and diversity in the raagas of flat notes, while the ones which has the tone on a major scale are huddled much closer. • It is possible to visualize which family is most prevalent. 5. Conclusion: Network analysis and data visualization can be used to understand the most prevalent famiy , raags and thaats. The study conducted suggests the visualization of raags through their representation as graphs facilitates their arrangement, grouping and similar properties of different nodes. Several relationships can be derived from the final layout.
  • 23. 22 9. Untangling the Social Network of Musicians (Focht, 2017) 1. Introduction: This research tends to investigate the cultural contexts, circumstances, inspirational models and different ways of knowledge, experience and expertise have been transferred over time. The aim of the project was Understand “creative transfer” within the field of Music This is done to understand the everlasting significance of musical works, relationships between musicians. It is generally seen that print media usually single relation between the two musicians are narrated and only one of the two’s biography is used to report the relationships. 2. Data Collection Some musicologists have examined relationships between musicians and print media that results in a unique database which is valuable for all kinds of music professionals. BMLO (Bavarian Musicians encyclopedia Online) provided the authors information of about 28,000 musicians.Invalid source specified.Invalid source specified. 3. Methodology The objective of the visualization was to develop a graph design that makes the social network of musicians visually accessible in the first time The resulted graph created by GEPHI has total of 1420 components, the largest connects 5539 musicians, and the second largest connects only 56 musicians and 1385 connected components contain less than 10 musicians. The preliminary step while generating the social network graph is filtering according to the research question. • A filtering can be done by relationship type(s). • It is possible to focus exclusively on musicians with specific professions (e.g., - instrumentalists). Authors focused on the analysis of teacher-student relationships to investigate how musical knowledge, experience and expertise have been transferred over time. Hence After filtering the data, 3,994 musicians which is the largest connected component of this sub-
  • 24. 23 network – the research object of the musicologists – contains 2,769 teachers and students Following things were taken care of while creating graphs • Temporally aligned graph: The relations are chronologically analyzed from left-to- right for this they used the layout which has temporal dimension. Therefore, Authors applied a force-directed graph layout and used fixed x-values to represent time, this reflects the middle of a musician’s creative lifetime on a horizontal time axis. Which gave them the result which has the nodes that are spread vertically and the chronological order remains intact. • Node grouping: Since the underlying research question investigates how musical knowledge are transferred, Authors hide the nodes of musicians who never had the role of a teacher. but these musicians(students) were grouped with their teachers. This design decision reduces the number of nodes to be displayed from 2,769 to 608. • Node layout: To illustrate the significance and influence of these personalities, the bigger size of nodes reflect the more number of students of the teachers, which makes teachers with many students salient. By default, node labels are hidden, but for navigation purposes, a user-defined number of node labels with the musician’s names can be shown on demand. Either the most popular musicians or the teachers with most students can be highlighted. • Interactivity: By hovering over node shows the corresponding musician and two lists of students (those who became teachers and those who did not) in a popup box. By Clicking a node highlights all connections of a teacher’s students who became themselves teachers. This way, transfer paths of musical knowledge can be assembled interactively. • Musical profession analysis: The musicians in the graph can be selected to see, the evolution of musical professions which can be analysed. Therefore, all musical professions of the teachers’ students are listed by decreasing frequency.
  • 25. 24 4. Findings: To understand the application of the graph Examples of the 2 musicians is taken • Joseph Rheinberger and Wilhelm sandberger are the teachers with highest number of students. • Visualisation of the graph reflects the change in music profile of students from composition to composition science. • Musicologist is the most frequent musical profession, composer is getting less frequent. The author infer that due to more number of change in music profession from composition to composition science there is an increase in musicology 5. Conclusion: Through close collaborations with computer science and Cloud applications like GEPHI , Authors did a data visualization .Network analysis of the same has provided us with the insights about the creative transfer of Music knowledge. It also answered the question about the How musical knowledge is transferred through their social network of musicians. Further data can be analyzed to understand the emerging trends of music professions It also gave an Idea about the teacher student relationship affects knowledge sharing. The graph can be altered based on the specific research questions.
  • 26. 25 10. Social Networks and Text Messaging in Public Health (Beck and Armbruster, 2014) 1. Introduction: This project is done to understand the usage of social media to develop an optimal strategy to design interventions for preventing diseasesInvalid source specified. .This project has helped researchers to predict the H1N1 outbreak in US in 2009.Invalid source specified. Research is mainly aimed to show how actual social networks and social network analysis can be incorporated into the design of behavioral prevention using text messaging. 2. Data Collection: Data was collected by surveying the community members in Hyderabad. 3. Methodology: To identify opinion leaders to target for behavior change campaigns a survey was conducted by the author in Hyderabad – • It was plotted in Data visualization tool GEPHI • Opinion leaders were identified using network centrality measures which includes ➢ Degree-The numbers of social contacts and a measure of popularity ➢ Closeness centrality- Those in the center of the network with few hops between them and everyone else in the target population ➢ Between centrality-Those playing a broker and bridge role by connecting different parts of the network. 4. Findings • Contact testing is used in the network based public health intervention .This is the popular method of tracking the cases of HIV, Tuberculosis and STD. • Mobile phones have a penetration of 96% globally .Text messages are primarily used for promotion and education of different health care issues. • These interventions are sent in bulk to large number of users or volunteers/public health workers send individually tailored text messages. Both the methods have some challenges in terms of richness and reach Authors have used a simulation framework and network datasets, such as in Figure 1, to compare the proposed P2P text messaging design against other intervention designs currently usedInvalid source specified., such as bulk message interventions and personalized interventions.
  • 27. 26 They believe that leveraging the peer-distributor’s social network and phone , with the ability to send personalised messages and then follow-up on these messages with his peers, This will increase the richness of the information conveyed, which will result in increase in acceptance of the messages. 5. Conclusion Social media has been used by government to spread public health welfare. Although author’s Idea of Bulk messaging and personalised messaging can yield results (high acceptability) but people in this generation finds its annoying but with usage of Peer to peer network this message can be effectively reach to masses as social network is the platform which allows viral like spread of intervention among the target population and is very cost effective.
  • 28. 27 References (unknown), Institutuion - Activate Networks. (2013, 5 16). Business - Drug Marketers use Social Network Diagrams to Help Locate Influential Donors. Retrieved from New York Times: http://www.nytimes.com/interactive/2013/05/16/business/PHARMA.html Andrei Brodera, R. K. (2000). Graph structure in the Web. Elsevier, 309-320. Bruns, A. (2011). Mapping Dynamic Conversation Networks on Twitter using Gawk and Gephi. Information Communication and Society, 1323-1351. Doughty, M. &. (2012). Who is on your sofa?: TV audience communities and second screening social networks. Proceedings of the 10th European Conference on Interactive TV and Video. EuroiTV'12 . Elsevier Early Career Resources. (2012). How to use blogging and microblogging to disseminate your research. Elsevier. Koehler, B. (2014). Before and After Series C funding – a network analysis of Domo. Beautiful Data. Larsson, A. O., & Moe, H. (2012). Studying political microblogging: Twitter users in the 2010 Swedish election campaign. Sage Journals, 729-747. P.W. Holland, S. L. (1971). Transitivity in structural models of small groups. Comparative Group Studies, 107-124. Seunghyun “Brian” Park, C. “. (2015). Using Twitter Data For Cruise Tourism Marketing and Research. Journal of Travel & Tourism Marketing, 15. Yong Hwan Kim, D. L. (2014). Exploring characteristics of video consuming behaviour in different social media using K-pop videos. Journal of Information Science, 806-822.