1. NETWORK ANALYSIS IN
VARIOUS DOMAINS
Analyzing 10 Use Cases solved using Network Analysis Techniques
in Gephi
Divita Madaan
Rubin Pipal
Vyusti Channa
2. Page | 1
Table of Contents
Understanding the Workforce (HR) ...........................................................2
The Social Network Data & Interest Graph Data (Marketing through
social media)................................................................................................ 10
Drug Marketers Help Locate Influential Doctors (Marketing)........... 12
Clamping down on Review Fraud (Sales).............................................. 14
Growth in Civic Tech (Analytics)............................................................ 18
Visual analysis of complex networks for Business Intelligence......... 20
Geographical Network analysis................................................................ 22
Mapping dynamic conversation networks and Twitter using Gwak
and Gephi (Analytics)................................................................................ 24
Capitalistic_Network (International Business)...................................... 26
SMEs' Use of Data Spatialization (International Business) ................ 28
References...........................................................................................33
3. Page | 2
(1) Understanding the Workforce
►Social Network Analysis as a Tool for Understanding the Workforce
Introduction
North Superior Workforce Planning Board (NSWPB) in 2014 came up with a project for
developing strategies to better connect the supply of skills in the labor supply with local
employer demand.
For that, Social Network Analysis was introduced as a possible tool for mapping a complex
social system such as the labour supply which would be carried out on Gephi for visualizing
workers in a workforce as individual agents interconnected with one another in dynamic
patterns through changes in training and work experiences. And further it would be used for
visualizing the labor supply as one holistic talent pool in the region.
Theoretical foundation
It was due to the knowledge of complexity theory that the possibility of representing the
labor supply as a complex system emerged. Complexity theory, or complexity science as
some are beginning to refer it (Mitchell, 2009; Johnson, 2007), is the study of phenomena
which emerge from a collection of interacting objects. It is an emerging area of theory and
research that spans disciplines such as mathematics, physics, biology, cognitive science, and
economics. Because there are such diverse perspectives involved, the phenomena being
studied are variously called complex systems, dynamical systems, non-linear systems, and
complex adaptive systems, each with slightly different connotations.
The theory within complexity science that has most relevance to this project is complex
adaptive systems theory, as it is associated with the study of living systems such as
ecologies, economies, and other human systems (Holland, 1992). In complex adaptive
systems, the individual components in the system display the capacity to adapt and learn.
They constantly vary the “rules” of interaction that they have with other components in the
system so as to seek the best possible outcomes. Of course because all the other components
in the system are also constantly adapting and varying their rules of interaction, the emergent
whole is constantly shifting, it is dynamic. One consequence of this is that the emergent
behaviour of the whole system is usually far from optimal. Indeed, there is no “optimal” end
4. Page | 3
state because complex adaptive systems are always exhibiting new collective behaviour as
they evolve (Holland, 1992).
Data collection
Keeping pace with the emerging changes in the workforce planning sector with the formation
of the LEPC(“Local Employment Planning Council,” n.d.) model, a survey was developed &
distributed via a link to SurveyMonkey(“SurveyMonkey - Free online survey software and
questionnaire tool,” n.d.) that was emailed to 31 potential respondents on December 7, 2015.
These respondents included 15 NSWPB board members, 3 staff, and 14 members of the HR
Strategy Steering Committee (a group formed under NSWPB) as examining this network at
this time was valuable as well as an accessible sample. They intend to analyze these networks
periodically as the LEPC model takes shape. However, only 25 responses could be received
by January 24, 2016. For a large sample, setting up social network surveys would be done
using specialized online survey tools like Network Genie.
After going through an instruction and welcome screen, respondents were asked to type their
name and were then shown a list of the names of all the 31 people in the network. For each
name they were asked to indicate on a scale of 0 (never) to 4 (very often) how frequently they
interact with this person on
(i) matters relating to workforce planning
(ii) matters relating to community outreach and engagement.
Following these social network questions, there were 5 additional questions in which
respondents were asked to rate their satisfaction with the performance of the NSWPB on
dimensions of action written, relating directly to the stated goals of the LEPC.
Methodology
Afterwards, the data (collected responses) was downloaded from SurveyMonkey to MS Excel
where it was transformed into a format readable by Gephi. First, the participants were
assigned an ID number. Two files were then prepared:
5. Page | 4
(i) A node file containing the ID numbers linked with respondent names
(ii) An edge file which was contained in columns:
(a) The rater’s ID number (source)
(b) The person’s ID they were rating (target)
(c) The rating itself, called weight A, which could range from 0 to 4.
(d) The rating from the other direction (the initial target person’s rating of the
source person), called weight B, was then placed in the column beside weight
A.
An average of these two weights was then calculated using MS Excel’s average function. For
those pairings in which both raters indicated a response of ‘0’ (reflecting that they never
discussed matters relating to that particular question), the pairing was deleted. Averages
which ended in .5 were rounded up. The same procedure was used for the second question.
The node file and edge file for the two questions were separately uploaded into Gephi.
Network layout algorithms were tested, and the Fruchterman Reingold algorithm was chosen
for its ability to create equilibrium to the network layout. With this algorithm, nodes (people)
who are more connected are positioned in the centre of the network map, with people who are
less connected on the periphery of the map.
Nodes and line or edge weights (the relationship between two people) were first assigned a
color as shown in the snapshots below. Node and line labels (with the value of the rating)
were also applied. Nodes were sized according to what is called their betweenness centrality,
which means that the larger the node, the more of a central role that person plays in the
network.
Findings
1. Network Mapping
Social networks were mapped and explored using filtering to examine how the maps change
when considering the sub-groups within the network (board, HR strategy steering committee,
staff), as well as the different weights to the relationships (people connected with very often
6. Page | 5
vs often, vs sometimes, vs rarely). When a given node is clicked, the network for that
individual is accentuated within the overall network, making the strength and character of an
individual’s connections, and their positioning within the overall network more salient.
The following images capture a series of network maps:
7. Page | 6
Annemarie from Board, Roger from HR Strategy Steering Committee, Frank & Dave from
both HR Strategy Steering Committee & Board and Madge & Karls from Staff are the people
who have the highest interactions on matters relating to the workforce.
Annemarie & Michelle from Board, Yolanda from HR Strategy Steering Committee, Frank &
Dave from both HR Strategy Steering Committee & Board and Madge & Karls from Staff are
the people who have the highest interactions on matters relating to community engagement.
8. Page | 7
2. Network Statistics
The software also produces a number of potentially meaningful network statistics. These
statistics were examined in relation to the whole network shown in Figures #7 and #8 above,
and are described below:
Table 1: Network Statistics, explanation and results
Statistic Explanation Results and Interpretation
Average Path Length Tells us how far apart the two
most distant nodes are in the
network. The distance between
two nodes that are connected is
counted as 1.
For Q1 was 1.30 and for Q2
was 1.25
Betweeness Centrality Tells us how often a node is on
the shortest path between two
nodes in a network. This
statistic also helps to identify
nodes that are connectors.
The nodes in Weight 1 and
Weight 2 were sized according
to their betweeness centrality
in order to distinguish those
nodes that played important
roles in connecting the
network.
Graph Density Measures how close a network
is to complete. It is a ratio of
the number of actual edges
(connecting lines) to the
number of possible edges.
When all possible connections
are in place, graph density is
equal to 1.
Q#1: 70%
Q#2: 75%
Note that the higher the density
of the network the more rigid
the network is considered to
be. Therefore, a density of
approximately 70-75% is good.
Average Degree Tells how many edges are
connected to each node. This
tells you how connected an
individual is within the
Node degrees for both
questions ranged from 11 to
30. In a network of 31 nodes,
30 is the highest number of
9. Page | 8
network connections (edges) possible.
Q#1 had an Average Degree of
20.8; Q#2 had an Average
Degree of 22.4.
It was meaningful to note
which respondents had the
highest (30) and lowest (11)
numbers of connections. This
distribution was interpretable.
Clustering Coefficient Measures how complete the
neighbourhood of an individual
node is. For example, the
neighbourhood of Julie is all of
the people (nodes) connected
to Julie. If all of the people in
her neighbourhood are also
connected to each other, then
the clustering coefficient is
equal to ‘1’. If the people Julie
is connected with have no
connections with each other,
then the coefficient is ‘0’.
The Average Clustering
Coefficient statistic is the
average of all clustering
coefficients in the entire
network. The Average
Clustering Coefficient for Q#1
was .82 and for Q#2 was .85.
We find that the maps are most meaningful when they are interacted with, dynamically. Still
images of the maps, as shown above, do not fully capture Gephi’s potential to conceptualize
the meaning of the various patterns of connection.
10. Page | 9
Conclusion
This social network analysis could prove highly valuable in understanding the workforce as a
complex adaptive system. The larger the network being analyzed, the more names one must
answer each question for. Alternatively, one could ask respondents to list the names of the
people they know or work with, and ask them to respond for each of these names. With a
larger network, the individual names associated with each node become less important
compared to the overall patterns revealed in the network.
Completion of the survey- A limitation
(i) Getting 25 responses out of 31
(ii) Time required and tedium of responding to one or two questions relating to all
the people in one’s known network
(iii) The larger the social network being analyzed, the longer it takes respondents
to fill out the social network survey, as they must answer the question for each
other person
► Develop a user-friendly interface for the survey
► Choose the questions very carefully- only one or two questions per relationship
Future Scope
The HR Strategy Steering Committee has discussed various possibilities, including an
interactive app through which respondents could maintain a profile and set of connections,
and be able to observe and interact with their map. The development or identification of such
an interface is critical to expanding social network analysis to larger samples of the
workforce.
11. Page | 10
(2) The Social Network Data & Interest Graph Data
►The Social Network Data & Interest Graph Data will power the next generation Shopping.
12. Page | 11
Introduction
The Social Network data represents the virtual network of friends, family and personal profile
information online. Interest graph data brings together and combines in a network the people
who share same interests and are connected by products and collections they share.
In fact it has been said that the combination of social network data and interest graph data
will be empowering and fueling the shopping in the coming generation. From the early
analysis of this unstructured big data in the form of information from the social network data
and the interest graph data constituting the Brand Interest graph will provide the benefit from
this unstructured data.
Data Collection/ Findings
In the figures given above, we can see a cluster of dots of different sizes called nodes and
lines connecting those dots called edges. The red or pink color dots represent the people
while the green and light green dots represent products. The edges, the lines, which connect
the dots to one another, represent the friendships, following between people and products, or
the social expressions on the products. The figure 1 shows the interest graph data, which
hasn’t been analyzed using methods of social analytics and thus it is scattered and
unorganized. This type of data cannot be used to infer anything as we cannot establish any
sort of relation or dependency among the present data variables and hence is meaningless.
The second figure, figure2, represents a small section or a part of the interest graph data,
which has been analyzed using the Social analytics. We can see that the dots or nodes are of
different sizes, it symbolizes the popularity of a person or product represented by that node.
Further we can observe that the nodes are set in different segments, which show that the
people with similar products interests, and social groups together.
Conclusion
Thus we can deduce that social profile data can be used to understand the people’s behavior
online with regards to the products and other people. Through social network analytics using
Gephi we can identify the segments, the key products that influence that people and also the
people who influence the segment. This information gathered can further help the user to
collect information and decide various factors to market the products like, what to market in
which segment, whom to target, who are the key influencers who can further help in
marketing, what should be the price, what are the demographics, etc.
13. Page | 12
(3) Drug Marketers Help Locate Influential Doctors
►Drug Marketers Use Social Network Diagrams to Help Locate Influential Doctors
Introduction
The figures given above are the social network diagrams, which identify relationships among
doctors, patients and doctors and prescribing histories. The diagrams show doctors in the
north eastern U.S. beside patients in common. The survey conducted is to track the
movement and popularity of the oncology drug among doctors and the frequency of it being
14. Page | 13
prescribed by the doctors. The doctors included in the case are the doctors who have
prescribed, or are potential customers for, an oncology drug.
Data Collection/Findings
The figures provided show the Doctors being represented by nodes or dots. The light blue
color nodes show doctors who have not prescribed the drug being surveyed while the dark
blue colored nodes show the doctors who have prescribed the surveyed drug. Further the size
of the node depicts the volume of oncology drug prescribed by the doctors, the bigger the
node the higher the volume of the drug prescribed. The red colored nodes or dots depict the
doctors who specialize in this particular field and also they have not prescribed the surveyed
drug. This makes these doctors the most important ones for marketers to sway and get them
to prescribe their oncology drug. The analysis distributes the doctors in different clusters,
which represent different segments of interrelated physicians, all at similar prescribing levels
thus making it easier for the marketers to target all at once. It also highlights the doctors or
physicians who act as links or bridges between different clusters or segments of doctors and
physicians making them also an important target for the marketers to focus on. The edges that
connect the nodes mean that the doctors share more than a certain number of patients i.e. the
patient suffering from similar condition, which could make them potential users of the drug
and hence drive the doctors to prescribe the drug. The analysis also helps to identify certain
doctors who may not be very high volume prescriber of the drug but have a significant central
position which enables them to influence even the bigger nodes around them.
Conclusion
Hence, from the following use case we can come to the inference that social network
diagrams can be of great help for the drug marketers to locate and narrow down their focus
on only certain influential and connected doctors or segments rather than approaching all
individually and still be able to efficiently market their new drugs in the market. It allows
them to target the most significant market and best potential targets for marketing to prescribe
their drug.
15. Page | 14
(4) Clamping down on Review Fraud
Introduction
Every day a number of product reviews are added on the ecommerce websites like ebay,
amazon, snapdeal and flipkart etc. These reviews act as a useful tool in reassuring the
customers of the credibility of the product quality, authenticity, price and after sales etc. and
also lets them know if the product is not up to the mark and if the customers should refrain
from buying it. But these days there is a new trend among some of the retailers selling on
these online shopping sites. These fraudulent retailers manipulate the user generated reviews
in their favor by creating false reviews which misleads the customers into buying their
substandard products or even to misrepresent their competitors. It is not only illegal but also
provides a huge hassle for the customers. These false reviews put the website in jeopardy,
erodes the customer trust and damages the integrity of the data on which the brands are build.
Apart from he reputation being compromised the websites and retailers also lose a
considerable amount of revenue.
16. Page | 15
Data Collection/Findings
In the given diagram the information and reviews are depicted by nodes where we an see that
with each review node there are three nodes of associated information, that are the business
or the retailer reviewed, the IP address used to post the review and the device provided. The
information is linked with edges of the colour blue and red. Where the blue edge represents
the default link while the red link depict the links associated to information, which is
suspicious of being fraudulent. The reviews, which have been previously removed, which
have already declared as frauds are depicted by a Ghosted red “x” node.
[1]
[2]
17. Page | 16
In the above diagram (1st
) we can see that a single IP address has been used to post about
seven review and from different devices, following this analysis three of the seven reviews
have been marked as fraud and removed. Having the same IP address and being posted in the
similar timeframe the suspicious is extended to other four reviews as well making them liable
to scrutiny. In another scenario, one device has posted eight ‘0’ ratings reviews to a
business/retailer, but through a number of different IP addresses , hence drawing
suspicion.(2nd
).
Another mode of scrutinizing the review is by keeping a check and and making a visual map
of the reviewers account or profile,
In the above provided diagram we can see that one account has been used to give ‘5’ rating
review to a lot of different businesses which draws attention towards itself and makes it
possible of being in a Astroturfing network. Astroturfing networks are the ones who post
reviews for the business in exchange of money or fees.
18. Page | 17
On focusing in the center of the same diagram, we can observe that one business has received
lowest ratings from accounts that have no other activity other than that. This could also be a
potential indicator of fraud.
Conclusion
Thus from the given examples we can state that visualization of the reviews and the
information associated with it can provide a fast and intuitive way to digest large amounts of
review and rating data, improving the quality and credibility of the reviews which prove as
very important factor in customer decision-making and driving sales for he website and
maintain goodwill by gaining the customer’s confidence.
19. Page | 18
(5) Growth in Civic Tech
Introduction
Civic Tech otherwise known as civic technology is a technology which enables governments
and other companies engage participation of the public in order to strengthen their ties
primarily through communication and improvement of infrastructure to improve the public.
In this case, we will assess the trends in this domain which will identify and study those
companies who are at the intersection of technology, open government and citizen
engagement.
Data Collection/ Methodology
Built in partnership with the Knight Foundation, using research from Quid, the tool
showcases 209 civic tech organizations, their relationships to one another, and the balance of
public and private investments they received between January 2011 and May 2013.
Findings
This looks at the network interaction of the
community as a whole in relation to organizations
within that geographical space.
20. Page | 19
This looks at the network interaction of
the organization Airbnb Inc. and its
behavior as a node in this network.
This looks at the network interaction of
the government as an open source, and
the methods that it is using to assess
the data to establish the network
relationships.
This diagram depicts the nodes and their
various branches which collectively form
the network.
21. Page | 20
(6) Visual Analysis of Complex Networks for Business Intelligence
Introduction
Social network analysis is extremely vital in order to understand the consumers behavior. It
not only studies the connection between individuals but studies the connection between
groups and organizations as well. It helps companies gain information on brand management
and for the analysis of the Return of Investment on communication campaigns.
In this case, a paper presented at the 13th
International Conference on Information
Visualization had an example of social network analysis and its usage in this case in relation
to web data found from e-diaspora research project to illustrate this methodology. The aim of
the project was to study the usage of web by migrant communities. These platforms help
companies identify which companies could possibly expand their consumer base and in
which areas in order to capture a larger chunk of the customer base.
Methodology
The data was analysed using Gephi. The data taken was performed on network of websites
from Moroccan diaspora.
Findings
The data was loaded onto Gephi and applied the ForceAtlas layout to get an overview of the
network structure, see Figure 2 (a). It was then observed that it is clearly divided into two
main clusters 2 of nodes (on the bottom-left and on the topright) with a few nodes connecting
these clusters. To validate this observation, further application of the Louvain was done. The
Louvain modularity algorithm (resolution=1), which automatically detects non-overlapping
communities that are finally represented with different colors. Intuitively, it shows how the
network is divided naturally into groups of nodes with dense connections within each group
and sparser connections between different groups. We see in Figure 2 (b) that the left-hand
cluster is clearly detected. Sub-clusters are also detected in the right-hand cluster (the
resolution parameter may be modified to find different sub-clusters), however Louvain
algorithm provides no justification on the existence of these clusters. The algorithm may
indeed detect communities in networks with no community structure, which is one of its
limits.
22. Page | 21
We would like to explain why these clusters exist, and why some nodes act as bridges
between them. We thus studied the correlation between node properties and visual patterns.
They then mapped the property called website category to node colors, see Figure 2 (c). We
observe that the left-hand cluster corresponds very clearly to websites classified as blogs (in
blue). This trivial grouping supports the hypothesis that blogs tend to be more connected to
other blogs than to the remainder of the websites. However, there is no trivial grouping for
the right-hand cluster. So, they then mapped the property of website main language to node
colors, see Figure 2 (d). We observe that the websites of both left-hand and right-hand
clusters are mostly written in French (in blue), but the clusters also contain some websites
written in English (in red). A sub-cluster (in red) in the right cluster is also confirmed; it
corresponds to the red cluster detected by the Louvain algorithm. Finally, it was observed that
one of the websites connecting the two clusters is written in English, and it is connected to
the other websites in English. Hence, this observation supports the hypotheses that the
existence of hyperlinks between websites is correlated to websites language, and that the
salient website seems to play a key role for websites written in English.
Conclusion
Gephi can be used to generate relevant hypothesis for the study of social networks. Various
functions such as central nodes and other such advanced queries help us find visualize some
rather complex situations which could help businesses break boundaries.
23. Page | 22
(7) Geographical network analysis
Introduction
Geospatial BI relies on integrating Geographic Information Systems (GIS) with BI
Technologies. In this case, Gephi was used a medium to analyse basic network and geospatial
analytics. If node properties include latitude and longitude coordinates geographical
predictions can be made using the GeoLayout plugin.
Methodology
Gephi and its various plugins have been used in order to identify nodes, thus finding their
associated geographical coordinates. Networks can then be exported to KMZ files for further
analysis on GIS software using the ExportToEarth plugin.
We applied a Mercator projection to the network of Twitter users who tweeted about COFA
Online Gateway3, an Australian platform for teaching e-learning. The tweets talking about it
have been manually collected using the Twitter Timeline and search engine from October 26,
2011 to January 11, 2012. Users have been geolocated manually. Then we built the network
of these users, where links exist when a user mentions another user in a tweet during this
period.
24. Page | 23
Findings
The network has been visualized using Gephi (Figure 3). We discovered that these users are
mostly located in south-east of Australia, in Great Britain and in the United States, but there
was no visual evidence of spatial proximity of mention on the dataset. Those conducting the
research also collected tweet timestamps to study the evolution of the network.
Conclusion
It is evident that Gephi can be used in a number of domains. Gephi aids with easy
visualization of complex networks.
25. Page | 24
(8) How long is a Tweet?
► Mapping dynamic conversation networks and Twitter using Gwak and Gephi
Introduction
Twitter is the second largest social media platform after facebook. Despite that, very little
research has gone into understanding and analyzing this particular platform. One
phenomenon which is rather famous on this social media platform is #hashtag, with # being
followed by the word which the user intends to highlight.
Methodology
The web service Twapperkeeper (TK) has been the preferred tool for campturing #hashtag or
the keywords in recent times. However, certain regulations have been altered which means
that the data from Twitter accounts is no longer downloadable and act only as a means of
functionality. Instead, yourTwapperKeeper (yTK) can be used and dose not face the same
restrictions. It has all the possible information available needed to analyse and form networks
of these #hastags. In this case, the authors have used Gephi in order to eliminate the
restrictions faced by other open sources such the ones stated above as well as Gwak.
Findings
26. Page | 25
As stated in the diagram above, we can see that in this case we are assessing the handle,
@KevinRuddPM rather than just his name. At the same time (and unsurprisingly), this
significant
level of incoming @replies does not result in any responses from the Prime Ministerial
account; with node colour indicating the amount of @replies sent, @KevinRuddPM remains
a pale yellow, indicating no activity. Conversely, there are also a number of very highly
active senders of @replies (shown in red), who do not necessarily also receive a significant
number of answers to their messages – and most interestingly, perhaps, it is possible for us to
identify a handful of participants who are notable both as senders and as recipients of
@replies (acting as central hubs in the network); most notably, these include journalists
@latikambourke and @renailemay (indicating, incidentally, the continued importance of
mainstream media sources even in social media environments).
In the analysis, the authors are merely interested in standard @replies rather than retweets.
Conclusion
The dynamic network visualizations which our approach enables us to generate are
significant in their own right; depending on the nature of the #hashtag data to be visualized,
and on the period for which data are available, they enable us to highlight the shifting roles
played by individual participants over time, as well as the response of the overall #hashtag
community to new stimuli – such as the entry of new participants or the availability of new
information. Additionally, this sort of an analysis also allows us to identify the kind of
handles which need to be used in order to create a viral, especially in the case of a political
campaign (this comment is restricted to the example analysed). However, similar analysis can
be performed for other situations which require large mass movement, such as an alteration in
certain legislation.
28. Page | 27
Introduction
The following Use case consists of the 147 firms from around the world and which dominate
the world economy. These firms belong to various sectors but majority of them are from
finance or real estate. These firms control 40% percent of the entire worlds wealth of the
transnational companies.
Data Collection/Findings
Ilka Kopplin and the weekly newspaper DIE ZEIT have researched the data for the use case.
This data has been sourced from James Glattfelder, Stefania Vitali, Stefano Battiston and
Chair of Systems Design, ETH Zurich.
Visualisation of the data shows the different firms, which are denoted by nodes of different
colour, and the edges show the link between those firms and their interest in each other, that
is how they are connected to one another. The nodes are of different size, which depicts the
performance of the company. The bigger the node the more profitable the company is.
Conclusion
This visualisation of the data helps us to clearly understand the role of giant firms
contributing to the world economy and how they shape and influence the world economy.
This also helps us to follow the ties a frim has with another firm the understand the
relationship a firm shares with other firms including some firms of minority interest and
further how this relation helps in building the world economy.
29. Page | 28
(10) SMEs' Use of Data Spatialization
►SMEs' Innovation and Export Capabilities Identification and Characterization of a
Common Space Using Data Spatialization
Introduction
Small and Medium Enterprises (SMEs) as we know have limited resources and have
difficulty mobilizing the necessary resources for the development of innovations, as well as
for their success on the international markets. Therefore, it becomes imperative to generate a
common space for SMEs for development of capabilities that an SME has to mobilize
primarily in order to create value simultaneously in terms of innovation and export.
Theoretical foundation
The study of the link between innovation and export in the context of SMEs represents a very
important research area in the current scientific literature (Love and Roper, 2015). This
paradigm is supported by two theories: self-selection (Boso et al., 2013; Monreal-Pérez et al.,
2012; Raymond and St-Pierre, 2013) and learning-by-exporting (Golovko and Valentini,
2014; Kafouros et al., 2008). These theories demonstrate respectively that innovation has a
positive impact on export and vice versa. The theory mainly accepted seems to be self-
selection, according to which the innovation can be considered as a necessary but not
sufficient condition for export.
Few documents concern SMEs in particular but some make comparisons between SMEs and
large companies. Certain authors validate the self-selection theory, according to which
innovation has a positive impact on the international performances of companies. Others
support the learning-by-exporting theory, which considers that the knowledge and the
acquired experiences on the international markets improve the innovation capability of
companies. And finally, certain studies consider that innovation and export have a mutual
positive impact, in the form of virtuous circle. The empirical validation of these theories is
mainly made by the analysis of data of existing inquiries (Spanish Business Strategy Survey
SBSS; Product Development Survey, PDS) (Love and Roper, 2001)).
30. Page | 29
The human aspects play an extremely important role in innovation activities (Rodríguez and
Hechanova, 2014) as well as in export activities (Alaoui and Makrini, 2014). This is
especially the case in the context of SMEs, because the manager is generally omnipresent and
sometimes the only decision-maker (Child and Hsieh, 2014).
Data collection (source of data)
The presence or absence of these capabilities within SMEs is significant for analyzing their
global capability to innovate and to export. The capabilities were identified in the literature,
in terms of innovation and export respectively. They were then validated through a series of
interviews with business managers and experts in the domain followed by a similarity
analysis highlighting the joint capabilities between innovation and export.
Data spatialization was used because the results of the similarity analysis were difficult to
exploit due to the representation of a large amount of data. Data spatialization would then be
used for representing visually the existing similarities between the joint capabilities which
were identified.
Methodology
Gephi proposes several force-based algorithms therefore, it was necessary to select the
algorithm most suited to the study.
Table 1: Overview of the force-based algorithms available on Gephi software, inspired
by Jacomy et al. (2014)
31. Page | 30
After some comparisons, the ForceAtlas2 algorithm was chosen as it seemed more suited to
the visualization of clusters which is relevant to this study. Also, this data visualization
methodology has in particular the advantage to show each datum only once, which facilitates
the interpretation of the results.
Findings
The findings of the data spatialization are presented in Figure 1. Gephi identified nine groups
separated by "modularity class." The capacities of the same "modularity class" can be
considered as a dimension common to both innovation and export activities.
Figure 1: Data Spatialization: clustering of similar capabilities
The similarity analysis relying on the frame of reference describing separately the innovation
and export capabilities highlighted the pairs of practices which can be considered as similar,
and the intensity of this similarity.
The usage of Gephi & the ForceAtlas2 algorithm show clusters of capabilities which can be
considered as similar (Table 5).
32. Page | 31
Table 5: Interpretation of the data spatialization results
The common dimensions identified concern the acquisition and the capitalization of
information (Groups 2 and 5), the management of internal and external skills (Groups 1 and
4), the management of projects and resources (Group 7), and the strategy (Group 6) based on
data spatialization methodology. Two additional groups appear: the management of
intellectual property and the diffusion of the corporate culture (human and cultural aspects).
This analysis provides an important degree of precision concerning the characterization of the
common space between capacities of innovation and export.
Conclusion
This analysis has led to the characterization of a common space between innovation and
export, composed of several dimensions including capabilities associated with both activities.
Therefore, innovation and export must be considered as two complementary activities,
integrating an interface representing the capabilities which an SME has to mobilize primarily
to create simultaneously value in terms of innovation and export. These capabilities will
further allow for the mobilization of common resources, common skills and common
knowledge and make it possible to minimize the effort associated with the creation of a
virtuous circle of innovation / export, supported by a value-creating common interface.
33. Page | 32
References
Bruns, A. (2012). How Long Is a Tweet? Mapping Dynamic Conversation Networks on
Twitter Using Gawk and Gephi. Information, Communication & Society, 15(9), 1323–
1351. https://doi.org/10.1080/1369118X.2011.635214
Capitalist_Network — Information is Beautiful Awards. (n.d.). Retrieved September 30,
2017, from https://www.informationisbeautifulawards.com/showcase/104-
capitalist_network
Clamping down on review fraud. (2015, April 15). Retrieved September 30, 2017, from
https://cambridge-intelligence.com/clamping-down-on-review-fraud/
Drug Marketers Use Social Network Diagrams to Help Locate Influential Doctors - Graphic -
NYTimes.com. (n.d.). Retrieved September 30, 2017, from
http://www.nytimes.com/interactive/2013/05/16/business/PHARMA.html
Enjolras, M., Camargo, M., & Schmitt, C. (2016). SMEs’ Innovation and Export Capabilities:
Identification and Characterization of a Common Space Using Data Spatialization.
Journal of Technology Management & Innovation, 11(2), 56–69.
https://doi.org/10.4067/S0718-27242016000200006
Fathom | Trends in Civic Tech. (n.d.). Retrieved September 30, 2017, from
https://fathom.info/civictech
FINAL_Social_Network_Analysis_as_a_Tool_for_Understanding_the_Workforce4.pdf.
(n.d.). Retrieved from
https://www.nswpb.ca/application/files/4414/6427/7989/FINAL_Social_Network_An
alysis_as_a_Tool_for_Understanding_the_Workforce4.pdf
Heymann, S., & Grand, B. L. (2013). Visual Analysis of Complex Networks for Business
Intelligence with Gephi. In 2013 17th International Conference on Information
Visualisation (pp. 307–312). https://doi.org/10.1109/IV.2013.39
34. Page | 33
Local Employment Planning Council. (n.d.). Retrieved September 30, 2017, from
http://www.localemploymentplanning.ca/about
Social Graph: Network Graph and Interest GraphServicios de Consultoría Social Commerce.
(n.d.). Retrieved September 30, 2017, from
http://www.socialtocommerce.com/2523/social-network-data-interest-graph-data-
will-power-next-generation-shopping/
SurveyMonkey - Free online survey software and questionnaire tool. (n.d.). Retrieved
September 30, 2017, from https://www.surveymonkey.com/