SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
Topic-Based Social Network Analysis for Virtual
                         Communities of Interests in the Dark Web

                                                   Gaston L’Huillier                            Hector Alvarez
                                                    University of Chile                        University of Chile
                                                     Republica 701                              Republica 701
                                                     Santiago, Chile                            Santiago, Chile
                                               glhuilli@dcc.uchile.cl                     halvarez@ing.uchile.cl
                                                Sebastián A. Ríos                             Felipe Aguilera
                                                    University of Chile                        University of Chile
                                                     Republica 701                           Blanco Encalada 2120
                                                     Santiago, Chile                            Santiago, Chile
                                                  srios@dii.uchile.cl                      faguiler@dcc.uchile.cl

      ABSTRACT                                                                            1. INTRODUCTION
      The study of extremist groups and their interaction is a
      crucial task in order to maintain homeland security and                             “The process of radicalization continues in a hostile
      peace. Tools such as social networks analysis and text min-                         physical environment, but it is enabled by the Internet,
      ing have contributed to their understanding in order to de-                         resulting in a disconnected, decentralized social structure”.
      velop counter-terrorism applications. This work addresses                           – Marc Sageman [20]
      the topic-based community key-members extraction problem,                              Today, it is no mystery that terrorists and cyber-criminals
      for which our method combines both text mining and so-                              are looking for uncovered means to seek peers with the same
      cial network analysis techniques. This is achieved by first                          interests such as internet forums, blogs, and social networks,
      applying latent Dirichlet allocation to build two topic-based                       where they can share and comment their feelings and inter-
      social networks in online forums: one social network oriented                       ests with other that support their cause. In this sense, Dark
      towards the thread creator point-of-view, and the other is                          Web1 has brought endless potential to achieve the coordi-
      oriented towards the repliers of the overall forum. Then,                           nation, propaganda delivery, and other unwanted interac-
      by using different network analysis measures, topic-based                            tions between extremists groups. However, web information
      key members are evaluated using as benchmark a social net-                          can be retrieved for further analysis. Many researchers have
      work built a plain representation of the network of posts.                          been wondering whether the leaders of the Dark Web can
      Experiments were successfully performed using an English                            be identified by their interaction with other members.
      language based forum available in the Dark Web portal.                                 The Dark Web is conformed by Virtual Communities of
                                                                                          Interests (VCoI) [9, 16] are communities which members are
      Categories and Subject Descriptors                                                  reunited by their shared interests in a topic, like fan clubs
                                                                                          of artists. Therefore, a key aspects when studying VCoI is
      H.2.8 [Database Management]: Database Applications—                                 to achieve a complete understanding on what are the main
      Data Mining; K.4.1 [Computing Milieux]: Computers                                   interests of the community. Only then, it will be possible to
      and Society—Public Policy Issues                                                    obtain a better insight of community’s social aspects, leading
                                                                                          to a better identification of the key members (i.e. Opinion
      General Terms                                                                       Leaders), density reduction, among other benefits.
                                                                                             Different approaches have been previously developed for
      Dark Web, Social Network Analysis, Text Mining                                      Social Networks, Virtual Communities [11], and Virtual Com-
                                                                                          munities of Practices [4, 17]. To the best of our knowledge,
      Keywords                                                                            none Latent Semantic Analysis (LSA) for enhancing the So-
                                                                                          cial Network Analysis (SNA) has ever been developed in
      Terrorism Informatics, Terrorism knowledge portals, Latent
                                                                                          Dark Web portals before. However, we think the applica-
      Dirichlet Allocation, Virtual Communities of Interest
                                                                                          tion of these techniques may enhance greatly SNA results.
                                                                                          This work’s main contribution is the hybrid approach us-
                                                                                          ing topic-models and SNA for enhancing the extraction of
                                                                                          key-members in a VCoI.
      Permission to make digital or hard copies of all or part of this work for
      personal or classroom use is granted without fee provided that copies are              This paper is structured as follows: In section 2, previous
      not made or distributed for profit or commercial advantage and that copies           work on SNA and text mining for key members identification
      bear this notice and the full citation on the first page. To copy otherwise, to      is presented, as well as previous work on the Dark Web fo-
      republish, to post on servers or to redistribute to lists, requires prior specific   rums analysis. Then, in section 3 the proposed methodology
      permission and/or a fee.                                                            1
      ISI-KDD 2010 , July 25, 2010, Washington, D.C., USA.                                  Internet-based forums or platforms for terrorists and cyber-
      Copyright 2010 ACM ISBN 978-1-4503-0223-4/10/07 ...$10.00.                          criminals.




SIGKDD Explorations                                                     Volume 12, Issue 2                                                          Page 66
to solve the topic-based community key members extraction        [29], aims towards the capture of domestic Web forums and
      problem and main contribution of this work is described. In      the creation of a social network mapping to identify their
      section 4 the experimental setup is detailed, and its results    structure and cluster affinities. Finally, in [28] a complete
      are discussed. Finally in section 5, the main conclusions of     framework on how to build a portal of Dark Web forums is
      this paper and future work are presented.                        presented, where data collection and integration, as well as
                                                                       its visualization and open access, are the main contributions
      2. PREVIOUS WORK                                                 of the authors.
                                                                          Previous work on key-members identification on the Dark
        Since Dark Web is a VCoI, there are different goals as-
                                                                       Web [24], describes how several centrality measures can be
      sociated to their members’ objectives [7]. In this case, the
                                                                       used to identify different key-members of a social network.
      support of the community with an online forum where its
                                                                       Here, the degree, betweenness, and closeness measures [6]
      anonymity, ubiquitous, and free-of-speech makes the perfect
                                                                       have been described as tools to characterize key-members of
      environment to share fundamentalism and terrorism propa-
                                                                       a given social network. It is important to highlight that this
      ganda. Therefore, a method to recognize the underlying
                                                                       approach has been traditionally used in previous work.
      members’ objectives is needed in order to identify threats or
                                                                          Latent semantic analysis has been previously applied in
      security issues. Topic-based SNA is a way to track, in some
                                                                       different Dark Web applications, such as [5] where Latent
      degree, topics on members’ thread, which can be related to
                                                                       Semantic Indexing was used to link nodes to certain topics
      members’ goals.
                                                                       in the network construction. Furthermore, in [26], an latent
        In the following, previous work on topic-based SNA is pre-
                                                                       Dirichlet allocation (LDA) based web crawling framework is
      sented, as well as related work on Dark Web Social Network
                                                                       proposed to discover different topics from Dark Web forum
      Analysis applications.
                                                                       cites, but not further social network analysis applications
      2.1 Topic-Based Social Network Analysis                          with these findings. Thus, leaving without measuring many
                                                                       social aspects from the Dark Web social structure.
        SNA [22] helps to understand relationships in a given com-
                                                                          In terms of text mining in Dark Web forums [2], appli-
      munity analyzing its graph representation. Users are seen as
                                                                       cations have been oriented to model assessment and feature
      nodes and relations among users are seen as arcs. This way,
                                                                       selection in order to improve the classification of messages
      several techniques have been proposed to extract key mem-
                                                                       that contains sensitive information on extremists’ opinions
      bers [12], classify users according his relevance within the
                                                                       and sentiments. Also, authorship analysis [1] has been previ-
      community [14], discovering and describing resulting sub-
                                                                       ously applied in order to analyze the groups’ authorship ten-
      communities [10], amongst other applications. However, all
                                                                       dencies, and more important, tackling the anonymity prob-
      these approaches leave aside the meaning of relationships
                                                                       lem associated to this type of virtual communities.
      among users. Therefore, analysis based only on reply of
      mails or posts to measure relationships’ strongness or weak-
      ness it is not a good indicator.                                 3. PROPOSED METHOD
        McCallum et al. in [11] described how to determine roles          The main question of this work is how to enhance the
      and topics in a text-based social networks by building Author-   key-members discovery based on a topic-based social net-
      Recipient-Topic (ART) and Role-Author-Recipient-Topic            work structure. The first step is to obtain a reduced/filtered
      (RART) models. Furthermore, in Pathak et al. [13], a com-        representation of the inner social community. However, this
      munity based topic-Model integrated social network anal-         representation must be created in such way that informa-
      ysis technique (Community-Author-Recipient-Topic model           tion contained is better to discover key-members (aligned to
      or CART) is proposed to extract communities from a emails        community’s goals and members who produce interaction).
      corpus based on the topics covered by different members of        The second step is to apply a core members’ algorithm to
      the overall network. These approaches novelty is the use of      obtain the key-members of a virtual community based on
      data mining on text from the social network to perform SNA       their respective topics of interest. As a result, we obtain the
      to study Roles or Sub-groups extraction.                         rank of the complete community members where for each
                                                                       community interesting topic, we are able to reveal members
      2.2 Social Network Analysis on the Dark Web                      that can be considered as experts or key on that topics.
         As described by Xu & Chen in [25], the topology of dark          We must mention the difference between a key-member
      networks share different properties with other types of net-      and a highly-radical member, since key members have sev-
      works, where small-world structures are determined by the        eral characteristics that define them. Firstly, a key member
      information flow properties, characterized by short average       may be highly-radical member in a given topic or not. Sec-
      path and a high clustering coefficient. In this sense, so-         ondly, he/she may increase the interaction in the community
      cial network analysis tools used in other virtual communi-       because he/she points out interesting messages, which pro-
      ties applications [19], could be useful to analyze this kind     duce replies from different level of members. We will focus
      of structures. They combine concepts obtained from com-          on key-members of any kind, afterwards, depending on the
      munities’ administrators into a concept-based text mining,       topic it would be possible to classify him/her as a radical
      which allow to obtain interesting results in terms of pur-       member or not. Of course, it is possible to automate the
      pose accomplishment in a Virtual Community of Practice           process of radical member discovery, however, more work
      (VCoP).                                                          is needed to avoid subjectivity. Furthermore, accusing or
         In Reid et al. [18], Web forums where analyzed in or-         pointing such label on a person is not a simple matter and
      der to determine whether a given community has been in-          implications are far beyond this work.
      volved with terrorist presence by using automated and semi-         We defined a key-member as a person totally aligned with
      automated procedures to gather information and analyze it.       the VCoIs’ goals and topics. Thus, producing contents which
      Other applications, such as the proposed by Zhou et al. in       are very relevant to satisfy other members’ interests. The




SIGKDD Explorations                                     Volume 12, Issue 2                                                       Page 67
only way to measure a key-member as defined, is using an hy-                   where p(zs |θ) can be represented by the random variable
                                                                                                                                     i
      brid approach of SNA combined with latent semantic-based                  θi , such that topic zs is presented in document i (zs = 1).
      text mining. This way, we can discover threatening topics,                A final expression can be deduced by integrating equation 2
      and perform specific analysis on each one.                                 over the random variable θ and summing over topics z ∈ T .
                                                                                Given this, the marginal distribution of a message can be
      3.1 Basic Notation                                                        defined as follows:
         Let us introduce some concepts. In the following, let V
      a vector of words that defines the vocabulary to be used.                                 Z            S
                                                                                                            Y X
                                                                                                                                          !
      We will refer to a word w, as a basic unit of discrete data,                                                                s
                                                                                 p(w|α, β) =       p(θ|α)               p(zs |θ)p(w |zs , β) dθ (3)
      indexed by {1, ..., |V|}. A post message is a sequence of S                                           s=1 zs ∈T
      words defined by w = (w1 , ..., wS ), where ws represents the
      sth word in the message. Finally, a corpus is defined by a                   The final goal of LDA is to estimate previously described
      collection of P post messages denoted by C = (w1 , ..., w|P| ).           distributions to build a generative model for a given cor-
         A vectorial representation of the posts corpus is given by             pus of messages. There are several methods developed for
      TF-IDF= (mij ), i ∈ {1, . . . , |V|} and j ∈ {1, . . . , |P|} , where     making inference over these probability distributions such as
      mij is the weight associated to whether a given word is more              variational expectation-maximization [3], a variational dis-
      important than another one in a document. The mij weights                 crete approximation of equation 3 empirically used by [23],
      considered in this research is defined as an improvement of                and by a Gibbs sampling Markov chain Monte Carlo model
      the tf-idf term [21] (term frequency times inverse document               efficiently implemented and applied by [15].
      frequency), defined by
                                                                                3.3 Network Configuration
                                                      „         «                  To build the social network, the members’ interaction
                               nij                        |C|
                       mij = P|V|             × log                      (1)    must be taken into consideration. In general, members’ ac-
                                        nkj               ni                    tivity is followed according to its participation on the forum.
                                  k=1
                                                                                Likewise, participation appears when a member post in the
         where nij is the frequency of the ith word in the j th doc-
                                                                                community. Because the activity of the VCoI is described
      ument and ni is the number of documents containing word
                                                                                according members’ participation, the network will be con-
      i. The tf-idf term is a weighted representation of the im-
                                                                                figured according to the following: Nodes will be the VCoI
      portance of a given word in a document that belongs to a
                                                                                members, and arcs will represent interaction between them.
      collection of documents. The term frequency (TF) indicates
                                                                                How to link the members and how to measure their interac-
      the weight of each word in a document, while the inverse
                                                                                tions to complete the network is our main concern.
      document frequency (IDF) states whether the word is fre-
                                                                                   In this work, will be describe two VCoIs’ network repre-
      quent or uncommon in the document, setting a lower or
                                                                                sentation according the following replying schema of mem-
      higher weight respectively.
                                                                                bers:
      3.2 Topic Modeling
                                                                                  1. Creator-oriented Network: When a member cre-
         A topic model can be considered as a probabilistic model                    ate a thread, every reply will be related to him/her.
      that relates documents and words through variables which                       This network representation is the less dense network
      represent the main topics inferred from the text itself. In                    (density is measured in terms of the number of arcs
      this context, a document can be considered as a mixture of                     that the network have).
      topics, represented by probability distributions which can
      generate the words in a document given these topics. The                    2. Last Reply-oriented Network: Every reply of a
      inferring process of the latent variables, or topics, is the key               thread will be a response of the last post. This network
      component of this model, whose main objective is to learn                      representation has a middle density.
      from text data the distribution of the underlying topics in a
      given corpus of text documents.                                              In figure 1 the latter two approaches of network conver-
         A main topic model is the Latent Dirichlet Allocation                  sion of the forum is presented. In figure 1, arcs represents
      (LDA) [3]. LDA is a Bayesian model where latent topics                    members’ replies and nodes represent the users who made
      of documents are inferred from estimated probability dis-                 the posts. In our first approach, the weight of arcs will be a
      tributions over the training dataset. The key idea of LDA,                counter of how many times a given member replies to other
      is that every topic is modeled as a probability distribution              one.
      over the set of words represented by the vocabulary (w ∈ V),
      and every document as a probability distribution over a set
      of topics (T ). These distributions are sampled from multi-
      nomial Dirichlet distributions.
         For LDA, given the smoothing parameters β and α, and
      a joint distribution of a topic mixture θ, the idea is to de-
      termine the probability distribution to generate from a set
      of topics T , a message composed by a set of S words w
      (w = (w1 , ..., wS )),

                                          S
                                          Y                                     Figure 1: Two different network models to represent
              p(θ, z, w|α, β) = p(θ|α)          p(zs |θ)p(ws |zs , β)    (2)
                                                                                a given forum interaction.
                                          s=1




SIGKDD Explorations                                                 Volume 12, Issue 2                                                        Page 68
In order to consider the reply of members according to                 Algorithm 1 Initialize Semantic Weights Matrix
      the community purpose (for any of these configurations),                   Input: V (Vocabulary)
      and to filter noisy posts, a topic-based message reduction is              Input: P (Posts)
      performed. According to previous network configurations,                   Input: k (Number of Topics)
      the topic-based filtering method is used in order to remove                Output: Semantic Weights Matrix SWM[|P|, k]
      all replies that are not according to the posts’ topic. Here,              1: TF-IDF[|P|, |V|] (Eq. 1)
      by using LDA, a list of keywords for each topic is deter-                  2: SM[k V] ← Build SM (semantic matrix) according to Top-
      mined, and later, used to build the final version of the social                ics (Sec. 3.2)
      network.                                                                   3: SWM[|P|, k] ← TF-IDF × SMT
      3.4 Topic-based Network Filtering
         Previous work [19] brings a method to evaluate commu-                  Algorithm 2 Creator-oriented Network
      nity goals accomplishment. In this work we will use this                  Input: P (Posts)
      approach to classify the members’ posts according VCoIs’                  Output: Network Gc = (N , A)
      goals. These goals are defined as a set of terms, which are                 1: Build SWM according to Algorithm 1
      composed by a set of keywords or statements in natural lan-                2: Initialize N = {}, A = {}
      guage.                                                                     3: for each i ∈ P do
         The idea is to compare with euclidean distance two mem-                 4:   N ←N ∪i
      bers’ posts. If the distance is over a certain threshold θ, an             5: end for
      interaction will be considered between them. We support                    6: for each i ∈ P.creator do
      the idea that this will help us to avoid irrelevant interac-               7:   for each j ∈ {i.replies}, i = j do
      tions. For example, in a VCoI with k goals (or topics), let                8:      if dm (Pi , Pj ) ≥ θ then
      T Bj a post of user j that is a reply to post of user i (T Bi ).           9:         ai,j ← ai,j + 1
      The distance between them will be calculated with equa-                   10:         A ← A ∪ ai,j
      tion 4.                                                                   11:      end if
                                                                                12:    end for
                                        P                                       13: end for
                                            gik gjk
                   dm (T Bi , T Bj ) = qP k P                            (4)
                                            2        2
                                         k gik    k gjk
                                                                                post username i, the arc weight ai,j is increased according
         Where gik is the score of topic k in post of user i. It is             to its number of all repliers j whose messages’ distance are
      clear that the distance exists only if T Bj is a reply to T Bi .          greater or equal than a threshold θ.
      After that, the weight of arc ai,j is calculated according to
      equation 5.                                                               Algorithm 3 Reply-oriented Network
                                   X                                            Input: {V, P, k}
                    ai,j =                        d(T Bi , T Bj )        (5)    Output: Network Gc = (N , A)
                                    i,j                                          1: . . . (as presented in algorithm 2)
                             dm (T Bi ,T Bj )≥θ
                                                                                 2: for each i ∈ P do
        We used this weight in both configurations previously de-                 3:     for each j ∈ {i.reply}, i = j do
      scribed (Creator-oriented and Last Reply-oriented). After-                 4:        if dm (Pi , Pj ) ≥ θ then
      wards, we applied HITS [8] to find the key members on the                   5:          ai,j ← ai,j + 1
      different network configurations.                                            6:          A ← A ∪ ai,j
                                                                                 7:        end if
      3.5 Network Construction                                                   8:     end for
        First, as a baseline proceedure, Algorithm 1 shows how                   9: end for
      to determine the Semantic Weights Matrix, that assigns an
      score to every post according to all topics considered for the
      network construction. In this case, the algorithm intially
      determines the TF-IDF matrix according to Equation 1, the
                                                                                4. EXPERIMENTAL SETUP AND RESULTS
      a semantic matrix SM according to topic-based text mining                    The proposed methodology was applied in the Ansar1 En-
      described in Section 3.2 and Section 3.1 respectively. Finally,           glish language based forum, available on the Dark Web por-
      the matrix multiplication between TF-IDF and SM defines the                tal2 , for which examples of the topic extraction methodol-
      SWM matrix.                                                               ogy, network construction, and key-members using the HITS
        Algorithm 2 presents the pseudo-code on the graph Gc =                  algorithm were determined.
      (N , A) is build by using the Creator-oriented network. First,               In the following, an analysis of topics extracted using
      the SWM matrix is build according to Algorithm 1. Then,                   LDA (described in Section 3.2) is presented. Then, the net-
      considering all of the posts P, the network is built following            work topology construction by using both Reply-oriented
      the structure presented in Figure 1 (a). This is, for each post           and Creator-oriented structures for the whole period is de-
      creator i, the arc weight ai,j is increased according to the              scribed. Finally, key-members where determined using HITS
      number of repliers j, that their messages’ distance greater               (both authority and hub scores), whose results where com-
      or equal than the threshold θ.                                            pared against different topologies of the social network.
        Algorithm 3, as well as the latter, defines in the first place            2
                                                                                 http://128.196.40.222:8080/CRI_Indexed_new/login.jsp   [last
      the SWM matrix. Then, according to in Figure 1 (b), for each              accessed 21-10-2010]




SIGKDD Explorations                                                 Volume 12, Issue 2                                                  Page 69
4.1 Dark Web evaluation results and discus-
          sions
        In this section, results obtained for the Dark Web Ansar1
      forum are presented and discussed accordingly. Firstly, the
      definition of topics using graphical models are presented as
      well as brief analysis on main extracted topics. Secondly,
      different network representations are displayed as well as
      benchmark network evaluations. Finally, HITS algorithm
      results for authority and hub scores and discussion are pre-
      sented.

      4.1.1    Topic Extraction
        There are 14 months (Dec. 2008 - Jan. 2010) of data             Figure 2: Social network visualization (a) without
      available. Posts were created by 376 members and extracted        topic-based filtering and (b) topic-based filtering,
      topics where realized over 29.057 posts P and 103.791 words       and (c) topic-based filtering for “Local Conflict” (us-
      in the vocabulary V by using a C++ Gibbs sampling-based           ing creator network representation)
      implementation of LDA3 previously described in Section 3.2.
        Topics extracted where evaluated according to the com-
      pactness of each topic, by the average shortest distance (ASD)    use HITS to analyze the information on this graph, the re-
      among the top 30 words in the topic [27]. The ASD between         sults are that key-members correspond to the three admin-
      two keywords wi and wj is determined by,                          istrators of the community which can also be seen in the
                                                                        graph. Therefore, the huge amount of posts from these mem-
                                                                        bers makes all other interactions look rather small, making
                           max{log(Ni ), log(Nj )} − log(Ni,j )
         ASD(wi , wj ) =                                          (6)   useless the traditional way for key-members detecton. We
                           log(|P| − min{log(Ni ), log(Nj )})           can correct this situation by using eather concept-based or
         where Ni is the number of posts that contains wi , Ni,j is     Topic-based filtering.
      the number of posts that contain both of them, and |P| is            In Figure 2 (b) and (c) we can observe that previously
      the total number of posts in the corpus. By using equation 6      stated three centers are gone, and it is much more diffuse
      [27], a representative number of topics was determined, by        to observe centers of interaction from the graphs. However,
      evaluating the ASD number for k ∈ [10, 50], where 47 was          making an analysis to discover which are the key members
      the number of topics with lowest ASD. Then, topics were           now is faster and provide better results. In fact, when ap-
      manually grouped into concepts categories, used for social        pliying HITS, it is possible to figure out other members,
      network analysis.                                                 rather than administrators, as key-members of the commu-
         In Table 1, the overall concept proposed is “Local Con-        nity. The comparison may be seen in Table 4.
      flicts” as for its main topics are related to terrorist and           More specifically, for “Local Conflicts” topics, in Figure 3
      counter-terrorism activities over different localities, such as    and Figure 4, the visualization for both Reply-oriented and
      Russia, Irak, Somalia, India, and Afghanistan.                    Creator-oriented social networks are presented respectively
         Table 2 is “War on Terror”, as its main topics are related     with their top 8 users using HITS hub score.
      to U.S. activities and relations towards terrorism activities,
      where the proposed topic names are “War”, “American Sol-
      dier”, “Imprisonment”, “Obama”, and “Military Operations”.
         Finally, as shown in Table 3, the overall concept proposed
      is “Recruiting” where topics such as “Religious Propaganda”,
      “Religious Conflicts”, “Ansar ”, “Family”, and “Ideology” are
      included.

      4.1.2    Topic-Based Social Network Visualization
         In Figure 2 (a) the whole corpus social network is pre-
      sented without any filtering methods. This can be inter-
      preted as the complete network built by using the complete
      repliers structure, where the edge ai,j between user i and j
      is defined for every interaction between users. Then in Fig-
      ure 2 (b) and (c) we present the same graphs but filtered
      using topic-based methods. In this case, it is possible to
      see a great density reduction. This suggests that visaliza-       Figure 3: Visualization of the complete “local con-
      tion techniques, such as graphs, can provide better visual        flict” network and their top 8 users in the Reply-
      information to analysts. Other great benefit, is that, since       oriented network for the complete period of the fo-
      we have lesser dense graph, SNA algorithms, such as HITS,         rum activity, using the hub score from HITS.
      PageRank, among others, run in shorter times.
         Furthermore, in Figure 2 (a), it is possible to observe very     In Figure 3 (a) all interactions presented for the “Local
      clearly three centers of interaction (or participation). If we    Conflicts” topic for Reply-oriented network configuration are
      3                                                                 depicted, where there is no relevant information that could
        http://gibbslda.sourceforge.net/ [Last accessed 21-10-
      2010]                                                             be infered or deduced directly. However, by listing top key-




SIGKDD Explorations                                       Volume 12, Issue 2                                                    Page 70
Table 1: Ten most relevant words with their respective conditional probabilities for five topics associated
      between them to the “Local Conflicts” concept from the Ansar1 forum.
                  Topic 1               Topic 8            Topic 13              Topic 17                Topic 19
                  “Eastern Europe”      “Irak Conflicts”    “Somalia Conflicts”    “India Conflicts”        “Afghanistan Conflicts”
                  russian (0.0289)      iraq (0.0885)      somalia (0.0517)      india (0.0334)          afghanistan (0.1023)
                  russia (0.0272)       iraqi (0.0566)     somali (0.0302)       indian (0.0274)         afghan (0.0760)
                  chechnya (0.0211)     baghdad (0.0457)   govern (0.0269)       kashmir (0.0224)        taliban (0.0676)
                  caucasus (0.0180)     mosul (0.0197)     mogadishu (0.0217)    pakistan (0.0180)       nato (0.0327)
                  kill (0.0145)         sunni (0.0151)     shabaab (0.0164)      isi (0.0147)            troop (0.0321)
                  chechen (0.0137)      citi (0.0143)      fight (0.0158)         mumbai (0.0121)         forc (0.0241)
                  oper (0.0127)         forc (0.0125)      islamist (0.0152)     attack (0.0106)         kabul (0.0197)
                  ingushetia (0.0104)   sourc (0.0115)     islam (0.0134)        let (0.0095)            insurg (0.0145)
                  north (0.0096)        aswat (0.0115)     shabab (0.0124)       arrest (0.0088)         karzai (0.0133)
                  moscow (0.0093)       shiit (0.0086)     town (0.0124)         lashkar (0.0087)        elect (0.0130)




      Table 2: Ten most relevant words with their respective conditional probabilities for five topics associated
      between them to the “War on Terror” concept from the Ansar1 forum.
                   Topic 3              Topic 21               Topic 24              Topic 25            Topic 31
                   ”War”                ”American Soldier”     ”Imprisonment”        ”Obama”             ”Military Operations”
                   peopl (0.0114)       islam (0.1238)         prison (0.0194)       presid (0.0146)     cia (0.0130)
                   war (0.0096)         emir (0.0964)          court (0.0137)        obama (0.0143)      report (0.0112)
                   countri (0.0092)     afghanistan (0.0811)   case (0.0104)         govern (0.0113)     million (0.0107)
                   american (0.0091)    provinc (0.0524)       charg (0.0099)        countri (0.0103)    work (0.0103)
                   world (0.0062)       destruct (0.0248)      releas (0.0092)       offici (0.0094)       compani (0.0100)
                   forc (0.0060)        american (0.0158)      detaine (0.0083)      secur (0.0092)      money (0.0095)
                   america (0.0058)     kill (0.0145)          tortur (0.0080)       militari (0.0082)   oper (0.0093)
                   one (0.0055)         soldier (0.0129)       alleg (0.0072)        year (0.0080)       blackwat (0.0082)
                   govern (0.0055)      tank (0.0104)          investig (0.0069)     minist (0.0079)     use (0.0077)
                   militari (0.0050)    enemi (0.0096)         guantanamo (0.0061)   state (0.0077)      state (0.0071)




      Table 3: Ten most relevant words with their respective conditional probabilities for five topics associated
      between them to the “Recruiting” concept from the Ansar1 forum.
                 Topic 0                      Topic 36                Topic 39                Topic 40            Topic 37
                 “Religious Propaganda”       “Religious Conflicts”    “Ansar propaganda”      “Family”            “Ideology”
                 islam (0.0270)               mosqu (0.0267)          ansar (0.0940)          women (0.0272)      muslim (0.0475)
                 movement (0.0247)            muslim (0.0254)         wmv (0.0419)            school (0.0160)     islam (0.0471)
                 allah (0.0217)               protest (0.0163)        info (0.0204)           children (0.0157)   law (0.0116)
                 youth (0.0217)               anti (0.0109)           jihad (0.0188)          year (0.0149)       rule (0.0088)
                 media (0.0211)               peopl (0.0096)          final (0.0184)           old (0.0149)        scholar (0.0087)
                 apost (0.0184)               bodi (0.0089)           ansarnet (0.0140)       woman (0.0126)      say (0.0086)
                 mujahideen (0.0172)          prayer (0.0085)         showthread (0.0132)     men (0.0124)        issu (0.0070)
                 crusad (0.0169)              ramadan (0.0077)        issu (0.0120)           man (0.0121)        group (0.0069)
                 god (0.0149)                 christian (0.0072)      video (0.0115)          girl (0.0109)       state (0.0068)
                 state (0.0133)               grave (0.0069)          media (0.0086)          student (0.0103)    call (0.0065)




      Table 4: Top 9 members ordered by HITS’s hub score over the social network for the complete topics
      evaluation for Creator-oriented network.
                          Complete Network Topic-based (All topics) Topic-based (“Local Conflicts”)
                           User      Hub     User        Hub         User            Hub
                           user1    0.6959  user2       0.0725      user2           0.0785
                           user2    0.1544  user3       0.0383      user3           0.0331
                           user3    0.1496  user4       0.0342      user4           0.0331
                          user24    0.0000  user21      0.0280      user20          0.0289
                          user237   0.0000  user20      0.0259      user5           0.0248
                          user53    0.0000  user8       0.0238      user40          0.0247
                          user36    0.0000  user5       0.0217      user6           0.0208
                          user231   0.0000  user40      0.0217      user13          0.0207
                          user14    0.0000  user13      0.0197      user21          0.0207



      members using computed HITS hub score, it is possible to              are presented from previously described network configura-
      make decissions towards who’s who in this topic-based net-            tion, where members with higher score have stronger arcs
      work. This can be visualized in Figure 3 (b) top 8 hub score          between each other.




SIGKDD Explorations                                        Volume 12, Issue 2                                                        Page 71
5. CONCLUSION
                                                                           We propose to combine traditional SNA with data min-
                                                                        ing techniques in order to produce results which enable to
                                                                        measure social aspects which are not considered by apply-
                                                                        ing SNA alone. This way, we can obtain results closer to
                                                                        reality when performing further analysis like experts detec-
                                                                        tion, sub-groups detection, centrality measures, among other
                                                                        measures, on any social network, virtual community, VCoP,
                                                                        VCoI, and many other human-based interactions.
                                                                           We used our approach to study a VCoI called the Dark
                                                                        Web, for which the Ansar1 forum was used. This forums was
                                                                        collected from Dec 2008 to Jan 2010, and was created by 376
                                                                        members in 29057 posts. By applying latent Dirichlet alloca-
                                                                        tion (LDA) and using average shortest distances (ASD), we
                                                                        were capable to obtain 47 topics using the closes 30 words on
      Figure 4: Visualization of the complete “local con-               each topic. Afterwards, 11 topics were discarded by inspec-
      flict” network and their top 8 users in the Creator-               tion. Finally, we used two different topology representations
      oriented network for the complete period of the fo-               of the forum: a creator-oriented and last reply-oriented net-
      rum activity, using the hub score from HITS.                      works.
                                                                           We showed in which ways SNA alone could be very poor
                                                                        in order to detect key-members which are specifically talking
      4.1.3    Hub Based Social Network Analysis                        about certain subjects. However, when combining a topic-
        In Table 5, the top 5 users are listed (with their respective   based text mining approach with SNA we outperforms SNA
      hub score) for the Creator-oriented network and last Reply-       alone to discover VCoIs’ key members, for which a better
      oriented networks for all topics. In this case, it can be seen    analysis of social networks can be performed.
      that in none of both cases, the users were repeated, which           In our experiments, we draw a complete 14 months social
      states that the topic-based network structure has completely      network which is quite dense and performed SNA. The in-
      different properties.                                              formation obtained was useless as its visual representation
                                                                        does not helps to identify any patterns. However, using our
                                                                        method, we were able to focus on a specific group of top-
      Table 5: Top 5 members ordered by HITS’s hub                      ics to create a topic-based network, which provides specific
      score over the social network for the complete topics             information since it has a lesser density. Then, by using
      evaluation for both Creator-oriented network (type                HITS, key members can be extracted from both network
      1 ) and last Reply-oriented network (type 2 ).                    configurations, and results also are radically different.
                All topics (type 1 ) All topics (type 2 )                  We can say that our proposal can lead to much better re-
                  user39 (0.07985)        user2 (0.2666)
                                                                        sults that applying SNA alone. However, as future work the
                  user70 (0.07985)       user4 (0.2067)
                                                                        overall evaluation of the top ranked members by the HITS
                  user40 (0.07985)       user43 (0.1965)
                                                                        algorithm can be potentiated by experts interviews. Also, by
                  user16 (0.07985)       user13 (0.1781)
                                                                        combining other SNA based measures and techniques, such
                  user53 (0.07985)        user3 (0.1753)
                                                                        as betweenness centrality scores, weighted-HITS, and HITS
                                                                        authoritative scores, the key-members could be identified
                                                                        with a higher performance. Besides, simple visual inspec-
                                                                        tion of topic-based SNA allow to view a perfect sub-group
                                                                        with key members.
                                                                           We can say that we have successfully tested our approach
      Table 6: Top 5 members ordered by HITS’s hub                      and we have shown how to apply it into the Dark Web to
      score over the social network for the “Local Conflict”             discover potential groups of topics (e.g. potential homeland
      concept for both Creator-oriented network (type 1 )               security threats), and key-members that potentiate these
      and Replier-oriented network (type 2 ).                           topics.
        “Local Conflicts” (type 1 ) “Local Conflicts” (type 2 )
                user2 (0.3757)                user2 (0.3001)
                                                                        6. ACKNOWLEDGEMENTS
               user13 (0.2333)               user43 (0.2221)
               user43 (0.2125)                user3 (0.2203)
                                                                          Authors would like to thank the continuous support of
               user25 (0.2055)                user4 (0.2142)
                                                                        “Instituto Sistemas Complejos de Ingenieria” (ICM: P-05-
               user24 (0.1937)               user13 (0.1955)
                                                                        004- F, CONICYT: FBO16); Initiation into Research Fund-
                                                                        ing, project code 11090188, entitled “Semantic Web Min-
                                                                        ing Techniques to Study Enhancements of Virtual Com-
                                                                        munities”; The Social Network Analysis Research Group
         In Table 6, top 5 members are listed, but for the “Local       (sna.dii.uchile.cl); and the Web Intelligence Research
      Conflicts” topic-filtered social network. In this case, the in-     Group (wi.dii.uchile.cl).
      formation that Creator-oriented and last Reply-oriented top
      lists highlights, is that forum members are repeated, show-
      ing that the “Local Conflicts” social network is specially built   7. REFERENCES
      by posts of specific users (like “user2” and “user13”).             [1] A. Abbasi and H. Chen. Applying authorship analysis




SIGKDD Explorations                                        Volume 12, Issue 2                                                   Page 72
to extremist-group web forum messages. IEEE               [20] M. Sageman. A strategy for fighting international
             Intelligent Systems, 20(5):67–75, 2005.                        islamist terrorists. ANNALS of the American Academy
       [2]   A. Abbasi, H. Chen, and A. Salem. Sentiment analysis           of Political and Social Science, 618(1):223–231, 2008.
             in multiple languages: Feature selection for opinion      [21] G. Salton, A. Wong, and C. S. Yang. A vector space
             classification in web forums. ACM Trans. Inf. Syst.,            model for automatic indexing. Commun. ACM, Vol.
             26(3):1–34, 2008.                                              18(11):613–620, 1975.
       [3]   D. Blei, A. Ng, and M. Jordan. Latent dirichlet           [22] S. Wasserman and K. Faust. Social Network Analysis:
             allocation. The Journal of Machine Learning . . . , Jan        Methods and Applications. 1994.
             2003.                                                     [23] D. Xing and M. Girolami. Employing latent dirichlet
       [4]   A. Bourhis, L. Dub´, and R. Jacob. . . . The success of
                                    e                                       allocation for fraud detection in telecommunications.
             virtual communities of practice: The leadership factor.        Pattern Recognition Letters, Vol. 28(13):1727–1734,
             The Electronic Journal of Knowledge . . . , Jan 2005.          2007.
       [5]   R. B. Bradford. Application of latent semantic            [24] J. Xu and H. Chen. Crimenet explorer: a framework
             indexing in generating graphs of terrorist networks.           for criminal network knowledge discovery. ACM
             pages 674–675, 2006.                                           Transactions on Information Systems (TOIS), Jan
       [6]   L. C. Freeman. Centrality in social networks                   2005. Aqui sale bien el blockmodeling.
             conceptual clarification. Social Networks, 1(3):215 –      [25] J. Xu and H. Chen. The topology of dark networks.
             239, 1978.                                                     Commun. ACM, 51(10):58–65, 2008.
       [7]   W. Kim, O.-R. Jeong, and S.-W. Lee. On social web         [26] L. Yang, F. Liu, J. Kizza, and R. Ege. Discovering
             sites. Information Systems, 35(2):215–236, 2010.               topics from dark websites. In CICS ’09: IEEE
       [8]   J. M. Kleinberg. Authoritative sources in a                    Symposium on Computational Intelligence in Cyber
             hyperlinked environment. J. ACM, 46(5):604–632,                Security., pages 175–179. IEEE, 2009.
             1999.                                                     [27] L. Yang, F. Liu, J. Kizza, and R. Ege. Discovering
       [9]   M. Kosonen. Knowledge sharing in virtual                       topics from dark websites. Computational Intelligence
             communities – a review of the empirical research. Int.         in Cyber Security, 2009. CICS ’09. IEEE Symposium
             J. Web Based Communities, 5(2):144–163, 2009.                  on, pages 175 – 179, 2009.
      [10]   H. Kwak, Y. Choi, Y.-H. Eom, H. Jeong, and S. Moon.       [28] Y. Zhang, S. Zeng, L. Fan, Y. Dang, C. A. Larson,
             Mining communities in networks: a solution for                 and H. Chen. Dark web forums portal: searching and
             consistency and its evaluation. pages 301–314, 2009.           analyzing jihadist forums. pages 71–76, 2009.
      [11]   A. McCallum, X. Wang, and A. Corrada-Emmanuel.            [29] Y. Zhou, E. Reid, J. Qin, H. Chen, and G. Lai. Us
             Topic and role discovery in social networks with               domestic extremist groups on the web: Link and
             experiments on enron and academic email. Journal of            content analysis. IEEE Intelligent Systems,
             Artificial . . . , Jan 2007.                                    20(5):44–51, 2005.
      [12]   R. D. Nolker and L. Zhou. Social computing and
             weighting to identify member roles in online
             communities. Web Intelligence, IEEE / WIC / ACM
             International Conference on, 0:87–93, 2005.
      [13]   N. Pathak, C. Delong, A. Banerjee, and K. Erickson.
             Social topic models for community extraction. Aug
             2008.
      [14]   U. Pfeil and P. Zaphiris. Investigating social network
             patterns within an empathic online community for
             older people. Computers in Human Behavior,
             25(5):1139–1155, 2009.
      [15]   X. H. Phang and C. Nguyen. Gibbslda++, 2008.
      [16]   C. E. Porter. A typology of virtual communities: A
             multi-disciplinary foundation for future research.
             Journal of Computer-Mediated Communication,
             10(1):00, 2004.
      [17]   G. Probst and S. Borzillo. Why communities of
             practice succeed and why they fail. European
             Management Journal, 26(5):335–347, 2008.
      [18]   E. Reid, J. Qin, Y. Zhou, G. Lai, M. Sageman,
             G. Weimann, and H. Chen. Collecting and analyzing
             the presence of terrorists on the web: A case study of
             jihad websites. pages 402–411, 2005.
      [19]   S. A. R´ F. Aguilera, and L. Guerrero. Virtual
                      ıos,
             communities of practice’s purpose evolution analysis
             using a concept-based mining approach.
             Knowledge-Based and Intelligent Information and
             Engineering Systems, 2:480–489, 2009.




SIGKDD Explorations                                      Volume 12, Issue 2                                                  Page 73

Más contenido relacionado

La actualidad más candente

Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...Guillaume Cabanac
 
Audience management in social media: Affordances, cultural differences, and i...
Audience management in social media: Affordances, cultural differences, and i...Audience management in social media: Affordances, cultural differences, and i...
Audience management in social media: Affordances, cultural differences, and i...Jan Schmidt
 
Fusion of Bandwidth on Demand and Virtual Organizations
Fusion of Bandwidth on Demand and Virtual OrganizationsFusion of Bandwidth on Demand and Virtual Organizations
Fusion of Bandwidth on Demand and Virtual OrganizationsEd Dodds
 
Fusion of bandwidth on demand and virtual organizations
Fusion of bandwidth on demand and virtual organizationsFusion of bandwidth on demand and virtual organizations
Fusion of bandwidth on demand and virtual organizationsHarold Teunissen
 
The Connected Republic 2.0: New Possibilities & New Value for the Public Sector
The Connected Republic 2.0: New Possibilities & New Value for the Public SectorThe Connected Republic 2.0: New Possibilities & New Value for the Public Sector
The Connected Republic 2.0: New Possibilities & New Value for the Public Sectortheconnectedrepublic
 
Torlina integrationofknowledge-2004
Torlina integrationofknowledge-2004Torlina integrationofknowledge-2004
Torlina integrationofknowledge-2004Daniel Policarpo
 
Crisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeCrisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeAxel101
 
Distributed cognition
Distributed cognitionDistributed cognition
Distributed cognitionHongbo Zhang
 
The research of cognitive communication networks
The research of cognitive communication networksThe research of cognitive communication networks
The research of cognitive communication networksWatcharanon Over
 
Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...Monika Steinberg
 
Workshop distributed cognition
Workshop distributed cognitionWorkshop distributed cognition
Workshop distributed cognitionKai Pata
 
Network analysis in the social sciences
Network analysis in the social sciencesNetwork analysis in the social sciences
Network analysis in the social sciencesJorge Pacheco
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemMarco Grassi
 
Entrepreneuring In Social Networks
Entrepreneuring In Social NetworksEntrepreneuring In Social Networks
Entrepreneuring In Social Networksvia fCh
 
Defining the IT artefact in social media for eParticipation: An Ensemble view
Defining the IT artefact in social media for eParticipation: An Ensemble viewDefining the IT artefact in social media for eParticipation: An Ensemble view
Defining the IT artefact in social media for eParticipation: An Ensemble viewMarius Rohde Johannessen
 
Distributed Cognition and The Social Web
Distributed Cognition and The Social WebDistributed Cognition and The Social Web
Distributed Cognition and The Social WebBrynn Evans
 
SCOReD-UniTEN 2010 Managing Personal Knowledge
SCOReD-UniTEN 2010 Managing Personal KnowledgeSCOReD-UniTEN 2010 Managing Personal Knowledge
SCOReD-UniTEN 2010 Managing Personal KnowledgeShahrinaz Ismail
 

La actualidad más candente (20)

Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...
Musings at the Crossroads of Digital Libraries, Information Retrieval, and Sc...
 
The WeKnowIt Project
The WeKnowIt ProjectThe WeKnowIt Project
The WeKnowIt Project
 
Audience management in social media: Affordances, cultural differences, and i...
Audience management in social media: Affordances, cultural differences, and i...Audience management in social media: Affordances, cultural differences, and i...
Audience management in social media: Affordances, cultural differences, and i...
 
Fusion of Bandwidth on Demand and Virtual Organizations
Fusion of Bandwidth on Demand and Virtual OrganizationsFusion of Bandwidth on Demand and Virtual Organizations
Fusion of Bandwidth on Demand and Virtual Organizations
 
Fusion of bandwidth on demand and virtual organizations
Fusion of bandwidth on demand and virtual organizationsFusion of bandwidth on demand and virtual organizations
Fusion of bandwidth on demand and virtual organizations
 
The Connected Republic 2.0: New Possibilities & New Value for the Public Sector
The Connected Republic 2.0: New Possibilities & New Value for the Public SectorThe Connected Republic 2.0: New Possibilities & New Value for the Public Sector
The Connected Republic 2.0: New Possibilities & New Value for the Public Sector
 
Torlina integrationofknowledge-2004
Torlina integrationofknowledge-2004Torlina integrationofknowledge-2004
Torlina integrationofknowledge-2004
 
Crisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeCrisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 Age
 
Distributed cognition
Distributed cognitionDistributed cognition
Distributed cognition
 
The research of cognitive communication networks
The research of cognitive communication networksThe research of cognitive communication networks
The research of cognitive communication networks
 
Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...
 
Workshop distributed cognition
Workshop distributed cognitionWorkshop distributed cognition
Workshop distributed cognition
 
Network analysis in the social sciences
Network analysis in the social sciencesNetwork analysis in the social sciences
Network analysis in the social sciences
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit system
 
Entrepreneuring In Social Networks
Entrepreneuring In Social NetworksEntrepreneuring In Social Networks
Entrepreneuring In Social Networks
 
Defining the IT artefact in social media for eParticipation: An Ensemble view
Defining the IT artefact in social media for eParticipation: An Ensemble viewDefining the IT artefact in social media for eParticipation: An Ensemble view
Defining the IT artefact in social media for eParticipation: An Ensemble view
 
Distributed Cognition and The Social Web
Distributed Cognition and The Social WebDistributed Cognition and The Social Web
Distributed Cognition and The Social Web
 
SCOReD-UniTEN 2010 Managing Personal Knowledge
SCOReD-UniTEN 2010 Managing Personal KnowledgeSCOReD-UniTEN 2010 Managing Personal Knowledge
SCOReD-UniTEN 2010 Managing Personal Knowledge
 

Destacado

The Dark Web
The Dark WebThe Dark Web
The Dark WebJan Siy
 
Children and Online Safety
Children and Online SafetyChildren and Online Safety
Children and Online Safetyalandang
 
The Dark Side Of The Web
The Dark Side Of The WebThe Dark Side Of The Web
The Dark Side Of The Webmshin
 
The dark web darwin de leon
The dark web   darwin de leonThe dark web   darwin de leon
The dark web darwin de leonDarwin de Leon
 
The dark web
The dark webThe dark web
The dark webBella M
 
The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...
The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...
The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...Skyhigh Networks
 
Deep web (amatuer level)
Deep web (amatuer level)Deep web (amatuer level)
Deep web (amatuer level)Ali Saif Mirza
 
Deep Dark Web - How to get inside?
Deep Dark Web - How to get inside?Deep Dark Web - How to get inside?
Deep Dark Web - How to get inside?Anshu Prateek
 
The Dark web - Why the hidden part of the web is even more dangerous?
The Dark web - Why the hidden part of the web is even more dangerous?The Dark web - Why the hidden part of the web is even more dangerous?
The Dark web - Why the hidden part of the web is even more dangerous?Pierluigi Paganini
 
Deep Web
Deep WebDeep Web
Deep WebSt John
 

Destacado (17)

The Dark Web
The Dark WebThe Dark Web
The Dark Web
 
Children and Online Safety
Children and Online SafetyChildren and Online Safety
Children and Online Safety
 
The Dark Side Of The Web
The Dark Side Of The WebThe Dark Side Of The Web
The Dark Side Of The Web
 
The dark web darwin de leon
The dark web   darwin de leonThe dark web   darwin de leon
The dark web darwin de leon
 
Dark web
Dark webDark web
Dark web
 
The dark web
The dark webThe dark web
The dark web
 
European Union
European UnionEuropean Union
European Union
 
The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...
The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...
The Cloud Economy: 11 Essential Trends About How Companies Connect to Each Ot...
 
The Dark Web
The Dark WebThe Dark Web
The Dark Web
 
The Dark Side of the Web
The Dark Side of the WebThe Dark Side of the Web
The Dark Side of the Web
 
Deep web (amatuer level)
Deep web (amatuer level)Deep web (amatuer level)
Deep web (amatuer level)
 
The Dark Web
The Dark WebThe Dark Web
The Dark Web
 
The dark web
The dark webThe dark web
The dark web
 
Deep Dark Web - How to get inside?
Deep Dark Web - How to get inside?Deep Dark Web - How to get inside?
Deep Dark Web - How to get inside?
 
The Deep and Dark Web
The Deep and Dark WebThe Deep and Dark Web
The Deep and Dark Web
 
The Dark web - Why the hidden part of the web is even more dangerous?
The Dark web - Why the hidden part of the web is even more dangerous?The Dark web - Why the hidden part of the web is even more dangerous?
The Dark web - Why the hidden part of the web is even more dangerous?
 
Deep Web
Deep WebDeep Web
Deep Web
 

Similar a topic based sna

Network Structure For Social Network
Network Structure For Social NetworkNetwork Structure For Social Network
Network Structure For Social NetworkVaibhav Sathe
 
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital businessStrategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital businessMarco Brambilla
 
Introduction to Social Computing - Book Chapter
Introduction to Social Computing - Book ChapterIntroduction to Social Computing - Book Chapter
Introduction to Social Computing - Book ChapterZaffar Ahmed Shaikh
 
Workshopvin2 A Socio Legal View On Virtual Individual Networks
Workshopvin2 A Socio Legal View On Virtual Individual NetworksWorkshopvin2 A Socio Legal View On Virtual Individual Networks
Workshopvin2 A Socio Legal View On Virtual Individual Networksimec.archive
 
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...Rick Vogel
 
Summary of my Doctoral Research, Interests
Summary of my Doctoral Research, InterestsSummary of my Doctoral Research, Interests
Summary of my Doctoral Research, InterestsMeena Nagarajan
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeHarish Vaidyanathan
 
SMiLE project poster - Digital Literacies Conference - #caasoton #sotonde
SMiLE project poster - Digital Literacies Conference - #caasoton #sotondeSMiLE project poster - Digital Literacies Conference - #caasoton #sotonde
SMiLE project poster - Digital Literacies Conference - #caasoton #sotondeNicole Beale
 
Decentralized social network
Decentralized social networkDecentralized social network
Decentralized social networkRichard Philip
 
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docxCitation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docxrichardnorman90310
 
2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networksMarc Smith
 
Summer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationSummer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationUniversity of Maryland
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for socialFiras Husseini
 
Niche online social networks
Niche online social networksNiche online social networks
Niche online social networksCarlos Osorio
 

Similar a topic based sna (20)

Network Structure For Social Network
Network Structure For Social NetworkNetwork Structure For Social Network
Network Structure For Social Network
 
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital businessStrategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
 
Introduction to Social Computing - Book Chapter
Introduction to Social Computing - Book ChapterIntroduction to Social Computing - Book Chapter
Introduction to Social Computing - Book Chapter
 
Workshopvin2 A Socio Legal View On Virtual Individual Networks
Workshopvin2 A Socio Legal View On Virtual Individual NetworksWorkshopvin2 A Socio Legal View On Virtual Individual Networks
Workshopvin2 A Socio Legal View On Virtual Individual Networks
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
 
Summary of my Doctoral Research, Interests
Summary of my Doctoral Research, InterestsSummary of my Doctoral Research, Interests
Summary of my Doctoral Research, Interests
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
SMiLE project poster - Digital Literacies Conference - #caasoton #sotonde
SMiLE project poster - Digital Literacies Conference - #caasoton #sotondeSMiLE project poster - Digital Literacies Conference - #caasoton #sotonde
SMiLE project poster - Digital Literacies Conference - #caasoton #sotonde
 
sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Decentralized social network
Decentralized social networkDecentralized social network
Decentralized social network
 
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docxCitation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
 
2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks
 
2_Doc5_2.pdf
2_Doc5_2.pdf2_Doc5_2.pdf
2_Doc5_2.pdf
 
Summer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationSummer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social Participation
 
Q046049397
Q046049397Q046049397
Q046049397
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for social
 
Creating Knowledge Sharing Networks
Creating Knowledge Sharing NetworksCreating Knowledge Sharing Networks
Creating Knowledge Sharing Networks
 
Niche online social networks
Niche online social networksNiche online social networks
Niche online social networks
 

Más de Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 

Más de Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 

Último

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Último (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

topic based sna

  • 1. Topic-Based Social Network Analysis for Virtual Communities of Interests in the Dark Web Gaston L’Huillier Hector Alvarez University of Chile University of Chile Republica 701 Republica 701 Santiago, Chile Santiago, Chile glhuilli@dcc.uchile.cl halvarez@ing.uchile.cl Sebastián A. Ríos Felipe Aguilera University of Chile University of Chile Republica 701 Blanco Encalada 2120 Santiago, Chile Santiago, Chile srios@dii.uchile.cl faguiler@dcc.uchile.cl ABSTRACT 1. INTRODUCTION The study of extremist groups and their interaction is a crucial task in order to maintain homeland security and “The process of radicalization continues in a hostile peace. Tools such as social networks analysis and text min- physical environment, but it is enabled by the Internet, ing have contributed to their understanding in order to de- resulting in a disconnected, decentralized social structure”. velop counter-terrorism applications. This work addresses – Marc Sageman [20] the topic-based community key-members extraction problem, Today, it is no mystery that terrorists and cyber-criminals for which our method combines both text mining and so- are looking for uncovered means to seek peers with the same cial network analysis techniques. This is achieved by first interests such as internet forums, blogs, and social networks, applying latent Dirichlet allocation to build two topic-based where they can share and comment their feelings and inter- social networks in online forums: one social network oriented ests with other that support their cause. In this sense, Dark towards the thread creator point-of-view, and the other is Web1 has brought endless potential to achieve the coordi- oriented towards the repliers of the overall forum. Then, nation, propaganda delivery, and other unwanted interac- by using different network analysis measures, topic-based tions between extremists groups. However, web information key members are evaluated using as benchmark a social net- can be retrieved for further analysis. Many researchers have work built a plain representation of the network of posts. been wondering whether the leaders of the Dark Web can Experiments were successfully performed using an English be identified by their interaction with other members. language based forum available in the Dark Web portal. The Dark Web is conformed by Virtual Communities of Interests (VCoI) [9, 16] are communities which members are Categories and Subject Descriptors reunited by their shared interests in a topic, like fan clubs of artists. Therefore, a key aspects when studying VCoI is H.2.8 [Database Management]: Database Applications— to achieve a complete understanding on what are the main Data Mining; K.4.1 [Computing Milieux]: Computers interests of the community. Only then, it will be possible to and Society—Public Policy Issues obtain a better insight of community’s social aspects, leading to a better identification of the key members (i.e. Opinion General Terms Leaders), density reduction, among other benefits. Different approaches have been previously developed for Dark Web, Social Network Analysis, Text Mining Social Networks, Virtual Communities [11], and Virtual Com- munities of Practices [4, 17]. To the best of our knowledge, Keywords none Latent Semantic Analysis (LSA) for enhancing the So- cial Network Analysis (SNA) has ever been developed in Terrorism Informatics, Terrorism knowledge portals, Latent Dark Web portals before. However, we think the applica- Dirichlet Allocation, Virtual Communities of Interest tion of these techniques may enhance greatly SNA results. This work’s main contribution is the hybrid approach us- ing topic-models and SNA for enhancing the extraction of key-members in a VCoI. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are This paper is structured as follows: In section 2, previous not made or distributed for profit or commercial advantage and that copies work on SNA and text mining for key members identification bear this notice and the full citation on the first page. To copy otherwise, to is presented, as well as previous work on the Dark Web fo- republish, to post on servers or to redistribute to lists, requires prior specific rums analysis. Then, in section 3 the proposed methodology permission and/or a fee. 1 ISI-KDD 2010 , July 25, 2010, Washington, D.C., USA. Internet-based forums or platforms for terrorists and cyber- Copyright 2010 ACM ISBN 978-1-4503-0223-4/10/07 ...$10.00. criminals. SIGKDD Explorations Volume 12, Issue 2 Page 66
  • 2. to solve the topic-based community key members extraction [29], aims towards the capture of domestic Web forums and problem and main contribution of this work is described. In the creation of a social network mapping to identify their section 4 the experimental setup is detailed, and its results structure and cluster affinities. Finally, in [28] a complete are discussed. Finally in section 5, the main conclusions of framework on how to build a portal of Dark Web forums is this paper and future work are presented. presented, where data collection and integration, as well as its visualization and open access, are the main contributions 2. PREVIOUS WORK of the authors. Previous work on key-members identification on the Dark Since Dark Web is a VCoI, there are different goals as- Web [24], describes how several centrality measures can be sociated to their members’ objectives [7]. In this case, the used to identify different key-members of a social network. support of the community with an online forum where its Here, the degree, betweenness, and closeness measures [6] anonymity, ubiquitous, and free-of-speech makes the perfect have been described as tools to characterize key-members of environment to share fundamentalism and terrorism propa- a given social network. It is important to highlight that this ganda. Therefore, a method to recognize the underlying approach has been traditionally used in previous work. members’ objectives is needed in order to identify threats or Latent semantic analysis has been previously applied in security issues. Topic-based SNA is a way to track, in some different Dark Web applications, such as [5] where Latent degree, topics on members’ thread, which can be related to Semantic Indexing was used to link nodes to certain topics members’ goals. in the network construction. Furthermore, in [26], an latent In the following, previous work on topic-based SNA is pre- Dirichlet allocation (LDA) based web crawling framework is sented, as well as related work on Dark Web Social Network proposed to discover different topics from Dark Web forum Analysis applications. cites, but not further social network analysis applications 2.1 Topic-Based Social Network Analysis with these findings. Thus, leaving without measuring many social aspects from the Dark Web social structure. SNA [22] helps to understand relationships in a given com- In terms of text mining in Dark Web forums [2], appli- munity analyzing its graph representation. Users are seen as cations have been oriented to model assessment and feature nodes and relations among users are seen as arcs. This way, selection in order to improve the classification of messages several techniques have been proposed to extract key mem- that contains sensitive information on extremists’ opinions bers [12], classify users according his relevance within the and sentiments. Also, authorship analysis [1] has been previ- community [14], discovering and describing resulting sub- ously applied in order to analyze the groups’ authorship ten- communities [10], amongst other applications. However, all dencies, and more important, tackling the anonymity prob- these approaches leave aside the meaning of relationships lem associated to this type of virtual communities. among users. Therefore, analysis based only on reply of mails or posts to measure relationships’ strongness or weak- ness it is not a good indicator. 3. PROPOSED METHOD McCallum et al. in [11] described how to determine roles The main question of this work is how to enhance the and topics in a text-based social networks by building Author- key-members discovery based on a topic-based social net- Recipient-Topic (ART) and Role-Author-Recipient-Topic work structure. The first step is to obtain a reduced/filtered (RART) models. Furthermore, in Pathak et al. [13], a com- representation of the inner social community. However, this munity based topic-Model integrated social network anal- representation must be created in such way that informa- ysis technique (Community-Author-Recipient-Topic model tion contained is better to discover key-members (aligned to or CART) is proposed to extract communities from a emails community’s goals and members who produce interaction). corpus based on the topics covered by different members of The second step is to apply a core members’ algorithm to the overall network. These approaches novelty is the use of obtain the key-members of a virtual community based on data mining on text from the social network to perform SNA their respective topics of interest. As a result, we obtain the to study Roles or Sub-groups extraction. rank of the complete community members where for each community interesting topic, we are able to reveal members 2.2 Social Network Analysis on the Dark Web that can be considered as experts or key on that topics. As described by Xu & Chen in [25], the topology of dark We must mention the difference between a key-member networks share different properties with other types of net- and a highly-radical member, since key members have sev- works, where small-world structures are determined by the eral characteristics that define them. Firstly, a key member information flow properties, characterized by short average may be highly-radical member in a given topic or not. Sec- path and a high clustering coefficient. In this sense, so- ondly, he/she may increase the interaction in the community cial network analysis tools used in other virtual communi- because he/she points out interesting messages, which pro- ties applications [19], could be useful to analyze this kind duce replies from different level of members. We will focus of structures. They combine concepts obtained from com- on key-members of any kind, afterwards, depending on the munities’ administrators into a concept-based text mining, topic it would be possible to classify him/her as a radical which allow to obtain interesting results in terms of pur- member or not. Of course, it is possible to automate the pose accomplishment in a Virtual Community of Practice process of radical member discovery, however, more work (VCoP). is needed to avoid subjectivity. Furthermore, accusing or In Reid et al. [18], Web forums where analyzed in or- pointing such label on a person is not a simple matter and der to determine whether a given community has been in- implications are far beyond this work. volved with terrorist presence by using automated and semi- We defined a key-member as a person totally aligned with automated procedures to gather information and analyze it. the VCoIs’ goals and topics. Thus, producing contents which Other applications, such as the proposed by Zhou et al. in are very relevant to satisfy other members’ interests. The SIGKDD Explorations Volume 12, Issue 2 Page 67
  • 3. only way to measure a key-member as defined, is using an hy- where p(zs |θ) can be represented by the random variable i brid approach of SNA combined with latent semantic-based θi , such that topic zs is presented in document i (zs = 1). text mining. This way, we can discover threatening topics, A final expression can be deduced by integrating equation 2 and perform specific analysis on each one. over the random variable θ and summing over topics z ∈ T . Given this, the marginal distribution of a message can be 3.1 Basic Notation defined as follows: Let us introduce some concepts. In the following, let V a vector of words that defines the vocabulary to be used. Z S Y X ! We will refer to a word w, as a basic unit of discrete data, s p(w|α, β) = p(θ|α) p(zs |θ)p(w |zs , β) dθ (3) indexed by {1, ..., |V|}. A post message is a sequence of S s=1 zs ∈T words defined by w = (w1 , ..., wS ), where ws represents the sth word in the message. Finally, a corpus is defined by a The final goal of LDA is to estimate previously described collection of P post messages denoted by C = (w1 , ..., w|P| ). distributions to build a generative model for a given cor- A vectorial representation of the posts corpus is given by pus of messages. There are several methods developed for TF-IDF= (mij ), i ∈ {1, . . . , |V|} and j ∈ {1, . . . , |P|} , where making inference over these probability distributions such as mij is the weight associated to whether a given word is more variational expectation-maximization [3], a variational dis- important than another one in a document. The mij weights crete approximation of equation 3 empirically used by [23], considered in this research is defined as an improvement of and by a Gibbs sampling Markov chain Monte Carlo model the tf-idf term [21] (term frequency times inverse document efficiently implemented and applied by [15]. frequency), defined by 3.3 Network Configuration „ « To build the social network, the members’ interaction nij |C| mij = P|V| × log (1) must be taken into consideration. In general, members’ ac- nkj ni tivity is followed according to its participation on the forum. k=1 Likewise, participation appears when a member post in the where nij is the frequency of the ith word in the j th doc- community. Because the activity of the VCoI is described ument and ni is the number of documents containing word according members’ participation, the network will be con- i. The tf-idf term is a weighted representation of the im- figured according to the following: Nodes will be the VCoI portance of a given word in a document that belongs to a members, and arcs will represent interaction between them. collection of documents. The term frequency (TF) indicates How to link the members and how to measure their interac- the weight of each word in a document, while the inverse tions to complete the network is our main concern. document frequency (IDF) states whether the word is fre- In this work, will be describe two VCoIs’ network repre- quent or uncommon in the document, setting a lower or sentation according the following replying schema of mem- higher weight respectively. bers: 3.2 Topic Modeling 1. Creator-oriented Network: When a member cre- A topic model can be considered as a probabilistic model ate a thread, every reply will be related to him/her. that relates documents and words through variables which This network representation is the less dense network represent the main topics inferred from the text itself. In (density is measured in terms of the number of arcs this context, a document can be considered as a mixture of that the network have). topics, represented by probability distributions which can generate the words in a document given these topics. The 2. Last Reply-oriented Network: Every reply of a inferring process of the latent variables, or topics, is the key thread will be a response of the last post. This network component of this model, whose main objective is to learn representation has a middle density. from text data the distribution of the underlying topics in a given corpus of text documents. In figure 1 the latter two approaches of network conver- A main topic model is the Latent Dirichlet Allocation sion of the forum is presented. In figure 1, arcs represents (LDA) [3]. LDA is a Bayesian model where latent topics members’ replies and nodes represent the users who made of documents are inferred from estimated probability dis- the posts. In our first approach, the weight of arcs will be a tributions over the training dataset. The key idea of LDA, counter of how many times a given member replies to other is that every topic is modeled as a probability distribution one. over the set of words represented by the vocabulary (w ∈ V), and every document as a probability distribution over a set of topics (T ). These distributions are sampled from multi- nomial Dirichlet distributions. For LDA, given the smoothing parameters β and α, and a joint distribution of a topic mixture θ, the idea is to de- termine the probability distribution to generate from a set of topics T , a message composed by a set of S words w (w = (w1 , ..., wS )), S Y Figure 1: Two different network models to represent p(θ, z, w|α, β) = p(θ|α) p(zs |θ)p(ws |zs , β) (2) a given forum interaction. s=1 SIGKDD Explorations Volume 12, Issue 2 Page 68
  • 4. In order to consider the reply of members according to Algorithm 1 Initialize Semantic Weights Matrix the community purpose (for any of these configurations), Input: V (Vocabulary) and to filter noisy posts, a topic-based message reduction is Input: P (Posts) performed. According to previous network configurations, Input: k (Number of Topics) the topic-based filtering method is used in order to remove Output: Semantic Weights Matrix SWM[|P|, k] all replies that are not according to the posts’ topic. Here, 1: TF-IDF[|P|, |V|] (Eq. 1) by using LDA, a list of keywords for each topic is deter- 2: SM[k V] ← Build SM (semantic matrix) according to Top- mined, and later, used to build the final version of the social ics (Sec. 3.2) network. 3: SWM[|P|, k] ← TF-IDF × SMT 3.4 Topic-based Network Filtering Previous work [19] brings a method to evaluate commu- Algorithm 2 Creator-oriented Network nity goals accomplishment. In this work we will use this Input: P (Posts) approach to classify the members’ posts according VCoIs’ Output: Network Gc = (N , A) goals. These goals are defined as a set of terms, which are 1: Build SWM according to Algorithm 1 composed by a set of keywords or statements in natural lan- 2: Initialize N = {}, A = {} guage. 3: for each i ∈ P do The idea is to compare with euclidean distance two mem- 4: N ←N ∪i bers’ posts. If the distance is over a certain threshold θ, an 5: end for interaction will be considered between them. We support 6: for each i ∈ P.creator do the idea that this will help us to avoid irrelevant interac- 7: for each j ∈ {i.replies}, i = j do tions. For example, in a VCoI with k goals (or topics), let 8: if dm (Pi , Pj ) ≥ θ then T Bj a post of user j that is a reply to post of user i (T Bi ). 9: ai,j ← ai,j + 1 The distance between them will be calculated with equa- 10: A ← A ∪ ai,j tion 4. 11: end if 12: end for P 13: end for gik gjk dm (T Bi , T Bj ) = qP k P (4) 2 2 k gik k gjk post username i, the arc weight ai,j is increased according Where gik is the score of topic k in post of user i. It is to its number of all repliers j whose messages’ distance are clear that the distance exists only if T Bj is a reply to T Bi . greater or equal than a threshold θ. After that, the weight of arc ai,j is calculated according to equation 5. Algorithm 3 Reply-oriented Network X Input: {V, P, k} ai,j = d(T Bi , T Bj ) (5) Output: Network Gc = (N , A) i,j 1: . . . (as presented in algorithm 2) dm (T Bi ,T Bj )≥θ 2: for each i ∈ P do We used this weight in both configurations previously de- 3: for each j ∈ {i.reply}, i = j do scribed (Creator-oriented and Last Reply-oriented). After- 4: if dm (Pi , Pj ) ≥ θ then wards, we applied HITS [8] to find the key members on the 5: ai,j ← ai,j + 1 different network configurations. 6: A ← A ∪ ai,j 7: end if 3.5 Network Construction 8: end for First, as a baseline proceedure, Algorithm 1 shows how 9: end for to determine the Semantic Weights Matrix, that assigns an score to every post according to all topics considered for the network construction. In this case, the algorithm intially determines the TF-IDF matrix according to Equation 1, the 4. EXPERIMENTAL SETUP AND RESULTS a semantic matrix SM according to topic-based text mining The proposed methodology was applied in the Ansar1 En- described in Section 3.2 and Section 3.1 respectively. Finally, glish language based forum, available on the Dark Web por- the matrix multiplication between TF-IDF and SM defines the tal2 , for which examples of the topic extraction methodol- SWM matrix. ogy, network construction, and key-members using the HITS Algorithm 2 presents the pseudo-code on the graph Gc = algorithm were determined. (N , A) is build by using the Creator-oriented network. First, In the following, an analysis of topics extracted using the SWM matrix is build according to Algorithm 1. Then, LDA (described in Section 3.2) is presented. Then, the net- considering all of the posts P, the network is built following work topology construction by using both Reply-oriented the structure presented in Figure 1 (a). This is, for each post and Creator-oriented structures for the whole period is de- creator i, the arc weight ai,j is increased according to the scribed. Finally, key-members where determined using HITS number of repliers j, that their messages’ distance greater (both authority and hub scores), whose results where com- or equal than the threshold θ. pared against different topologies of the social network. Algorithm 3, as well as the latter, defines in the first place 2 http://128.196.40.222:8080/CRI_Indexed_new/login.jsp [last the SWM matrix. Then, according to in Figure 1 (b), for each accessed 21-10-2010] SIGKDD Explorations Volume 12, Issue 2 Page 69
  • 5. 4.1 Dark Web evaluation results and discus- sions In this section, results obtained for the Dark Web Ansar1 forum are presented and discussed accordingly. Firstly, the definition of topics using graphical models are presented as well as brief analysis on main extracted topics. Secondly, different network representations are displayed as well as benchmark network evaluations. Finally, HITS algorithm results for authority and hub scores and discussion are pre- sented. 4.1.1 Topic Extraction There are 14 months (Dec. 2008 - Jan. 2010) of data Figure 2: Social network visualization (a) without available. Posts were created by 376 members and extracted topic-based filtering and (b) topic-based filtering, topics where realized over 29.057 posts P and 103.791 words and (c) topic-based filtering for “Local Conflict” (us- in the vocabulary V by using a C++ Gibbs sampling-based ing creator network representation) implementation of LDA3 previously described in Section 3.2. Topics extracted where evaluated according to the com- pactness of each topic, by the average shortest distance (ASD) use HITS to analyze the information on this graph, the re- among the top 30 words in the topic [27]. The ASD between sults are that key-members correspond to the three admin- two keywords wi and wj is determined by, istrators of the community which can also be seen in the graph. Therefore, the huge amount of posts from these mem- bers makes all other interactions look rather small, making max{log(Ni ), log(Nj )} − log(Ni,j ) ASD(wi , wj ) = (6) useless the traditional way for key-members detecton. We log(|P| − min{log(Ni ), log(Nj )}) can correct this situation by using eather concept-based or where Ni is the number of posts that contains wi , Ni,j is Topic-based filtering. the number of posts that contain both of them, and |P| is In Figure 2 (b) and (c) we can observe that previously the total number of posts in the corpus. By using equation 6 stated three centers are gone, and it is much more diffuse [27], a representative number of topics was determined, by to observe centers of interaction from the graphs. However, evaluating the ASD number for k ∈ [10, 50], where 47 was making an analysis to discover which are the key members the number of topics with lowest ASD. Then, topics were now is faster and provide better results. In fact, when ap- manually grouped into concepts categories, used for social pliying HITS, it is possible to figure out other members, network analysis. rather than administrators, as key-members of the commu- In Table 1, the overall concept proposed is “Local Con- nity. The comparison may be seen in Table 4. flicts” as for its main topics are related to terrorist and More specifically, for “Local Conflicts” topics, in Figure 3 counter-terrorism activities over different localities, such as and Figure 4, the visualization for both Reply-oriented and Russia, Irak, Somalia, India, and Afghanistan. Creator-oriented social networks are presented respectively Table 2 is “War on Terror”, as its main topics are related with their top 8 users using HITS hub score. to U.S. activities and relations towards terrorism activities, where the proposed topic names are “War”, “American Sol- dier”, “Imprisonment”, “Obama”, and “Military Operations”. Finally, as shown in Table 3, the overall concept proposed is “Recruiting” where topics such as “Religious Propaganda”, “Religious Conflicts”, “Ansar ”, “Family”, and “Ideology” are included. 4.1.2 Topic-Based Social Network Visualization In Figure 2 (a) the whole corpus social network is pre- sented without any filtering methods. This can be inter- preted as the complete network built by using the complete repliers structure, where the edge ai,j between user i and j is defined for every interaction between users. Then in Fig- ure 2 (b) and (c) we present the same graphs but filtered using topic-based methods. In this case, it is possible to see a great density reduction. This suggests that visaliza- Figure 3: Visualization of the complete “local con- tion techniques, such as graphs, can provide better visual flict” network and their top 8 users in the Reply- information to analysts. Other great benefit, is that, since oriented network for the complete period of the fo- we have lesser dense graph, SNA algorithms, such as HITS, rum activity, using the hub score from HITS. PageRank, among others, run in shorter times. Furthermore, in Figure 2 (a), it is possible to observe very In Figure 3 (a) all interactions presented for the “Local clearly three centers of interaction (or participation). If we Conflicts” topic for Reply-oriented network configuration are 3 depicted, where there is no relevant information that could http://gibbslda.sourceforge.net/ [Last accessed 21-10- 2010] be infered or deduced directly. However, by listing top key- SIGKDD Explorations Volume 12, Issue 2 Page 70
  • 6. Table 1: Ten most relevant words with their respective conditional probabilities for five topics associated between them to the “Local Conflicts” concept from the Ansar1 forum. Topic 1 Topic 8 Topic 13 Topic 17 Topic 19 “Eastern Europe” “Irak Conflicts” “Somalia Conflicts” “India Conflicts” “Afghanistan Conflicts” russian (0.0289) iraq (0.0885) somalia (0.0517) india (0.0334) afghanistan (0.1023) russia (0.0272) iraqi (0.0566) somali (0.0302) indian (0.0274) afghan (0.0760) chechnya (0.0211) baghdad (0.0457) govern (0.0269) kashmir (0.0224) taliban (0.0676) caucasus (0.0180) mosul (0.0197) mogadishu (0.0217) pakistan (0.0180) nato (0.0327) kill (0.0145) sunni (0.0151) shabaab (0.0164) isi (0.0147) troop (0.0321) chechen (0.0137) citi (0.0143) fight (0.0158) mumbai (0.0121) forc (0.0241) oper (0.0127) forc (0.0125) islamist (0.0152) attack (0.0106) kabul (0.0197) ingushetia (0.0104) sourc (0.0115) islam (0.0134) let (0.0095) insurg (0.0145) north (0.0096) aswat (0.0115) shabab (0.0124) arrest (0.0088) karzai (0.0133) moscow (0.0093) shiit (0.0086) town (0.0124) lashkar (0.0087) elect (0.0130) Table 2: Ten most relevant words with their respective conditional probabilities for five topics associated between them to the “War on Terror” concept from the Ansar1 forum. Topic 3 Topic 21 Topic 24 Topic 25 Topic 31 ”War” ”American Soldier” ”Imprisonment” ”Obama” ”Military Operations” peopl (0.0114) islam (0.1238) prison (0.0194) presid (0.0146) cia (0.0130) war (0.0096) emir (0.0964) court (0.0137) obama (0.0143) report (0.0112) countri (0.0092) afghanistan (0.0811) case (0.0104) govern (0.0113) million (0.0107) american (0.0091) provinc (0.0524) charg (0.0099) countri (0.0103) work (0.0103) world (0.0062) destruct (0.0248) releas (0.0092) offici (0.0094) compani (0.0100) forc (0.0060) american (0.0158) detaine (0.0083) secur (0.0092) money (0.0095) america (0.0058) kill (0.0145) tortur (0.0080) militari (0.0082) oper (0.0093) one (0.0055) soldier (0.0129) alleg (0.0072) year (0.0080) blackwat (0.0082) govern (0.0055) tank (0.0104) investig (0.0069) minist (0.0079) use (0.0077) militari (0.0050) enemi (0.0096) guantanamo (0.0061) state (0.0077) state (0.0071) Table 3: Ten most relevant words with their respective conditional probabilities for five topics associated between them to the “Recruiting” concept from the Ansar1 forum. Topic 0 Topic 36 Topic 39 Topic 40 Topic 37 “Religious Propaganda” “Religious Conflicts” “Ansar propaganda” “Family” “Ideology” islam (0.0270) mosqu (0.0267) ansar (0.0940) women (0.0272) muslim (0.0475) movement (0.0247) muslim (0.0254) wmv (0.0419) school (0.0160) islam (0.0471) allah (0.0217) protest (0.0163) info (0.0204) children (0.0157) law (0.0116) youth (0.0217) anti (0.0109) jihad (0.0188) year (0.0149) rule (0.0088) media (0.0211) peopl (0.0096) final (0.0184) old (0.0149) scholar (0.0087) apost (0.0184) bodi (0.0089) ansarnet (0.0140) woman (0.0126) say (0.0086) mujahideen (0.0172) prayer (0.0085) showthread (0.0132) men (0.0124) issu (0.0070) crusad (0.0169) ramadan (0.0077) issu (0.0120) man (0.0121) group (0.0069) god (0.0149) christian (0.0072) video (0.0115) girl (0.0109) state (0.0068) state (0.0133) grave (0.0069) media (0.0086) student (0.0103) call (0.0065) Table 4: Top 9 members ordered by HITS’s hub score over the social network for the complete topics evaluation for Creator-oriented network. Complete Network Topic-based (All topics) Topic-based (“Local Conflicts”) User Hub User Hub User Hub user1 0.6959 user2 0.0725 user2 0.0785 user2 0.1544 user3 0.0383 user3 0.0331 user3 0.1496 user4 0.0342 user4 0.0331 user24 0.0000 user21 0.0280 user20 0.0289 user237 0.0000 user20 0.0259 user5 0.0248 user53 0.0000 user8 0.0238 user40 0.0247 user36 0.0000 user5 0.0217 user6 0.0208 user231 0.0000 user40 0.0217 user13 0.0207 user14 0.0000 user13 0.0197 user21 0.0207 members using computed HITS hub score, it is possible to are presented from previously described network configura- make decissions towards who’s who in this topic-based net- tion, where members with higher score have stronger arcs work. This can be visualized in Figure 3 (b) top 8 hub score between each other. SIGKDD Explorations Volume 12, Issue 2 Page 71
  • 7. 5. CONCLUSION We propose to combine traditional SNA with data min- ing techniques in order to produce results which enable to measure social aspects which are not considered by apply- ing SNA alone. This way, we can obtain results closer to reality when performing further analysis like experts detec- tion, sub-groups detection, centrality measures, among other measures, on any social network, virtual community, VCoP, VCoI, and many other human-based interactions. We used our approach to study a VCoI called the Dark Web, for which the Ansar1 forum was used. This forums was collected from Dec 2008 to Jan 2010, and was created by 376 members in 29057 posts. By applying latent Dirichlet alloca- tion (LDA) and using average shortest distances (ASD), we were capable to obtain 47 topics using the closes 30 words on Figure 4: Visualization of the complete “local con- each topic. Afterwards, 11 topics were discarded by inspec- flict” network and their top 8 users in the Creator- tion. Finally, we used two different topology representations oriented network for the complete period of the fo- of the forum: a creator-oriented and last reply-oriented net- rum activity, using the hub score from HITS. works. We showed in which ways SNA alone could be very poor in order to detect key-members which are specifically talking 4.1.3 Hub Based Social Network Analysis about certain subjects. However, when combining a topic- In Table 5, the top 5 users are listed (with their respective based text mining approach with SNA we outperforms SNA hub score) for the Creator-oriented network and last Reply- alone to discover VCoIs’ key members, for which a better oriented networks for all topics. In this case, it can be seen analysis of social networks can be performed. that in none of both cases, the users were repeated, which In our experiments, we draw a complete 14 months social states that the topic-based network structure has completely network which is quite dense and performed SNA. The in- different properties. formation obtained was useless as its visual representation does not helps to identify any patterns. However, using our method, we were able to focus on a specific group of top- Table 5: Top 5 members ordered by HITS’s hub ics to create a topic-based network, which provides specific score over the social network for the complete topics information since it has a lesser density. Then, by using evaluation for both Creator-oriented network (type HITS, key members can be extracted from both network 1 ) and last Reply-oriented network (type 2 ). configurations, and results also are radically different. All topics (type 1 ) All topics (type 2 ) We can say that our proposal can lead to much better re- user39 (0.07985) user2 (0.2666) sults that applying SNA alone. However, as future work the user70 (0.07985) user4 (0.2067) overall evaluation of the top ranked members by the HITS user40 (0.07985) user43 (0.1965) algorithm can be potentiated by experts interviews. Also, by user16 (0.07985) user13 (0.1781) combining other SNA based measures and techniques, such user53 (0.07985) user3 (0.1753) as betweenness centrality scores, weighted-HITS, and HITS authoritative scores, the key-members could be identified with a higher performance. Besides, simple visual inspec- tion of topic-based SNA allow to view a perfect sub-group with key members. We can say that we have successfully tested our approach Table 6: Top 5 members ordered by HITS’s hub and we have shown how to apply it into the Dark Web to score over the social network for the “Local Conflict” discover potential groups of topics (e.g. potential homeland concept for both Creator-oriented network (type 1 ) security threats), and key-members that potentiate these and Replier-oriented network (type 2 ). topics. “Local Conflicts” (type 1 ) “Local Conflicts” (type 2 ) user2 (0.3757) user2 (0.3001) 6. ACKNOWLEDGEMENTS user13 (0.2333) user43 (0.2221) user43 (0.2125) user3 (0.2203) Authors would like to thank the continuous support of user25 (0.2055) user4 (0.2142) “Instituto Sistemas Complejos de Ingenieria” (ICM: P-05- user24 (0.1937) user13 (0.1955) 004- F, CONICYT: FBO16); Initiation into Research Fund- ing, project code 11090188, entitled “Semantic Web Min- ing Techniques to Study Enhancements of Virtual Com- munities”; The Social Network Analysis Research Group In Table 6, top 5 members are listed, but for the “Local (sna.dii.uchile.cl); and the Web Intelligence Research Conflicts” topic-filtered social network. In this case, the in- Group (wi.dii.uchile.cl). formation that Creator-oriented and last Reply-oriented top lists highlights, is that forum members are repeated, show- ing that the “Local Conflicts” social network is specially built 7. REFERENCES by posts of specific users (like “user2” and “user13”). [1] A. Abbasi and H. Chen. Applying authorship analysis SIGKDD Explorations Volume 12, Issue 2 Page 72
  • 8. to extremist-group web forum messages. IEEE [20] M. Sageman. A strategy for fighting international Intelligent Systems, 20(5):67–75, 2005. islamist terrorists. ANNALS of the American Academy [2] A. Abbasi, H. Chen, and A. Salem. Sentiment analysis of Political and Social Science, 618(1):223–231, 2008. in multiple languages: Feature selection for opinion [21] G. Salton, A. Wong, and C. S. Yang. A vector space classification in web forums. ACM Trans. Inf. Syst., model for automatic indexing. Commun. ACM, Vol. 26(3):1–34, 2008. 18(11):613–620, 1975. [3] D. Blei, A. Ng, and M. Jordan. Latent dirichlet [22] S. Wasserman and K. Faust. Social Network Analysis: allocation. The Journal of Machine Learning . . . , Jan Methods and Applications. 1994. 2003. [23] D. Xing and M. Girolami. Employing latent dirichlet [4] A. Bourhis, L. Dub´, and R. Jacob. . . . The success of e allocation for fraud detection in telecommunications. virtual communities of practice: The leadership factor. Pattern Recognition Letters, Vol. 28(13):1727–1734, The Electronic Journal of Knowledge . . . , Jan 2005. 2007. [5] R. B. Bradford. Application of latent semantic [24] J. Xu and H. Chen. Crimenet explorer: a framework indexing in generating graphs of terrorist networks. for criminal network knowledge discovery. ACM pages 674–675, 2006. Transactions on Information Systems (TOIS), Jan [6] L. C. Freeman. Centrality in social networks 2005. Aqui sale bien el blockmodeling. conceptual clarification. Social Networks, 1(3):215 – [25] J. Xu and H. Chen. The topology of dark networks. 239, 1978. Commun. ACM, 51(10):58–65, 2008. [7] W. Kim, O.-R. Jeong, and S.-W. Lee. On social web [26] L. Yang, F. Liu, J. Kizza, and R. Ege. Discovering sites. Information Systems, 35(2):215–236, 2010. topics from dark websites. In CICS ’09: IEEE [8] J. M. Kleinberg. Authoritative sources in a Symposium on Computational Intelligence in Cyber hyperlinked environment. J. ACM, 46(5):604–632, Security., pages 175–179. IEEE, 2009. 1999. [27] L. Yang, F. Liu, J. Kizza, and R. Ege. Discovering [9] M. Kosonen. Knowledge sharing in virtual topics from dark websites. Computational Intelligence communities – a review of the empirical research. Int. in Cyber Security, 2009. CICS ’09. IEEE Symposium J. Web Based Communities, 5(2):144–163, 2009. on, pages 175 – 179, 2009. [10] H. Kwak, Y. Choi, Y.-H. Eom, H. Jeong, and S. Moon. [28] Y. Zhang, S. Zeng, L. Fan, Y. Dang, C. A. Larson, Mining communities in networks: a solution for and H. Chen. Dark web forums portal: searching and consistency and its evaluation. pages 301–314, 2009. analyzing jihadist forums. pages 71–76, 2009. [11] A. McCallum, X. Wang, and A. Corrada-Emmanuel. [29] Y. Zhou, E. Reid, J. Qin, H. Chen, and G. Lai. Us Topic and role discovery in social networks with domestic extremist groups on the web: Link and experiments on enron and academic email. Journal of content analysis. IEEE Intelligent Systems, Artificial . . . , Jan 2007. 20(5):44–51, 2005. [12] R. D. Nolker and L. Zhou. Social computing and weighting to identify member roles in online communities. Web Intelligence, IEEE / WIC / ACM International Conference on, 0:87–93, 2005. [13] N. Pathak, C. Delong, A. Banerjee, and K. Erickson. Social topic models for community extraction. Aug 2008. [14] U. Pfeil and P. Zaphiris. Investigating social network patterns within an empathic online community for older people. Computers in Human Behavior, 25(5):1139–1155, 2009. [15] X. H. Phang and C. Nguyen. Gibbslda++, 2008. [16] C. E. Porter. A typology of virtual communities: A multi-disciplinary foundation for future research. Journal of Computer-Mediated Communication, 10(1):00, 2004. [17] G. Probst and S. Borzillo. Why communities of practice succeed and why they fail. European Management Journal, 26(5):335–347, 2008. [18] E. Reid, J. Qin, Y. Zhou, G. Lai, M. Sageman, G. Weimann, and H. Chen. Collecting and analyzing the presence of terrorists on the web: A case study of jihad websites. pages 402–411, 2005. [19] S. A. R´ F. Aguilera, and L. Guerrero. Virtual ıos, communities of practice’s purpose evolution analysis using a concept-based mining approach. Knowledge-Based and Intelligent Information and Engineering Systems, 2:480–489, 2009. SIGKDD Explorations Volume 12, Issue 2 Page 73