SlideShare una empresa de Scribd logo
1 de 56
Part I → Part II → Part III → Part IV → Part V
           Influence Maximization




1
Word-of-mouth (WoM) effect in social
    networks             xphone is good

                        xphone is good


                                                    xphone is good
                               xphone is good


                                                              xphone is good
    xphone is good
                                          xphone is good



       Word-of-mouth (viral) marketing is believed to be a promising
        marketing strategy.
       Increasing popularity of online social networks may enable large
        scale viral marketing
2
Outline
       Diffusion/propagation models and the
        Influence Maximization (IM) problem

       The theory: greedy methods to optimize
        submodular functions

       Scalable influence maximization



3
Diffusion/Propagation Models
    and the Influence Maximization
              (IM) Problem




4
The first definition of IM problem: A
    Markov random fields formulation
       Each node 𝑖 has random variable 𝑋 𝑖 ,
        indicating bought the product or not,
         𝑿 = 𝑋1 , … , 𝑋 𝑛

       Markov random field formation: 𝑋 𝑖 depends
        on its neighbors’ actions 𝑵 𝑖 ⊆ 𝑿

       Marketing action 𝑴 = {𝑀1 , … , 𝑀 𝑛 }

       Problem: find a choice of 𝑴 that maximizes
        the revenue obtained from the result of 𝑿
5                               [Domingos & Richardson KDD 2001, 2002]
Discrete diffusion models
       Static social network: 𝐺 = (𝑉, 𝐸)
        ◦ 𝑉: set of nodes, representing individuals
        ◦ 𝐸: set of edges, representing directed influence
          relationships

       Influence diffusion process
        ◦ Seed set 𝑺: initial set of nodes selected to start the
          diffusion
        ◦ Node activations: Nodes are activated starting from
          the seed nodes, in discrete steps and following
          certain stochastic models
        ◦ Influence spread 𝝈(𝑺): expected number of
          activated nodes when the diffusion process starting
          from the seed set 𝑆 ends
6
Major stochastic diffusion models
       Independent cascade (IC) model

       Linear threshold (LT) model

       General threshold model

       Others
        ◦ Voter model
        ◦ Heat diffusion model


7
Independent cascade model
    •   Each edge (𝑢, 𝑣) has
        a propagation
        probability 𝑝(𝑢, 𝑣)
    •   Initially some seed
        nodes are activated
    •   At each step 𝑡, each
                                             0.3
        node 𝑢 activated at
                                             0.1
        step 𝑡 − 1 activates
        its neighbor 𝑣 with
        probability 𝑝(𝑢, 𝑣)
    •   once activated, stay
        activated
8                              [Kempe, Kleinberg and Tardos, KDD 2003]
Linear threshold model
       Each edge (𝑢, 𝑣) has weight                             0.6
                                                                                   0.5
         𝑤 𝑢, 𝑣 :
        ◦ when      𝑢, 𝑣 ∉ 𝐸, 𝑤 𝑢, 𝑣 = 0
        ◦   𝑢   𝑤 𝑢, 𝑣 ≤ 1
       Each node 𝑣 selects a
        threshold 𝜃 𝑣 ∈ [0,1] uniformly            0.3                                   0.3
        at random
       Initially some seed nodes
        are activated                      0.4                        0.3
       At each step, node                                            0.1    0.7
         𝑣 checks if the weighted
        sum of its active neighbors is
        greater than its threshold 𝜃 𝑣 ,
        if so 𝑣 is activated                              0.2                            0.8
                                                 0.3
       once activated, stay
        activated

9                                                  [Kempe, Kleinberg and Tardos, KDD 2003]
Influence maximization
        Given a social network, a diffusion model
         with given parameters, and a number 𝑘,
         find a seed set 𝑆 of at most 𝑘 nodes such
         that the influence spread of 𝑆 is maximized.

        May be further generalized:
         ◦ Instead of k, given a budget constraint and
           each node has a cost of being selected as a
           seed
         ◦ Instead of maximizing influence spread,
           maximizing a (submodular) function of the set of
           activated nodes
10
Key takeaways for diffusion models
     and IM definition
        stochastic diffusion models
         ◦ IC model reflects simple contagion
         ◦ LT model reflects complex contagion (activation
           needs social affirmation from multiple sources
           [Centola and Macy, AJS 2007])

        maximization objective focuses on expected
         influence spread
         ◦ others to be considered later



11
Time for
       Theory:
      Optimizing
     Submodular
      Functions
                   Credit: Rico3244 @ DeviantArt




12
Hardness of influence maximization
        Influence maximization under both IC
         and LT models are NP hard
         ◦ IC model: reduced from k-max cover
           problem
         ◦ LT model: reduced from vertex cover
           problem

        Need approximation algorithms


13
Optimizing submodular functions
        Sumodularity of set
         functions 𝑓: 2V → 𝑅
         ◦ for all 𝑆 ⊆ 𝑇 ⊆ 𝑉, all 𝑣 ∈ 𝑉 ∖ 𝑇,    𝑓(𝑆)
           𝑓 𝑆∪ 𝑣   − 𝑓 𝑆 ≥ 𝑓 𝑇∪ 𝑣     − 𝑓(𝑇)
         ◦ diminishing marginal return
         ◦ an equivalent form: for all
           𝑆, 𝑇 ⊆ 𝑉
                                                       |𝑆|
           𝑓 𝑆∪ 𝑇 + 𝑓 𝑆∩ 𝑇 ≤ 𝑓 𝑆 + 𝑓 𝑇

        Monotonicity of set
         functions 𝑓: for all 𝑆 ⊆ 𝑇 ⊆ 𝑉,
                    𝑓 𝑆 ≤ 𝑓(𝑇)
14
Example of a submodular function
     and its maximization problem
        set coverage
         ◦ each entry 𝑢 is a subset of
                                                       elements
           some base elements                  sets
         ◦ coverage 𝑓 𝑆 = | 𝑢∈𝑆 𝑢 |
         ◦ 𝑓 𝑆 ∪ 𝑣 − 𝑓 𝑆 : additional              𝑆
           coverage of v on top of 𝑆
                                               𝑇
        𝑘-max cover problem
         ◦ find 𝑘 subsets that maximizes   𝑣
           their total coverage
         ◦ NP-hard
         ◦ special case of IM problem in
           IC model
15
Greedy algorithm for submodular
     function maximization
      1: initialize 𝑆 = ∅ ;
      2: for 𝑖 = 1 to 𝑘 do
      3:     select
              𝑢 = argmax 𝑤∈𝑉∖𝑆 [𝑓 𝑆 ∪ 𝑤   − 𝑓(𝑆))]
      4:    𝑆 = 𝑆 ∪ {𝑢}
      5: end for
      6: output 𝑆



16
Property of the greedy algorithm
        Theorem: If the set function 𝑓 is
         monotone and submodular with 𝑓 ∅ =
         0, then the greedy algorithm achieves
         (1 − 1/𝑒) approximation ratio, that is, the
         solution 𝑆 found by the greedy algorithm
         satisfies:
                       1
         ◦ 𝑓 𝑆 ≥ 1−        max 𝑆 ′⊆𝑉,   𝑆′   =𝑘   𝑓(𝑆 ′ )
                       𝑒




17               [Nemhauser, Wolsey and Fisher, Mathematical Programming, 1978]
Submodularity of influence
     diffusion models
        Based on equivalent live-edge graphs



                  0.3

                  0.1




          diffusion dynamic            random live-edge graph:
                                       edges are randomly removed
         Pr(set A is activated given   Pr(set A is reachable from S in
         seed set S)                   random live-ledge graph)
18
(Recall) active node set via IC
     diffusion process
     •   yellow node set is
         the active node
         set after the
         diffusion process in
         the independent
         cascade model          0.3

                                0.1




19
Random live-edge graph for the IC
     model and its reachable node set
        random live-edge
         graph in the IC model
         ◦ each edge is
           independently
           selected as live with its
           propagation
           probability

        yellow node set is the        0.3
         active node set
         reachable from the            0.1
         seed set in a random
         live-edge graph

        Equivalence is
         straightforward

20
(Recall) active node set via LT
     diffusion process
                                                                  0.5
        yellow node set                        0.6


         is the active
         node set after                                                 0.3
                                    0.3
         the diffusion
         process in the     0.4                       0.3

         linear threshold                             0.1   0.7

         model
                                          0.2                           0.8
                                  0.3




21
Random live-edge graph for the LT
     model and its reachable node set
        random live-edge
         graph in the LT model
         ◦ each node select at
           most one incoming
           edge, with probability
           proportional to its
           weight

        yellow node set is the
         active node set
         reachable from the         0.3
         seed set in a random
         live-edge graph            0.1

        equivalence is based
         on uniform threshold
         selection from [0,1],
         and linear weight
         addition


22
Submodularity of influence
     diffusion models (cont’d)
     •   Influence spread of seed set 𝑆, 𝜎(𝑆):
                     𝜎 𝑆 =    𝐺 𝐿 Pr   𝐺 𝐿 |𝑅 𝑆, 𝐺 𝐿 |,
         •    𝐺 𝐿 : a random live-edge graph
         •   Pr 𝐺 𝐿 : probability of 𝐺 𝐿 being generated
         •    𝑅(𝑆, 𝐺 𝐿 ): set of nodes reachable from 𝑆 in 𝐺 𝐿

     •   To prove that 𝜎 𝑆 is submodular, only need to
         show that 𝑅 ⋅, 𝐺 𝐿 is submodular for any 𝐺 𝐿
         •   sumodularity is maintained through linear
             combinations with non-negative coefficients

23
Submodularity of influence
     diffusion models (cont’d)
     •   Submodularity of
          𝑅 ⋅, 𝐺 𝐿
         •   for any 𝑆 ⊆ 𝑇 ⊆ 𝑉,
              𝑣 ∈ 𝑉 ∖ 𝑇,
         •   if 𝑢 is reachable from 𝑣     𝑆         𝑇
             but not from 𝑇, then
         •    𝑢 is reachable from 𝑣 but                         𝑣
             not from 𝑆
         •   Hence, 𝑅 ⋅, 𝐺 𝐿 is                         𝑢
             submodular

     •   Therefore, influence                 marginal contribution
         spread 𝜎 𝑆 is                            of 𝑣 w.r.t. 𝑇
         submodular in both IC
         and LT models
24
General threshold model
     •   Each node 𝑣 has a threshold function
                          𝑓𝑣 : 2 𝑉 → [0,1]
     •   Each node 𝑣 selects a threshold 𝜃 𝑣 ∈ [0,1]
         uniformly at random
     •   If the set of active nodes at the end of step
          𝑡 − 1 is 𝑆, and 𝑓𝑣 𝑆 ≥ 𝜃 𝑣 , 𝑣 is activated at step 𝑡
     •   reward function 𝑟(𝐴(𝑆)): if 𝐴(𝑆) is the final set of
         active nodes given seed set 𝑆, 𝑟(𝐴(𝑆)) is the
         reward from this set
     •   generalized influence spread:
                   𝜎 𝑆 = 𝐸[𝑟 𝐴 𝑆 ] [Kempe, Kleinberg and Tardos, KDD 2003]
25
IC and LT as special cases of
     general threshold model
     •   LT model
         •   𝑓𝑣 𝑆 = 𝑢∈𝑆 𝑤(𝑢, 𝑣)
         •   𝑟(𝑆) = |𝑆|

     •   IC model
         •   𝑓𝑣 𝑆 = 1 −   𝑢∈𝑆(1 −   𝑝 𝑢, 𝑣 )
         •   𝑟(𝑆) = |𝑆|




26
Submodularity in the general
     threshold model
        Theorem [Mossel & Roch STOC 2007]:
         ◦ In the general threshold model,
         ◦ if for every 𝑣 ∈ 𝑉, 𝑓𝑣 (⋅) is monotone and
           submodular with 𝑓𝑣 ∅ = 0,
         ◦ and the reward function 𝑟(⋅) is monotone
           and submodular,
         ◦ then the general influence spread function
            𝜎 ⋅ is monotone and submodular.

        Local submodularity implies global
         submodularity
27
The greedy approximation
     algorithm for the IM problem
                  1
     •       1−     −   𝜀 -approximation greedy algorithm
                  𝑒
         •   Use general greedy algorithm framework
         •   However, need evaluation of influence spread
             𝜎 𝑆 , which is shown to be #P-hard
         •   Use Monte Carlo simulations to estimate 𝜎 𝑆 to
             arbitrary accuracy with high probability
                                                           1
         •   For any 𝜀 > 0, there exists 𝛾 > 0, s.t. 1   −     − 𝜀 -
                                                           𝑒
             approximation can be achieved with 1 − 𝛾
             approximate values of 𝜎 𝑆

28
Performance of greedy algorithm
          weighted cascade model                     linear threshold model




 • coauthorship graph from arxiv.org, high energy physics section,
   n=10748, m53000
 • allow parallel edges, 𝑐 𝑢,𝑣 = number of edges between u and v
                                           1 𝑐 𝑢,𝑣
 • weighted cascade: 𝑝 𝑢, 𝑣 = 1 − 1 −      𝑑𝑣
 • linear threshold: 𝑤 𝑢, 𝑣 = 𝑐 𝑢,𝑣 /𝑑 𝑣
29                                         [Kempe, Kleinberg and Tardos, KDD 2013]
Key takeaways for theory support
        submodularity of diffusion models
         ◦ diminishing return
         ◦ shared by many models, but some model
           extensions may not be submodular (see Part
           IV)

        submodular function maximization
         ◦ greedy hill-climbing algorithm
         ◦ approximation ratio (1 − 1/𝑒)

30
Scalable Influence Maximization




31
Theory versus Practice
                      ...the trouble about arguments is,
                      they ain't nothing but theories,
                      after all, and theories don't prove
                      nothing, they only give you a place
                      to rest on, a spell, when you are
                      tuckered out butting around and
                      around trying to find out something
                      there ain't no way to find out...
                      There's another trouble about
                      theories: there's always a hole in
                      them somewheres, sure, if you
                      look close enough.
                      - “Tom Sawyer Abroad”, Mark Twain




32
Inefficiency of greedy algorithm
        It take days to find 50 seeds for a 30K node
         graph

        Too many evaluations of influence spread
         ◦ Given a graph of 𝑛 nodes, each round needs
           evaluation of 𝑛 influence spreads, totally 𝑂(𝑛𝑘)
           evaluations

        Each evaluation of influence spread is hard
         ◦ Exact computation is #P-hard, for both IC and LT
           models [Chen et al. KDD/ICDM 2010]
         ◦ Monte-Carlo simulation is very slow
33
Scalable Influence Maximization
        Reduction on the number of influence
         spread evaluations.

        Batch computation of influence spread

        Scalable heuristics for influence spread
         computation



34
Lazy forward optimization
        Exploiting submodularity, significantly reduce # influence
         spread evaluations
         ◦ 𝑆 𝑡−1 − seed set selected after round 𝑡 − 1
         ◦ 𝑣 𝑡 − selected as seed in round 𝑡: 𝑆 𝑡 = 𝑆 𝑡−1 ∪ {𝑣 𝑡 }
         ◦ 𝑢 is not a seed yet, 𝑢’s marginal gain 𝑀𝐺 𝑢 𝑆 𝑡 = 𝜎(𝑆 𝑡 ∪ {𝑢}) −
            𝜎(𝑆 𝑡 )
         ◦ by submodularlity, 𝑀𝐺 𝑢 𝑆 𝑡 ≤ 𝑀𝐺(𝑢|𝑆 𝑡−1 )
         ◦ This implies, if 𝑀𝐺 𝑢 𝑆 𝑡−1 ≤ 𝑀𝐺(𝑣|𝑆 𝑡 ) for some node 𝑣, then no
           need to evaluate 𝑀𝐺 𝑢 𝑆 𝑡 in round 𝑡 + 1.
         ◦ Can be implemented efficiently using max-heap
            take the top of the heap, if it has 𝑀𝐺 for the current round, then it is the
             new seed;
            else compute its 𝑀𝐺, and re-heapify
        Often, top element after round 𝑘 − 1 remains top in round
         𝑘.
        Up to 700 X speedup
35                                                              [Leskovec et al., KDD 2007]
CELF++
        With each heap node 𝑢, further maintain
         ◦ 𝑢. 𝑚𝑔1 = 𝑀𝐺(𝑢|𝑆), 𝑆 = current seed set.
         ◦ 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = node with max. MG in the current
           iteration, among nodes seen before 𝑢;
         ◦ 𝑢. 𝑚𝑔2 = 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡});
         ◦ 𝑢. 𝑓𝑙𝑎𝑔 = iteration where 𝑢. 𝑚𝑔1 was last updated;

         𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 chosen in current iteration  no
         need to compute 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) in next
         iteration.
         𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) can be computed
         efficiently along with 𝑀𝐺(𝑢|𝑆) in one run of MC.
36                                           [Goyal, Lu and L., WWW 2011]
CELF++ (contd.)
     •   Pick the top node 𝑢 in the heap
     •   𝑢. 𝑓𝑙𝑎𝑔 = |𝑆|  𝑢. mg1 is correct 𝑀𝐺, thus 𝑢 is the best pick
         of current itn; add 𝑢 to 𝑆; remove from 𝑄;
     •   else, if 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = 𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑 & 𝑢. 𝑓𝑙𝑎𝑔 = |𝑆| − 1
         𝑢. 𝑚𝑔1  𝑢. 𝑚𝑔2 (no need to compute 𝑀𝐺(𝑢|𝑆 +
         𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑))




37
Batch computation of
     influence spread
        Based on equivalence to reachable set in
         live-edge graphs
         ◦ Generate a random live-edge graph
         ◦ batch computation of size of reachable set of
           all nodes [Cohen, JCSS 1997]
         ◦ repeat above to increase accuracy

        In conflict with lazy-forward optimization
         ◦ MixedGreedy: In the first round uses reachability-
           based batch computation; in remaining rounds,
           use lazy forward optimization
38                                   [Chen, Wang and Yang, KDD 2009]
Scalable heuristics for
     influence spread computation
        MIA: for IC model [Chen, Wang and
         Wang, KDD 2010]
        LDAG: for LT model [Chen, Yuan and
         Zhang, ICDM 2010]
        Features:
         ◦ Focus on a local region
         ◦ Find a graph spanner allow efficient
           computation
         ◦ Batch update of marginal influence spread
39
Maximum Influence Arborescence
     (MIA) Heuristic
        Local influence
         regions of node 𝑣
         ◦ For every nodes 𝑢, find
           the maximum influence          𝑢
           path (MIP) from 𝑢 to 𝑣,
           ignore it if Pr(𝑝𝑎𝑡ℎ) < 𝜆 (𝜆
           is a small threshold value)
         ◦ all MIPs to 𝑣 form its             0.3
           maximum
           influence in-                      0.1
                                                    𝑣
           arborescence (MIIA)
         ◦ MIIA can be efficiently
           computed
         ◦ influence to 𝑣 is
           computed over MIIA

40
MIA Heuristic III: Computing
     Influence through the MIA
     structure
        Recursive computation of activation probability 𝑎𝑝(𝑢)
         of a node 𝑢 in its in-arborescence, given a seed set 𝑆
         (𝑁 𝑖𝑛 (𝑢) is the in-neighbors of 𝑢 in its in-arborescence)




41
MIA Heuristic IV: Efficient updates
     on incremental activation
     probabilities
                                                𝑀𝐼𝐼𝐴(𝑣)
        𝑢 is the new seed in 𝑀𝐼𝐼𝐴(𝑣)
        Naive update: for each candidate   𝑢
          𝑤, redo the computation in the
         previous page to compute 𝑤’s
         marginal influence to 𝑣                          𝑤
                         2
         ◦   𝑂( 𝑀𝐼𝐼𝐴 𝑣       )
        Fast update: based on linear
         relationship of activation
         probabilities between any node
          𝑤 and root 𝑣, update marginal
         influence of all 𝑤’s to 𝑣 in two           𝑣
         passes
         ◦   𝑂(|𝑀𝐼𝐼𝐴(𝑣)|)


42
Experiment Results on MIA heuristic
          Influence spread vs. seed set                            running time
          size                                                                103 times
                                                                              speed up




                                    close to Greedy,
                                 49% better than Degree,
                                    15% better than
                                    DegreeDiscount




      Experiment setup:
      •   35k nodes from coauthorship graph in physics archive
      •   influence probability to a node 𝑣 = 1 / (# of neighbors of 𝑣)
      •   running time is for selecting 50 seeds
43
Time for some simplicity




44
The SimPath Algorithm
        Vertex Cover                          Look ahead
        Optimization                          optimization
     Improves the efficiency                Improves the efficiency
       in the first iteration             in the subsequent iterations

           In lazy forward manner, in each
          iteration, add to the seed set, the
             node providing the maximum
                marginal gain in spread.
     [Goyal, Lu, & L. ICDM 2011]
                                     Simpath-Spread
45
                                   Compute marginal gain by
                                    enumerating simple paths
Estimating Spread in SimPath (2)
        Thus, the spread of a node can be
         computed by enumerating simple paths
         starting from the node.
                                        0.4                Influence of x on z
                        x                              z Influence of x on y
                                  0.3         0.2
                                                        Influence of x on x itself
                            0.1                     0.5
                                        y                  Total influence of
                                                           node x is 1.96

            Influence Spread of node x is
            = 1 + (0.3 + 0.4 * 0.5) + (0.4 + 0.3 * 0.2) = 1.96
46
Estimating Spread in SimPath (3)


                                                           Total influence of the
                                                           seed set {x, y} is 2.6
                                                           Influence of node y in a
                                          0.4              subgraph that does not
                          x                              z
                                                0.2
                                                           contain x
                                    0.3
                              0.1                     0.5
                                                           Influence of node x in a
                                          y                subgraph that does not
                                                           contain y
      Let the seed set S = {x,y}, then influence spread of S is
       (S )   V  y ( x)   V  x ( y)  1  0.4  1  0.2  2.6
47
Estimating Spread in SimPath (4)

                              Enumerating all simple
                                 paths is #P hard


       Thus, influence can be estimated
         by enumerating all simple paths
      On slightly starting from the seed set.
different subgraphs

               The majority of influence flows in a
 48
                     small neighborhood.
Look Ahead Optimization (1/2)
        As the seed set grows, the time spent in
         estimating spread increases.
         ◦ More paths to enumerate.

        A lot of paths are repeated though.

        The optimization avoids this repetition
         intelligently.

        A look ahead parameter ‘l’.
49
Look Ahead Optimization (2/2)
     l = 2 here              Seed Set Si after iteration i
 Let y and x be
 prospective seeds
 from CELF queue




                     y

                                                             ....
                     x
              A lot of paths are enumerated repeatedly
                  1. Compute spread achieved by S+y
                  2. Compute spread achieved by S+x


50
SimPath vs LDAG




51
Other means of speedup
        Community-based approach
              [Wang et al. KDD 2010]




        Network sparsification
                [Mathioudakis et al, KDD 2011]




        Simulated Annealing
                [Jiang et al. AAAI 2011]


52
Community-based influence
     maximization
        Use influence parameters to partition graph
         into communities
         ◦ different from pure structure-based partition

        Influence maximization within each
         community

        Dynamic programming for selecting top
         seeds

        Orthogonal with scalable influence
         computation algorithms
53                                             [Wang et al. KDD 2010]
Sparsification of influence networks
        Remove less important edges for
         influence propagation

        Use action traces to guide graph
         trimming

        Orthogonal to scalable influence
         computation algorithms


54                                [Mathioudakis et al, KDD 2011]
Simulated annealing
        Following simulated annealing
         framework
         ◦ not going through all nodes to find a next
           seed as in greedy
         ◦ find a random replacement (guided by
           some heuristics)

        Orthogonal to scalable influence
         computation algorithms

55
Key takeaways for scalable
     influence maximization
        tackle from multiple angles
         ◦ reduce #spread evaluations: lazy-forward
         ◦ scale up spread computation:
           use local influence region (LDAG, MIA, SimPath)
           use efficient graph structure (trees, DAGs)
           model specific optimizations (simple path
            enumerations)
         ◦ graph sparsifications, community partition, etc.

        upshot: can scale up to graphs with millions
         of nodes and edges, on a single machine
56

Más contenido relacionado

La actualidad más candente

Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewSatyaki Sikdar
 
Probabilistic Relational Models for Link Prediction Problem
Probabilistic Relational Models for Link Prediction ProblemProbabilistic Relational Models for Link Prediction Problem
Probabilistic Relational Models for Link Prediction ProblemSina Sajadmanesh
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Multilayer tutorial-netsci2014-slightlyupdated
Multilayer tutorial-netsci2014-slightlyupdatedMultilayer tutorial-netsci2014-slightlyupdated
Multilayer tutorial-netsci2014-slightlyupdatedMason Porter
 
Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey煜林 车
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationSangwoo Mo
 
Wayfair's Data Science Team and Case Study: Uplift Modeling
Wayfair's Data Science Team and Case Study: Uplift ModelingWayfair's Data Science Team and Case Study: Uplift Modeling
Wayfair's Data Science Team and Case Study: Uplift ModelingPatricia Stichnoth
 
Temporal network epidemiology: Subtleties and algorithms
Temporal network epidemiology: Subtleties and algorithmsTemporal network epidemiology: Subtleties and algorithms
Temporal network epidemiology: Subtleties and algorithmsPetter Holme
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with NetworkxErika Fille Legara
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognitionzukun
 

La actualidad más candente (20)

Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Sampling Distributions and Estimators
Sampling Distributions and Estimators Sampling Distributions and Estimators
Sampling Distributions and Estimators
 
Probabilistic Relational Models for Link Prediction Problem
Probabilistic Relational Models for Link Prediction ProblemProbabilistic Relational Models for Link Prediction Problem
Probabilistic Relational Models for Link Prediction Problem
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Multilayer tutorial-netsci2014-slightlyupdated
Multilayer tutorial-netsci2014-slightlyupdatedMultilayer tutorial-netsci2014-slightlyupdated
Multilayer tutorial-netsci2014-slightlyupdated
 
Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
 
Wayfair's Data Science Team and Case Study: Uplift Modeling
Wayfair's Data Science Team and Case Study: Uplift ModelingWayfair's Data Science Team and Case Study: Uplift Modeling
Wayfair's Data Science Team and Case Study: Uplift Modeling
 
Fp growth
Fp growthFp growth
Fp growth
 
Temporal network epidemiology: Subtleties and algorithms
Temporal network epidemiology: Subtleties and algorithmsTemporal network epidemiology: Subtleties and algorithms
Temporal network epidemiology: Subtleties and algorithms
 
Decision trees
Decision treesDecision trees
Decision trees
 
Propensity Score Matching Methods
Propensity Score Matching MethodsPropensity Score Matching Methods
Propensity Score Matching Methods
 
Goodness of Fit Notation
Goodness of Fit NotationGoodness of Fit Notation
Goodness of Fit Notation
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with Networkx
 
Two Means, Independent Samples
Two Means, Independent SamplesTwo Means, Independent Samples
Two Means, Independent Samples
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 

Destacado

Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiLaks Lakshmanan
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Time Critical Influence Maximization
Time Critical Influence MaximizationTime Critical Influence Maximization
Time Critical Influence MaximizationWei Lu
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalAndre Vellino
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryAndre Vellino
 
Digital Brand Monitoring and Management
Digital Brand Monitoring and ManagementDigital Brand Monitoring and Management
Digital Brand Monitoring and ManagementRob Garner
 
Pubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob GarnerPubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob GarnerRob Garner
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksNatasha Mandal
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsLaks Lakshmanan
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionCarlos Castillo (ChaTo)
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphCarlos Castillo (ChaTo)
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsCarlos Castillo (ChaTo)
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Carlos Castillo (ChaTo)
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural DisastersCarlos Castillo (ChaTo)
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement BiasMounia Lalmas-Roelleke
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationCarlos Castillo (ChaTo)
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...IIIT Hyderabad
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsDiarta
 

Destacado (20)

Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Time Critical Influence Maximization
Time Critical Influence MaximizationTime Critical Influence Maximization
Time Critical Influence Maximization
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equal
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital Library
 
Digital Brand Monitoring and Management
Digital Brand Monitoring and ManagementDigital Brand Monitoring and Management
Digital Brand Monitoring and Management
 
Pubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob GarnerPubcon Google+ Google Plus and Author Markup Panel by Rob Garner
Pubcon Google+ Google Plus and Author Markup Panel by Rob Garner
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social Networks
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network Analytics
 
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query SuggestionThe Effects of Time on Query Flow Graph-based Models for Query Suggestion
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
 
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graphDr. Searcher and Mr. Browser: A unified hyperlink-click graph
Dr. Searcher and Mr. Browser: A unified hyperlink-click graph
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of News
 
Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...Characterizing the Life Cycle of Online News Stories Using Social Media React...
Characterizing the Life Cycle of Online News Stories Using Social Media React...
 
Information Verification During Natural Disasters
Information Verification During Natural DisastersInformation Verification During Natural Disasters
Information Verification During Natural Disasters
 
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Chapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing CommunicationsChapter 17 Designing And Managing Integrated Marketing Communications
Chapter 17 Designing And Managing Integrated Marketing Communications
 

Similar a Maximizing Influence in Social Networks

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPAbdullah al Mamun
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural NetworkOmer Korech
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. OverviewOleksandr Baiev
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxvipul6601
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social NetworksWei Lu
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_KimSundong Kim
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...Taiji Suzuki
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 

Similar a Maximizing Influence in Social Networks (20)

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLP
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. Overview
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
UNIT 5-ANN.ppt
UNIT 5-ANN.pptUNIT 5-ANN.ppt
UNIT 5-ANN.ppt
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 

Último

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Maximizing Influence in Social Networks

  • 1. Part I → Part II → Part III → Part IV → Part V Influence Maximization 1
  • 2. Word-of-mouth (WoM) effect in social networks xphone is good xphone is good xphone is good xphone is good xphone is good xphone is good xphone is good  Word-of-mouth (viral) marketing is believed to be a promising marketing strategy.  Increasing popularity of online social networks may enable large scale viral marketing 2
  • 3. Outline  Diffusion/propagation models and the Influence Maximization (IM) problem  The theory: greedy methods to optimize submodular functions  Scalable influence maximization 3
  • 4. Diffusion/Propagation Models and the Influence Maximization (IM) Problem 4
  • 5. The first definition of IM problem: A Markov random fields formulation  Each node 𝑖 has random variable 𝑋 𝑖 , indicating bought the product or not, 𝑿 = 𝑋1 , … , 𝑋 𝑛  Markov random field formation: 𝑋 𝑖 depends on its neighbors’ actions 𝑵 𝑖 ⊆ 𝑿  Marketing action 𝑴 = {𝑀1 , … , 𝑀 𝑛 }  Problem: find a choice of 𝑴 that maximizes the revenue obtained from the result of 𝑿 5 [Domingos & Richardson KDD 2001, 2002]
  • 6. Discrete diffusion models  Static social network: 𝐺 = (𝑉, 𝐸) ◦ 𝑉: set of nodes, representing individuals ◦ 𝐸: set of edges, representing directed influence relationships  Influence diffusion process ◦ Seed set 𝑺: initial set of nodes selected to start the diffusion ◦ Node activations: Nodes are activated starting from the seed nodes, in discrete steps and following certain stochastic models ◦ Influence spread 𝝈(𝑺): expected number of activated nodes when the diffusion process starting from the seed set 𝑆 ends 6
  • 7. Major stochastic diffusion models  Independent cascade (IC) model  Linear threshold (LT) model  General threshold model  Others ◦ Voter model ◦ Heat diffusion model 7
  • 8. Independent cascade model • Each edge (𝑢, 𝑣) has a propagation probability 𝑝(𝑢, 𝑣) • Initially some seed nodes are activated • At each step 𝑡, each 0.3 node 𝑢 activated at 0.1 step 𝑡 − 1 activates its neighbor 𝑣 with probability 𝑝(𝑢, 𝑣) • once activated, stay activated 8 [Kempe, Kleinberg and Tardos, KDD 2003]
  • 9. Linear threshold model  Each edge (𝑢, 𝑣) has weight 0.6 0.5 𝑤 𝑢, 𝑣 : ◦ when 𝑢, 𝑣 ∉ 𝐸, 𝑤 𝑢, 𝑣 = 0 ◦ 𝑢 𝑤 𝑢, 𝑣 ≤ 1  Each node 𝑣 selects a threshold 𝜃 𝑣 ∈ [0,1] uniformly 0.3 0.3 at random  Initially some seed nodes are activated 0.4 0.3  At each step, node 0.1 0.7 𝑣 checks if the weighted sum of its active neighbors is greater than its threshold 𝜃 𝑣 , if so 𝑣 is activated 0.2 0.8 0.3  once activated, stay activated 9 [Kempe, Kleinberg and Tardos, KDD 2003]
  • 10. Influence maximization  Given a social network, a diffusion model with given parameters, and a number 𝑘, find a seed set 𝑆 of at most 𝑘 nodes such that the influence spread of 𝑆 is maximized.  May be further generalized: ◦ Instead of k, given a budget constraint and each node has a cost of being selected as a seed ◦ Instead of maximizing influence spread, maximizing a (submodular) function of the set of activated nodes 10
  • 11. Key takeaways for diffusion models and IM definition  stochastic diffusion models ◦ IC model reflects simple contagion ◦ LT model reflects complex contagion (activation needs social affirmation from multiple sources [Centola and Macy, AJS 2007])  maximization objective focuses on expected influence spread ◦ others to be considered later 11
  • 12. Time for Theory: Optimizing Submodular Functions Credit: Rico3244 @ DeviantArt 12
  • 13. Hardness of influence maximization  Influence maximization under both IC and LT models are NP hard ◦ IC model: reduced from k-max cover problem ◦ LT model: reduced from vertex cover problem  Need approximation algorithms 13
  • 14. Optimizing submodular functions  Sumodularity of set functions 𝑓: 2V → 𝑅 ◦ for all 𝑆 ⊆ 𝑇 ⊆ 𝑉, all 𝑣 ∈ 𝑉 ∖ 𝑇, 𝑓(𝑆) 𝑓 𝑆∪ 𝑣 − 𝑓 𝑆 ≥ 𝑓 𝑇∪ 𝑣 − 𝑓(𝑇) ◦ diminishing marginal return ◦ an equivalent form: for all 𝑆, 𝑇 ⊆ 𝑉 |𝑆| 𝑓 𝑆∪ 𝑇 + 𝑓 𝑆∩ 𝑇 ≤ 𝑓 𝑆 + 𝑓 𝑇  Monotonicity of set functions 𝑓: for all 𝑆 ⊆ 𝑇 ⊆ 𝑉, 𝑓 𝑆 ≤ 𝑓(𝑇) 14
  • 15. Example of a submodular function and its maximization problem  set coverage ◦ each entry 𝑢 is a subset of elements some base elements sets ◦ coverage 𝑓 𝑆 = | 𝑢∈𝑆 𝑢 | ◦ 𝑓 𝑆 ∪ 𝑣 − 𝑓 𝑆 : additional 𝑆 coverage of v on top of 𝑆 𝑇  𝑘-max cover problem ◦ find 𝑘 subsets that maximizes 𝑣 their total coverage ◦ NP-hard ◦ special case of IM problem in IC model 15
  • 16. Greedy algorithm for submodular function maximization 1: initialize 𝑆 = ∅ ; 2: for 𝑖 = 1 to 𝑘 do 3: select 𝑢 = argmax 𝑤∈𝑉∖𝑆 [𝑓 𝑆 ∪ 𝑤 − 𝑓(𝑆))] 4: 𝑆 = 𝑆 ∪ {𝑢} 5: end for 6: output 𝑆 16
  • 17. Property of the greedy algorithm  Theorem: If the set function 𝑓 is monotone and submodular with 𝑓 ∅ = 0, then the greedy algorithm achieves (1 − 1/𝑒) approximation ratio, that is, the solution 𝑆 found by the greedy algorithm satisfies: 1 ◦ 𝑓 𝑆 ≥ 1− max 𝑆 ′⊆𝑉, 𝑆′ =𝑘 𝑓(𝑆 ′ ) 𝑒 17 [Nemhauser, Wolsey and Fisher, Mathematical Programming, 1978]
  • 18. Submodularity of influence diffusion models  Based on equivalent live-edge graphs 0.3 0.1 diffusion dynamic random live-edge graph: edges are randomly removed Pr(set A is activated given Pr(set A is reachable from S in seed set S) random live-ledge graph) 18
  • 19. (Recall) active node set via IC diffusion process • yellow node set is the active node set after the diffusion process in the independent cascade model 0.3 0.1 19
  • 20. Random live-edge graph for the IC model and its reachable node set  random live-edge graph in the IC model ◦ each edge is independently selected as live with its propagation probability  yellow node set is the 0.3 active node set reachable from the 0.1 seed set in a random live-edge graph  Equivalence is straightforward 20
  • 21. (Recall) active node set via LT diffusion process 0.5  yellow node set 0.6 is the active node set after 0.3 0.3 the diffusion process in the 0.4 0.3 linear threshold 0.1 0.7 model 0.2 0.8 0.3 21
  • 22. Random live-edge graph for the LT model and its reachable node set  random live-edge graph in the LT model ◦ each node select at most one incoming edge, with probability proportional to its weight  yellow node set is the active node set reachable from the 0.3 seed set in a random live-edge graph 0.1  equivalence is based on uniform threshold selection from [0,1], and linear weight addition 22
  • 23. Submodularity of influence diffusion models (cont’d) • Influence spread of seed set 𝑆, 𝜎(𝑆): 𝜎 𝑆 = 𝐺 𝐿 Pr 𝐺 𝐿 |𝑅 𝑆, 𝐺 𝐿 |, • 𝐺 𝐿 : a random live-edge graph • Pr 𝐺 𝐿 : probability of 𝐺 𝐿 being generated • 𝑅(𝑆, 𝐺 𝐿 ): set of nodes reachable from 𝑆 in 𝐺 𝐿 • To prove that 𝜎 𝑆 is submodular, only need to show that 𝑅 ⋅, 𝐺 𝐿 is submodular for any 𝐺 𝐿 • sumodularity is maintained through linear combinations with non-negative coefficients 23
  • 24. Submodularity of influence diffusion models (cont’d) • Submodularity of 𝑅 ⋅, 𝐺 𝐿 • for any 𝑆 ⊆ 𝑇 ⊆ 𝑉, 𝑣 ∈ 𝑉 ∖ 𝑇, • if 𝑢 is reachable from 𝑣 𝑆 𝑇 but not from 𝑇, then • 𝑢 is reachable from 𝑣 but 𝑣 not from 𝑆 • Hence, 𝑅 ⋅, 𝐺 𝐿 is 𝑢 submodular • Therefore, influence marginal contribution spread 𝜎 𝑆 is of 𝑣 w.r.t. 𝑇 submodular in both IC and LT models 24
  • 25. General threshold model • Each node 𝑣 has a threshold function 𝑓𝑣 : 2 𝑉 → [0,1] • Each node 𝑣 selects a threshold 𝜃 𝑣 ∈ [0,1] uniformly at random • If the set of active nodes at the end of step 𝑡 − 1 is 𝑆, and 𝑓𝑣 𝑆 ≥ 𝜃 𝑣 , 𝑣 is activated at step 𝑡 • reward function 𝑟(𝐴(𝑆)): if 𝐴(𝑆) is the final set of active nodes given seed set 𝑆, 𝑟(𝐴(𝑆)) is the reward from this set • generalized influence spread: 𝜎 𝑆 = 𝐸[𝑟 𝐴 𝑆 ] [Kempe, Kleinberg and Tardos, KDD 2003] 25
  • 26. IC and LT as special cases of general threshold model • LT model • 𝑓𝑣 𝑆 = 𝑢∈𝑆 𝑤(𝑢, 𝑣) • 𝑟(𝑆) = |𝑆| • IC model • 𝑓𝑣 𝑆 = 1 − 𝑢∈𝑆(1 − 𝑝 𝑢, 𝑣 ) • 𝑟(𝑆) = |𝑆| 26
  • 27. Submodularity in the general threshold model  Theorem [Mossel & Roch STOC 2007]: ◦ In the general threshold model, ◦ if for every 𝑣 ∈ 𝑉, 𝑓𝑣 (⋅) is monotone and submodular with 𝑓𝑣 ∅ = 0, ◦ and the reward function 𝑟(⋅) is monotone and submodular, ◦ then the general influence spread function 𝜎 ⋅ is monotone and submodular.  Local submodularity implies global submodularity 27
  • 28. The greedy approximation algorithm for the IM problem 1 • 1− − 𝜀 -approximation greedy algorithm 𝑒 • Use general greedy algorithm framework • However, need evaluation of influence spread 𝜎 𝑆 , which is shown to be #P-hard • Use Monte Carlo simulations to estimate 𝜎 𝑆 to arbitrary accuracy with high probability 1 • For any 𝜀 > 0, there exists 𝛾 > 0, s.t. 1 − − 𝜀 - 𝑒 approximation can be achieved with 1 − 𝛾 approximate values of 𝜎 𝑆 28
  • 29. Performance of greedy algorithm weighted cascade model linear threshold model • coauthorship graph from arxiv.org, high energy physics section, n=10748, m53000 • allow parallel edges, 𝑐 𝑢,𝑣 = number of edges between u and v 1 𝑐 𝑢,𝑣 • weighted cascade: 𝑝 𝑢, 𝑣 = 1 − 1 − 𝑑𝑣 • linear threshold: 𝑤 𝑢, 𝑣 = 𝑐 𝑢,𝑣 /𝑑 𝑣 29 [Kempe, Kleinberg and Tardos, KDD 2013]
  • 30. Key takeaways for theory support  submodularity of diffusion models ◦ diminishing return ◦ shared by many models, but some model extensions may not be submodular (see Part IV)  submodular function maximization ◦ greedy hill-climbing algorithm ◦ approximation ratio (1 − 1/𝑒) 30
  • 32. Theory versus Practice ...the trouble about arguments is, they ain't nothing but theories, after all, and theories don't prove nothing, they only give you a place to rest on, a spell, when you are tuckered out butting around and around trying to find out something there ain't no way to find out... There's another trouble about theories: there's always a hole in them somewheres, sure, if you look close enough. - “Tom Sawyer Abroad”, Mark Twain 32
  • 33. Inefficiency of greedy algorithm  It take days to find 50 seeds for a 30K node graph  Too many evaluations of influence spread ◦ Given a graph of 𝑛 nodes, each round needs evaluation of 𝑛 influence spreads, totally 𝑂(𝑛𝑘) evaluations  Each evaluation of influence spread is hard ◦ Exact computation is #P-hard, for both IC and LT models [Chen et al. KDD/ICDM 2010] ◦ Monte-Carlo simulation is very slow 33
  • 34. Scalable Influence Maximization  Reduction on the number of influence spread evaluations.  Batch computation of influence spread  Scalable heuristics for influence spread computation 34
  • 35. Lazy forward optimization  Exploiting submodularity, significantly reduce # influence spread evaluations ◦ 𝑆 𝑡−1 − seed set selected after round 𝑡 − 1 ◦ 𝑣 𝑡 − selected as seed in round 𝑡: 𝑆 𝑡 = 𝑆 𝑡−1 ∪ {𝑣 𝑡 } ◦ 𝑢 is not a seed yet, 𝑢’s marginal gain 𝑀𝐺 𝑢 𝑆 𝑡 = 𝜎(𝑆 𝑡 ∪ {𝑢}) − 𝜎(𝑆 𝑡 ) ◦ by submodularlity, 𝑀𝐺 𝑢 𝑆 𝑡 ≤ 𝑀𝐺(𝑢|𝑆 𝑡−1 ) ◦ This implies, if 𝑀𝐺 𝑢 𝑆 𝑡−1 ≤ 𝑀𝐺(𝑣|𝑆 𝑡 ) for some node 𝑣, then no need to evaluate 𝑀𝐺 𝑢 𝑆 𝑡 in round 𝑡 + 1. ◦ Can be implemented efficiently using max-heap  take the top of the heap, if it has 𝑀𝐺 for the current round, then it is the new seed;  else compute its 𝑀𝐺, and re-heapify  Often, top element after round 𝑘 − 1 remains top in round 𝑘.  Up to 700 X speedup 35 [Leskovec et al., KDD 2007]
  • 36. CELF++  With each heap node 𝑢, further maintain ◦ 𝑢. 𝑚𝑔1 = 𝑀𝐺(𝑢|𝑆), 𝑆 = current seed set. ◦ 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = node with max. MG in the current iteration, among nodes seen before 𝑢; ◦ 𝑢. 𝑚𝑔2 = 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}); ◦ 𝑢. 𝑓𝑙𝑎𝑔 = iteration where 𝑢. 𝑚𝑔1 was last updated;  𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 chosen in current iteration  no need to compute 𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) in next iteration.  𝑀𝐺(𝑢|𝑆 ∪ {𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡}) can be computed efficiently along with 𝑀𝐺(𝑢|𝑆) in one run of MC. 36 [Goyal, Lu and L., WWW 2011]
  • 37. CELF++ (contd.) • Pick the top node 𝑢 in the heap • 𝑢. 𝑓𝑙𝑎𝑔 = |𝑆|  𝑢. mg1 is correct 𝑀𝐺, thus 𝑢 is the best pick of current itn; add 𝑢 to 𝑆; remove from 𝑄; • else, if 𝑢. 𝑝𝑟𝑒𝑣_𝑏𝑒𝑠𝑡 = 𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑 & 𝑢. 𝑓𝑙𝑎𝑔 = |𝑆| − 1 𝑢. 𝑚𝑔1  𝑢. 𝑚𝑔2 (no need to compute 𝑀𝐺(𝑢|𝑆 + 𝑙𝑎𝑠𝑡_𝑠𝑒𝑒𝑑)) 37
  • 38. Batch computation of influence spread  Based on equivalence to reachable set in live-edge graphs ◦ Generate a random live-edge graph ◦ batch computation of size of reachable set of all nodes [Cohen, JCSS 1997] ◦ repeat above to increase accuracy  In conflict with lazy-forward optimization ◦ MixedGreedy: In the first round uses reachability- based batch computation; in remaining rounds, use lazy forward optimization 38 [Chen, Wang and Yang, KDD 2009]
  • 39. Scalable heuristics for influence spread computation  MIA: for IC model [Chen, Wang and Wang, KDD 2010]  LDAG: for LT model [Chen, Yuan and Zhang, ICDM 2010]  Features: ◦ Focus on a local region ◦ Find a graph spanner allow efficient computation ◦ Batch update of marginal influence spread 39
  • 40. Maximum Influence Arborescence (MIA) Heuristic  Local influence regions of node 𝑣 ◦ For every nodes 𝑢, find the maximum influence 𝑢 path (MIP) from 𝑢 to 𝑣, ignore it if Pr(𝑝𝑎𝑡ℎ) < 𝜆 (𝜆 is a small threshold value) ◦ all MIPs to 𝑣 form its 0.3 maximum influence in- 0.1 𝑣 arborescence (MIIA) ◦ MIIA can be efficiently computed ◦ influence to 𝑣 is computed over MIIA 40
  • 41. MIA Heuristic III: Computing Influence through the MIA structure  Recursive computation of activation probability 𝑎𝑝(𝑢) of a node 𝑢 in its in-arborescence, given a seed set 𝑆 (𝑁 𝑖𝑛 (𝑢) is the in-neighbors of 𝑢 in its in-arborescence) 41
  • 42. MIA Heuristic IV: Efficient updates on incremental activation probabilities 𝑀𝐼𝐼𝐴(𝑣)  𝑢 is the new seed in 𝑀𝐼𝐼𝐴(𝑣)  Naive update: for each candidate 𝑢 𝑤, redo the computation in the previous page to compute 𝑤’s marginal influence to 𝑣 𝑤 2 ◦ 𝑂( 𝑀𝐼𝐼𝐴 𝑣 )  Fast update: based on linear relationship of activation probabilities between any node 𝑤 and root 𝑣, update marginal influence of all 𝑤’s to 𝑣 in two 𝑣 passes ◦ 𝑂(|𝑀𝐼𝐼𝐴(𝑣)|) 42
  • 43. Experiment Results on MIA heuristic Influence spread vs. seed set running time size 103 times speed up close to Greedy, 49% better than Degree, 15% better than DegreeDiscount Experiment setup: • 35k nodes from coauthorship graph in physics archive • influence probability to a node 𝑣 = 1 / (# of neighbors of 𝑣) • running time is for selecting 50 seeds 43
  • 44. Time for some simplicity 44
  • 45. The SimPath Algorithm Vertex Cover Look ahead Optimization optimization Improves the efficiency Improves the efficiency in the first iteration in the subsequent iterations In lazy forward manner, in each iteration, add to the seed set, the node providing the maximum marginal gain in spread. [Goyal, Lu, & L. ICDM 2011] Simpath-Spread 45 Compute marginal gain by enumerating simple paths
  • 46. Estimating Spread in SimPath (2)  Thus, the spread of a node can be computed by enumerating simple paths starting from the node. 0.4 Influence of x on z x z Influence of x on y 0.3 0.2 Influence of x on x itself 0.1 0.5 y Total influence of node x is 1.96 Influence Spread of node x is = 1 + (0.3 + 0.4 * 0.5) + (0.4 + 0.3 * 0.2) = 1.96 46
  • 47. Estimating Spread in SimPath (3) Total influence of the seed set {x, y} is 2.6 Influence of node y in a 0.4 subgraph that does not x z 0.2 contain x 0.3 0.1 0.5 Influence of node x in a y subgraph that does not contain y Let the seed set S = {x,y}, then influence spread of S is  (S )   V  y ( x)   V  x ( y)  1  0.4  1  0.2  2.6 47
  • 48. Estimating Spread in SimPath (4) Enumerating all simple paths is #P hard Thus, influence can be estimated by enumerating all simple paths On slightly starting from the seed set. different subgraphs The majority of influence flows in a 48 small neighborhood.
  • 49. Look Ahead Optimization (1/2)  As the seed set grows, the time spent in estimating spread increases. ◦ More paths to enumerate.  A lot of paths are repeated though.  The optimization avoids this repetition intelligently.  A look ahead parameter ‘l’. 49
  • 50. Look Ahead Optimization (2/2) l = 2 here Seed Set Si after iteration i Let y and x be prospective seeds from CELF queue y .... x A lot of paths are enumerated repeatedly 1. Compute spread achieved by S+y 2. Compute spread achieved by S+x 50
  • 52. Other means of speedup  Community-based approach [Wang et al. KDD 2010]  Network sparsification [Mathioudakis et al, KDD 2011]  Simulated Annealing [Jiang et al. AAAI 2011] 52
  • 53. Community-based influence maximization  Use influence parameters to partition graph into communities ◦ different from pure structure-based partition  Influence maximization within each community  Dynamic programming for selecting top seeds  Orthogonal with scalable influence computation algorithms 53 [Wang et al. KDD 2010]
  • 54. Sparsification of influence networks  Remove less important edges for influence propagation  Use action traces to guide graph trimming  Orthogonal to scalable influence computation algorithms 54 [Mathioudakis et al, KDD 2011]
  • 55. Simulated annealing  Following simulated annealing framework ◦ not going through all nodes to find a next seed as in greedy ◦ find a random replacement (guided by some heuristics)  Orthogonal to scalable influence computation algorithms 55
  • 56. Key takeaways for scalable influence maximization  tackle from multiple angles ◦ reduce #spread evaluations: lazy-forward ◦ scale up spread computation:  use local influence region (LDAG, MIA, SimPath)  use efficient graph structure (trees, DAGs)  model specific optimizations (simple path enumerations) ◦ graph sparsifications, community partition, etc.  upshot: can scale up to graphs with millions of nodes and edges, on a single machine 56