A Novel Target Marketing Approach based on Influence Maximization

A Novel Target Marketing
Approach based on
Influence Maximization

Motivation
• “Businesses on Facebook and Twitter are reaching only 2% of
their fans and only 0.07% of follower actually interact with
their post.” – Forrester Study, Nov. 17, 2014
• Local business owner need to target market people nearby, to
increase footfall
• Traditional methods of marketing like leafleting are inefficient
• “82% people check review online before spending money on
product/service” – Nielsen Study, July 1, 2013
• Local businesses can use online review websites like Yelp,
Zomato to target customers effectively.

Problem Statement
• “To develop a novel approach for Identification of influential customers for target
marketing through Influence Maximization.”
Objectives
Fig. 1

Influence Maximization
• It is problem to find K vertices in the graph such that under the diffusion model, the expected
number of vertices influenced by the K vertices (referred to as influence spread) is the largest
possible
• The Independent Cascade (IC) model is the simplest diffusion model. If j is a neighbor of i
then the probability of j being activated by i is:
Eq. 5
i j
pij
wij

Existing Work
• Kemp et al. were first to study the optimization problem of influence maximization
• Proved it to be a NP-hard problem, gave a time inefficient Greedy algorithm
• GeneralGreedy repeats k rounds: in the ith round, select a node v that provides the largest
increase in influence spread
• In each round influence spread is calculated by Monte-Carlo simulations.

Cont’d
• Chen et al. developed NewGreedyIC, an improved Greedy algorithm
• NewGreedyIC also runs Monte-Carlo simulations, but in each iteration it generates a random
graph G’ by randomly removing edges from the existing graph G. This makes the size of graph
in that iteration smaller and hence is faster than GeneralGreedy method

Cont’d
• Chen et al. also proposed a more efficient DegreeDiscount method
• DegreeDiscount method doesn’t run Monte-Carlo simulations, it uses degree discount
heuristics where it is assumed that the spread increases with the degree of nodes.
• It gives discount in the degree of a node by one if any of its neighbors have already been
selected in the set of active nodes.
• It is 6 time faster than NewGreedyIC. It gives influence spread slightly lower than
NewGreedyIC.
[link]
A
3
5
6
A
2
4
5

Inspiration from existing work
• DegreeDiscount method eliminates need for Monte-Carlo simulations by using degree
heuristic.
• This reduces running time compared to NewGreedy by manifold.

Research Gap
• DegreeDiscount doesn’t take into account
the overlapping part of spread of two
influential nodes
• Due to which the total influence spread will
be lesser than sum of their individual
influence spread
• Our novel algorithm adds that node as kth
node which maximize difference between
spread of already selected k-1 nodes and
that of k nodes after addition
• C-A has more difference in spread than B-A.
A
B
C
A
B
C

Our
Approach:
ANIM
(A Novel Influence
Maximization approach)
Fig. 5

Yelp Dataset Description
Name Attributes
User {user_id, name, review_count, average_stars, friends, yelping_since, elite}
Business {business_id, name, review_count, stars, address, latitude, longitude, categories}
Review {user_id, business_id, text, stars, votes}
Users 252,898
User-user edges(friendship) 955,999
Businesses 42,153
Reviews 1,125,458

Data and Preprocessing
• The semi-structured data obtained from Yelp is stored in a Document Oriented database.
• Preprocessing is done to clean the data.
• Social network is formed from users who have reviewed similar nearby businesses.
• Users are represented as nodes in the network, and two nodes are joined by an edge only if
they are friends.

Edge weight calculation in network
• The weight of an edge between two users X and Y is calculated by the formula:
• w1 is the normalized count of mutual friends between X and Y
where nx and ny are the list of friends of user X and user Y.
• w2 signifies the similarity in opinion of user X and user Y
where and
• Xpos is the set of businesses that X rated positively; Xneg is the set of businesses that X rated
negatively.
• We have considered a rating of 3 or below as negative review, and 4 or above as positive
review. [old]
Eq. 9
Eq. 7
Eq. 8

Propagation probability calculation in network
• Propagation probability of an edge going from u to v was calculated by:
• Strength of an edge between u and v is the average of influence of u and v
• Where
• For popularity we used two attributes of the user, reviewCount and averageStars
• The clustering value is defined as the closeness of a node to a cluster of highly interconnected
nodes.
• C(v) is clustering value of a node given by:
• Quartiles were used for normalization.[link]
Eq. 17
Eq. 16
Eq. 15
Eq. 10
Eq. 11

Our novel approach: spreadHeuristicIC Algorithm
• Proposed algorithm is a greedy algorithm.
• It iteratively finds a node and add it to the set S of top-K influential nodes.
• While adding kth node to set S, it finds the node that maximize the difference between
spread of already selected k-1 nodes and spread of set S after adding that kth node.
A
B
C

Complexity Analysis
• The algorithm take O(V) steps in line 3 and line 4 take O(T) time, where T is the time to
compute the coverage of a node in the graph G, and it takes O(IE) time (where I is the number
of simulations for the Independent Cascade model, and E is the number of edges in graph G).
• From lines 7-9, complexity of each line is O(VlgV) when we use sorting for union operation.
• So, overall complexity of the algorithm is O(K(VIE + VlgV)).

Experiments and Results
• We have conducted experiments for our algorithm and various other algorithms (i.e.- Degree
Discount algorithm, Single Discount algorithm, Degree Discount algorithm, General Greedy
algorithm etc.) on Yelp’s network.
• We find that the Spread Heuristic based algorithm has more influence spread compared to
the other algorithms. The ranking based on influence spread comes out to be:
spreadHeuristicIC > newGreedyIC > degreeDiscountIC > random

Cont’d
Influence spread for G with n=1617, E=2058
0
50
100
150
200
250
300
0 10 20 30 40 50 60 70 80
InfluenceSpread
K
degreeDiscountIC degreeDiscountIC2 degreeDiscountStar
degreeHeuristic degreeHeuristic2 singleDiscount
highestDegree newGreedyIC randomHeuristic
spreedHeuristic
Influence spread for G with n=4292, E=8147
0
50
100
150
200
250
0 10 20 30 40 50 60 70 80
InfluenceSpread
K
spreadHeuristic
Fig. 9Fig. 7

Cont’d
Run time for G with n=1617, E=2058 Run time for G with n=4292, E=8147
-10
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80
RunningTime(sec)
K
spreadHeuristic
0
50
100
150
200
250
300
350
0 10 20 30 40 50 60 70 80
RunningTime(sec)
K
spreedHeuristic
Fig. 10Fig. 8

Conclusion
• With respect to initial aims and objectives of this project, the final outcome is fairly
successful.
• After series of experiments, we concluded that our algorithm outperforms existing influence
maximization algorithms.
• We developed a dashboard for the businesses to visualize the influential users and their
spread among the people nearby.

References
[1] M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113.
[2] Blondel, Vincent D., et al. "Fast unfolding of communities in large networks. "Journal of Statistical Mechanics: Theory and Experiment 2008.10 (2008):
P10008.
[3]. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 420–429, 2007.
[4] “Yelp Dataset,” https://www.yelp.com/dataset challenge/dataset.
[5]. D. Kempe, J. Kleinberg, E. Tardos. Maximizing the Spread of Influence through a Social Network. Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery
and Data Mining, 2003.
[6]. M. Richardson, P. Domingos. Mining Knowledge-Sharing Sites for Viral Marketing. Eighth Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
[7] J. Goldenberg, B. Libai, E. Muller. Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth. Marketing Letters 12:3(2001),
211-223
[8] M. Granovetter, Threshold models of collective behavior, the American Journal of sociology, vol. 83, no. 6, pp.1420-1443, May 1978
[9] Chen, Wei, Yajun Wang, and Siyu Yang. "Efficient influence maximization in social networks." Proceedings of the 15th ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM, 2009.
[10] Kempe, David, Jon Kleinberg, and Éva Tardos. "Maximizing the spread of influence through a social network." Proceedings of the ninth ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM, 2003.
[11] Wang, Yu, et al. "Community-based greedy algorithm for mining top-k influential nodes in mobile social networks." Proceedings of the 16th ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM, 2010.
[12] Saito, Kazumi, Ryohei Nakano, and Masahiro Kimura. "Prediction of information diffusion probabilities for independent cascade model." Knowledge-Based
Intelligent Information and Engineering Systems. Springer Berlin Heidelberg, 2008.
[13] Newman, Mark EJ. "Analysis of weighted networks." Physical Review E 70.5 (2004): 056131.

A Novel Target Marketing Approach based on Influence Maximization

Recomendados

Recomendados

Más contenido relacionado

Similar a A Novel Target Marketing Approach based on Influence Maximization

Similar a A Novel Target Marketing Approach based on Influence Maximization (20)

A Novel Target Marketing Approach based on Influence Maximization