Instagram has become a growing platform for users to share media reflecting their interests like food, travel, fashion etc. In addition, it is heavily used by marketers and influencers to reach out to potential audience by advertising their content. The number of likes received on posts reflects social reputation of the users, and in some cases, social media influencers with a large reach are also compensated by marketers to promote products. This has led to users artificially bolstering the likes they receive to project an inflated social worth. Our analysis on over 20,000 likes spanning 18,091 posts from 1000 users, reveals that fake engagement on Instagram has distinct features. In this study, we build an automated mechanism to detect fake likes on Instagram, and estimate the true reach of a user. We achieve a high precision of 92% to detect fake likes. We further validate the efficiency of our model to perform an in-the-wild study and label 1,34,021 like instances, identifying 8,845 previously unknown fake likes.
6. Why Fake Likes?
- ‘Influencers’ compensated on engagement: likes and
comments
- Incentive to artificially inflate engagement metrics by
purchasing likes, like markets or like back networks
- Inflated like count fool potential brand or advertisers into
hiring ‘unworthy’ Influencers
6
8. - How do we automatically detect fraudulent likes on
Instagram?
Core Thesis Question
Organic Likes
- Likers who engage with content
- Genuine reach
Inorganic Likes
- Likers bought from marketplaces
- Artificial reach
- Understanding properties of genuine liking behaviour B : {b1
, b2
, …, bn
}
- Reducing the effect of likes which do not match B
8
9. Thesis Outline
- Research Aim
- Data Collection
- Analysis of Fake Likes
- Machine Learning Classifier to Detect Fake Likes
- Estimating Reach of Users
- Conclusion
9
10. What is a Like Instance?
- Given a poster S whose post p has been liked by liker L,
we define a like instance as the tuple (L, p, S)
10
11. Research Aim
- Find out the features of liker L, post p and S, to
determine the probability of liker L genuinely liking
that particular post p.
- Identify true reach of poster by determining fake
likes received on the posted content.
11
12. Possible Reasons for Genuine Liking
Homepage:
followees’ posts
Explore:
Instagram’s
Recommendations
Likes of followees
12
13. Possible Reasons for Genuine Liking
Based on photos
you liked
Based on people
you follow
Similar to accounts
you interact with
Explore
13
14. Possible Reasons For Genuine Liking
- Poster is a followee
- Poster is a followee of a followee
- Topical interests in common
14
15. How to get Fake Likes
- Marketplaces
- Like Back collusion
networks
- Link Farming hashtags
- Bots
15
16. Architecture Diagram 1) Liker meta and last 18 posts
2) Poster meta and last 18 posts
3) Post meta
Fake Likes
Other Likes
Training Data
Machine
Learning
Model
Random
unknown Likes
Fake
Not Fake
Features
Features
16
1 - α
α
17. Data Collection: Fake Likes
Purchased Fake
Likes
Fake Likes 1: Likes given
by Honeypot victims
Likes on videos
with views = 0
Honeypot
Fake Likes 2
victim?
Instagram
Featured users
Snowball
Sample to
1M
Random
sample of
500
Honeypot Other Likes
not
victim?
17
Instagram
Featured users
Snowball
Sample to
1M
Random
sample of
500
Honeypot Other Likes
not
victim?
Data Collection: Fake Likes
Purchased Fake
Likes
Fake Likes 1: Likes given
by Honeypot victims
Likes on videos
with views = 0
Honeypot
Fake Likes 2
victim?
17
18. Data Collection: Fake Likes
- Honeypots to trap fake likers bought through a service
- If user falls for honeypot then we monitor their liking
behaviour
Honeypot
18
19. Instagram
Featured users
Snowball
Sample to
1M
Random
sample of
500
Honeypot Other Likes
not
victim?
Data Collection: Fake Likes
Purchased Fake
Likes
Fake Likes 1: Likes given
by Honeypot victims
Likes on videos
with views = 0
Honeypot
Fake Likes 2
victim?
19
20. Data Collection: Other Likes
Purchased Fake
Likers
Fake Likes 1: Likes given
by Honeypot victims
Likes on videos
with views = 0
Honeypot
Fake Likes 2
victim?
Instagram
Featured users
Snowball
Sample to
1M
Random
sample of
500
Honeypot Other Likes
not
victim?
20
21. Data Collection: Other Likes
- Randomly sample 500 users from 1M users who are not
honeypot victims
#Likes #Posts #Likers #Posters
Fake 10,417 8,408 500 7,715
Other 11,810 11,644 500 7,631
21
22. Thesis Outline
- Research Aim
- Data Collection
- Analysis of Fake Likes
- Machine Learning Classifier to Detect Fake Likes
- Estimating Reach of Users
- Conclusion
22
23. Understanding Fake Likes
- Hypotheses indicative of fake liking behaviour
- Validate with 2 sample KS test
- Network effect:
- Liker is follower of poster
- Liker is follower of follower of poster
23
24. Liker is Follower of Poster
- Green edges: liker relationship
- Red edges: liker - follower
relationship
- Other likes have a higher
proportion of follower-likers
24
Other Likes
Fake Likes
28. Extracting Topics
- Bio, post text and post image
- Wikification and Densecap for images
28
Image topics
Post caption topics
29. Interest Overlap
- A user will like a post if she shares topical interests with
the post
- Affinity
- non-commutative
29
30. Affinity
- Affinity outperforms Jaccard distance in terms of
discernibility
- post image topics strong indicators of genuine liking
30
31. - Our metric is able to capture semantic relationship
between entities compared to other traditional distance
metrics
- 90% of other likes have an average affinity of 0.5
- 90% of fake likes have an average affinity of 0.74
0.740.5
31
32. Other Features
- Celebrities tend to get more likes (engagement)
- Genuine likers will keep coming back - repeated likers
- Link Farming hashtags: #like4like, #l4l, #like2follow
- Topical hashtags
- Posting activity of liker (Badri et al, CIKM’16) and poster
- Profile picture of liker: egghead profiles (cheap to
create)
32
33. Automatic Detection of Fake Likes
- Using features described and a set of ML classifiers
- Fake likes : Other likes ratio → 1:2
- SVM RBF kernel gives best performance
33
34. Classification Model
- Performance
- Manually look at 100 false negatives and find that 70 of
them had high topical overlap
- Liker interest set was small: affinity metric limitation
Precision Recall F1-score
0 0.93 0.96 0.945
1 0.895 0.825 0.86
total 0.92 0.925 0.92
34
35. In the Wild Experiment
- random 1,34,669 like instances
- Categorize posts into : food, fashion, outdoors,
merchandise, people, gadgets, pets, captioned
- We find 8,557 fake likes
- Manually analyze 100 of these and find 78 to be fake
35
36. Thesis Outline
- Research Aim
- Data Collection
- Analysis of Fake Likes
- Machine Learning Classifier to Detect Fake Likes
- Estimating Reach of Users
- Conclusion
36
37. - Enable advertisers to make better decisions
- Reduce the effect of fake likes a poster may have
received
- Measure Deviation in reach
Reach Estimation
37
38. Who receives fake likes?
- Users posting about merchandise, outdoors (including
travel posts) and people (posts containing faces) have
highest deviation from the projected reach.
38
39. Who receives fake likes?
39
merchandise, outdoors (including travel posts) and people
Most posters do not have high deviation while
some users have very high deviation
40. Do Popular Users have more Fake Likes?
- No, users with lower follower counts who maybe trying to
gain a following higher deviation
40
‘Micro Influencers’ have higher deviation
41. Conclusion
- Automated method to detect fake like instances
- Performs well to identify unseen fake likes on Instagram.
- Find true reach of a user
- Helps advertisers and brands identify users with genuine,
meaningful reach
41
42. Challenges, Limitations and Future Work
- Availability of labeled data, approximations using
honeypot
- Data collection constraints, integrate network features
- Improve affinity, improve precision(dynamic features)
- Fine grained topical recommendations for brands and
advertisers 42
43. Acknowledgement
- Anupama Aggarwal, PhD Scholar, IIIT Delhi
- Committee members
- Srishti Gupta, Divyansh Agarwal, Neha Jawalkar, Sonu
Gupta, Kushagra Bhargava
- Siddharth Singh, Shiven Mian
- Members of Precog
- Family and friends
43