SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
Detecting Fake Profiles On Online Matrimony
Vaibhav Garg Dr. Ponnurangam Kumaraguru (Chair)
linkedin.com/in/vaibhav-garg-
0a708899
facebook.com/in/vaibhav.gar
g.104203
@rk_check
2
Thesis Committee
◆ Dr. Arun Balaji Buduru, IIIT Delhi
◆ Dr. Siddhartha Asthana, United Health Group (Optum)
◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
3
4
Core Thesis Question
How to automatically detect fake profiles on online
matrimony ?
5
Demo
* Due to the privacy policy of the company, we can not give demo on the actual
company’s portal.
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
6
7
Register Suggested
View
Profile
Start
Conversation
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
8
9
About the Data
◆ To dig into the problem, we chose a use case of India’s
leading matrimony website
◆ Ground Truth: 5,40,737 genuine profiles and very less
number of fake profiles.
◆ Data of Categorical Attributes : age, body type, caste, city,
country, education, height, income, manglik, marital status,
mother tongue, occupation, religion.
Categorical Data
10
Attribute Number of
Categories
Different Categories
Caste 470 Hindu: Arora, Hindu: Aggarwal,
Hindu: Brahmin etc.
Height 37 5’0, 5’1, 5’2, 5’3 etc.
Income 25 Rs. 0 - 1 Lakh, Rs 1-2 Lakh etc
Mother Tongue 42 Telugu, Bengali, Hindi-Delhi etc.
Occupation 69 Doctor, Analyst, IT-Engineer etc.
Categorical Data
11
Attribute Number of
Categories
Different Categories
Religion 10 Hindu, Muslim, Christian etc.
Body Type 4 Slim, Average, Athletic, Heavy
Country 214 India, Afghanistan, Australia etc.
City 3683 Delhi, UP, Ahmedabad etc.
Manglik 2 Manglik, Non-Manglik
Categorical Data
12
Attribute Number of
Categories
Different Categories
Marital Status 4 Never Married, Divorcee,
Separated and Widowed
Education 53 B.A, B.Com, B.Tech etc.
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
13
14
Behaviour Heterogeneity
C1
Genuine
Profile
Fake
Profile
C2
C3
C4
C8
C7
C6
C5
C1
C2 C3
15
Inconsistent Edits
Edit Done After 4 Days of Registration
16
Profile Inconsistency
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
17
18
Behavioural Trend for Caste Attribute
Experimented on 100 fake and 100 genuine profiles belonging to
Aggarwal Community
19
Behavioural Trend for Marital Status Attribute
Experimented on 100 fake and 100 genuine profiles belonging to Non
Married Community
20
Static Windows
User’s First 8 days Activity
First 12
hours
Day 0
… . . . . .
0th window 1st window
Day 0
Activity
Day 1
Activity
Day 6
Activity
Day 7
Activity
… . . . . .
Last 12
hours
Day 0
First 12
hours
Day 7
Last 12
hours
Day 7
15th window 16th window
21
Static Windows and Feature Generation
22
Which Model to Choose ?
Model Architecture
23
Output
Features
Offline Results on Behaviour Features
24
Confusion Matrix Predicted Fake Predicted Clean
Actual Fake 2953 852
Actual Clean 168 17799
Above results are obtained on 3805 fake profiles and 17967 clean profiles
Drawback: The user has to be 8 days old on portal to be scrutinized through this approach
LIVE Results : True Positives
25
LIVE Results : False Negatives
26
Edit and Profile features needs to be incorporated !!
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
27
28
Edit Summary for Mother Tongue Attribute
Experimented on 100 fake and 100 genuine profiles which registered
with Hindi-UP category
29
Edit Summary for Income Attribute
Experimented on 100 fake and 100 genuine profiles which registered
with Rs 5-7.5 Lakh category
30
Concept of Dynamic Windows
User’s Active Lifetime on portal = T seconds
User’s total initiates = N
Time period of first N/W
initiates
If we select no of windows = W
Time period of next N/W
initiates
Time period of last N/W
initiates
… . . . . .
0th window 1st window last window
Feature Designing
◆ Profile Features : One hot vector of profile attributes
◆ Behavior Features : In dynamic time windows, each feature stores the
proportion of initiates sent to a particular category of attribute
◆ Edit Features : In dynamic time windows, each feature stores the proportion of
time user has spent on that particular category of attribute
◆ Other Raw Features : In each window, we also store the total interests sent
and time duration of that window.
31
32
Feature Designing
0th window
+ + . . . .
Nth window
33
Experimenting with number of dynamic windows
No of Windows Precision Recall Accuracy
Using 5 windows 0.170 0.510 0.8830
Using 4 window 0.192 0.635 0.8891
Using 3 windows 0.230 0.780 0.8977
Using 2 windows 0.242 0.804 0.8975
Using 1 window 0.266 0.866 0.8972
34
Feature Selection on Best Model
Method Precision Recall Accuracy
Best Model 0.266 0.866 0.8972
Best Model + Feature
Selection
0.269 0.894 0.9083
Criteria Used = (Entropy for fake) - (Entropy for clean)
(Entropy for fake)
Precision is still low !!
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
35
36
Affinity Features along with Behaviour Features
◆ An Affinity score between two categories i and j is the
likelihood score of a person having category i to send
interests to user having category j
◆ Affinity scores when incorporated with behaviour features
compare between how a user is expected to behave and
how he/she actually behaves on the platform
37
Affinity Features
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
38
39
Proposed Full length Feature Vector
Profile Features
Behaviour Features in
Time windows
Affinity Features
Edit Features in
Time windows
+ + +
40
Final Model Architecture
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
41
42
Final Results
Method Precision Recall Accuracy
Proposed Features +
Autoencoder
0.341 0.902 0.9176
Product team demanded for 25% precision at 60% recall !!
Outline
◆ About Online Matrimony
◆ About the Data
◆ Characteristics of a fake profile
◆ Using only Behaviour Trends
◆ Using Behavior, Edit and Profile Information
◆ Incorporating Community features
◆ Feature Engineering: Proposed Full length feature vector
◆ Final Results
◆ Conclusion
43
Conclusion
◆ We first studied the distinction in behaviour, profile and edit
pattern between genuine and fake users
◆ We incorporated these characteristics in the form of
features using dynamic time windows.
◆ We then trained the autoencoder model to detect fake
profiles on online matrimony.
44
45
Real World Impact
Week 1 Week 2
46
Real World Impact
Week 3 Week 4
Limitations and Future Work
◆ More number of samples for training autoencoder can lead
to more generalisation.
◆ We detected fake profiles using categorical attributes only.
Text spamming can be explored.
47
Acknowledgement
◆ Committee Members
◆ Hunny, Adhish from InfoEdge India Ltd.
◆ Members of Precog family
◆ Family and friends
48
49
References
◆ https://timesofindia.indiatimes.com/city/hyderabad/nigerian-held-for-matrim
onial-fraud-in-hyderabad/articleshow/66939563.cms
◆ https://www.hindustantimes.com/mumbai-news/woman-creates-fake-profil
e-on-matrimony-site-cheats-mumbai-man-of-rs23-lakh/story-KHLj4zPWI8U
Gv31YM5A8tK.html
◆ https://timesofindia.indiatimes.com/city/mangaluru/online-matrimony-frauds
-on-the-rise-in-mangaluru/articleshow/66102334.cms
◆ https://timesofindia.indiatimes.com/city/pune/matrimonial-fraud-on-the-rise-
more-than-50-cases-registered-this-year/articleshow/60049950.cms
◆ https://dl.acm.org/citation.cfm?id=2689747
◆ https://link.springer.com/book/10.1007%2F978-3-319-20466-6
◆ https://dl.acm.org/citation.cfm?id=3106489
Thanks!
vaibhav17064@iiitd.ac.in
50

Más contenido relacionado

La actualidad más candente

Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
Santosh Kumar
 

La actualidad más candente (20)

1. introduction to cyber security
1. introduction to cyber security1. introduction to cyber security
1. introduction to cyber security
 
Cyber crime ppt new
Cyber crime ppt newCyber crime ppt new
Cyber crime ppt new
 
Introduction to cyber security
Introduction to cyber security Introduction to cyber security
Introduction to cyber security
 
Deep fake
Deep fakeDeep fake
Deep fake
 
Cyber safety and cyber security
Cyber safety and cyber securityCyber safety and cyber security
Cyber safety and cyber security
 
Cyber security
Cyber securityCyber security
Cyber security
 
Cyber security
Cyber securityCyber security
Cyber security
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
 
Introduction to Cybersecurity Fundamentals
Introduction to Cybersecurity FundamentalsIntroduction to Cybersecurity Fundamentals
Introduction to Cybersecurity Fundamentals
 
Cyber security awareness presentation
Cyber security awareness  presentationCyber security awareness  presentation
Cyber security awareness presentation
 
cybersecurity- A.Abutaleb
cybersecurity- A.Abutalebcybersecurity- A.Abutaleb
cybersecurity- A.Abutaleb
 
Cyber attacks
Cyber attacks Cyber attacks
Cyber attacks
 
Face Recognition Technology
Face Recognition TechnologyFace Recognition Technology
Face Recognition Technology
 
Cyber Security
Cyber SecurityCyber Security
Cyber Security
 
Cyber crime ppt
Cyber crime  pptCyber crime  ppt
Cyber crime ppt
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 
Online privacy & security
Online privacy & securityOnline privacy & security
Online privacy & security
 
Chapter2 the need to security
Chapter2 the need to securityChapter2 the need to security
Chapter2 the need to security
 
Cyber security standards
Cyber security standardsCyber security standards
Cyber security standards
 
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
 

Similar a Detecting Fake Profiles On Online Matrimony

Simran confidentiality protection in crowdsourcing
Simran confidentiality protection in crowdsourcingSimran confidentiality protection in crowdsourcing
Simran confidentiality protection in crowdsourcing
IIIT Hyderabad
 
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
Selvaraj Seerangan
 

Similar a Detecting Fake Profiles On Online Matrimony (20)

Odata V4 : The New way to REST for Your Applications
Odata V4 : The New way to REST for Your Applications Odata V4 : The New way to REST for Your Applications
Odata V4 : The New way to REST for Your Applications
 
Predictive Maintenance - Predict the Unpredictable
Predictive Maintenance - Predict the UnpredictablePredictive Maintenance - Predict the Unpredictable
Predictive Maintenance - Predict the Unpredictable
 
Machine learning specialist ver#4
Machine learning specialist ver#4Machine learning specialist ver#4
Machine learning specialist ver#4
 
Power of Flows and Prepare for Salesforce Admin Certification
Power of Flows and Prepare for Salesforce Admin CertificationPower of Flows and Prepare for Salesforce Admin Certification
Power of Flows and Prepare for Salesforce Admin Certification
 
Presentation
PresentationPresentation
Presentation
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 reprise
 
[Pinto] Is my SharePoint Development team properly enlighted?
[Pinto] Is my SharePoint Development team properly enlighted?[Pinto] Is my SharePoint Development team properly enlighted?
[Pinto] Is my SharePoint Development team properly enlighted?
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
 
Login & Registration defect taxonomy v1.0
Login & Registration defect taxonomy v1.0Login & Registration defect taxonomy v1.0
Login & Registration defect taxonomy v1.0
 
Hitachi ID Identity Manager
Hitachi ID Identity ManagerHitachi ID Identity Manager
Hitachi ID Identity Manager
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
 
Ria Sankar on Building AI Products
Ria Sankar on Building AI ProductsRia Sankar on Building AI Products
Ria Sankar on Building AI Products
 
Simran confidentiality protection in crowdsourcing
Simran confidentiality protection in crowdsourcingSimran confidentiality protection in crowdsourcing
Simran confidentiality protection in crowdsourcing
 
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
 
Chapter-5.pdf
Chapter-5.pdfChapter-5.pdf
Chapter-5.pdf
 
Demise of test scripts rise of test ideas
Demise of test scripts rise of test ideasDemise of test scripts rise of test ideas
Demise of test scripts rise of test ideas
 
Chapter 5 IoT Design methodologies
Chapter 5 IoT Design methodologiesChapter 5 IoT Design methodologies
Chapter 5 IoT Design methodologies
 
A Busy Lawyer’s Guide to Managing Documents and Court Forms
A Busy Lawyer’s Guide to Managing Documents and Court FormsA Busy Lawyer’s Guide to Managing Documents and Court Forms
A Busy Lawyer’s Guide to Managing Documents and Court Forms
 
Testistanbul 2016 - Keynote: "Why Automated Verification Matters" by Kristian...
Testistanbul 2016 - Keynote: "Why Automated Verification Matters" by Kristian...Testistanbul 2016 - Keynote: "Why Automated Verification Matters" by Kristian...
Testistanbul 2016 - Keynote: "Why Automated Verification Matters" by Kristian...
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 

Más de IIIT Hyderabad

Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
IIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
IIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
IIIT Hyderabad
 

Más de IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Último

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

Detecting Fake Profiles On Online Matrimony

  • 1. Detecting Fake Profiles On Online Matrimony Vaibhav Garg Dr. Ponnurangam Kumaraguru (Chair) linkedin.com/in/vaibhav-garg- 0a708899 facebook.com/in/vaibhav.gar g.104203 @rk_check
  • 2. 2 Thesis Committee ◆ Dr. Arun Balaji Buduru, IIIT Delhi ◆ Dr. Siddhartha Asthana, United Health Group (Optum) ◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
  • 3. 3
  • 4. 4 Core Thesis Question How to automatically detect fake profiles on online matrimony ?
  • 5. 5 Demo * Due to the privacy policy of the company, we can not give demo on the actual company’s portal.
  • 6. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 6
  • 8. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 8
  • 9. 9 About the Data ◆ To dig into the problem, we chose a use case of India’s leading matrimony website ◆ Ground Truth: 5,40,737 genuine profiles and very less number of fake profiles. ◆ Data of Categorical Attributes : age, body type, caste, city, country, education, height, income, manglik, marital status, mother tongue, occupation, religion.
  • 10. Categorical Data 10 Attribute Number of Categories Different Categories Caste 470 Hindu: Arora, Hindu: Aggarwal, Hindu: Brahmin etc. Height 37 5’0, 5’1, 5’2, 5’3 etc. Income 25 Rs. 0 - 1 Lakh, Rs 1-2 Lakh etc Mother Tongue 42 Telugu, Bengali, Hindi-Delhi etc. Occupation 69 Doctor, Analyst, IT-Engineer etc.
  • 11. Categorical Data 11 Attribute Number of Categories Different Categories Religion 10 Hindu, Muslim, Christian etc. Body Type 4 Slim, Average, Athletic, Heavy Country 214 India, Afghanistan, Australia etc. City 3683 Delhi, UP, Ahmedabad etc. Manglik 2 Manglik, Non-Manglik
  • 12. Categorical Data 12 Attribute Number of Categories Different Categories Marital Status 4 Never Married, Divorcee, Separated and Widowed Education 53 B.A, B.Com, B.Tech etc.
  • 13. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 13
  • 15. 15 Inconsistent Edits Edit Done After 4 Days of Registration
  • 17. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 17
  • 18. 18 Behavioural Trend for Caste Attribute Experimented on 100 fake and 100 genuine profiles belonging to Aggarwal Community
  • 19. 19 Behavioural Trend for Marital Status Attribute Experimented on 100 fake and 100 genuine profiles belonging to Non Married Community
  • 20. 20 Static Windows User’s First 8 days Activity First 12 hours Day 0 … . . . . . 0th window 1st window Day 0 Activity Day 1 Activity Day 6 Activity Day 7 Activity … . . . . . Last 12 hours Day 0 First 12 hours Day 7 Last 12 hours Day 7 15th window 16th window
  • 21. 21 Static Windows and Feature Generation
  • 22. 22 Which Model to Choose ?
  • 24. Offline Results on Behaviour Features 24 Confusion Matrix Predicted Fake Predicted Clean Actual Fake 2953 852 Actual Clean 168 17799 Above results are obtained on 3805 fake profiles and 17967 clean profiles Drawback: The user has to be 8 days old on portal to be scrutinized through this approach
  • 25. LIVE Results : True Positives 25
  • 26. LIVE Results : False Negatives 26 Edit and Profile features needs to be incorporated !!
  • 27. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 27
  • 28. 28 Edit Summary for Mother Tongue Attribute Experimented on 100 fake and 100 genuine profiles which registered with Hindi-UP category
  • 29. 29 Edit Summary for Income Attribute Experimented on 100 fake and 100 genuine profiles which registered with Rs 5-7.5 Lakh category
  • 30. 30 Concept of Dynamic Windows User’s Active Lifetime on portal = T seconds User’s total initiates = N Time period of first N/W initiates If we select no of windows = W Time period of next N/W initiates Time period of last N/W initiates … . . . . . 0th window 1st window last window
  • 31. Feature Designing ◆ Profile Features : One hot vector of profile attributes ◆ Behavior Features : In dynamic time windows, each feature stores the proportion of initiates sent to a particular category of attribute ◆ Edit Features : In dynamic time windows, each feature stores the proportion of time user has spent on that particular category of attribute ◆ Other Raw Features : In each window, we also store the total interests sent and time duration of that window. 31
  • 32. 32 Feature Designing 0th window + + . . . . Nth window
  • 33. 33 Experimenting with number of dynamic windows No of Windows Precision Recall Accuracy Using 5 windows 0.170 0.510 0.8830 Using 4 window 0.192 0.635 0.8891 Using 3 windows 0.230 0.780 0.8977 Using 2 windows 0.242 0.804 0.8975 Using 1 window 0.266 0.866 0.8972
  • 34. 34 Feature Selection on Best Model Method Precision Recall Accuracy Best Model 0.266 0.866 0.8972 Best Model + Feature Selection 0.269 0.894 0.9083 Criteria Used = (Entropy for fake) - (Entropy for clean) (Entropy for fake) Precision is still low !!
  • 35. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 35
  • 36. 36 Affinity Features along with Behaviour Features ◆ An Affinity score between two categories i and j is the likelihood score of a person having category i to send interests to user having category j ◆ Affinity scores when incorporated with behaviour features compare between how a user is expected to behave and how he/she actually behaves on the platform
  • 38. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 38
  • 39. 39 Proposed Full length Feature Vector Profile Features Behaviour Features in Time windows Affinity Features Edit Features in Time windows + + +
  • 41. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 41
  • 42. 42 Final Results Method Precision Recall Accuracy Proposed Features + Autoencoder 0.341 0.902 0.9176 Product team demanded for 25% precision at 60% recall !!
  • 43. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 43
  • 44. Conclusion ◆ We first studied the distinction in behaviour, profile and edit pattern between genuine and fake users ◆ We incorporated these characteristics in the form of features using dynamic time windows. ◆ We then trained the autoencoder model to detect fake profiles on online matrimony. 44
  • 47. Limitations and Future Work ◆ More number of samples for training autoencoder can lead to more generalisation. ◆ We detected fake profiles using categorical attributes only. Text spamming can be explored. 47
  • 48. Acknowledgement ◆ Committee Members ◆ Hunny, Adhish from InfoEdge India Ltd. ◆ Members of Precog family ◆ Family and friends 48
  • 49. 49 References ◆ https://timesofindia.indiatimes.com/city/hyderabad/nigerian-held-for-matrim onial-fraud-in-hyderabad/articleshow/66939563.cms ◆ https://www.hindustantimes.com/mumbai-news/woman-creates-fake-profil e-on-matrimony-site-cheats-mumbai-man-of-rs23-lakh/story-KHLj4zPWI8U Gv31YM5A8tK.html ◆ https://timesofindia.indiatimes.com/city/mangaluru/online-matrimony-frauds -on-the-rise-in-mangaluru/articleshow/66102334.cms ◆ https://timesofindia.indiatimes.com/city/pune/matrimonial-fraud-on-the-rise- more-than-50-cases-registered-this-year/articleshow/60049950.cms ◆ https://dl.acm.org/citation.cfm?id=2689747 ◆ https://link.springer.com/book/10.1007%2F978-3-319-20466-6 ◆ https://dl.acm.org/citation.cfm?id=3106489