SlideShare una empresa de Scribd logo
DETECTING AUTOMATICALLY MANAGED
ACCOUNTS IN ONLINE SOCIAL NETWORKS:
GRAPH EMBEDDING APPROACH
Ilia Karpov (karpovilia@gmail.com)
Ekaterina Glazkova (catherine.glazkova@gmail.com)
Moscow, 2020
BOT ACCOUNT EXAMPLES
Catch Me if You Can
Detecting Automatically Managed Accounts in OSN
DATA COLLECTION
Defining Bot account
Detecting Automatically Managed Accounts in OSN
1.Manual annotation
2.Suspended users lists
3.Honeypots
Existing approaches*
* F. Morstatter et all: A New Approach to Bot Detection: Striking the Balance between Precision and Recall (2016)
A bot is an account created and used to generate profit for the owner by violating the rules of a social network by automatic methods
1.Account exchanges monitoring
2.Suspended users lists
3.Induction based search**
Proposed approach
CLASSIFICATION PROBLEM
Profile features
Detecting Automatically Managed Accounts in OSN
CLASSIFICATION PROBLEM
Profile model
Detecting Automatically Managed Accounts in OSN
• country_id
• personal_people_main
• city_title
• sex
• personal_langs
• counters_gifts
• mobile_phone
• counters_pages
• personal_alcohol
• is_closed
• last_seen_platform
• home_phone
• relation_partner_first_name
• relation
• counters_followers
• domain
• occupation_id
• counters_subscriptions
• personal_smoking
• movies
• occupation_name
• counters_photos
• counters_videos
• city_id
• bdate
• university
• counters_audios
• last_seen_time
• faculty
• counters_user_photos
• counters_groups
• has_photo
Selected static features Selected network features
• friend_id
EMBEDDING GENERATION
Node2Vec
Detecting Automatically Managed Accounts in OSN
* A. Grover: node2vec: Scalable Feature Learning for Networks (2016)
EMBEDDING GENERATION
Attri2Vec
Detecting Automatically Managed Accounts in OSN
* Zhang et al: Attributed network embedding via subspace discovery (2019)
Detecting Automatically Managed Accounts in OSN
p = 0.25 p = 0.5 p = 1 p = 2 p = 4
q = 0.25 0.727 0.823 0.751 0.753 0.793
q = 0.5 0.750 0.795 0.796 0.806 0.754
q = 1 0.771 0.804 0.765 0.788 0.772
q = 2 0.747 0.742 0.808 0.764 0.779
q = 4 0.776 0.724 0.745 0.709 0.793
p = 0.25 p = 0.5 p = 1 p = 2 p = 4
q = 0.25 0.856 0.814 0.804 0.823 0.780
q = 0.5 0.787 0.768 0.813 0.799 0.822
q = 1 0.863 0.812 0.847 0.829 0.808
q = 2 0.821 0.931 0.776 0.793 0.848
CLASSIFICATION PROBLEM
LogReg Classification ROC AUC based on N2V embedding
Sophisticated accounts
Technical accounts
Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM
Classification ROC AUC
Technical accounts Sophisticated accounts
Attri2Vec 0.988 0.684
Node2Vec 0.93 0.87
Static 0.85 0.81
N2V + SF 0.934 0.91
• Support Vector Classifier (SVC)
• Random Forest (RF)
• Logistic Regression (LogReg)
Classifiers evaluation
Model results
Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM
Comparison with existing approaches
Technical accounts Sophisticated accounts
AUC ROC 0.988 0.867
Zegzhda et.al. --- 0.73
Skorniakov et.al. --- 0.820
• Two bot detection datasets with anonymised data *
• More than 80 network embedding trainings with different parameters.
• Classifiers on embeddings obtained with network embedding.
• Classifiers based on static features.
• Classifiers on the concatenation of static features and embeddings.
Contributions
* https://github.com/karpovilia/botdetection
Detecting Automatically Managed Accounts in OSN
FUTURE RESEARCH
• use of text embedding - a significant part of artificial accounts performs the
function of promoting certain goods or disseminating information, which can be
used for classification;
• significant number of accounts hide their friends, but leave open groups that can
be used to model a user as a bipartite graph node;
• network modeling as a temporal network is of interest, taking into account such
characteristics as the joint appearance of accounts on the network
Questions?
Ilia Karpov (karpovilia@gmail.com)
Ekaterina Glazkova (catherine.glazkova@gmail.com)

Más contenido relacionado

Similar a Detecting Automatically Managed Accounts in Online Social Networks: Graph Embedding Approach

KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0
WSO2
 

Similar a Detecting Automatically Managed Accounts in Online Social Networks: Graph Embedding Approach (20)

Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
3452 - Managing your applications
3452 - Managing your applications3452 - Managing your applications
3452 - Managing your applications
 
How to fully automate a store.pptx
How to fully automate a store.pptxHow to fully automate a store.pptx
How to fully automate a store.pptx
 
Driving Insights in the Digital Enterprise
Driving Insights in the Digital EnterpriseDriving Insights in the Digital Enterprise
Driving Insights in the Digital Enterprise
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?
 
Monitoring modern applications: Introduction to AWS xray
Monitoring modern applications: Introduction to AWS xrayMonitoring modern applications: Introduction to AWS xray
Monitoring modern applications: Introduction to AWS xray
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
PST Labs presentation general
PST Labs presentation generalPST Labs presentation general
PST Labs presentation general
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
From measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 PlatformFrom measurement to knowledge with sofia2 Platform
From measurement to knowledge with sofia2 Platform
 
Metrics that every startup should know
Metrics that every startup should knowMetrics that every startup should know
Metrics that every startup should know
 
Whose Stack Is It Anyway?
Whose Stack Is It Anyway?Whose Stack Is It Anyway?
Whose Stack Is It Anyway?
 
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
 
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
 
KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0
 
(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby
 

Último

Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Último (20)

Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Detecting Automatically Managed Accounts in Online Social Networks: Graph Embedding Approach

  • 1. DETECTING AUTOMATICALLY MANAGED ACCOUNTS IN ONLINE SOCIAL NETWORKS: GRAPH EMBEDDING APPROACH Ilia Karpov (karpovilia@gmail.com) Ekaterina Glazkova (catherine.glazkova@gmail.com) Moscow, 2020
  • 2. BOT ACCOUNT EXAMPLES Catch Me if You Can Detecting Automatically Managed Accounts in OSN
  • 3. DATA COLLECTION Defining Bot account Detecting Automatically Managed Accounts in OSN 1.Manual annotation 2.Suspended users lists 3.Honeypots Existing approaches* * F. Morstatter et all: A New Approach to Bot Detection: Striking the Balance between Precision and Recall (2016) A bot is an account created and used to generate profit for the owner by violating the rules of a social network by automatic methods 1.Account exchanges monitoring 2.Suspended users lists 3.Induction based search** Proposed approach
  • 4. CLASSIFICATION PROBLEM Profile features Detecting Automatically Managed Accounts in OSN
  • 5. CLASSIFICATION PROBLEM Profile model Detecting Automatically Managed Accounts in OSN • country_id • personal_people_main • city_title • sex • personal_langs • counters_gifts • mobile_phone • counters_pages • personal_alcohol • is_closed • last_seen_platform • home_phone • relation_partner_first_name • relation • counters_followers • domain • occupation_id • counters_subscriptions • personal_smoking • movies • occupation_name • counters_photos • counters_videos • city_id • bdate • university • counters_audios • last_seen_time • faculty • counters_user_photos • counters_groups • has_photo Selected static features Selected network features • friend_id
  • 6. EMBEDDING GENERATION Node2Vec Detecting Automatically Managed Accounts in OSN * A. Grover: node2vec: Scalable Feature Learning for Networks (2016)
  • 7. EMBEDDING GENERATION Attri2Vec Detecting Automatically Managed Accounts in OSN * Zhang et al: Attributed network embedding via subspace discovery (2019)
  • 8. Detecting Automatically Managed Accounts in OSN p = 0.25 p = 0.5 p = 1 p = 2 p = 4 q = 0.25 0.727 0.823 0.751 0.753 0.793 q = 0.5 0.750 0.795 0.796 0.806 0.754 q = 1 0.771 0.804 0.765 0.788 0.772 q = 2 0.747 0.742 0.808 0.764 0.779 q = 4 0.776 0.724 0.745 0.709 0.793 p = 0.25 p = 0.5 p = 1 p = 2 p = 4 q = 0.25 0.856 0.814 0.804 0.823 0.780 q = 0.5 0.787 0.768 0.813 0.799 0.822 q = 1 0.863 0.812 0.847 0.829 0.808 q = 2 0.821 0.931 0.776 0.793 0.848 CLASSIFICATION PROBLEM LogReg Classification ROC AUC based on N2V embedding Sophisticated accounts Technical accounts
  • 9. Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM Classification ROC AUC Technical accounts Sophisticated accounts Attri2Vec 0.988 0.684 Node2Vec 0.93 0.87 Static 0.85 0.81 N2V + SF 0.934 0.91 • Support Vector Classifier (SVC) • Random Forest (RF) • Logistic Regression (LogReg) Classifiers evaluation Model results
  • 10. Detecting Automatically Managed Accounts in OSNCLASSIFICATION PROBLEM Comparison with existing approaches Technical accounts Sophisticated accounts AUC ROC 0.988 0.867 Zegzhda et.al. --- 0.73 Skorniakov et.al. --- 0.820 • Two bot detection datasets with anonymised data * • More than 80 network embedding trainings with different parameters. • Classifiers on embeddings obtained with network embedding. • Classifiers based on static features. • Classifiers on the concatenation of static features and embeddings. Contributions * https://github.com/karpovilia/botdetection
  • 11. Detecting Automatically Managed Accounts in OSN FUTURE RESEARCH • use of text embedding - a significant part of artificial accounts performs the function of promoting certain goods or disseminating information, which can be used for classification; • significant number of accounts hide their friends, but leave open groups that can be used to model a user as a bipartite graph node; • network modeling as a temporal network is of interest, taking into account such characteristics as the joint appearance of accounts on the network
  • 12. Questions? Ilia Karpov (karpovilia@gmail.com) Ekaterina Glazkova (catherine.glazkova@gmail.com)

Notas del editor

  1. artificial accounts distort the popularity of groups, spreadfake news, are used for fraud activities
  2. We are going to analyse both network and static features