SlideShare una empresa de Scribd logo
1 de 25
Clustering Internet users based on their
       behavior towards banner ads




Despina Stamkou
stamkou@kth.se




                                           14 Feb 2011
Agenda

    Introduction

    Theoretical Background

    Method

    Results

    Analysis

    Conclusions

    Future Work
Introduction
       :: Background

Marketing is an exchange process of values between
              companies and customers
                                                           (Philip, Armstrong, Wong and Saunders, 2010)




                            Online Marketing


             [2   nd   position on Advertisement Investment      ]
                                               (Orbit Scripts, 2011)
Introduction
        :: Background
 Online Advertisements are promoted through Web Sites (Publishers)


                                       The goal is to motivate the internet
                                        users to click on the online
                                        advertisements


                                       Users with similar profiles click on similar
                                        online advertisements
                                         (Giuffrida et al. 2001)




                                       Users are more likely to click on
                                        personalised advertisements
                                        compared to non-personalised ads
                                        (automatic optimisation)
Introduction
            :: Background

 Automatic Optimisation Mechanism for personalised online advertisements

                   publisher                                             Company
                                                                         between
Web Site                                                               publishers and
                                                    AdNetwork             clients
                 Advertisement
                  Placement
                                                   Advertisement 1

                                                   Advertisement 2

                                                   Advertisement 3
                                                          …
                                                                          Client’s
                                                   Advertisement N
                                                                       Advertisements




                                                      automatic
                                                     optimisation
                                                     mechanism
Introduction
      :: Problem Statement

 Problem
AdNetworks need to develop an intelligent automatic optimisation logic
        To keep a competent position in the online marketing business area


 Goal
Evaluate well known grouping algorithms
         To use the best performing one for the automatic optimisation logic


 Purpose
To prove that the performance success of the dominant algorithm is data-
independent
Introduction
       :: Method & Material



 Literature Study
      Background Knowledge on clustering
      Identify algorithms with significant clustering performance


 Empirical Part
    Compare the identified algorithms
Introduction
      :: Significance


  Automatic optimisation can increase the revenues of an
   AdNetwork


  The thesis topic is part of the automatic optimisation project in
   Tradedoubler and will use data from the specific AdNetwork


  Each Adnetwork has different data but can benefit from the
   conclusions


  The conclusions will reinforce the data-independence of the
   dominant clustering algorithm
Introduction
      :: Limitations


 Only two clustering algorithms are examined


 The number of clusters are predefined


 Data set has a specific dimensionality and is not publicly available


 Data set represent an instance of the user’s behaviour for a
  specific period
Theoretical Background
                  :: Classification vs Clustering

             Data mining is the process of discovering knowledge from data sources (Bing Liu, 2006)




    Supervised Classification ( Classification)                      Unsupervised Classification ( Clustering)
    We know the class labels and the number of classes               We do not know the class labels and
                                                                     may not know the number of classes




                                       …                                                              …
    1.dark        2.light    3.dark             n. pink            1. ???      2. ???       3. ???               ?. ???
     blue         green      orange




   Groups users with the exact same characteristics               Groups users with similar characteristics
       Impossible to predict future actions                            Opportunity to predict future actions
Theoretical Background
              :: Selecting the clustering method

                                 Clustering




                                                                 Data object belong to
                 Non-Exclusive                Exclusive
                                                                 only one cluster

Data object belong to
one or more clusters


                             Partitional                  Hierarchical




                                       Agglomerative                       Divisive
Theoretical Background
      :: Related Research

 Most recent related studies were selected to be examined (2011)


 These studies aimed to compare the clustering performance between the best
  performing algorithms from past related studies


 K-means algorithm was used as a base line


 The algorithms were examined with a predefined number of clusters


 The performance measurement was applied through a fitness function
Theoretical Background
      :: Selecting the algorithms


 Particle Swarm Optimisation (PSO) & K-means


 K-means as a base line


 PSO because it outperformed the rest of the clustering algorithms


 Limited studies around PSO


 Interesting to evaluate PSO performance with the available data set from
  Tradedoubler and reinforce the data-independence
Method
         :: Data Selection


      Data set consists of real transactions within Tradedoubler’s AdNetwork

      254.046 rows

      Sampling by time period – 1 month

      information columns:

                  PROGRAM_ID      ID of the Campaign where the banner belongs
Advertisement
Campaign info     WEBSITE_ID      ID of Website from where the action was generated
                  BANNER_ID       ID of the banner with which the user interacted
                  EVENT_ID        ID of the event: Click or Sale
Internet user     USER_AGENT      Visitors’ web browser agent and Operating System
      info
                  TIMESTAMP       Time the transaction was made
Method
     :: Evaluation Criteria
Clustering evaluation is a complex and difficult problem (Liu, 2006)


Types of evaluation
     External
          With readable and meaningful data -without numbers


      Indirect
          With an external application which will test the results


      Internal
          With any distance comparison function
Method
      :: Fitness Function
The fitness function that will be used will provide the summary value of the
maximum distance of each cluster from a data object :



The smaller the value of the summary, the better the clustering algorithm performs
Method
    :: Alternative Fitness Function


 Summary value of average distance between the centroid and the data
  vectors


 Summary value of minimum distance between data objects that belong to
  different clusters


The selected for this study fitness function has been used from relative researches
for the same purpose and with the same algorithms, as the current study, and
therefore was preferred among the alternatives
Results
     :: Methods Tools and Time

   Programs developed in Perl and parameterized
    for the multidimensional data set


   Both algorithms ran for 10 different values of K;
    5, 10, 15, 20, 25, 30, 35, 40, 45 and 50


   The operating system Linux Ubuntu
    Hardware characteristics :
    RAM: 3GB, processor: Intel Core Duo at 2,26GHz.


   Execution time between the algorithms was approximately 1:4;
    K-mean ran in total for 1,5 hours and PSO for 7 hours
Results
     :: Performance Chart
Analysis
     :: Performance Comparison

     PSO >> K-means                       Why?

  Both algorithms calculate the next position of the clusters
   and continuously moving them within the search space until
   there is no change on their position but…

   …PSO evaluates each next position in the space by using
   an internal fitness method

   …This method keeps a memory of the previous fitness value
   of each cluster and compares it with the fitness of the new
   position

   …Then a decision is made if the new position should be kept
   or return the cluster to the previous one
Analysis
     :: Similarity Evaluation


 Through a basic external evaluation from a small sample of data
  vectors similarities were traced so as to prove the concept of having
  grouped homogeneous users within the same clusters


 Even though it was discussed that external will not be used as
  argument for the final conclusions, it can yet provide us with
  confidence of having properly developed the clustering algorithms
Analysis
           :: Limitations


 Fitness Function is the main evaluation method
     Combined with indirect evaluation would give more accurate conclusions


 Fitness was measured for a defined number of clusters
     Hypothetically PSO would continue performing well in a higher number of K.
       Yet this is not proved through the experiments


 The basic external evaluation should not be taken as a criterion for the performance
  of the algorithms; rather, to guarantee that the development of the algorithms is
  more likely correct
Conclusions




 The experiments reinforce the superiority of PSO in terms of performance
  despite the nature and the dimensionality of the data
     Important fact : the data belong to real life transactions



 Indication that the higher the value of clusters is, the better the resulting fitness for PSO
     This indicates additional process effort and memory use
       The best number of clusters can be defined based on processing time and fitness
Future Work

 Compare different hybrids of the PSO without predefined number of clusters

 Develop the personalised mechanism to propose relevant advertisements




                                         Subgroup 1
                                         Has seen                   Show Advertisement
                                         Advertisement A                    B

                                         Subgroup 3
    Inside a Cluster :                   Has seen                   Show Advertisement
                                         Advertisement A                   from
                                         and Advertisement B         neighbour cluster

                                         Subgroup 2
                                         Has seen                   Show Advertisement
                                         Advertisement B                    A




 Users’ actions will define the performance : indirect method of evaluation
Thank you!




                   Questions / Comments




References

Philip, K., Armstrong, G., Wong, V. and Saunders, J., 2010. Principles of Marketing, 5th edition. New Jersey: Pearson Education, p.7

Giuffrida, G., Reforgiato, D., Tribulato, G. and Zabra, C. , 2001. A Banner Recommendation System Based on Web Navigation History.
Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium, Paris

Liu, B., 2006. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Chicago:Springer, p.6

Más contenido relacionado

Destacado

Discourse Analysis article shared by Azhar Khan ..2
Discourse Analysis article shared by Azhar Khan ..2Discourse Analysis article shared by Azhar Khan ..2
Discourse Analysis article shared by Azhar Khan ..2Abdullah Saleem
 
Final Thesis_Text Apr23 2005 [Copy]
Final Thesis_Text Apr23 2005 [Copy]Final Thesis_Text Apr23 2005 [Copy]
Final Thesis_Text Apr23 2005 [Copy]Phan Sok
 
PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...
PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...
PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...jundumaug1
 
Marketing Thesis Report
Marketing Thesis ReportMarketing Thesis Report
Marketing Thesis ReportClassic Tech
 
Research Thesis Final Hiv Aids
Research Thesis Final Hiv AidsResearch Thesis Final Hiv Aids
Research Thesis Final Hiv AidsMukesh Mishra
 
Advertisement Thesis
Advertisement ThesisAdvertisement Thesis
Advertisement ThesisSalar Bijili
 
MSW with 4.9 years experience in Community Development program
MSW with 4.9 years experience in Community Development program MSW with 4.9 years experience in Community Development program
MSW with 4.9 years experience in Community Development program Dattatraya Naik
 
Level of Attainment of the Objectives of the Computerization Program in JMAME...
Level of Attainment of the Objectives of the Computerization Program in JMAME...Level of Attainment of the Objectives of the Computerization Program in JMAME...
Level of Attainment of the Objectives of the Computerization Program in JMAME...Jmames Lanao Virac Catanduanes
 
Tax Guidelines for E-Commerce Transactions in the Philippines
Tax Guidelines for E-Commerce Transactions in the PhilippinesTax Guidelines for E-Commerce Transactions in the Philippines
Tax Guidelines for E-Commerce Transactions in the PhilippinesJanette Toral
 
Marketing Strategy in Periods of Economic Crisis
Marketing Strategy in Periods of Economic CrisisMarketing Strategy in Periods of Economic Crisis
Marketing Strategy in Periods of Economic CrisisDiogo Seborro
 
Effects of advertising on children 17
Effects of advertising on children 17Effects of advertising on children 17
Effects of advertising on children 17priyaba
 
IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...
IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...
IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...Carl Marvin Yabut
 
Causes and effects of dropouts at primary level
Causes and effects of dropouts at primary levelCauses and effects of dropouts at primary level
Causes and effects of dropouts at primary levelGHS Kot Takht Bhai Mardan
 
Impact of Corporate Social Responsibility on consumers' preference for a bran...
Impact of Corporate Social Responsibility on consumers' preference for a bran...Impact of Corporate Social Responsibility on consumers' preference for a bran...
Impact of Corporate Social Responsibility on consumers' preference for a bran...Muzamil Quadir
 
Module 2.2 alternative learning system
Module 2.2 alternative learning systemModule 2.2 alternative learning system
Module 2.2 alternative learning systemNoel Tan
 

Destacado (20)

Organizations of als for printing
Organizations of als  for printingOrganizations of als  for printing
Organizations of als for printing
 
Discourse Analysis article shared by Azhar Khan ..2
Discourse Analysis article shared by Azhar Khan ..2Discourse Analysis article shared by Azhar Khan ..2
Discourse Analysis article shared by Azhar Khan ..2
 
HIV AIDS paper
HIV AIDS paperHIV AIDS paper
HIV AIDS paper
 
Hospital Profile
Hospital ProfileHospital Profile
Hospital Profile
 
Final Thesis_Text Apr23 2005 [Copy]
Final Thesis_Text Apr23 2005 [Copy]Final Thesis_Text Apr23 2005 [Copy]
Final Thesis_Text Apr23 2005 [Copy]
 
Thesis proposal
Thesis proposalThesis proposal
Thesis proposal
 
PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...
PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...
PLANNING, PROGRAMMING, BUDGETING AND SYSTEM ANALYSIS: Alternative Learning Sy...
 
Marketing Thesis Report
Marketing Thesis ReportMarketing Thesis Report
Marketing Thesis Report
 
Research Thesis Final Hiv Aids
Research Thesis Final Hiv AidsResearch Thesis Final Hiv Aids
Research Thesis Final Hiv Aids
 
Advertisement Thesis
Advertisement ThesisAdvertisement Thesis
Advertisement Thesis
 
MSW with 4.9 years experience in Community Development program
MSW with 4.9 years experience in Community Development program MSW with 4.9 years experience in Community Development program
MSW with 4.9 years experience in Community Development program
 
Level of Attainment of the Objectives of the Computerization Program in JMAME...
Level of Attainment of the Objectives of the Computerization Program in JMAME...Level of Attainment of the Objectives of the Computerization Program in JMAME...
Level of Attainment of the Objectives of the Computerization Program in JMAME...
 
Advance figure intelligence scale (almost done)
Advance figure intelligence scale (almost done)Advance figure intelligence scale (almost done)
Advance figure intelligence scale (almost done)
 
Tax Guidelines for E-Commerce Transactions in the Philippines
Tax Guidelines for E-Commerce Transactions in the PhilippinesTax Guidelines for E-Commerce Transactions in the Philippines
Tax Guidelines for E-Commerce Transactions in the Philippines
 
Marketing Strategy in Periods of Economic Crisis
Marketing Strategy in Periods of Economic CrisisMarketing Strategy in Periods of Economic Crisis
Marketing Strategy in Periods of Economic Crisis
 
Effects of advertising on children 17
Effects of advertising on children 17Effects of advertising on children 17
Effects of advertising on children 17
 
IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...
IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...
IMPACT OF NESTLE PHILIPPINE'S CORPORATE SOCIAL RESPONSIBILITY PROGRAMS ON CON...
 
Causes and effects of dropouts at primary level
Causes and effects of dropouts at primary levelCauses and effects of dropouts at primary level
Causes and effects of dropouts at primary level
 
Impact of Corporate Social Responsibility on consumers' preference for a bran...
Impact of Corporate Social Responsibility on consumers' preference for a bran...Impact of Corporate Social Responsibility on consumers' preference for a bran...
Impact of Corporate Social Responsibility on consumers' preference for a bran...
 
Module 2.2 alternative learning system
Module 2.2 alternative learning systemModule 2.2 alternative learning system
Module 2.2 alternative learning system
 

Similar a Thesis Presentation

HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتMohammed El Rafie Tarabay
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
 
Simply Data driven behavioural algorithms
Simply Data driven behavioural algorithmsSimply Data driven behavioural algorithms
Simply Data driven behavioural algorithmsNana Bianca
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewCamella Taylor
 
A Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningA Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningIRJET Journal
 
AI in Entertainment – Movie Recommendation System
AI in Entertainment – Movie Recommendation SystemAI in Entertainment – Movie Recommendation System
AI in Entertainment – Movie Recommendation SystemIRJET Journal
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...IRJET Journal
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media DataIRJET Journal
 
12/18 regular meeting paper
12/18 regular meeting paper12/18 regular meeting paper
12/18 regular meeting papermarxliouville
 
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET -  	  Recommendation System using Big Data Mining on Social NetworksIRJET -  	  Recommendation System using Big Data Mining on Social Networks
IRJET - Recommendation System using Big Data Mining on Social NetworksIRJET Journal
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online ReviewIRJET Journal
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxLokeshKumarReddy8
 
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET Journal
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxNitishChoudhary23
 
Click Fraud Detection Of Advertisements using Machine Learning
Click Fraud Detection Of Advertisements using Machine LearningClick Fraud Detection Of Advertisements using Machine Learning
Click Fraud Detection Of Advertisements using Machine LearningIRJET Journal
 
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET Journal
 

Similar a Thesis Presentation (20)

HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
H040101063069
H040101063069H040101063069
H040101063069
 
Simply Data driven behavioural algorithms
Simply Data driven behavioural algorithmsSimply Data driven behavioural algorithms
Simply Data driven behavioural algorithms
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
 
A Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningA Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine Learning
 
AI in Entertainment – Movie Recommendation System
AI in Entertainment – Movie Recommendation SystemAI in Entertainment – Movie Recommendation System
AI in Entertainment – Movie Recommendation System
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
Deep Learning Recommender Systems
Deep Learning Recommender SystemsDeep Learning Recommender Systems
Deep Learning Recommender Systems
 
12/18 regular meeting paper
12/18 regular meeting paper12/18 regular meeting paper
12/18 regular meeting paper
 
IRJET - Recommendation System using Big Data Mining on Social Networks
IRJET -  	  Recommendation System using Big Data Mining on Social NetworksIRJET -  	  Recommendation System using Big Data Mining on Social Networks
IRJET - Recommendation System using Big Data Mining on Social Networks
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online Review
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
 
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Click Fraud Detection Of Advertisements using Machine Learning
Click Fraud Detection Of Advertisements using Machine LearningClick Fraud Detection Of Advertisements using Machine Learning
Click Fraud Detection Of Advertisements using Machine Learning
 
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
IRJET- Scalable Content Aware Collaborative Filtering for Location Recommenda...
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Thesis Presentation

  • 1. Clustering Internet users based on their behavior towards banner ads Despina Stamkou stamkou@kth.se 14 Feb 2011
  • 2. Agenda  Introduction  Theoretical Background  Method  Results  Analysis  Conclusions  Future Work
  • 3. Introduction :: Background Marketing is an exchange process of values between companies and customers (Philip, Armstrong, Wong and Saunders, 2010) Online Marketing [2 nd position on Advertisement Investment ] (Orbit Scripts, 2011)
  • 4. Introduction :: Background  Online Advertisements are promoted through Web Sites (Publishers)  The goal is to motivate the internet users to click on the online advertisements  Users with similar profiles click on similar online advertisements (Giuffrida et al. 2001)  Users are more likely to click on personalised advertisements compared to non-personalised ads (automatic optimisation)
  • 5. Introduction :: Background  Automatic Optimisation Mechanism for personalised online advertisements publisher Company between Web Site publishers and AdNetwork clients Advertisement Placement Advertisement 1 Advertisement 2 Advertisement 3 … Client’s Advertisement N Advertisements automatic optimisation mechanism
  • 6. Introduction :: Problem Statement  Problem AdNetworks need to develop an intelligent automatic optimisation logic To keep a competent position in the online marketing business area  Goal Evaluate well known grouping algorithms To use the best performing one for the automatic optimisation logic  Purpose To prove that the performance success of the dominant algorithm is data- independent
  • 7. Introduction :: Method & Material  Literature Study  Background Knowledge on clustering  Identify algorithms with significant clustering performance  Empirical Part  Compare the identified algorithms
  • 8. Introduction :: Significance  Automatic optimisation can increase the revenues of an AdNetwork  The thesis topic is part of the automatic optimisation project in Tradedoubler and will use data from the specific AdNetwork  Each Adnetwork has different data but can benefit from the conclusions  The conclusions will reinforce the data-independence of the dominant clustering algorithm
  • 9. Introduction :: Limitations  Only two clustering algorithms are examined  The number of clusters are predefined  Data set has a specific dimensionality and is not publicly available  Data set represent an instance of the user’s behaviour for a specific period
  • 10. Theoretical Background :: Classification vs Clustering Data mining is the process of discovering knowledge from data sources (Bing Liu, 2006) Supervised Classification ( Classification) Unsupervised Classification ( Clustering) We know the class labels and the number of classes We do not know the class labels and may not know the number of classes … … 1.dark 2.light 3.dark n. pink 1. ??? 2. ??? 3. ??? ?. ??? blue green orange  Groups users with the exact same characteristics  Groups users with similar characteristics  Impossible to predict future actions  Opportunity to predict future actions
  • 11. Theoretical Background :: Selecting the clustering method Clustering Data object belong to Non-Exclusive Exclusive only one cluster Data object belong to one or more clusters Partitional Hierarchical Agglomerative Divisive
  • 12. Theoretical Background :: Related Research  Most recent related studies were selected to be examined (2011)  These studies aimed to compare the clustering performance between the best performing algorithms from past related studies  K-means algorithm was used as a base line  The algorithms were examined with a predefined number of clusters  The performance measurement was applied through a fitness function
  • 13. Theoretical Background :: Selecting the algorithms  Particle Swarm Optimisation (PSO) & K-means  K-means as a base line  PSO because it outperformed the rest of the clustering algorithms  Limited studies around PSO  Interesting to evaluate PSO performance with the available data set from Tradedoubler and reinforce the data-independence
  • 14. Method :: Data Selection  Data set consists of real transactions within Tradedoubler’s AdNetwork  254.046 rows  Sampling by time period – 1 month  information columns: PROGRAM_ID ID of the Campaign where the banner belongs Advertisement Campaign info WEBSITE_ID ID of Website from where the action was generated BANNER_ID ID of the banner with which the user interacted EVENT_ID ID of the event: Click or Sale Internet user USER_AGENT Visitors’ web browser agent and Operating System info TIMESTAMP Time the transaction was made
  • 15. Method :: Evaluation Criteria Clustering evaluation is a complex and difficult problem (Liu, 2006) Types of evaluation  External  With readable and meaningful data -without numbers  Indirect  With an external application which will test the results  Internal  With any distance comparison function
  • 16. Method :: Fitness Function The fitness function that will be used will provide the summary value of the maximum distance of each cluster from a data object : The smaller the value of the summary, the better the clustering algorithm performs
  • 17. Method :: Alternative Fitness Function  Summary value of average distance between the centroid and the data vectors  Summary value of minimum distance between data objects that belong to different clusters The selected for this study fitness function has been used from relative researches for the same purpose and with the same algorithms, as the current study, and therefore was preferred among the alternatives
  • 18. Results :: Methods Tools and Time  Programs developed in Perl and parameterized for the multidimensional data set  Both algorithms ran for 10 different values of K; 5, 10, 15, 20, 25, 30, 35, 40, 45 and 50  The operating system Linux Ubuntu Hardware characteristics : RAM: 3GB, processor: Intel Core Duo at 2,26GHz.  Execution time between the algorithms was approximately 1:4; K-mean ran in total for 1,5 hours and PSO for 7 hours
  • 19. Results :: Performance Chart
  • 20. Analysis :: Performance Comparison PSO >> K-means Why?  Both algorithms calculate the next position of the clusters and continuously moving them within the search space until there is no change on their position but… …PSO evaluates each next position in the space by using an internal fitness method …This method keeps a memory of the previous fitness value of each cluster and compares it with the fitness of the new position …Then a decision is made if the new position should be kept or return the cluster to the previous one
  • 21. Analysis :: Similarity Evaluation  Through a basic external evaluation from a small sample of data vectors similarities were traced so as to prove the concept of having grouped homogeneous users within the same clusters  Even though it was discussed that external will not be used as argument for the final conclusions, it can yet provide us with confidence of having properly developed the clustering algorithms
  • 22. Analysis :: Limitations  Fitness Function is the main evaluation method  Combined with indirect evaluation would give more accurate conclusions  Fitness was measured for a defined number of clusters  Hypothetically PSO would continue performing well in a higher number of K. Yet this is not proved through the experiments  The basic external evaluation should not be taken as a criterion for the performance of the algorithms; rather, to guarantee that the development of the algorithms is more likely correct
  • 23. Conclusions  The experiments reinforce the superiority of PSO in terms of performance despite the nature and the dimensionality of the data  Important fact : the data belong to real life transactions  Indication that the higher the value of clusters is, the better the resulting fitness for PSO  This indicates additional process effort and memory use The best number of clusters can be defined based on processing time and fitness
  • 24. Future Work  Compare different hybrids of the PSO without predefined number of clusters  Develop the personalised mechanism to propose relevant advertisements Subgroup 1 Has seen Show Advertisement Advertisement A B Subgroup 3 Inside a Cluster : Has seen Show Advertisement Advertisement A from and Advertisement B neighbour cluster Subgroup 2 Has seen Show Advertisement Advertisement B A  Users’ actions will define the performance : indirect method of evaluation
  • 25. Thank you! Questions / Comments References Philip, K., Armstrong, G., Wong, V. and Saunders, J., 2010. Principles of Marketing, 5th edition. New Jersey: Pearson Education, p.7 Giuffrida, G., Reforgiato, D., Tribulato, G. and Zabra, C. , 2001. A Banner Recommendation System Based on Web Navigation History. Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium, Paris Liu, B., 2006. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Chicago:Springer, p.6