3. Introduction
:: Background
Marketing is an exchange process of values between
companies and customers
(Philip, Armstrong, Wong and Saunders, 2010)
Online Marketing
[2 nd position on Advertisement Investment ]
(Orbit Scripts, 2011)
4. Introduction
:: Background
Online Advertisements are promoted through Web Sites (Publishers)
The goal is to motivate the internet
users to click on the online
advertisements
Users with similar profiles click on similar
online advertisements
(Giuffrida et al. 2001)
Users are more likely to click on
personalised advertisements
compared to non-personalised ads
(automatic optimisation)
5. Introduction
:: Background
Automatic Optimisation Mechanism for personalised online advertisements
publisher Company
between
Web Site publishers and
AdNetwork clients
Advertisement
Placement
Advertisement 1
Advertisement 2
Advertisement 3
…
Client’s
Advertisement N
Advertisements
automatic
optimisation
mechanism
6. Introduction
:: Problem Statement
Problem
AdNetworks need to develop an intelligent automatic optimisation logic
To keep a competent position in the online marketing business area
Goal
Evaluate well known grouping algorithms
To use the best performing one for the automatic optimisation logic
Purpose
To prove that the performance success of the dominant algorithm is data-
independent
7. Introduction
:: Method & Material
Literature Study
Background Knowledge on clustering
Identify algorithms with significant clustering performance
Empirical Part
Compare the identified algorithms
8. Introduction
:: Significance
Automatic optimisation can increase the revenues of an
AdNetwork
The thesis topic is part of the automatic optimisation project in
Tradedoubler and will use data from the specific AdNetwork
Each Adnetwork has different data but can benefit from the
conclusions
The conclusions will reinforce the data-independence of the
dominant clustering algorithm
9. Introduction
:: Limitations
Only two clustering algorithms are examined
The number of clusters are predefined
Data set has a specific dimensionality and is not publicly available
Data set represent an instance of the user’s behaviour for a
specific period
10. Theoretical Background
:: Classification vs Clustering
Data mining is the process of discovering knowledge from data sources (Bing Liu, 2006)
Supervised Classification ( Classification) Unsupervised Classification ( Clustering)
We know the class labels and the number of classes We do not know the class labels and
may not know the number of classes
… …
1.dark 2.light 3.dark n. pink 1. ??? 2. ??? 3. ??? ?. ???
blue green orange
Groups users with the exact same characteristics Groups users with similar characteristics
Impossible to predict future actions Opportunity to predict future actions
11. Theoretical Background
:: Selecting the clustering method
Clustering
Data object belong to
Non-Exclusive Exclusive
only one cluster
Data object belong to
one or more clusters
Partitional Hierarchical
Agglomerative Divisive
12. Theoretical Background
:: Related Research
Most recent related studies were selected to be examined (2011)
These studies aimed to compare the clustering performance between the best
performing algorithms from past related studies
K-means algorithm was used as a base line
The algorithms were examined with a predefined number of clusters
The performance measurement was applied through a fitness function
13. Theoretical Background
:: Selecting the algorithms
Particle Swarm Optimisation (PSO) & K-means
K-means as a base line
PSO because it outperformed the rest of the clustering algorithms
Limited studies around PSO
Interesting to evaluate PSO performance with the available data set from
Tradedoubler and reinforce the data-independence
14. Method
:: Data Selection
Data set consists of real transactions within Tradedoubler’s AdNetwork
254.046 rows
Sampling by time period – 1 month
information columns:
PROGRAM_ID ID of the Campaign where the banner belongs
Advertisement
Campaign info WEBSITE_ID ID of Website from where the action was generated
BANNER_ID ID of the banner with which the user interacted
EVENT_ID ID of the event: Click or Sale
Internet user USER_AGENT Visitors’ web browser agent and Operating System
info
TIMESTAMP Time the transaction was made
15. Method
:: Evaluation Criteria
Clustering evaluation is a complex and difficult problem (Liu, 2006)
Types of evaluation
External
With readable and meaningful data -without numbers
Indirect
With an external application which will test the results
Internal
With any distance comparison function
16. Method
:: Fitness Function
The fitness function that will be used will provide the summary value of the
maximum distance of each cluster from a data object :
The smaller the value of the summary, the better the clustering algorithm performs
17. Method
:: Alternative Fitness Function
Summary value of average distance between the centroid and the data
vectors
Summary value of minimum distance between data objects that belong to
different clusters
The selected for this study fitness function has been used from relative researches
for the same purpose and with the same algorithms, as the current study, and
therefore was preferred among the alternatives
18. Results
:: Methods Tools and Time
Programs developed in Perl and parameterized
for the multidimensional data set
Both algorithms ran for 10 different values of K;
5, 10, 15, 20, 25, 30, 35, 40, 45 and 50
The operating system Linux Ubuntu
Hardware characteristics :
RAM: 3GB, processor: Intel Core Duo at 2,26GHz.
Execution time between the algorithms was approximately 1:4;
K-mean ran in total for 1,5 hours and PSO for 7 hours
20. Analysis
:: Performance Comparison
PSO >> K-means Why?
Both algorithms calculate the next position of the clusters
and continuously moving them within the search space until
there is no change on their position but…
…PSO evaluates each next position in the space by using
an internal fitness method
…This method keeps a memory of the previous fitness value
of each cluster and compares it with the fitness of the new
position
…Then a decision is made if the new position should be kept
or return the cluster to the previous one
21. Analysis
:: Similarity Evaluation
Through a basic external evaluation from a small sample of data
vectors similarities were traced so as to prove the concept of having
grouped homogeneous users within the same clusters
Even though it was discussed that external will not be used as
argument for the final conclusions, it can yet provide us with
confidence of having properly developed the clustering algorithms
22. Analysis
:: Limitations
Fitness Function is the main evaluation method
Combined with indirect evaluation would give more accurate conclusions
Fitness was measured for a defined number of clusters
Hypothetically PSO would continue performing well in a higher number of K.
Yet this is not proved through the experiments
The basic external evaluation should not be taken as a criterion for the performance
of the algorithms; rather, to guarantee that the development of the algorithms is
more likely correct
23. Conclusions
The experiments reinforce the superiority of PSO in terms of performance
despite the nature and the dimensionality of the data
Important fact : the data belong to real life transactions
Indication that the higher the value of clusters is, the better the resulting fitness for PSO
This indicates additional process effort and memory use
The best number of clusters can be defined based on processing time and fitness
24. Future Work
Compare different hybrids of the PSO without predefined number of clusters
Develop the personalised mechanism to propose relevant advertisements
Subgroup 1
Has seen Show Advertisement
Advertisement A B
Subgroup 3
Inside a Cluster : Has seen Show Advertisement
Advertisement A from
and Advertisement B neighbour cluster
Subgroup 2
Has seen Show Advertisement
Advertisement B A
Users’ actions will define the performance : indirect method of evaluation
25. Thank you!
Questions / Comments
References
Philip, K., Armstrong, G., Wong, V. and Saunders, J., 2010. Principles of Marketing, 5th edition. New Jersey: Pearson Education, p.7
Giuffrida, G., Reforgiato, D., Tribulato, G. and Zabra, C. , 2001. A Banner Recommendation System Based on Web Navigation History.
Computational Intelligence and Data Mining (CIDM), 2011 IEEE Symposium, Paris
Liu, B., 2006. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Chicago:Springer, p.6