The classification of time series data is a challenge common to all data-driven fields. However, there is no agreement about which are the most efficient techniques to group unlabeled time-ordered data. This is because a successful classification of time series patterns depends on the goal and the domain of interest, i.e. it is application-dependent.
In this article, we study free-to-play game data. In this domain, clustering similar time series information is increasingly important due to the large amount of data collected by current mobile and web applications. We evaluate which methods cluster accurately time series of mobile games, focusing on player behavior data. We identify and validate several aspects of the clustering: the similarity measures and the representation techniques to reduce the high dimensionality of time series. As a robustness test, we compare various temporal datasets of player activity from two free-to-play video-games.
With these techniques we extract temporal patterns of player behavior relevant for the evaluation of game events and game-business diagnosis. Our experiments provide intuitive visualizations to validate the results of the clustering and to determine the optimal number of clusters. Additionally, we assess the common characteristics of the players belonging to the same group. This study allows us to improve the understanding of player dynamics and churn behavior.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data [IEEE CIG 2016]
1. Discovering Playing Patterns:
Time Series Clustering of Free-To-Play Game Data
Alain Saas, Anna Guitart and ´Africa Peri´a˜nez (Silicon Studio)
IEEE CIG 2016 Santorini
21 September, 2016
2. About us
• Who are we?
◦ Game studio and graphics
middleware company based in Tokyo
◦ Research project to provide Game
Data Science as a service
◦ Goals: predict player behavior, scale
to big data and intuitive result
visualization
• Which data?
◦ RPG free-to-play games
◦ TS of two games
◦ TS of in-app purchases and activity
behavioral data
2 of 17
3. Challenge
Unsupervised clustering of Time Series of player activity
• Why?
◦ discover temporary player patterns
◦ evaluation of game events and business diagnosis
◦ assess common characteristics of players belonging to the same cluster
• How?
1. representation techniques: reducing the high dimensionality of TS
2. similarity measures for free-to-play game data
3. hierarquical clustering
4. visual validation of the results
3 of 17
5. Similarity measures
Dynamic Time Warping
DTW (X, Y ) = min
r∈M
M
m=1
|xim − yjm|
Correlation-based measure
COR(X, Y ) =
N
n=1(xn − ¯X)(yn − ¯Y )
N
n=1(xn − ¯X)2 N
n=1(yn − ¯Y )2
Temporal Correlation and Raw Values
Behaviors measure
CORT(X, Y ) =
N−1
n=1 (xn+1 − xn)(yn+1 − yn)
N−1
n=1 (xn+1 − xn)2 N−1
n=1 (yn+1 − yn)2
Complexity-Invariant Distance
measure
CID(X, Y ) = dist(X, Y ) · CF(X, Y ),
CF complexity correction factor
CF(X, Y ) =
max(CE(X), CE(Y ))
min(CE(X), CE(Y ))
CE is the complexity estimation
CE(X) =
N−1
n=1
(xn − xn+1)2
5 of 17
6. Similarity measure comparison
Euclidean vs. Correlation Correlation vs. Complexity-Invariant Distance
Dynamic Time Warping vs.Correlation Correlation vs. Discrete Wavelet Transform
6 of 17
7. Comparison clustering methods
• DTW Dynamic Time Warping
◦ similar player profiles with a
shift on the time axis
◦ different patterns but at
different scale
• DWT Discrete Wavelet Transform
◦ dimensionality reduction
◦ frequency of the series
• SAX Symbolic Aggregate
Approximation
◦ parameters w,a
• COR Correlation
◦ similar geometric and
synchronous profiles
◦ sensitive to noise data and
outliers
• CORT Temporal Correlation
◦ similar to COR but with time
consideration?
• CID Complexity-Invariant distance
◦ similar complexity patterns
◦ good for sparse time series
• COR+trend Correlation and trend extraction
◦ addresses COR’s sensitivity to noise
◦ does not work well with sparse time series
7 of 17
8. Hierarchical clustering
Agglomerative Ward method:
Lead to a minimum increase of total within-cluster variance
Single Linkage
Complete Linkage
Average Linkage
Centroid Method
Ward Method
8 of 17
9. Our data
Time series measured per user per day.
Game Activity
Behavioral data
Time: The amount of time spent in the game
Sessions: The total number of playing sessions
Actions: The total number of actions performed
In-app Sales Purchase: The total amount of in-app purchases
9 of 17
10. Data selection, constraints
Time Series: Multi-dimensional data
⇒ selection of period P
• in our data weekly game events
• period P of length 21 days
• played time → active users
min connections 6/7 days a week
• purchases → paying users
at least one purchase in period P
• players alive during period P
10 of 17
11. Datasets and tests
Game Data Technique Clusters Date range
Age of Ishtaria Daily played time COR-trend 8 Oct2014 - Jan2016
Age of Ishtaria Daily purchase CID 5 Oct2014 - Jan2016
Grand Sphere Daily played time COR-trend 8 Jun2015 - Mar2016
11 of 17
12. Clustering time series of time played
1. representation method: trend extraction
2. similarity measure: correlation
3. hierarchical clustering: Ward method
4. validation of results: visualization with
heatmap (raw data)
12 of 17
14. Clustering time series of time played
Also able to extract differentiate patterns as in Age of Ishtaria
14 of 17
15. Clustering time series of purchases
1. similarity measure:
complexity-invariant distance
2. hierarchical clustering: Ward method
3. validation of results: visualization with
heatmap (raw data)
15 of 17
16. Summary and Next Steps
• Unsupervised clustering time series data from two free-to-play
games
• Evaluate several similarity measures and representation methods
• Extract meaningful behavioral patterns of players
• Assess impact of weekly game events
• Discover hidden playing dynamics regarding purchases and time
played
• Feature for churn prediction
• Event recommender
• Cluster level behaviour
16 of 17