In the last years context awareness has become a reality in real-world applications. However, building comprehensive context recognition systems which are able to recognize both low and high-level context information remains a challenge. In this paper, we discuss environment recognition as a means to address the issue of recognizing a high-level user context, social activity. In many countries, bars, pubs and similar establishments are one of the main places where social engagement takes place, and thus we propose recognizing these types of environments using data collected from mobile device sensors as a proxy for inferring social activity. For this purpose, we discuss the common defining characteristics of these establishments and the sensors we will use to recognize them. After that, we introduce the design of our system. Finally, we present the preliminary evaluation carried out to assess the validity of our proposal.
Facing up social activity recognition using smartphone sensors
1. 1/35
UCAmI
2015
DeustoTech-Deusto Institute of Technology, University of Deusto
http://www.morelab.deusto.es
December 2, 2015
Facing up social activity recognition using smartphone sensors
Pablo Curiel,Ivan Pretel, AnaB. Lago
24. 24/35
UCAmI
2015
Conclusion
► Findings
► The preliminary results obtained seem promising regarding the recognition of new
locations for the same user.
► However, generalization to new users seems to be more troublesome.
► Future work
► New data collection campaign which involves more users in order to better study
these aspects
► Study what is the most descriptive value for eachfeature (mean, median,
standard deviation, minimum and maximum)
► Searchfor better recognition results with separate classes for each type of bar-like
environment, as this could potentially enable to better capture the particular
characteristics each of these environments has.
26. 26/35
UCAmI
2015
DeustoTech-Deusto Institute of Technology, University of Deusto
http://www.morelab.deusto.es
Facing up social activity recognition using smartphone sensors
Pablo Curiel,IvanPretel, AnaB. Lago
{pcuriel@deusto.es} {ivan.pretel@deusto.es} {anabelen.lago@deusto.es}
27. 27/35
UCAmI
2015
All rights of images are reservedby the original owners*,the rest of the content
is licensed under a Creative Commons by-sa 3.0 license.
*
• http://mami.uclm.es/ucami-iwaal-amihealth-2015
• https://flic.kr/p/enRrs9
• https://www.iconfinder.com/yudha_ap
• https://www.iconfinder.com/iconsets/stash
• https://www.iconfinder.com/DemSt
• https://www.iconfinder.com/paomedia
• https://flic.kr/p/eD7GR
• https://flic.kr/p/8G1yiU
Notas del editor
This presentation is formed by 4 main sections.
First, I will introduce the motivation and the main aim of this work, which is the bar-like environments recognition.
Secondly, I will explain the followed approach to achieve this aim.
Next, the evaluation of our approach.
And finally, the conclusions and the future work
Several works address the problem of recognizing user context.
They relied in ad-hoc architectures for either equipping spaces with a sensing infrastructure or attaching sensors to human body.
However, current smartphones are equipped with a wide variety of sensors,
like accelerometer, gyroscope, geomagnetic sensor, luminosity sensor, microphone or GPS among others.
Consequently, they are an ideal replacement for those early ad-hoc sensing stations.
As a result, the context recognition area has greatly benefited from mobile technologies.
In addition, smartphones are part of our daily lives and we carry them with ourselves everywhere and every-time.
For this reason, context information is especially important in the mobile computing area, where user context and their needs change rapidly
In fact, in the last years context awareness has become a reality in real-world mobile applications.
In particular, usage of simple context information like location is commonly used in commercial applications like Foursquare, to suggest interesting venues nearby, or Twitter to tell which topics are trending in each user's location.
More recently, more complex context recognition is gaining presence in everyday products. For instance, several physical activity tracking application are very popular nowadays. They are capable of distinguishing a number of activities like walking, running, cycling or climbing stairs.
However, many complex or high-level context information does not always follow a clear or recognizable pattern that can be interpreted using low-level sensor data.
This is the case of many high-level user activities like cooking or reading a newspaper; or user environments like home, workplace, bar or public transport.
In this paper we address the issue of environment recognition as a means to tackle a high-level user activity: socialization.
Detecting when users are engaged in social contexts would enable services like "social reminders".
Products like coupon or “discount applications” or marketing campaigns where the aim is to attract the current clients' friends, or to present promotions in the most appropriate moment.
However, directly inferring a social interaction using smartphone sensors is not possible.
For this reason, in the present work we address this issue by means of environment recognition: detecting when a user is in a bar, pub, restaurant or similar establishments. In many countries, like Spain, these kinds of establishments are one of the main places of socialization. Therefore we consider that recognizing these kinds of venues is useful.
For the task of deciding which data to use for recognizing bar-like establishments we considered the main characteristics that all of them share.
In general, we can describe them as:
- Noisy places with continuous murmur, music playing or TV on among other noises.
- People either sitting or standing, depending on the kind of establishment, but usually in stationary positions.
- Low-light locations, specially low in the case of pubs, for instance, and less dark in others like restaurants. But in general terms, they can be described as artificially lighted places.
Consequently, we will use audio, acceleration and luminosity as data sources for the recognition task.
In order to train and test classification algorithms, we first need an annotated dataset of a diverse list of bar-like establishments and other non-bar environments.
For this purpose, we developed an Android application used by two users which gathers…
Audio root mean square power (RMS) and Decibels (dBs). From the microphone.
3-axial acceleration. From acceleration, gyroscope and geomagnetic sensors.
Luminosity from the luminosity sensor.
And finally, screen status. Provided by the Android framework. That it is used for transforming the luminosity data.
Once we capture the raw data using the Android application we must process it before feeding the classification algorithms with the generated dataset.
First we carry out a data fusion process where data coming from the different sensors is combined at a constant and uniform sampling rate.
Second, we make some transformations to improve the properties of the data for the classification task.
Finally, we extract the final features that will be used in the classifiers.
Not all sensors are capable of providing samples at the same rate and offering synchronized times for all of them.
The devices used for data capturing were used normally while this data gathering was running, some sensors suffered from occasional increases, delays or halts in their sampling rates.
Following the 50 Hz sample rate requested by default, we fuse data at this constant rate and also at 20 10 5 2 and 1.
Linear interpolation is used to compute the missing values.
In this step we apply transformations to the raw data variables in order to generate new ones.
Acceleration is processed to generate both linear (with no gravity component) and earth-coordinate versions of it. In addition, this data are augmented with the acceleration vector norm.
Regarding audio data, it is too noisy in its raw form. Thus, we also generated filtered versions of sound variables using a low pass filter.
Concerning luminosity data, it exposes a remarkable issue.
Due to normal operation of the mobile phone while data was captured, it was placed inside pockets for long periods, resulting in repeated zero values which do not correspond to the true luminosity of the environment.
To tackle this problem, we process luminosity data and fill those zero values when the screen is turned on with the closest sample observed when the screen was turned on.
Additionally, luminosity data follows a heavily skewed, long-tail distribution which can be tricky for some classifiers. Thus we also generated a log-transformed version of this variable which presents a more normalized shape.
The final step before feeding the data into a classifier is feature extraction.
First we grouped data into the labeled environments and after that we split data into window frames to compute these features.
For each window frame, we compute the mean, median, standard deviation, minimum and maximum values of all the variables.
These aggregated measures make up our features for the classification task.
With this evaluation we study what is the best combination of parameters to detect bar-like environments
We split the 100 hours captured with the Nexus 4 into 70 hours for training and 30 for testing.
The 20 hours of data captured with the HTC were dedicated to the test set, in order to evaluate how well the classifiers are able to generalize to different users and devices.
Regarding classifier training, we used a 10-repetition 5-fold cross validation.
Additionally, we trained four different classifiers for a more exhaustive comparison: a random forest, a support vector machine (SVM), k-Nearest Neighbours (k-NN) and a Naive Bayes classifier.
There are several configuration parameters to study:
the best features to use
the most suitable window sizes
and the classifier performance decay with decreasing sensor sampling rates.
We study this performance studying the recall, specificity, area under the ROC curve (AUC) and accuracy
For all these evaluation tasks we used R in its 3.1 version. For training and testing classifiers we used the caret package (version 6.0-41), and more specifically, the random Forest package (version 4.6-7) for using random forests, kernlab [4] (version 0.9-20) for the SVM, the R built-in class package for k-NN and klaR [15] package (version 0.6-12) for the Naive Bayes classifier.
Focusing on the feature comparison
With acceleration features we compared two aspects: If adding the vector norm was useful and if either linear or earth-acceleration lead to better results than base acceleration.
-In general, vector norm can be a useful feature to add to the 3-axis acceleration.
For random forest, SVM and k-NN, it leads to better classification results for the three types of acceleration but this improvements is in general subtle (around 1%).
-Regarding the comparison of the three types of acceleration, there are more different results.
For all classifiers, there is no significant difference between linear and earth acceleration.
Comparing base acceleration with the transformed ones, both random forest and SVM show better results (up to a 4% better).
With audio features we also compared two aspects:
-Comparing the two feature types, dBs are significantly better than RMS, except for the random forest, which shows no difference between both. This improvement ranges from a 4 to a 9% in the case of the SVM, from 6 to 15% for k-NN and from 2 to 8% for Naive Bayes.
-In the case of the filtering, there exist less differences.
Except for k-NN, which in the case of RMS it works better with the unfiltered version, the filtered feature leads to better results
-With luminosity features we studied if the two applied transformations are useful.
-Although using separately the log transformation and the fixed version we don’t get better results, combining both transformations we get a substantial improvement. The bigger in Naive Bayes.
The last step in feature comparison is studying the contribution of the features extracted from each sensor to the classifier performance.
This was done training the classifiers with the best performing feature of each sensor.
Then, classifiers where trained excluding the features captured by each sensor.
The results are the following.
Audio features are the most important feature.
Decline in performance without it for the four classifiers ranges from a significant 15% to a 20%.
Acceleration features are less important, but nevertheless their contribution ranges from a significant 1% to a 10%.
In contrast, luminosity features are only useful for SVM and Naive Bayes.
Once selected the best performing features we used them to make window comparisons.
Although the best performing one varies for each classifier, a common pattern can be seen. As it was expected, the smaller the window size, the worse the results.
Considering each classifier independently, random forest shows the best performance for 240 second windows.
The average performance loss for smaller windows is around a 2%.
In the case of the SVM, the best performing window is of 120 seconds.
The k-NN classifier the best is a 180 second windows.
Finally, Naive Bayes stands out with 240 second windows.
Later, we studied how decreasing the sampling rate of the sensors impacts classifier performance.
As expected, smaller window sizes suffer more than bigger ones when this parameter is decreased.
Lastly and having selected both the best performing features and the best window sizes, we can compare how well each classifier performs the recognition task.
As it can be observed, the best performing classifier is SVM, which outperforms the others in recall, AUC and accuracy.
Only random forest beats SVM in specificity.
After studying the best performing configuration, we selected the SVM with linear acceleration, filtered dBs and log-transformed fixed luminosity as the best classifier
With the first test set, nexus 4, we have satisfactory results. However, the HTC testset results are much less satisfactory
The reason for this results is that, as it can be observed, only "Bar" is successfully classified.
Seeing this, we tried training the classifier with this second test data to evaluate its performance.
In this case, results were much more satisfactory.
It means that the proposed system is at least capable of generalizing to new environments captured by the same user and device.