—The graphical user interface (GUI) of an interactive system is nowadays the most frequently used interaction modality. While the contents are of high importance, the Look and Feel is an equally essential factor determining the GUI quality that is impacted by several determinants such as but not limited to aesthetics, pleasurability, fun, etc. Therefore, GUIs aesthetics is a potential element to focus on in order to facilitate communication between device and user. On that basis, one question that comes up is: “Is it possible to evaluate the quality of a GUI by estimating its aesthetics through a series of measurable geometric metrics?”. This paper suggests possible directions to address the previous question by, first, introducing a simplifying model of GUIs aesthetics that captures aesthetics aspects and regions-related metrics. In a second phase, a methodology for the evaluation of GUIs aesthetics is defined based on the underlying model. The paper finally puts forwards a model-based implementation of the aforementioned methodology in the form of a web service tool for metric-based evaluation of GUIs and discuss the results of a survey on users aesthetics perceptions.
This paper has been presented at the IEEE eighth International Conference on Research Challenges in Information Science RCIS'2014 5marrakech, May 28-30, 2014).
Towards an Evaluation of Graphical User Interfaces Aesthetics based on Metrics
1. Towards an Evaluation of
Graphical User Interfaces Aesthetics
based on Metrics
Mathieu Zen & Jean Vanderdonckt
Louvain School of Management
Université catholique de Louvain
2. Outline
• Context and motivations
• Related Literature
• Metrics-based Model proposal
• INFRARED: A Method For UI Evaluation
• QUESTIM : Tool Implementation
• Pilot Study on Users Perceptions
2
3. Context and motivations
“Not only in work definitely undertaken from artistic
impulse, but in all the products of his industry, in
his choice of locality, […] his clothes and his
implements, man shows that he is affected by
appearance, by something that causes him
pleasure over and above the immediate utility of
object.”
The Origin of the Aesthetic Emotion
Felix Clay, 1908
3
4. Context and motivations
4
• Bounce rate : 40%
• 60% of our brain is dedicated to vision
• Design appreciation is the first interaction with a
UI and takes less than a half second
• Cisco’s white paper forecasts :
1.4 mobile device per capita by 2018
7. • Problem statement
‘How to evaluate the quality of a User Interface ?’
– Qualitative evaluation process
• Long and Cumbersome
• Final
– Quantitative evaluation process
• Quick
• Objective
Central statement
Central statement State of the Art Model Tool Validation
7
8. Central statement
• Research question:
‘How to model a quantitative evaluation of GUIs
aesthetics?’
• Methodology
– Systematic Literature Review (SLR)
– Metrics-based model for GUI evaluation
– Method implementation
– Empirical validation
Central statement State of the Art Model Tool Validation
8
10. State of the Art
• A qualitative approach
Central statement State of the Art Model Tool Validation
10
The power of the center - ARNHEIM - 1983
11. State of the Art
• A qualitative approach – VISUAL TECHNIQUE
11
Central statement State of the Art Model Tool Validation
12. State of the Art
• Quantifying aesthetics
12
Central statement State of the Art Model Tool Validation
13. State of the Art
• Quantifying aesthetics – AESTHETICS METRIC
13
Central statement State of the Art Model Tool Validation
14. State of the Art
• Shortcomings
– No metrics validation
– Lots of formulas
– Lots of interpretations
• Two main issues with metrics:
– REPRESENTATIVENESS
– RELEVANCE
14
Central statement State of the Art Model Tool Validation
15. Metrics Proposal
15
Central statement State of the Art Proposal Tool Validation
• Set of metrics
Balance
Symmetry
Density
Complexity
Proportion
Regularity
Spacing
Economy
Homogeneity
Rhythm
Concentricity
Alignment
…
16. Metrics Proposal
• New metrics
16
Central statement State of the Art Proposal Tool Validation
ddiag
d
25. Validation
Central statement State of the Art Model Tool Validation
25
• Two main issues:
– REPRESENTATIVENESS
e.g. Does the formula for balance represents human
perception of balance?
– RELEVANCE
e.g. Does the balance property of a UI enhance its
aesthetics perception?
26. Pilot Study
26
• Experiment with 25 subjects
• Under and postgraduate students
• 4 UIs
• 12 Visual techniques
• 5-points Likert scale
GOAL : Compare Users Perceptions with
values given by the metrics
REPRESENTATIVENESS
Central statement State of the Art Model Tool Validation
29. Pilot Study
29
• Hypothesis
– H1 : There is a perfect similarity between scores of
metrics and users reviews.
– H2 : There is a relative similarity between scores of
metrics and users reviews.
Central statement State of the Art Model Tool Validation
30. Pilot Study
30
• H1 : One-Sample Wilcoxon Signed Rank Test
Central statement State of the Art Model Tool Validation
31. Pilot Study
31
• H2 : Match-Paired Wilcoxon Signed Rank Test
Central statement State of the Art Model Tool Validation
32. Pilot Study
32
• H2 : Friedman 2-Way ANOVA by Ranks
Central statement State of the Art Model Tool Validation
33. Pilot Study
33
• H2 : Metrics Ranking
Central statement State of the Art Model Tool Validation
34. Pilot Study
34
• Discussion
– 4/12 representative metrics
– Study at a larger scale is needed (only 4 UIs)
– Subjective nature of the UI grid
Central statement State of the Art Model Tool Validation
35. Conclusions
• Qualitative evaluation
Final and long process, subjective
• Quantitative evaluation
Quick and objective feedback
• Metrics Relevance and Representativeness
35
36. Future work
• QUESTIM – Automatic regions detection
• Further empirical validation
• Relevance
Investigate professional designers methods to improve
GUI aesthetics
• Representativeness
Reproduce pilot study at a larger scale
36
The title of this paper is « A Metrics-Based Methodology for Evaluating Graphical User Interfaces of Information Systems ».
Here is the plan of this presentation. I’ll first introduce context and motivations of the work. Then I’ll present the related literature surrounding UI aesthetics evaluation. Then, I propose a metrics-based model and a method for UI evaluation that is called INFRARED. A software is also implemented based on this method: the name of which is QUESTIM. I’ll present a short video of how it can be used. And finally, I present a pilot study on users perceptions relative to the validation part of the work.
So let’s start with the context and motivations. I chose to first present you this quote from Felix Clay, a philosopher who wrote an article about the origin of aesthetic emotion.
To put it in a nutshell, he says that man, in his choices of products, locality and clothes, shows that he is affected by appearance.
And this can also be valid for HCI. Here I present to you some various statistics that goes in this sense.
First of all, we have the bounce rate which is often used in web analytics and is the percentage of people who arrive on your site and leave without visiting a second page. The average Bounce rate according to Google Analytics is 40%. Not only people leave your page because they think they won’t find what they want or they found what they want, or simply because they find your page is not attractive.
Another statistics coming from Neurological science is that 60% of our brain is dedicated to vision.
Researchers in HCI have also reported that design appreciation is the first interaction users have with a UI and that this interaction takes less than half a second.
Finally, from the Cisco’s white paper a forecast report about technologies, we know that by 2018 the number of mobile devices will exceed the size of the population on earth. The presence of interfaces in humans everyday lifes is not going to decrease.
All those examples and many others advocate the need to focus on UI quality improvement. This can be done by trying to improve aesthetics.
But how can we evaluate UI aesthetics?
To expose concretely the context previously introduced, I show you here some examples of poorly build website interfaces.
The first one is the website of a national union for lab and scientific workers. As you can see there are too many items not well aranged and there is a lack of structure, organization and alignment of objects.
The second one is my favorite. Not only because it is worst than the previous one but mainly because it is the website of a webdesign company.
Here the organization of objects is better but the choose of colors and background is misleading the user who is unable to distinguish the interactive elements anymore.
The problem I’m trying to address through my research work is « How is it possible to evaluate the quality of a User Interface ».
2 approaches are possible:
-The first one is through a qualitative evaluation process. Which can be long and cumbersome because it is needed to gather many reviews and proceed to qualitative survey by questionnaires. These kind of surveys are also most of the time conducted as a final step in UI development.
-The second approach is through a quantitative evaluation process based on metrics. Which can provide the designer with a quick and objective feedback.
The latter approach is the one we will explore further in our work.
Therefore, the research question is: « How to model a quantitative evaluation of GUIs asthetics ? »
The methodology I used is the following:
-First, I proceed to a SLR and define the concepts of visual techniques and aesthetic metrics based on the literature.
-Then, I proceed with the proposal of a metrics-based model for GUI evaluation
-After this, we propose a method for GUI aesthetics evaluation and a tool implementation of the method
-And finally we proceed to some empirical validation of the method mainly through experiments with users.
The scope of the research is the following:
-There exists some properties that are part of the quality norm, including usability and functionality.
-Usability itself is divided in several properties such as social acceptance and physical acceptance.
-The main factor contributing to social acceptance is aesthetics.
-We try to explore UI aesthetics and find out it is mainly linked to UI objects arrangement, used color sets and shapes complexity.
-Focusing on the arrangement of items, we explore different properties and techniques such as balance, symmetry, density, etc.
-Each technique can have one or several dedicate metrics and each metric can lead to various interpretations.
In literature, there exist two main approaches for evaluating aesthetics.
First, with a qualitative approach as exposed by Rudolf Arnheim, an art theorician, with this design of a plain black disc in an empty square.Arnheim says « this disk is centered », it is obvious we don’t need to measure the distance from the left border and the center of the disk to say that it is centered.
This is mainly because men have a sense of balance. They can feel when something is not at equilibrium.
This sense of balance can be adapted to UI design as a visual technique. Indeed, we can see through those examples that the interface on the left is more balanced than the one on the right because its objects are equally distributed in all its 4 quadrants.
When it comes to quantify aesthetics, the first formula that should come to mind is the Golden Section. Basically, it is a proportion considered as the number that would ideally represent beauty through order. And thus could have been used by famous artists, sculptors and architects throughout the years.
As shown, on this picture the parthenon has been presumably built considering the rules of this proportion.
The first problem is that it has never been suitably demonstrated that the Golden Section is a condition for beauty.
The second problem is that a lot of works have been erroneously considered as following these rules. Lots of people gave a Golden property to some artworks, let’s call it so, but this was maybe not the will of its author.
Considering UIs, there exist also some metrics and formulas which have been defined for quantifying aesthetics. It is the case for the formula of balance and 12 other metrics defined by Ngo.
For the sake of presentation, I chose not to expose here the complete algorithm to compute balance of an interface. Basically, it calculates the weight of objects through 4 quadrants and then give a value for the vertical balance and the horizontal balance. Then the balance metric is obtained from those two values.
I show also here another existing formula for computing balance with the UI center of mass. It indicates that, as previously mentioned, a visual technique can be represented with several aesthetics metrics.
Considering the state of the art, we perceive some shortcomings that need to be solved.
First of all, there exist metrics but no metrics validation that is significant enough.
Another thing is that there exist sometimes lots of different formulas and different interpretations.
Therefore it could be interesting to bring some more validation to those metrics and also try to find the formula that best represents what the humans perceive.
About this perception, we found out two main issues with metrics that need to be tackle:
The representativeness of the metric: namely, how the metric represents what human can perceive?
The relevance of the metric, namely, how the presence of an acceptable value of the metric can enhance the global UI aesthetics?
Here we define a set of metrics we are already considering in our work. This is not an exhaustive list because other could be find by exploring literature and it is also possible to find new metrics.
It is the case for the metric for concentricity and the metric for alignment which has been defined as a visual technique but never solely linked to a formula.
The concentricity metric is one of these new metrics we propose. It measures how UI objects are gathered in the center or rather kept in the corners.
The formula is the ratio between the average distances between the center of the objects (dbar) and the UI center and the distance from one corner to the center.
In a second step, we propose a method for GUIs aesthetics evaluation in 4 steps. INFRARED
- First the user select a UI
Then he draws regions of interest on top of this UI
Metrics are automatically computed and presented in a metrics report
The user can then try to redesign the UI according to the acceptable values
These 3 last steps are iterative and the user can repeat it until he reach an acceptable level for the metrics.
Here is the model proposal onto which is based the method. We find here a description of the main elements:
The GUI which consists in a set of regions. Those regions serve as a basis for metrics computation that are presented in the form of a report.
We implemented the method in a tool called QUESTIM for Quality Estimator Tool based on Metrics.
Here I show you a 1minute video showing how the tool can be used to evaluate Uis aesthetics.
The last part of our presentation concerns the validation. We previously defined 2 main issues:
The first one was the REPRESENTATIVENESS. Namely, does the formula for balance represents human perception of balance?
And the second one was the RELEVANCE. Namely, does the balance property of a UI enhance its aesthetics perception?
We conducted a first pilot study which goal was to compare users perceptions with values given by the metrics. Therefore, it is an attempt to tackle the representativeness issue.
It was an experiment with 25 subjects, under and postgraduate students. We asked them to score 4 Uis according to 12 Visual techniques on a 5-points Likert scale.
Here are the 4 selected interfaces. The first one is the portal of the belgium government, the second one is the main menu interface of the game GTR2, the third one is the interface of a bank ATM and the last one is the former homepage of UCL website.
We proceeded also to a grid definition of the Uis in order to compute the metrics. But the ones that were presented to the subjects were the original ones.
We defined 2 hypothesis :
The first one assume a perfect or exact similarity between scores of metrics and users reviews
The second one assume a relative similarity between scores of metrics and users reviews.
For the hypothese 1 we conducted One-Sample Wilcoxon test and found that only the median of the score of balance for UI belgium was exactly and significantly representing the value of the metric. As we could expect it was therefore not possible to find an exact match between users scores and metrics.
We proceeded then with a match-paired wilcoxon test. Basically we test if the difference of scores for balance for two Uis is equal to 0 which would mean that the balance for the two UI would be considered the same for the rispondents.
For example, we test if the balance of the ATM UI is equal to the balance of the BELGIUM UI and find that it is not true. The statistic is even negative meaning that subjects found that BELGIUM was more balanced than ATM.
We conducted also a Friedman 2-way ANOVA and got this char which namely represents the results. We can see here for example that the balance for the ATM was low according to users while the balance for UCL website was high according to users.
With these results we were able to rank the Uis for each specific technique (here for the balance) and compare it with the ranking done by the metrics.
We see here that UCL is the most balanced for both people and metric and also that ATM is the least balanced UI for both. We cannot compare GTR2 and BELGIUM as we got no significant value allowing to say that one is more balanced than the other according to users.
We found some main points to discuss.
-First of all the pilot study give some interesting results showing that 4 metrics were relatively representative of users perceptions. Those metrics were balance, equilibrium, density and economy.
-We see also that a pilot study is not sufficient for a significant validation. The study should be reconducted at a larger scale with more than 4UIs and with more subjects.
-We also question the subjective nature of the UI grid previously defined. It is needed to find some objective rules for defining regions on top of a UI.
We reach the end of the presentation.
Our conclusions are the following:
Qualitative evaluation is often a long and subjective process which needs a lot of reviews for being relevant. It is also often restricted for evaluating a Final UI before it is released.
We propose therefore metrics-based quantitative evaluation method and model implemented in a tool that can provide designers with quick and objective feedback that could be also adapted to each stage of UI design.
We also tackled the research question and found that there were two main issues to be solved relative to metrics evaluation, namely the relevance and the representativeness.
As a future work, we mainly notice:
- The necessity to define rules for detecting UI objects before to proceed to metrics computation. Another interesting point concerning the tool could be to implement an automatic detection feature for regions of interest.
There is also a need for further empirical validation
To tackle first the metrics relevance: we could for example investigate professional designers methods and principles that are used for improving GUI aesthetics
Then for the representativeness issue: we could reproduce the pilot study at a larger scale to strongly validate the metrics.
Thank you for attending to my presentation and I’ll be glad to answer your questions and to discuss with you the orientation of my research.