Position Paper For CHI 2009 Workshop: Mobile User Experience Research: Challenges, Methods & Tools Sven Körber SirValUse Consulting GmbH Schlossstraße 8g, 22041 Hamburg, Germany
[email protected] +49 (0) 40 68 28 27-37 ABSTRACT
on the maturity level of their organizations; from those accustomed to subcontracting user experience work routinely, with clear focus and driven by an established customer focus, to those who draw on our expertise to clarify objectives, find matching methods and introduce more user-centered design processes.
The author shares techniques for evaluating mobile user interfaces efficiently in a commercial user experience agency environment. He endorses the System Usability Scale (SUS), product reaction cards, and stresses the importance of describing quantitative metrics obtained from small sample tests with confidence intervals, to guard against misinterpretations of study data. This position paper also contains a short discussion of LEOtrace mobile, an innovative means of collecting usage data on smartphones transparently in the field.
My environment is determined by a high frequency of individual projects that are commissioned on an ad-hoc basis. Many evaluation projects have to be concluded in about four to six weeks, slightly longer when there is an international angle.
AUTHOR KEYWORDS
Many of our clients decide to evaluate their products in multiple countries at the same time, mostly in Europe, but more and more in China and India, too. We cover international testing by leveraging a network of agencies we co-founded in 2005, the UXAlliance [13].
SUS, product reaction cards, confidence intervals, LEOtrace mobile ACM Classification Keywords
H.5.2 Evaluation / Methodology
METHODS THAT HAVE WORKED WELL
Generally, we have been successful in addressing our clients’ research questions by drawing on composite study designs [4] of standard methods like the following:
INTRODUCTION
I currently hold the position of director customer experience at SirValUse Consulting GmbH [8], a customer experience agency located in Hamburg, Munich and Beijing. SirValUse, Europe’s largest independent corporate consulting agency, has been analyzing and optimizing electronic interfaces in the dimensions of usability, utility, design and emotional relationship to the brand since its foundation in May 2000. I started as a consultant in 2002, and am currently leading a team of eight consultants in Hamburg. I manage SirValUse’s global accounts from the mobile handset and operator businesses. The needs and wants of the internal stakeholders I cater to on the client side vary depending
•
heuristic evaluations and cognitive walkthroughs
•
quantitative usability tests in the lab to benchmark handsets against one another and observing usage
•
interviews in the lab based on semi-structured interview guides to probe deeply into participants’ motivations and attitudes
•
field trials combined with online diaries to understand the context of use and capture feedback with a high degree of ecological validity
I would like to focus on three specific points that have helped us in addressing the needs of our clients from the mobile industry.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2009, April 4–9, 2009, Boston, MA, USA. Copyright 2009 ACM 978-1-60558-246-7/08/04…$5.00
Product reaction cards
To get good overall qualitative feedback for a device or service, we recommend taking this method from Microsoft’s desirability toolkit into account [1]. It is a simple two-step card sorting exercise of a stack of 115 1
positive and negative terms and is geared towards eliciting the participant’s subjective evaluation of a device, website or mobile service. It takes about ten minutes to complete and results in five self-selected core terms that lend themselves to being explored in a short interview segment. In my experience, this method provides a welcome change of pace after a usability test and gives a greater diversity of feedback than most other interviewing techniques, especially with less eloquent participants. I find it easy to prepare and to report on. Reporting confidence intervals to guard against overinterpretation of data from small sample studies
Since the cost of user experience testing scales with sample size, some of our clients demand setups with a low number of participants; for example n=8 per country or even less than that. This wouldn’t be much of a problem if the focus were to simply identify the most important usability problems; according to the calculator available at [7], one can expect to uncover 95% of all problems existing in an interface with a typical problem occurrence rate of 0.3 (n=8 participants). However, we are often asked to also report descriptive statistics of task completion rates, time on task and scaled satisfaction based on the results from small sample studies, in a context of benchmarking multiple service/devices against one another. One should bear in mind that such statistics will most likely be quoted without most of the context inside the organizations we report to, and used by stakeholders looking for highly simplified statements. It is our responsibility to not simplify results beyond the point of uselessness, and quantify our uncertainty with precision. In such cases, reporting just averages won’t be enough to describe the nature of the findings adequately. I would thus like to emphasize the importance of reporting confidence intervals for all statistical measures as suggested in Tullis’s excellent UPA 2008 presentation [11]. This might be glaringly obvious for everyone inside of the scientific community, but I have seen many bad reports lacking such precision. SUS questionnaire
To capture overall satisfaction with the usability of a mobile device, there are a number of questionnaires one could draw upon, and there is even an integrated questionnaire [6]: While the MPUQ is very comprehensive, we needed something a little simpler that also works with smaller samples. Hence, we selected DEC’s System Usability Scale (SUS) [3], and have since used it for dozens of mobile phone evaluations. It is a tenitem instrument resulting in a combined single score ranging from 0–100, 100 being the perfect score. The literature suggests that it is applicable for studies with
small samples [9]. In our own studies, the questionnaire’s results have always been in line with findings obtained via other methods, e.g. success rates, number and severity of issues uncovered, as well as the verbal reactions from participants, hence I regard it as a useful tool to measure subjective satisfaction. Since there is little data about how SUS ratings are distributed for walk-up-and-use, quantified tests of mobile phones, I’d like to share the following chart. The scores were obtained in a number of different benchmarking studies, with different, simple tasks, participants, countries, and handsets, in different language versions of the SUS. Due to confidentiality reasons, I cannot share any specifics about the handsets under scrutiny. However heterogeneous this aggregated view might be, I think it can still help the workshop participants to ascertain that any score above 75 on the SUS scale is probably a good effort on part of the device manufacturer. In a similar comparison of SUS scores from the web domain [12], scores in the 71–80 range were most frequent – I attribute this discrepancy to the fact that web user interaction styles are much coherent than in the mobile space, and the tasks might thus have been easier to complete for participants there.
DISCONTINUED METHODS
What has not worked so well for us so far in the context of benchmarking mobile handsets was using time on task in walk-up-and-use tests. Only rarely could we design in the time to complete each task twice and compare, to look at learnability (as suggested by the very helpful article by Berkun [2]). It would be interesting to hear which efficiency metrics other practitioners use successfully to evaluate mobile device usability. ISSUES AND UNSOLVED PROBLEMS
Most of the issues I personally face are not so much related to the methods we employ, but more about getting
buy-in for enforcing methodological consistency in multiphased study designs. It appears to be hard for many organizations to agree on clear objectives and then to stick to a certain study design for several waves, e.g. when tracking performance across product versions. I have also yet to meet a mobile client who has clearly stated performance goals for his product/service.
http://www.microsoft.com/usability/UEPostings/Desir abilityToolkit.doc 2. Berkun, S. (2003): The art of usability benchmarking. Online: http://www.scottberkun.com/essays/27-theart-of-usability-benchmarking/ 3. Brooke, J. (1996). SUS: a "quick and dirty" usability scale. In P. W. Jordan, B. Thomas, B. A. Weerdmeester & A. L. McClelland (eds.) Usability Evaluation in Industry. London: Taylor and Francis. Online: www.usabilitynet.org/trump/documents/ Suschapt.doc
It also appears hard to get buy-in for the idea that attempting to benchmark tasks on a mobile phone leads to certain trade-offs in the design of the tasks themselves, e.g. that they need to have a clear start- and end point and that linking random events together does not make for a useful overall task. Much too often, the resulting compromises are detrimental to the usefulness of the quant results.
4. Lesemann, E., Woletz, N., Koerber, S. (2007): Combining methods to evaluate mobile usability. MobileHCI '07: Proceedings of the 9th international conference on Human computer interaction with mobile devices and services. p 444-447
I would welcome any discussions about strategies to address these two points in the workshop.
5. Noldus Theme website: http://www.noldus.com/human-behaviorresearch/products/theme
NEW TOOLS AND METHODS
I would like to highlight LEOtrace mobile – a research system to log and analyze user activity on smartphones running Windows Mobile 5&6, Symbian S60 v2&3 or RIM OS (4.2.0 and higher).
6. Ryu, Y. S., Smith-Jackson, T. L.: Reliability and Validity of the Mobile Phone Usability Questionnaire (MPUQ). Journal of Usability Studies, Volume 2, Issue 1, November 2006, pp. 39-53 . Online: http://www.upassoc.org/upa_publications/jus/2006_no vember/ryu_smithjackson_mobile_phone_questionnaire.html
This application is currently under development in cooperation with our sister company Nurago and will allow us to deploy an over-the-air meter that automatically reports usage statistics back to a central log file repository on a daily basis. This will provide continuous, naturalistic usage data and will help uncover hidden usage patterns.
7. Sauro, J. (2006): Sample Size Calculator for Discovering Problems in a User Interface. Measuring Usability. Quantitative Usability, Statistics & Six Sigma. Website: http://www.measuringusability.com/ problem_discovery.php
In addition to the logging component that runs in the background, invisible to the user, LEOtrace mobile will also contain a module that allows us to automatically trigger short questionnaires based on events occurring on the smartphone. Such events include starting/exiting an application, sending/receiving messages, visiting a specific URL, and many more.
8. SirValUse Consulting GmbH Website (in English): http://www.sirvaluse.de/index.php?id=1&L=1
SPECIAL INTERESTS FOR THE WORKSHOP
9. Tullis, T.S., and Stetson, J.N. (2004): A Comparison of Questionnaires for Assessing Website Usability, Usability Professional Association Conference. Online: http://home.comcast.net/ ~tomtullis/publications/UPA2004TullisStetson.pdf
I would like to discuss with the other participants of the workshop how to make the most of activity logs obtained in the real world and how to distill actionable information from these.
10. Tullis, T.S., and Albert, B. (2008): Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Website: http://www.measuringuserexperience.com/index.htm
It would be especially interesting for me if we could discuss how to mine behavioral t-patterns from a stream of time-stamped sequential events [5].
11. Tullis, T.S., and Albert, B. (2008): Tips and Tricks for Measuring the User Experience (PDF). Presentation at the UPA Boston Usability and User Experience 2008 Conference. http://www.measuringuserexperience.com/ Tips&Tricks-Boston-UPA-2008.pdf
REFERENCES
1. Benedek, J., Miner, T. (2002). Measuring Desirability: New methods for evaluating desirability in a usability lab setting. In Proceedings of UPA 2002, Orlando, July 8-12. Online:
12. http://www.measuringuserexperience.com/SUSscores.xls 13. User Experience Alliance Website: http://www.uxalliance.com/ 3