USER INTERFACE DESIGN UPDATE - SEPTEMBER, 2001

(click here for a printable version)

Insights from Human Factors International, Inc. (HFI) Providing consulting and training in software ergonomics. (http://www.humanfactors.com/)

Every month HFI reviews the most useful developments in UI research from major conferences and publications.

In this issue:

Dr. Bob Bailey asks -- How reliable is usability performance testing?

The Ergonomic Pragmatist, Dr. Eric Schaffer, gives practical advice.

Bob Bailey, Ph.D., Chief Scientist for HFI

Rolf Molich of DialogDesign in Denmark published two articles (Molich, et.al., 1998; Molich, et.al., 1999) over the past three years that helped us to understand better the limitations of even our best usability testing method -- performance testing.

He and his colleagues did a comparative evaluation of usability tests by having four commercial usability labs carry out tests on the same commercially available calendar program. The purpose of the comparative evaluation was to observe the different ways in which independent laboratories conducted usability tests. The testers independently performed usability tests that each involved about five typical users, and then prepared a test report. Their results showed that some labs found few usability problems (4), while others found many (98).

Only one problem was found by all four teams, and over 90% of the problems found by each team was found only by that team.

Molich and his colleagues conducted a follow-up to the first test to determine if the results were unique or could be replicated. In the second study, seven different professional usability labs and two university student teams independently carried out usability tests of a well-known Web site -- hotmail.com. They each prepared and submitted their standard test report. Again, their results showed that some labs found few problems (10), while others found many (150).

The results from the first study were, indeed, replicated. Again, there seemed to be little consistency across testing organizations. Over half (55%) of the problems found by each team were found only by that team.

More recently, Martin Kessner (Kessner, 2000; Kessner , et.al., 2001) from Carleton University in Ottawa had six usability testing teams conduct usability tests on a prototype of a system.

He attempted to improve the agreement of the testing teams by (a) testing a prototype that had not yet been used by actual users, (b) limiting the issues to be evaluated to five questions specified by designers, (c) focusing exclusively on usability issues (excluding all marketing and other issues), (d) having two evaluators group similar observations into categories of problems that were essentially the same, and (e) using only professional usability teams (no student teams).

>From the original total of 117 potential "usability problems" reported by all the testing teams, the evaluators excluded 31 as non-usability problems. They then combined similar problems and ended up with a final number of 36 unique usability problems. Consistent with the first two studies, none of the problems was found by every team, and a large proportion of the problems (44%) were found by one team only.

When considering the five specific questions that designers wanted answered, there was moderate agreement among the teams on two questions, and low agreement on the other three.

Taken together, the findings of these three studies show that there is considerable need for improvement in the usability testing process. Contrary to what some would like us to believe, effective usability testing is extremely difficult to do well. As a discipline, we need fewer "discount" methods, and more research-based, truly valid methods for finding usability true problems.

These findings show that even experienced usability professionals have difficulty in identifying usability problems. Should designers trust all observations made by usability professionals? With this much variability in performance testing results, should Web site designers trust any observations made by usability professionals?

Usability professionals do not let clients drop off a prototype Web site with the request to find as many problems as possible; and professional designers do not take seriously the never-ending list of "problems" identified by someone who has a usability lab with fancy video equipment. Any amateur with a conference room and a couple of subjects can use a performance test to find all kinds of so-called "usability problems." Some do not even need the test subjects -- they can find a multitude of "problems" just by staring at a Web site and fiddling with the links.

I agree with Kessner, et.al. (2001), the one thing that will most likely reduce the large-scale disagreements among usability testers is to have designers specify precisely the usability questions they have. Ideally, these questions will include the maximum allowable time for task completion, and a clear definition of success for each task. The true usability professional can then effectively use a performance test to identify those usability problems that most need finding and fixing.

For a complete list of references for this newsletter, go to: http://www.humanfactors.com/library/sep01.asp

The Ergonomic Pragmatist Eric Schaffer, Ph.D., CPE, Founder and CEO of HFI

Well this compilation should give every developer or human factors professional a jolt, I am not surprised by these findings. For 25 years I have routinely interviewed people who said they were human factors specialists because "they were humans." Conversely I have also interviewed hundreds of heavily credentialed human factors professionals, who understood the usability technology, but had no practical sense for what would be important. When testing it is essential to test a set of scenarios that fit closely with the business imperative of the site. It is also necessary to test with a reasonable number of subjects (maybe 20 - not just 5). It is also essential to select your usability staff or consultants with care.

Usability testing is an essential part of our craft. We must gather data well and use that to establish a good initial design. Then, just as a potter shapes a vessel, we progressively use the results to adjust our design. Sometimes we must rethink the whole structure. Sometimes the adjustment of a single word effects the success rate enormously. Even with the most highly trained, intuitive, and savvy professional, interface design is by nature an iterative process.

Two weeks ago I ran a usability test on a financial planning package. Over 80% of the users stopped at a question about their "income range." They did not understand why they would use it; even though there was an explanation just above that said it was to allow calculation of tax rate. Most said they would not proceed. We switched the question from "Income Range" to "Tax Bracket" (showing the related income ranges). Problem gone. How could we NOT take advantage of usability testing results like that?

3-day Annual User Interface Update Seminar presented by Dr. Robert Bailey. http://www.humanfactors.com/training/annualupdate.asp.

REGISTER for Bob's UI Update Seminar: (only two left in 2001)

Chicago - October 24 - 26 https://www.humanfactors.com/training/registration/AUregister10.asp

Seattle - November 7 - 9 https://www.humanfactors.com/training/registration/AUregister12.asp

Suggestions, comments, questions? HFI editors at mailto:hfi@humanfactors.com.

Want past issues? http://www.humanfactors.com/library/pastissues.asp

Subscribe? - http://www.humanfactors.com/library/subscribe.asp

Do NOT want this newsletter? E-mail mailto:unsubscribe@humanfactors.com with a Subject of: "Unsubscribe Newsletter"

Last update: February 07, 2003