Heuristic Evaluation

A usability evaluation method in which one or more reviewers, preferably experts, compare a software, documentation, or hardware product to a list of design principles (commonly referred to as heuristics) and identify where the product does not follow those principles.

Related Links
Detailed description
How To
Special Considerations

Related Links

Originators/Popularizers

In discussing heuristic evaluation, many books written for usability practitioners reference work by Nielsen or Nielsen and Molich. (Preece, Rogers, and Sharp, 2002; Dumas and Redish, 1999; Mayhew, 1999).

While the Nielsen and Molich heuristics (or one of the modified versions developed by Nielsen with other colleagues) are commonly thought of as THE heuristics, design guidelines had been proposed long before 1990. Cheriton (1976) proposed a list of guidelines for the design of interfaces for time-sharing systems. GOMS, proposed by Card, Moran, and Newell (1983), models the user’s cognitive processes, using operator times that are derived from the results of human performance research. Norman (1983a, 1983b) proposed some basic design rules for interface design. Ben Shneiderman first presented his “Eight Golden Rules of Interface Design” in the 1987 edition of his book Designing the User Interface (Shneiderman, 1987).

Nielsen and Molich themselves place their usability heuristics within a larger context. “Ideally people would conduct such evaluations according to certain rules, such as those listed in typical guidelines documents (for example, Smith and Mosier, 1986)…most people probably perform heuristic evaluation on the basis of their own intuition and common sense” (Nielsen and Molich, 1990). Nielsen and Molich cut “the complexity of the rule base by two orders of magnitude [from a typical guidelines document] by relying on a small set of heuristics such as the nine basic usability principles” (Molich and Nielsen, 1990). This description disagrees with the description provided by Molich and Dumas, in which heuristic evaluation is described as a subset of expert reviews (Molich and Dumas, 2005). In any event, while there may not be a clear linear relationship between heuristic evaluation, guidelines documents, and expert reviews, they are certainly related.

Authoritative References

Nielsen, J. (1989). Usability engineering at a discount. In G. Salvendy & M.J. Smith (Eds.), Designing and using human-computer interfaces and knowledge based systems (pp 394-401). Amsterdam, The Netherlands: Elsevier Science Publishers, B.V.

Molich, R. & Nielsen, J. (1990). Improving a human computer dialogue. Communications of the ACM. 33(3). 338-348.

Nielsen, J. & Molich, R. (1990). Heuristic evaluation of user interfaces. Proceedings of the SIGCHI conference on human factors in computing systems: Empowering people. Seattle, WA, USA. April, 1990. 249-256.

Nielsen, J. (1992). Finding usability problems through heuristic evaluation. Proceedings of the SIGCHI conference on human factors in computing systems. Monterey, CA, USA, 1992. 373-380.

Published Studies

Cockton, G., Lavery, D., & Woolrych, A. (2003). Inspection-based methods. In J.A. Jacko & A. Sears (Eds.), The human-computer interaction handbook (pp. 1118-1138). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

Molich, R. & Dumas, J.S. (2005). A comparison of Usability testing and expert reviews. In preparation.

Related Subjects

Scenarios and Fast Iteration: A discount usability method, used to create prototypes that reduce the level of functionality and the number of features by designing the prototype to work only when the user follows a previously planned script (Nielsen, 1989).
Simplified Thinking Aloud: A discount usability method used to conduct small scale usability tests. Nielsen recommends three participants. According to his research, three participants will turn up about half of the usability problems (Nielsen, 1989). Another difference between simplified thinking aloud studies and traditional usability studies is that data analysis is based solely on the facilitator’s notes (Nielsen, 1989). Because there is no need for recording equipment (and, therefore, no time spent reviewing session recordings) the simplified approach is less expensive than traditional studies (Nielsen, 1989).
Participatory Heuristic Evaluation: This method extends the heuristic inspection by adding several new heuristics and actual users to the evaluation team (Muller, Matheson, Page, & Gallup, 1995).
Cognitive Walkthrough: Similar to heuristic evaluation in that usability is evaluated without feedback from representative users. The cognitive walkthrough was proposed by Lewis, Polson, Wharton, and Rieman (1990) as a response to the difficulty encountered by usability practitioners in applying cognitive models to user interface design. The method is based on the model of exploratory learning developed by Polson and Lewis, which assumes that the user makes selections based on the expectation that the action will help the user achieve his/her current goal (Lewis et al, 1990). The person responsible for creating one aspect of a system presents the proposed design to a group of peers (for example, a software engineer presenting to other software engineers), who evaluate the design by following one or more specific user tasks (Wharton, Rieman, Lewis, and Polson, 1994). The designer uses a list of questions that replicates the user’s thought process as the user tries to achieve a goal (Lewis, et al, 1990). The presentation includes the interface’s design specifications, a task scenario, descriptions of the anticipated users, a description of the conditions under which the software will be used, and task-based procedures (Wharton, et al, 1994). There are several versions of the cognitive walkthrough method with successive versions focused on making the method more usable for practitioners who are not well-versed in cognitive science.
Summative Usability Testing: An evaluation method that produces direct feedback from representative users. The user’s performance and behavior is observed and recorded in a controlled environment -- the usability laboratory (Preece, Rogers, & Sharp, 2002). The users perform tasks that reflect the way they would use the product in a “real life” situation (Preece, Rogers, & Sharp, 2002). As a rule, usability tests measure performance in terms of success rates, number of errors and time to complete specific tasks (Preece, Rogers, & Sharp, 2002). Some summative tests compare the results with a usability specification that details what is considered acceptable usability for a product. For an overview of usability testing, refer to A Practical Guide to Usability Testing (Dumas & Redish, 1993).
Guidelines Review: Similar to heuristic evaluation in that usability is evaluated without feedback from representative users. An evaluation method in which the evaluator checks the system for conformance with a list of standing guidelines (Shneiderman, 1998). [Nielsen and Molich (1990) refer to guideline documents in their explanation of the heuristic evaluation method.] A complete list of guidelines can include hundreds or thousands of items, and may cover the following: (1) words and icons, (2) screen-layout issues, (3) input and output devices, (4) action sequences, (5) training (Shneiderman, 1998). For an example of a guidelines document, refer to http://www.hcibib.org/sam/index.html< (Smith and Mosier, 1986).
GOMS Analysis: Similar to heuristic evaluation in that usability is evaluated without feedback from representative users. GOMS analysis is based on the prevailing theory in cognitive psychology, which views the human brain as an information processor with limited capacity (Card, Moran, and Newell, 1983). GOMS is one of the few widely known theory-based methods available for evaluating the usability of computer interfaces (John and Kieras, 1996). The components of the GOMS model, a set of Goals, a set of Operators the user must execute, a set of Methods (procedures) for accomplishing a goal, and a set of Selection rules for selecting among the available methods, describe the cognitive processing that takes place as the user completes a task (Card, Moran, and Newell, 1983). Time estimates are then made at the Operator level (Card, Moran, & Newell, 1983). Analysis is based on comparison of total operator times aggregated at the Method, Selection rule, and Goal level (John, 1996). For an example of operators and times for the GOMS Keystroke Level Model, refer to Kieras (1993).

Detailed description

Benefits

Heuristic evaluation falls within the category of usability engineering methods known as Discount Usability Engineering (Nielsen, 1989). The primary benefits of these methods are that they are less expensive than other types of usability engineering methods and they require fewer resources (Nielsen, 1989). The beneficiaries are the stakeholders responsible for producing the product – it costs less money to perform a heuristic evaluation than other forms of usability evaluation, and this will reduce the cost of the project. Of course, the users benefit from a more usable product.

Advantages

Inexpensive relative to other evaluation methods (Nielsen & Molich, 1990).
Intuitive, and easy to motivate potential evaluators to use the method (Nielsen & Molich, 1990).
Advanced planning not required (Nielsen & Molich, 1990).
Evaluators do not have to have formal usability training. In their study, Nielsen and Molich used professional computer programmers and computer science students (Nielsen & Molich, 1990; Nielsen, 1992).
Can be used early in the development process (Nielsen & Molich, 1990).
Faster turnaround time than laboratory testing (Kantner & Rosenbaum, 1997).

Disadvantages

As originally proposed by Nielsen and Molich, the evaluators would have knowledge of usability design principles, but were not usability experts (Nielsen & Molich, 1990). However, Nielsen subsequently showed that usability experts would identify more issues than non-experts, and “double experts” – usability experts who also had expertise with the type of interface (or the domain) being evaluated – identified the most issues (Nielsen, 1992). Such double experts may be hard to come by, especially for small companies (Nielsen, 1992).
Individual evaluators identify a relatively small number of usability issues (Nielsen & Molich, 1990). Multiple evaluators are recommended since a single expert is likely to find only a small percentage of problems. The results from multiple evaluators must be aggregated. (Nielsen & Molich, 1990).
Heuristic evaluations and other discount methods may not identify as many usability issues as other usability engineering methods, for example, usability testing. (Nielsen, 1989).
Heuristic evaluation may identify more minor issues and fewer major issues than would be identified in a think-aloud usability test (Jeffries and Desurvire, 1992).
Heuristic reviews may not scale well for complex interfaces (Slavkovic & Cross, 1999). In complex interfaces, a small number of evaluators may not find a majority of the problems in an interface and may miss some serious problems.
Does not always readily suggest solutions for usability issues that are identified (Nielsen & Molich).
Biased by the preconceptions of the evaluators (Nielsen & Molich, 1990).
As a rule, the method will not create “eureka moments” in the design process (Nielsen & Molich, 1990).
In heuristic evaluations, the evaluators only emulate the users – they are not the users themselves. Actual user feedback can only be obtained from laboratory testing (Kantner and Rosenbaum, 1997) or by involving users in the heuristic evaluation (Muller, Matheson, Page, & Gallup, 1995).
Heuristic evaluations may be prone to reporting false alarms – problems that are reported that are not actual usability problems in application (Jeffries, 1994).

Note

The original justifying assumptions about the heuristic evaluation method presented by Nielsen (1989) and Nielsen and Molich (1990) were: the method is relatively inexpensive, evaluators do not have to be usability experts, there would be evaluations by “several” evaluators, and that when results are aggregated, the evaluators will find “most” of the issues identified by more expensive methods.

The literature presents mixed messages about the relative advantages and disadvantages of the heuristic review method. For example, two papers written in the early 1990s (Jeffries, Miller, Wharton, and Uyeda, 1991; Desurvire, Kondziela, and Atwood, 1992) compared the effectiveness of different usability evaluation methods, including heuristic reviews.

Jeffries et al found that heuristic reviews identified more usability issues than the other methods used in their study -- usability test, guidelines review, and cognitive walkthrough – when the results of all evaluators were aggregated (Jeffries et at, 1991). However, their definition of heuristic evaluation differed from method described by Nielsen and Molich. They used experienced usability professionals – “UI specialists study the interface in depth and look for properties they know, from experience, will lead to usability problems” (emphasis added) (Jeffries et al, 1991). Further, while heuristic evaluation identified more issues than usability testing, heuristic evaluation identified more minor issues and usability testing identified more major issues (Jeffries et al, 1991).

Desurvire et at (1992) showed that experts identified more usability issues than non experts, Desurvire et al’s finding supports the results reported by Nielsen (1992), that usability experts identify more issues than non experts, and double experts – usability experts who are also domain experts – find more issues than usability experts. Jeffries and Desurvire (1992) point out that to realize the full benefit of a heuristic review, the evaluators should all be experts, and the cost of multiple experts will increase the cost of the review.

Cockton and Woolrych (2002) reviewed “discount” usability methods from the perspective of a cost benefit analysis. They point out that actual user problems result from a complex interaction between the user and the system (Cockton & Woolrych, 2002). They believe that discount methods, including heuristic reviews, are too simple to accurately evaluate this interaction (Cockton & Woolrych, 2002). They concluded that these methods are so prone to error that the potential costs far outweigh the benefits (Cockton & Woolrych, 2002). Cockton and Woolrych (2002) recommend that these methods “should be cleared off the HCI store’s shelves.”

Molich and Dumas (2005) reviewed the results of Comparative Usability Evaluation 4
(CUE-4). In the study, 17 teams of usability professionals evaluated the reservation system for the Hotel Pennsylvania – www.hotelpenn.com (Molich and Dumas, 2005). Nine of the teams performed usability tests and eight teams performed expert reviews (Molich and Dumas, 2005). While only one expert review team used a heuristic review as described by Nielsen and Molich (1990), Molich and Dumas’s general conclusions about how expert reviews compared to usability testing are of interest.

There were no false alarms, which contradicts a common belief that expert reviews will produce more false alarms than usability tests (Molich & Dumas, 2005).
Expert reviews may be more efficient than usability tests, in terms of number of issues found as a function of resources expended (Molich & Dumas, 2005).
Expert reviews identify the same proportion of major and minor problems as usability tests (Molich & Dumas, 2005) which contradicts earlier studies.

Cost-Effectiveness (ROI)

Jeffries and Desurvire (1992) point out that, if the results reported by Nielsen (1992) and Desurvire, Kondziela, and Atwood (1992) – that experts or double experts will find more issues than nonexperts – then heuristic evaluation becomes more of a Cadillac method than a discount method (Jeffries and Desurvire, 1992).

Cockton and Woolrych (2002) reviewed “discount” usability methods from the perspective of cost benefit analysis. They point out that actual user problems result from a complex interaction between the user and the system (Cockton and Woolrych, 2002). They believe that discount methods, including heuristic reviews, are too simple to accurately evaluate this interaction (Cockton and Woolrych, 2002). They concluded that these methods are so prone to error that the potential costs far outweigh the benefits (Cockton and Woolrych, 2002).

How To

Appropriate Uses

Heuristic evaluation can be used throughout the design life cycle – at any point where it is desirable to evaluate the usability of a product or product component. Of course, the closer the evaluation is to the end of the design lifecycle, the more it is like traditional quality assurance and further from usability evaluation. So, as a matter of practicality, if the method is going to have an impact on the design of the interface (i.e. the usability issues are to be resolved before release) the earlier in the lifecycle the review takes place the better. Specifically, heuristic reviews can be used as part of requirements gathering (to evaluate the usability of the current/early versions of the interface), competitive analysis (to evaluate your competitors to find their strengths and weaknesses) and prototyping (to evaluate versions of the interface as the design evolves).

Nielsen and Molich described heuristic evaluation as “an informal method of usability analysis where a number of evaluators are presented with an interface design and asked to comment on it” (Nielsen & Molich, 1990). In this paper, they presented nine usability heuristics:

Simple and natural dialog
Speak the user’s language
Minimize user memory load
Be consistent
Provide feedback
Provide clearly marked exits
Provide shortcuts
Good error messages
Prevent errors

This list, and later versions (for example, Nielsen, 1994; Nielsen, Bush, Dayton, Mond, Muller, & Root, 1992), are commonly used by many practitioners as the basic heuristics for product evaluation. However, there are other published lists of heuristics available, including Shneiderman’s eight golden rules of interface design (Shneiderman, 1998), Gerhardt-Powals research-based guidelines (Gerhardt-Powals, 1996) and Kamper’s lead, follow, and get out of the way principles and heuristics (Kamper, 2002).

Heuristic evaluation is not limited to one of the published lists of heuristics. The list of heuristics can be as long as the evaluators deem appropriate for the task at hand. For example, you can develop a specialized list of heuristics for specific audiences, like senior citizens, children, or disabled users, based on a review of the literature.

Procedure

Decide which aspects of a product and what tasks you want to review. For most products, you cannot review the entire user interface so you need to consider what type of coverage will provide the most value.
Decide which heuristics will be used.
Select a team of three to five evaluators (you can have more, but the time to aggregate and interpret the results will increase substantially) and give them some basic training on the principles and process.
Create a list of representative tasks for the application or component you are evaluating. You might also describe the primary and secondary users of your product if the team is not familiar with the users.
Ask each evaluator to perform the representative tasks individually and list where the product violates one or more heuristics, After the evaluators work through the tasks, they are asked to review any other user interface objects that were not involved directly in the tasks and note violations of heuristics. You may also ask evaluators to rate how serious the violations would be from the users’ perspective.
Compile the individual evaluations and ratings of seriousness.
Categorize and report the findings so they can be presented effectively to the product team.

Participants and Other Stakeholders

The basic heuristic inspection does not involve users of the product under consideration. As originally proposed by Nielsen and Molich (1990), the heuristic review method was intended for use by people with no formal training or expertise in usability. However, Nielsen (1992) and Desurvire, Kondziela, and Atwood (1992) found that usability experts would find more issues than non experts. For some products a combination of usability practitioners and domain experts would be recommended.

The stakeholders are those who will benefit from the cost savings that may be realized from using a “discount” (i.e. low cost) usability methods. These stakeholder may include the ownership and management of the company producing the product and the users who will purchase the product.

Materials Needed

A list of heuristics with a brief description of each heuristic.
A list of tasks and/or the components of the product that you want inspected (for example, for a major Web site, you might designated 10 tasks, plus 10 pages that you want reviewed).
Access to the specification, screen shots, prototypes, or working product.
A standard form for recording violations of the heuristics.

Who Can Facilitate

Heuristic evaluations are generally organized by a usability practitioner who introduces the method and the principles, though with some training, other members of a product could facilitate.

Common Problems

Insufficient resources (too few evaluators) are committed to the evaluation. As a result, major usability issues may be overlooked.
Evaluators do not fully understand the heuristics.
Evaluators may report problems at different levels of granularity (for example, “The error messages are bad” versus “Error message 27 does not state how to resolve this problem”).
Some organizations find heuristic evaluation such a popular method that they are reluctant to use other methods like usability testing or participatory design.

Data Analysis Approach

The data are collected in a list of usability problems and issues. Analysis can include assignment of severity codes and recommendations for resolving the usability issues. The problems should be organized in a way that is efficient for the people who will be fixing the problems.

Next Steps

Discuss the usability issues with the product team. Track what problems are fixed, deferred, and viewed as “not a problem” by the product team.

Special Considerations

Costs and Scalability

People and Equipment

There is no special equipment required for a heuristic evaluation, other than the computer or other hardware (PDA, cell phone, etc.) used to run the application. The cost will reflect the number of evaluators, their level of usability and domain expertise, and the amount of time they put into the evaluation. As originally proposed by Nielsen and Molich (1990), the heuristic review method was intended for use by people with no formal training or expertise in usability. However, Nielsen (1992) and Desurvire, Kondziela, and Atwood (1992) found that usability experts would find more issues than non experts, and Nielsen (1992) found that double experts (evaluators with usability expertise and expertise in the domain in which the software is used) will find more issues than usability experts. Short training sessions on the list of heuristics may add some costs, but make for more effective evaluations.

Time

Molich and Dumas (2005) reported that expert reviews (which included one heuristic review) conducted for the CUE-4 study took significantly less time than usability tests, and that the expert reviews identified the same number and quality of issues.

Accessibility Considerations

None (unless the list of heuristics will be used to evaluate accessibility).

International Considerations

None (unless the list of heuristics will be used to evaluate localization issues).

Ethical and Legal Considerations

For mission-critical products, the heuristic evaluation should not be the sole evaluation method for examining potential usability problems. More formal methods including summative testing and formal user interface inspections may be required to examine subtle interactions between the user, task, and product that would create serious errors.

Facts

Lifecycle: Analysis , Evaluation

Released: 2007-01

Printer-friendly version

Search

Topics section

How You Can Help

Sponsors