Methods
Heuristic Evaluation
Facts:
- Also called: Heuristic Review, Discount Usability Engineering, Usability Evaluation, User Interface Inspection, Expert Review
- Lifecycle stages: All
A usability evaluation method in which one or more reviewers, preferably experts, compare a software, documentation, or hardware product to a list of design principles (commonly referred to as heuristics) and list where the product does not follow those principles.
Appropriate Uses
Heuristic evaluation can be used throughout the design life cycle – at any point where it is desirable to evaluate the usability of a product or product component. Of course, the closer the evaluation is to the end of the design lifecycle, the more it is like traditional quality assurance and further from usability evaluation. So, as a matter of practicality, if the method is going to have an impact on the design of the interface (i.e. the usability issues are to be resolved before release) the earlier in the lifecycle the review takes place the better. Specifically, heuristic reviews can be used as part of requirements gathering (to evaluate the usability of the current/early versions of the interface), competitive analysis (to evaluate your competitors to find their strengths and weaknesses) and prototyping (to evaluate versions of the interface as the design evolves).
Nielsen and Molich described heuristic evaluation as “an informal method of usability analysis where a number of evaluators are presented with an interface design and asked to comment on it” (Nielsen & Molich, 1990). In this paper, they presented nine usability heuristics:
- Simple and natural dialog
- Speak the user’s language
- Minimize user memory load
- Be consistent
- Provide feedback
- Provide clearly marked exits
- Provide shortcuts
- Good error messages
- Prevent errors
This list, and later versions (for example, Nielsen, 1994; Nielsen, Bush, Dayton, Mond, Muller, & Root, 1992), are commonly used by many practitioners as the basic heuristics for product evaluation. However, there are other published lists of heuristics available, including Shneiderman’s eight golden rules of interface design (Shneiderman, 1998), Gerhardt-Powals research-based guidelines (Gerhardt-Powals, 1996) and Kamper’s lead, follow, and get out of the way principles and heuristics (Kamper, 2002).
Heuristic evaluation is not limited to one of the published lists of heuristics. The list of heuristics can be as long as the evaluators deem appropriate for the task at hand. For example, you can develop a specialized list of heuristics for specific audiences, like senior citizens, children, or disabled users, based on a review of the literature.
Procedure
- Decide which aspects of a product and what tasks you want to review. For most products, you cannot review the entire user interface so you need to consider what type of coverage will provide the most value.
- Decide which heuristics will be used.
- Select a team of three to five evaluators (you can have more, but the time to aggregate and interpret the results will increase substantially) and give them some basic training on the principles and process.
- Create a list of representative tasks for the application or component you are evaluating. You might also describe the primary and secondary users of your product if the team is not familiar with the users.
- Ask each evaluator to perform the representative tasks individually and list where the product violates one or more heuristics, After the evaluators work through the tasks, they are asked to review any other user interface objects that were not involved directly in the tasks and note violations of heuristics. You may also ask evaluators to rate how serious the violations would be from the users’ perspective.
- Compile the individual evaluations and ratings of seriousness.
- Categorize and report the findings so they can be presented effectively to the product team.
Participants and Other Stakeholders
The basic heuristic inspection does not involve users of the product under consideration. As originally proposed by Nielsen and Molich (1990), the heuristic review method was intended for use by people with no formal training or expertise in usability. However, Nielsen (1992) and Desurvire, Kondziela, and Atwood (1992) found that usability experts would find more issues than non experts. For some products a combination of usability practitioners and domain experts would be recommended.
The stakeholders are those who will benefit from the cost savings that may be realized from using a “discount” (i.e. low cost) usability methods. These stakeholder may include the ownership and management of the company producing the product and the users who will purchase the product.
Materials Needed
- A list of heuristics with a brief description of each heuristic.
- A list of tasks and/or the components of the product that you want inspected (for example, for a major Web site, you might designated 10 tasks, plus 10 pages that you want reviewed).
- Access to the specification, screen shots, prototypes, or working product.
- A standard form for recording violations of the heuristics.
Who Can Facilitate
Heuristic evaluations are generally organized by a usability practitioner who introduces the method and the principles, though with some training, other members of a product could facilitate.
Common Problems
- Insufficient resources (too few evaluators) are committed to the evaluation. As a result, major usability issues may be overlooked.
- Evaluators do not fully understand the heuristics.
- Evaluators may report problems at different levels of granularity (for example, “The error messages are bad” versus “Error message 27 does not state how to resolve this problem”).
- Some organizations find heuristic evaluation such a popular method that they are reluctant to use other methods like usability testing or participatory design.
Data Analysis Approach
The data are collected in a list of usability problems and issues. Analysis can include assignment of severity codes and recommendations for resolving the usability issues. The problems should be organized in a way that is efficient for the people who will be fixing the problems.
Next Steps
Discuss the usability issues with the product team. Track what problems are fixed, deferred, and viewed as “not a problem” by the product team.
Read More About It
Originators/Popularizers
In discussing heuristic evaluation, many books written for usability practitioners reference work by Nielsen or Nielsen and Molich. (Preece, Rogers, and Sharp, 2002; Dumas and Redish, 1999; Mayhew, 1999).
While the Nielsen and Molich heuristics (or one of the modified versions developed by Nielsen with other colleagues) are commonly thought of as THE heuristics, design guidelines had been proposed long before 1990. Cheriton (1976) proposed a list of guidelines for the design of interfaces for time-sharing systems. GOMS, proposed by Card, Moran, and Newell (1983), models the user’s cognitive processes, using operator times that are derived from the results of human performance research. Norman (1983a, 1983b) proposed some basic design rules for interface design. Ben Shneiderman first presented his “Eight Golden Rules of Interface Design” in the 1987 edition of his book Designing the User Interface (Shneiderman, 1987).
Nielsen and Molich themselves place their usability heuristics within a larger context. “Ideally people would conduct such evaluations according to certain rules, such as those listed in typical guidelines documents (for example, Smith and Mosier, 1986)…most people probably perform heuristic evaluation on the basis of their own intuition and common sense” (Nielsen and Molich, 1990). Nielsen and Molich cut “the complexity of the rule base by two orders of magnitude [from a typical guidelines document] by relying on a small set of heuristics such as the nine basic usability principles” (Molich and Nielsen, 1990). This description disagrees with the description provided by Molich and Dumas, in which heuristic evaluation is described as a subset of expert reviews (Molich and Dumas, 2005). In any event, while there may not be a clear linear relationship between heuristic evaluation, guidelines documents, and expert reviews, they are certainly related.
Authoritative References
Nielsen, J. (1989). Usability engineering at a discount. In G. Salvendy & M.J. Smith (Eds.), Designing and using human-computer interfaces and knowledge based systems (pp 394-401). Amsterdam, The Netherlands: Elsevier Science Publishers, B.V.
Molich, R. & Nielsen, J. (1990). Improving a human computer dialogue. Communications of the ACM. 33(3). 338-348.
Nielsen, J. & Molich, R. (1990). Heuristic evaluation of user interfaces. Proceedings of the SIGCHI conference on human factors in computing systems: Empowering people. Seattle, WA, USA. April, 1990. 249-256.
Nielsen, J. (1992). Finding usability problems through heuristic evaluation. Proceedings of the SIGCHI conference on human factors in computing systems. Monterey, CA, USA, 1992. 373-380.
Published Studies
Cockton, G., Lavery, D., & Woolrych, A. (2003). Inspection-based methods. In J.A. Jacko & A. Sears (Eds.), The human-computer interaction handbook (pp. 1118-1138). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
Molich, R. & Dumas, J.S. (2005). A comparison of Usability testing and expert reviews. In preparation.
Related Subjects
- Scenarios and Fast Iteration: A discount usability method, used to create prototypes that reduce the level of functionality and the number of features by designing the prototype to work only when the user follows a previously planned script (Nielsen, 1989).
- Simplified Thinking Aloud: A discount usability method used to conduct small scale usability tests. Nielsen recommends three participants. According to his research, three participants will turn up about half of the usability problems (Nielsen, 1989). Another difference between simplified thinking aloud studies and traditional usability studies is that data analysis is based solely on the facilitator’s notes (Nielsen, 1989). Because there is no need for recording equipment (and, therefore, no time spent reviewing session recordings) the simplified approach is less expensive than traditional studies (Nielsen, 1989).
- Participatory Heuristic Evaluation: This method extends the heuristic inspection by adding several new heuristics and actual users to the evaluation team (Muller, Matheson, Page, & Gallup, 1995).
- Cognitive Walkthrough: Similar to heuristic evaluation in that usability is evaluated without feedback from representative users. The cognitive walkthrough was proposed by Lewis, Polson, Wharton, and Rieman (1990) as a response to the difficulty encountered by usability practitioners in applying cognitive models to user interface design. The method is based on the model of exploratory learning developed by Polson and Lewis, which assumes that the user makes selections based on the expectation that the action will help the user achieve his/her current goal (Lewis et al, 1990). The person responsible for creating one aspect of a system presents the proposed design to a group of peers (for example, a software engineer presenting to other software engineers), who evaluate the design by following one or more specific user tasks (Wharton, Rieman, Lewis, and Polson, 1994). The designer uses a list of questions that replicates the user’s thought process as the user tries to achieve a goal (Lewis, et al, 1990). The presentation includes the interface’s design specifications, a task scenario, descriptions of the anticipated users, a description of the conditions under which the software will be used, and task-based procedures (Wharton, et al, 1994). There are several versions of the cognitive walkthrough method with successive versions focused on making the method more usable for practitioners who are not well-versed in cognitive science.
- Summative Usability Testing: An evaluation method that produces direct feedback from representative users. The user’s performance and behavior is observed and recorded in a controlled environment -- the usability laboratory (Preece, Rogers, & Sharp, 2002). The users perform tasks that reflect the way they would use the product in a “real life” situation (Preece, Rogers, & Sharp, 2002). As a rule, usability tests measure performance in terms of success rates, number of errors and time to complete specific tasks (Preece, Rogers, & Sharp, 2002). Some summative tests compare the results with a usability specification that details what is considered acceptable usability for a product. For an overview of usability testing, refer to A Practical Guide to Usability Testing (Dumas & Redish, 1993).
- Guidelines Review: Similar to heuristic evaluation in that usability is evaluated without feedback from representative users. An evaluation method in which the evaluator checks the system for conformance with a list of standing guidelines (Shneiderman, 1998). [Nielsen and Molich (1990) refer to guideline documents in their explanation of the heuristic evaluation method.] A complete list of guidelines can include hundreds or thousands of items, and may cover the following: (1) words and icons, (2) screen-layout issues, (3) input and output devices, (4) action sequences, (5) training (Shneiderman, 1998). For an example of a guidelines document, refer to http://www.hcibib.org/sam/index.html (Smith and Mosier, 1986).
- GOMS Analysis: Similar to heuristic evaluation in that usability is evaluated without feedback from representative users. GOMS analysis is based on the prevailing theory in cognitive psychology, which views the human brain as an information processor with limited capacity (Card, Moran, and Newell, 1983). GOMS is one of the few widely known theory-based methods available for evaluating the usability of computer interfaces (John and Kieras, 1996). The components of the GOMS model, a set of Goals, a set of Operators the user must execute, a set of Methods (procedures) for accomplishing a goal, and a set of Selection rules for selecting among the available methods, describe the cognitive processing that takes place as the user completes a task (Card, Moran, and Newell, 1983). Time estimates are then made at the Operator level (Card, Moran, & Newell, 1983). Analysis is based on comparison of total operator times aggregated at the Method, Selection rule, and Goal level (John, 1996). For an example of operators and times for the GOMS Keystroke Level Model, refer to Kieras (1993).


