Usability Testing

Usability testing involves observing users while they perform tasks with a hardware or software system.

The product may be a paper sketch, a wireframe, a storyboard, a display mock-up, a product in development, a working prototype, or a completed product. Usability testing can also be conducted on competitive products to understand their strengths and weaknesses.

A usability test can be a formative evaluation<, which is conducted early in the design process to find problems improve the product, or summative evaluation<, conducted to validate the design against specific goals.

Testing involves recruiting targeted users as test participants and asking those users to complete a set of tasks. A test facilitator conducts the testing via a test protocol while the test sessions are typically recorded either by a video operator and/or an automated testing tool.

Usability testing should be conducted with participants who are representative of the real or potential users of the system. For some tests, users must have certain domain, product and application-specific knowledge and experience.

Usability testing consists of five primary phases:

Planning
Pretest or pilot
Test sessions
Post-test or debrief
Analysis, interpretation, and presentation of the results.

These phases are described in the procedure section below.

Detailed description

Benefits, Advantages and Disadvantages

Advantages

You can get feedback that reveals possible design flaws and other issues.
You can get reliable measures of usability (see summative usability testing).
Experienced test facilitators can elicit feedback from users to help understand why they had problems.
Low and medium-fidelity prototypes are cost-effective to test.
It is easy to have project manager and developers as observers.
You can produce video clips from test sessions to show problems.

Disadvantages

Not all problems will be found with small samples of users.
You may not have access to users that match the user profile.
Not all tasks may be "right" for all users.
Lab testing takes users away from their natural work environment.
Technical setup may be complex and require domain experts and additional time for setup and debugging.
An inexperienced facilitator can influence the results by using too many hints, asking biased questions, or providing nonverbal cues about the tasks.

Appropriate Uses

Major usability problems are identified that may not be revealed by less formal testing, including problems related to the specific skills and expectations of the users.
Measures can be obtained for the users' effectiveness, efficiency and satisfaction.

How To

Planning

Write a usability test plan to define the goals, users, tasks, procedures, test setup, data collection and reporting requirements.
Select the most important tasks and user groups to be tested. Task can be chosen based on what features are available for testing, frequency of use, criticality, and other factors.
Recruit users who are representative of each user group. The number of users will depend on on your goals (finding problems versus comparing performance to benchmarks), the impact of the product on users, and other factors. For formative testing, there is much debate about how many participants should be in a usability test with numbers ranging from 5 to more than 50.
Produce task scenarios and input data and write instructions for the user (tell the user what to achieve, not how to do it). You can also create basic tasks which you customize for each participant, or allow participants to select their own tasks.
Plan the test sessions allowing time for giving instructions, running the test, answering a questionnaire, and post-test interview. Test sessions longer than about 2 hours will require dedicated participants.
Invite members of the product team to observe the sessions if possible. An alternative is to videotape the sessions, and show edited clips of the usability problems to team members who could not attend the sessions.
For formative evaluation, the facilitator will normally be with the user to prompt and question when necessary. For summative evaluation, the facilitator is generally observing from another room so as not to interfere with the participant's work.
Prepare additional written test materials including informed consent (for participation, for recording), pre- and post-test questionnaires, and any observer data recording sheets.

Pretest or Pilot

Conduct pilot tests with internal users to debug instructions and tasks, verify that the hardware and software are working, and determine if there is adequate time for the session.
Resolve any technical or logistical problems with the test plan and setup. Fix any problems with written test materials.
Finalize the schedule and send it to all the observers.

Running sessions

Welcome the user, sign informed consent form(s), and the nondisclosure agreements (NDAs) if needed, and fill out pretest questionnaire which can be used to verify screening information and gather additional background information.

Let the user know about any test observers.
Give the task instructions let user know how their questions will be handled.
Observe the user working through the tasks. Do not give hints or assistance unless necessary.
Time each task, if this is part of the session protocol.
At the end of the session, ask the user to complete a satisfaction questionnaire.
Interview the user to confirm they are representative of the intended user group, to gain general opinions, and to ask about specific problems encountered, if this is part of the session protocol.
Assess the results of the task for accuracy and completeness, if this is part of the session protocol.

Post-test or Debrief

Ask the user if they would like to meet the observers and ask questions.
After user leaves, the test team including observers discuss what was observed.

Variations

Summative usability testing is used to obtain measures to establish a Usability benchmark or to compare results with usability requirements.

Group Usability Testing (Journal of Usability Studies, Volume 2, Issue 3, May 2007, pp. 133-144). Several to many participants individually, but simultaneously, performing tasks, with one to several testers observing and interacting with participants.

Remote evaluation may be set up with a portable test lab. This setup enables more users to be tested in their natural work environment. It also means testing can be done at user conferences, customer sites as well as part of Beta test programs.

Variations may also include:

Varying the order tasks are presented to users.
Testing only one user to base design decisions on (RITE method).
Allowing users to self-report.

Participants and Other Stakeholders

The people primarily involved in usability testing include:

Test facilitator who conducts the pilot and test sessions.
Test participants
Test observers
Test monitor who operates recording equipment and may also take notes

Materials Needed

For some testing there will not be any technical requirements, just written test materials. In general, the materials needed to run a usability test include:

The system (paper sketch, model, display mockup, software, website)
Physical or portable test lab (camera setup, observation room)
Written test materials (informed consent, questionnaires, task scenarios, observation data sheets)
Technical setup (servers, "live" or simulated test data)
Connections for remote observers

Who Can Facilitate

An experienced test facilitator is someone who is:

Knowledgeable about the system and the tasks being tested.
Knows how to avoid giving unnecessary hints.
Able to develop rapport with all kinds of people.
Flexible and organized.
An active listener.

Common Problems

Test participants do not truly match the user profile in test plan.
Insufficient number of participants to draw conclusions from.
Incidence of Hawthorne Effect in test participants (see below).
Unsure how to handle "outliers" or problems noted by only one user.
Hints given that interfere with users completing tasks on their own.
Glitches in test setup (e.g. server goes down, missing simulated data).
Problems with recording equipment.

Opinions vary on the number of participants that should be recruited for a usability test, from a few as 1 to as many as 15. It is better to perform multiple usability tests with fewer users each time rather than a single test late in the development lifecycle.

The Hawthorne effect refers to a danger that participants in any human-centered study may exhibit atypically high levels of performance simply because they are aware that they are being studied.

Usability studies and the Hawthorne Effect Journal of Usability Studies, Volume 2, Issue 3, May 2007, pp. 145-154

The Hawthorne effect can be (mis)used as a basis on which to criticize the validity of human-centered studies, including usability studies. Therefore, it is important that practitioners are able to defend themselves against such criticism. A wide variety of defenses are possible; depending on which interpretation of the Hawthorne effect is adopted. To make an informed decision as to which interpretation to adopt, practitioners should be aware of the whole story regarding this effect.

Data Analysis and Reporting

Produce a list of usability problems, categorized by importance, and an overview of the types of problems encountered.
Arrange a meeting with the project manager and developer to discuss whether and how each problem can be fixed.
If measures have been taken, summarize the results of the satisfaction questionnaire, task time and effectiveness (accuracy and completeness) measures.

The type of objective and subjective data collected during testing may include:

Ability and time to complete a task.
Sequence and number of steps to complete a task.
Types and numbers of errors.
Number of repeated errors.
Number of design issues.
Ratings of ease of performing a task.

Categories of usability problems include:

Preventing users from performing tasks (ìdead ends, lack of functionality).
Slowing users down (lack of feedback, items not in expected places, terminology not understood).
Increasing userís workload (recall required from multiple screens, typing rather than selecting).
Inconsistencies (use of color, layout of information).
Insufficient error handling (hard to correct errors, missing undo function, cryptic error messages).

Severity rankings can also be assigned to each problem. These rankings can be determined by how frequent a problem occurred, the impact of the problem and the persistence of the problem.

If a full report is required, the Common Industry Format provides a good structure. There is a detailed example of a usability report using the Common Industry Format.

Next Steps

Collect feedback from users after release to inform any redesign.
Determine need to test with more users.
Determine what design issues cut across related product lines.

Special Considerations

Costs and Scalability

People and Equipment

The costs for usability tests vary depending upon what type of prototype is being tested. For traditional lab testing the costs include:

The recruiting cost per participant.
The cost of payment or incentives for test participation.
Travel costs if conducting testing in multiple sites.
Equipment costs for a portable lab. This is a one-time cost however.
The cost of any full transcription of test sessions.

Time

The time involved to meet with the project team, develop the test plan, run pilot tests vary.
The time to run the test sessions varies depending upon nature and scope of the test plan.

Ethical and Legal Considerations

Informed consent forms are needed for participation and recording.

Facts

Lifecycle: Evaluation

Sources and contributors:

Cathy Herzon, Eva Kaniasty, Karen Shor, Nigel Bevan (incorporating material from UsabilityNet)

Released: 2011-05

Printer-friendly version

Search

Topics section

How You Can Help

Sponsors