Wizard of Oz

User-based evaluation of unimplemented technology where, generally unknown to the user, a human or team is simulating some or all the responses of the system.

The technique has often been used to explore design and usability with speech systems, natural language applications, command languages, imaging systems, and pervasive computing applications.

The originator, J.F. Kelley explains: "The term Wizard of Oz (originally Oz Paradigm) has come into common usage in the fields of Experimental Psychology, Human Factors, Ergonomics and Usability Engineering to describe a testing or iterative design methodology wherein an experimenter (the "Wizard"), in a laboratory setting, simulates the behavior of a theoretical intelligent computer application (often by going into another room and intercepting all communications between participant and system). Sometimes this is done with the participant's a-priori knowledge and sometimes it is a low-level deceit employed to manage the participant's expectations and encourage natural behaviors (though always, I would hope, with appropriate disclosure during the debriefing part of the experiments)."

Related Links
Detailed description
How To
Special Considerations

Detailed description

Benefits, Advantages and Disadvantages

Benefits

The Wizard of Oz technique can provide valuable information on which to base future designs. It can be used to:

Gather actual human responses, about the non-existent interaction
Test the interaction of a device before building a functional (and possibly expensive) model
- test which input techniques and sensing mechanisms best represent the interaction (so that subsequent effort developing or adapting sensing technologies is appropriately directed)
- test design of feedback through output technologies such as speech synthesis
- test heuristic algorithm to determine how to produce outputs for ambiguous human inputs
Find out the kinds of problems people will have with the devices and techniques
Investigate aspects of the products form such as:
- visual affordance (whether the product shows how it can be used)
- linguistic affordance (which words should be used in prompts to be understood in given contexts)
- para-linguistic affordance (which feedback can be understood in which meaningful way; e.g. blinking LEDs as confused facial expression)

Advantages

You can test future technologies without building an expensive prototype, or can "fill in" functionality that is not yet ready for a prototype.
Rapid iterations, particularly minor changes in wording or call flow, are immediately testable.
Allows the system to be evaluated at an early stage in the design process.
Provides a unique insight into the user's actions, gained from 'interacting' with the user during the evaluation.
Colleagues who play the "wizard" can learn about how users interact with computer systems. [But see Disadvantages below.]

Disadvantages

Wizard simulations require significant training so the wizard can respond in a way that is credible.
- Involving and training a Wizard is an additional resource cost.
It is difficult for wizards to provide consistent responses across sessions.
- Thus, proper program code, or 'behavior instruction' should be prepared and given to the wizard.
- This 'Behavior instruction' should not describe every single reactions, but try to control predictable and typical situation, and guide the session to answer the target questions
If a research team member plays the role of Wizard, there is a risk that they will improvise beyond the programmed behavior.
- To avoid this risk, hire someone who can be instructed [programmed] with simple rules and play as a wizard.
Computers respond differently than humans so the wizard needs to match how a computer might respond (for example, the Wizard should not make typing errors).
Playing the wizard can be exhausting, meaning the wizard's reaction may change over time, mainly due to cognitive fatigue.
It is difficult to evaluate systems with a large graphical interface element.
This approach does not uncover errors that arise as a result of system performance and recognition rates (unless these are specifically simulated), so it is more effective in revealing problems than predicting real world usability.

Cost-Effectiveness

Wizard of Oz testing is a highly cost-effective way to compare multiple designs.

Appropriate Uses

This technique can be used to test device concepts and techniques and suggested functionality before it is implemented. For example, this technique can be used to simulate a caller-system interaction. The user experience is similar to interacting with a functioning interactive voice response (IVR) system.

The Wizard of Oz technique can provide valuable information on which to base future designs. It can:

Gather information about the nature of the interaction
Test which input techniques and sensing mechanisms best represent the interaction (so that subsequent effort developing or adapting sensing technologies is appropriately directed)
Test the interaction of a device before building a functional model
Find out the kinds of problems people will have with the devices and techniques
Investigate aspects of the products form such as visual affordance (whether the product shows how it can be used)

How To

Procedure

The wizard sits in a back room, observes the user's actions, and simulates the system's responses in real-time. For input device testing the wizard will typically watch live video feeds from cameras trained on the participant's hand(s), and simulate the effects of the observed manipulations. Often users are unaware (until after the experiment) that the system was not real.

The wizard has to be able to quickly and accurately discern the user's input, which is easiest for simple for voice input or hand movements. The output must also be sufficiently simple that the "wizard" can simulate or create it in real time.

The basic wizard of Oz procedure involves the following steps:

Develop a simulated user interface for the target technology.
Develop a detailed test plan with the instructions for the facilitator, wizard, participants and other staff. Determine if you need to set any expectations about the simulation's "performance" so participants are prepared for sub-par performance.
Recruit users who meet the appropriate user profile, try to cover the range of users within the target population.
Prepare realistic task scenarios for the evaluation.
Develop a procedure where the wizard can respond to input from a participant.
Train the wizard.
Design the instructions for the study so that the participant knows that they are working with an early prototype and that performance is not "optimized" yet.
Conduct pilot tests to refine the procedure and give the wizard some practice. Make any changes to the procedures and test plan.
Ensure recording facilities are available and functioning.
Conduct each session. The facilitator instructs the user to work through the allocated tasks interacting and responding to the system as appropriate.
Conduct a debriefing of the participants. Obtain feedback on the "performance of the wizard system". Tell the users about the wizard and explain why you couldn't tell them earlier.
Collate, analyze, and summarize the data from the study. Consider the themes and severity of the problems identified.
Summarise design implications and recommendations for improvements and feed back to design team. Video recordings can support this.
Where necessary refine the prototype and repeat the above process.

Special Considerations

Ethical and Legal Considerations

The Wizard of Oz method can involve a low level of deception - the participants are lead to believe that they are using a working system rather than a simulation controlled by an expert, the wizard. According to ASA Code of Ethics: "When deception is an integral feature of the design and conduct of research, (researchers) attempt to correct any misconception that research participants may have no later than at the conclusion of the research."

This concept of ethics on experiment is not universal in the field of usability testing, but this should be take more seriously, especially the performed usability testing could induce the participant to have wrongful expectation about the technological status.

Facts

Lifecycle: Design

Search

Topics section

How You Can Help

Sponsors