SIGCHI Conference Paper Format

Viewer
Transcript

How we make sense of instructions David Kirsh UCSD Dept of Cognitive Science La Jolla, CA 92093-0515 [email protected] ABSTRACT

attend, interpret and probe the abundance of data present in their environments and tie it to the instructions they are reading?

Instructions such as manuals, recipes, game rules etc. pose a special challenge for sense makers because the reader must situate those instructions in a specific context of activity (debugging, cooking, game playing). We studied instructions for making origami and noted that instructions contained a variety of pointers that were not about folding but rather about such things as orienting the paper to ease folding. The instructions themselves, moreover, represented the actions to be performed – typically folding – in diverse ways. Finally we noted that subjects invariably perform a variety of non-folding actions that helped them understand instructions rather than advancing them in pragmatics of folding. These include making gestures, asking for clarification, and registering paper and instruction. We also explored the utility of a general design principle for creating betterdesigned instructions and better-designed interactive artifacts. This principle states that the right information or cues should be in the right form, the right place, presented at the right time and at the right pace.

In this paper I will discuss some basic concepts related to situated understanding as these bear on a study we undertook to explore how people make sense of instructions for making origami. Our original motive in studying origami (and some other areas such as recipe use in cooking) was to look for principles that could guide elearning of practical skills and tasks. The promise of elearning is that one day it will be possible to deliver instruction in situ, where people work and play, both customized to the specifics of the context of activity and the idiosyncrasies and expertise of the learner. This is not about to happen until we understand a few of the fundamentals of how people make sense of what they are told, what they read, or see. Good elearning should reduce the sense making problem students encounter. It should be easier to make sense of the demands of task and procedure. This is not the same problem facing analysts who confront a huge onslaught of information. But it is an ecologically natural problem we tackle every time we are told to make sense of a manual or instruction package in order to get something done. Sometimes this means searching and reviewing elements in a corpus (manual), but more often it means interpreting what a statement or illustration means in an embodied, at hand manner.

INTRODUCTION

Following instructions, whether illustrated, spoken or textual, is a learned activity that requires situating representations in a context of activity. Manuals, recipes, how to’s, instructions for filling in forms, for playing games, assembling furniture and equipment, are familiar examples of information rich resources. Making sense of such resources is frequently non trivial and involves a deep and interactive process of interpretation that in many respects showcases processes of sense making that apply outside the instruction domain.

A second and related motive in these studies was to explore how instructions can be designed to be more effective. An easy mnemonic of good design is that for a given activity A an artifact or environment E is welldesigned if E supplies the:

Failure to make sense of instructions is closely allied with failure in usability. A well-designed VCR, so the story goes, should be easy for a user to program without instructions or at least with only minimal instruction. Perhaps so: guidance by cue instead of sentence is often easier – discursive cognition is not involved. But whether guidance comes in the form of language or nonlinguistically via cues, prompts and contextual feedback the question remains: how do people make sense of what they are being directed to do. How do they selectively

1

•

right information (cues, constraints, affordances, objects);

•

in the right form;

•

at the right time;

•

in the right place;

•

at the right pace.

Whether for learning, or just getting things done, we want environments that provide us with the information we need, where, when, and in the right form and at the right pace to help us reach our goals. Curiously, this is essentially the inverse of the sense making problem. Here is an analogy to show why design and sense making are interrelated. Imagine someone who wants to build a laser-based display that paints realistic scenes directly on the retina to trick someone into thinking they are seeing a live scene. When well designed, each time a user’s eyes move, whether by saccade or voluntary action, the part of the scene that would ordinarily be observed is projected, making the two, fake and real scenes indistinguishable. Could such a device be designed without knowing how the visual system probes and then actively constructs scenes? I think not. The designers have to know too much about what the eye-brain will accept as natural. For instance, what aspects of a scene must be simultaneously present for peripheral vision? When there is rapid movement the periphery plays a more important role. Accordingly, the retinal display problem is the inverse of the visual scene problem: establish a coherent scene from well chosen but brief fragments. Solve one and you are close to solving the other because the same questions arise. The same logic applies to instructions: if we discover how people make sense of procedural instructions, we should be able to design instructions more effectively, making it easier for them to interpret instructions in terms of their current situation. Before presenting our findings on origami let me press this point further by showing how an elaboration of the key design principle tells us something about sense making. ELEMENTS OF THE KEY DESIGN PRINCIPLE When is information in the right form? Designers want to provide information that is relevant, easy to understand and quick to apply. Sense makers want to determine what information is relevant, what it means and how it can be applied to specific concerns. From a theoretical point of view, a representation R is easier to understand if the information it encodes is more readily extractable given the knowledge and computational methods available to a user. The information is more explicitly encoded. The numeral ‘5’ is a more explicit encoding of the number 5 4

than the symbolic representation ‘ 625 ’ because it is computationally closer. A typical user needs to compute less to recover 5 – the one is a pure associative retrieval of the meaning of ‘5’, the other requires transformation of 4

the symbol ‘ 625 ’ until an associative retrieval can be made. The same applies to language illustrations and other representations. For instance, the sentence ‘That is something up with which I will not put’ is a less explicit encoding of ‘I won’t put up with that’. See [3,4]. By

definition, users find it easier to make sense of information that is encoded more explicitly. This does not mean they can derive long distance implications easier, but they can readily extract the core or local meaning which can then be used to facilitate long distance inferences. This applies to visualizations too. The better a visualization is, the more explicitly it encodes the information that users want. The salient elements are on the surface, readily grasped. If R is meant to convey some content C – the sense to be extracted – R should be designed to explicitly encode C. To determine how readily subjects can extract C from R it is necessary to know the cognitive processes they use to interpret R. This concern with finding the most efficient or explicit form information can be presented in (given a chosen medium) is one aspect of the sense-making problem. Any progress on this problem would set the groundwork for sense making aids that would help sense makers by presenting information in ways that facilitate their interpretation. When is information presented at the right time?

Timing of information is important because cognition unfolds in time and so its trajectory is affected by the arrival time of new information. It is well known that a learning engine will discover a target function faster if it is fed a diet of examples in the ‘right’ order. [1]. Feedback can be used to falsify or correct hypotheses. This idea was presaged by Vygotsky in his discussion of the zone of proximal learning. He observed that learning and understanding proceeds fastest when new ideas, challenges to previous views, etc, come in the right order. A student of math learns fastest when new problems are just within reach; if they are too hard they are unattainable. Sense making aids for open information spaces need heuristics to control when new information is to be revealed and when it is to be concealed. They can be further enhanced with heuristics that periodically test for confirmation biases, rigid framing, and other decision biases. When is information presented at the right pace? Pace

matters because information will be wasted or misused if its speed or volume is too great. A presentation that exceeds the speed limit, or with too much content, will fail to be fully understood; music that is played too quickly will lose its intended musical meaning. Acceptable pace, however, varies with topic and user, since expertise alters perceived bandwidth – as the familiar accounts of chess perception have shown. One person’s 100 bit structure is another’s 5 bit structure. Effective sense making involves prudent scheduling based on an analysis of how hard information elements are to understand and integrate. Since there are both objective and personal factors driving comprehension and

integration rates, heuristics for pace will be hard to set automatically.

instruction sets we found a split: some instructions defined actions by showing origami shapes before the action and after it, while others explicitly represented the action to be performed. See figure 1a, an instruction for a simple valley fold using before and after images. Here the structures are explicit and the action implicit. In figure 1c, the actions are explicitly marked in a before image showing arrows where the folding action is to be performed.

When is information presented at the right place?

Place matters because information is easier to use if it is ‘at hand’. Labels for buttons should be on or close to the buttons they label. Instructions for use should be on or close to where they will be acted on. There is a cost to looking away from one’s current visual focus to retrieve the next piece of data. The search process is a distraction. Who wants to look in a manual to find how to modify something? Some of this cost can be corrected for by reducing the time or cognitive effort associated with search. Nonetheless, a well-designed environment will place information where it will be of most use. In sense making, subjects need to be mindful of how they arrange the display of resources. This, in itself, is a topic of inquiry. But whether for design purposes or to facilitate sense making, information should be where we want it, when we want, in the form we want it and at the pace we want it.

The second issue concerns the range of actions referred to. It would be natural, once again, to suppose that instructions would identify task-advancing actions. In fact we found a variety of actions unconnected to folding. Sometimes the instructions advised the user to flip the paper over, blow into it or pull on it. Because these are non-folding (or unfolding) actions, they do not designate necessary steps in building the structure. But such actions may make it easier to fold or easier to write instructions for folding. An extreme case of instructions that designate actions not directly connected with advancing the task, is found in most IKEA instruction packets for assembling furniture. Inevitably, such instructions begin by asking the user to spread out all the parts, then to lay them out on the ground in a manner that encodes the assembly plan. This intelligent use of space [6] prepares both agent and environment. It reduces the need to consult the instructions because when well done the next step can be inferred from the layout of the parts. See figure 1. It is a form of preparation or information structuring of the environment.

Now to the origami study. MAKING SENSE OF ORIGAMI INSTRUCTIONS

The core insight – and a bias coming into the study – is that people do many more things when they perform a task than just task-advancing actions narrowly construed. [5] When following instructions we predicted and observed a variety of non task-advancing actions, many of which were concerned with making sense of the instructions. If someone is given directions in a mall, for instance, they need to tie the directions to landmarks or cues. ‘Turn right at Macy’s, go past the cell phone kiosk, and when you see the William’s and Sonoma store look for a stairwell and go down one flight.’ They anchor the directions to elements in the world. [2]. The concreteness of directions tends to make them more memorable and easier to follow than more abstract or general instructions – ‘turn right 50 yds from here, go more or less straight for 200 yds then take the first stairs you see, down to level B’. The tying of instructions to attributes, structures or objects in the environment is part of the process of situating instructions. It is invariably non-trivial.

Figure 1. Laying out boards in this pattern is the first step in the instructions in assembling an IKEA desk. The parts are encoded in a 2D representation that allows the user to infer the 3D structure by a simple transformation. The topology is preserved. Laying out in this way is a form of preparation or information structuring

The problem of anchoring instructions is usually easier than the general sense-making problem because the instruction set is circumscribed (it is a closed set) and the domain it maps to is known or mostly known. There are, however, aspects of the problem that defy this simple view and lead to conclusions that are more general.

The third and perhaps most interesting issue concerns what people do when trying to make sense of instructions. This is part of the activity of origami making, yet rarely discussed. The function of many of these actions is to help users interpret instructions. In our 20 hours of origami observations we noted a variety of registration actions, gestures, partial try-outs, shrugging, muttering, asking advice or clarification, looking up terms, and more. At first, these might be thought to be

The first issue is about the domain elements explicitly referred to in the instruction set. In origami, is it the structure or the activity that is referenced? It would be natural to suppose that instructions are about actions: do this, do that. But in fact in surveying dozens of origami 3

epiphenomena, not actually part of the origami activity – a view partly justified by our observation that better players perform fewer of these actions than novices. But when better players are challenged or given complex instructions these ‘superfluous’ actions recur, suggesting that they are part of the interpretation process that are omitted once practice leads to chunking and more automatic behavior.

a.

Implicit Fold

b. Explicit non-fold

Though each of these actions deserves analysis – especially if we hope to develop principles for personalizing instructions – I close with discussion of a single type: registration.

c. Explicit fold d. + shape word

Intermediate stages

Figure 2. We see here a few illustrations that show some of the different ways instructions are given for Origami. 1a illustrates a fold by showing before and after structures, we call this an implicit representation of a fold because it does not directly refer to an action. 1b is an explicit non-fold, since it has an explicit symbol for flipping the paper overr. 1c is an explicit representation of a fold using a symbol for folding and language to describe the fold. Many explicit folds are shown without language. The final figure 1d is an implicit representation of a fold with intermediate stages shown. REGISTRATION

Registration refers to the process of aligning a representation of the world with the physical world. If the system using the representation is appropriately registered with the physical world then inferences based on the representation provide reliable directives for actions in the world. There is a surprising range of actions that people perform to a) bring themselves into alignment with their task environments, b) maintain their alignment, and c) test that they are correctly aligned. In the case of maps, people may start by tying a few symbolic features of a map – a building corner, a kiosk, street name or intersection – with landmarks they see. This is the initial process of alignment. But it is piecemeal. It must be repeated for other features. Accordingly, people will often use those easy correspondences as anchors and re-orient their map completely, thereby making it easier to maintain those correspondences, and easier to interpret new correspondences. To make sure the re-orientation is right – the testing phase – they typically check a few other symbolic features or shapes to see that the relational structure on the map mirrors the relational structure in the world. Pointing is another way of establishing correspondence, but it does not create the kind of holistic alignment that re-orientation does. This highlights why wholesale

registration is so powerful: it reduces the cost of translating between the outside world and the inside representation by bringing the two systems into alignment. The analogy is with two representational systems that can express the same truths. One can either show for any given statement S1i true in one system L1 there is an equivalent statement S2j true in L2, or one can prove of an axiomatization of L1 that there is an equivalent axiomatization in L2, thereby proving at once that anything true in L1 is also true in L2. Wholesale registration is this sort of global realignment. In maps, it is easy to achieve; in instructions, far less so. It is a core sense making process, of situating an abstract representation in context. CONCLUSION

I discussed two aspects of sense making: 1) design guidelines for making instructions and other media more readily understandable and usable, and 2) types of sense making activity we found in a controlled study of college students following origami instructions. It was argued that searching for principles that guide designers in their effort to create easily understood instruction sets is one route into the general problem of sense making: both designer and sense making theorist want to understand what makes one type of representation – whether text, video, illustration, or image – easier or harder to understand. Five factors were discussed: right (content or cues, form, timing, place, pace). Beyond these factors another element at play in sense making, at least of

3. Kirsh, D. When is Information Explicitly Represented? The Vancouver Studies in Cognitive Science. (1990) pp. 340-365. Re-issued Oxford University Press. 1992.

instruction sets, is how easy or hard it is for subjects to contextualize instructions in terms of their current activity. One might expect that subjects who successfully follow instructions would read them and then do them, step by step. But we found in our study of origami that subjects perform a range of comprehension specific actions that help them to make sense of instructions. In addition to following instructions they do things to help them situationalize them.

4. Kirsh, D. Implicit and Explicit Representation, Encyclopedia of Cognitive Science, 2003 5. Kirsh, D. Multi-tasking and Cost Structure: Implications for Design, In Proceedings of Twentyseventh Annual Conference of the Cognitive Science Society (eds.) Editors: Bruno G. Bara, L. Barsalou, & M. Bucciarelli Mahwah, N.J. : L. Erlbaum Associates, 2005.

REFERENCES

1. Elman J.L.. Finding structure in time. Cognitive Science, 14:179-211, 1990

6. Kirsh, D. The Intelligent Use of Space. Artificial Intelligence, Vol. 73, Number 1-2, pp. 31-68, (1995).

2. Fauconnier, G. & Turner, M. (2002). The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities. Basic Books

5