How Interaction Improves Sense Making David Kirsh Dept of Cognitive Science, UCSD [email protected] They interactively coordinate their projections – their simulation of what if’s – with the board as they find it. Why does the board help them project? How?

ABSTRACT

How, when and why do people interact with their environment when making sense of situations, diagrams, illustrations, and problems? To study this I looked at how people project structure onto geometric drawings, visual proofs, and games like tic tac toe. I also analyzed why people move scrabble tiles in word generation tasks. Projection is a special capacity, similar to perception, but freer. Projection requires external structure to anchor it, and it can be improved by providing additional structure. Much of our interactivity during sense making and problem solving involves a cycle of projecting then creating structure. This is highly sensitive to the differential costs of doing things in the head vs. in the world. I explored this cost factor through a study of interactivity in scrabble.

2.

The theme unifying these two questions is my real concern: how, when and why do people interact with their environment when making sense of situations and solving problems. I present a truncated account of what I believe is a key – perhaps the key interactive process in reasoning and sense making: the project-create-project cycle. I believe that this cycle lies at the heart of much sense making, especially problem oriented sense making. It lies, as well, at the heart of most planning and tangible reasoning. It is a fine grain analysis. In videographic studies of people understanding such things as illustrations, models and diagrams we found that subjects typically find ways of interacting with at-hand tools and resources – often in creative ways – to help them make sense of those targets. Sometimes these sense-making actions are as simple as gesturing or pointing with hand, body or instrument, muttering while looking, marking or note taking, or shifting the orientation of the target. Sometimes they involve talking with others. When tools are placed near subjects – manipulable things such as rulers, pencils, and physical parts of models – we found subjects regularly using those as ‘things to think with’. They use those resources to create or augment local structure to facilitate projection and mental experimentation. This is the heart of the project-createproject cycle: use what is perceived to help you do what you can in your head – namely, try to understand things by projecting possibilities or somehow augment what you see – then externalize part of that mentally projected structure so that you free up cognitive resources. This process of externalization simultaneously changes the stimulus to make it easier to project even deeper. If tools make it easier to externalize what you are thinking, then tools are used. This cycle of projecting, externalizing, then projecting again continues as long as subjects stay focused – though as with any exploratory or epistemic process a subject may soon loop, get stuck or run out of novel projections. Let me define the terms.

Author Keywords

Projection, interactivity, sense making, visual thinking. ACM Classification Keywords INTRODUCTION

I ask two questions that underpin the activity of making sense of situations, diagrams, illustrations, and problems: 1.

Why does moving scrabble tiles turn out to be a good strategy, typically yielding better performance in a word generation task than leaving tiles in place? Scrabble, like chess, is a search problem. But unlike chess, where only novices benefit from moving pieces, scrabble is a game where interactivity during the search process seems to help. Why?

Why does staring at a chess board or a tic tac toe board or a geometric proof typically yield better performance than looking at the board or proof and then closing one’s eyes and just thinking? When subjects consider possible moves in a chess game, one popular account is that they are searching a problem space; they are exploring a purely mental representation of the game’s current state, as well as the possible actions available and their consequences. This way of speaking leaves unexplained the relation between the physical board that is perceived and the mental process of searching an internal representation. The two might be uncoupled. Studies have shown that, in fact, masters rarely need the cognitive support provided by a physical board. They can do all the work in their heads. So a purely mental representation seems apt for them. But less expert players do benefit from a board’s presence.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2009, April 4–9, 2009, Boston, MA, USA. Copyright 2009 ACM 978-1-60558-246-7/08/04…$5.00

1

Projection is a way of ‘seeing’ something extra in the thing present. It is a way of augmenting the observed thing. In contrast to perception, which is concerned with seeing what is present, projection is concerned with seeing what is not present but might be. In figure1 two rather different illustrations are displayed. The first – a cartoon – requires subjects to interpret the symbolic meaning of the key elements. The image must be recast as a ‘keyframe’ in a narrative invented by the reader, in this case, a narrative of retirees watching helplessly as their pension money is lost forever to inflation. In figure 1b we see a geometric illustration that is meant to help a subject prove implicit truths about that structure: that the triangle is equilateral and that the line extending from A through D will bisect BC. Both illustrations involve projection of meaning, though the geometric problem also requires projection of constructions. I will be concerned in this paper with the second type of sense making – a form of visual projection onto existing structure that typically results in adding new structure to the target.

epistemically useful structure in the environment. Externalizations are everywhere: annotations, notes, constructions in geometry, gestures, utterances, encoding order in layout, etc. Often the action of externalizing alters the information or projection in useful ways. This is a key factor in thinking with things, in knowing what you are thinking by seeing what you are saying, and so on. Externalizations may leave persistent traces, as in annotations or rearrangements, or they may be present only during the externalization process, as when someone gestures or talks while thinking. Moreover although externalizations always serve an epistemic function they also may have pragmatic consequences too. A chess move is at once an externalization of an inner projection and a move in the game. The final thing to note is that there are other actions that change the environment in epistemically useful ways that are not externalizations: registration of maps, turning on the news channel, etc. These are actions that alter the epistemic landscape of activity but they do not bear the right relation to internal activity to qualify as externalization. WHY DOES EXTERNAL PERFORMANCE?

1a

1b.

Figure 1. Two illustrations: 1a is a narrative illustration requiring the viewer to make a sensible story out of the image. 1b is a geometric figure with constructions described in the givens of the problem. Subjects use the diagram to draw inferences and allowable augmentations (new constructions). These typically arise initially in he form of projections.

Projection is closely related to perception, and imagination. The three lie on a continuum of stimulus dependence. Perception is strongly dependent on the physical stimulus it is about. You cannot see what is not there. Projection is dependent on present stimuli but much less so. You can project structures or dynamic elements onto external things. But there has to be something there to project onto, something to anchor them. Imagination, by contrast, is completely stimulus independent. It is a flight of fancy unanchored to the structure or details of the current situation. Externalization is a way of taking information or mental structure generated by an agent and transforming it into

STRUCTURE

HELP

To answer the question why staring at a tic tac toe board usually facilitates performance we ran a few simple experiments, video’ed of course, to see how behavior and cognitive strategy differs when a board is present from when it is not. We used a 3 by 3 tic tac toe board first, then we scaled the game to a 4 by 4 board. Procedure: Moves were identified by calling out numbers 1 – 9, which corresponded to cells on the 3 by 3 board. Subjects were given an initial training period during which they mastered the translation of number to position on the board. Both the subject and the opponent identified moves by calling out the number corresponding to their chosen move.

Table 1. Performance was best when subjects were able to project onto a blank tic tac toe board rather than imagining or projecting onto a blank sheet of paper. A table with an X and O was better for some than the simple table condition, but not for most subjects. The three conditions are shown on the right. These are pilot data with an N of 4.

Results. As shown in Table 1 the time to make a move was shortest in the table condition, significantly shorter (p=.04)

than with a blank page. The time to move when a table with an X and O was in front of a subject was less significantly better than the no table condition (i.e. the blank page). The value of the X and O varied among subjects, some finding it useful at first, though most eventually found it a distraction. There were more memory errors – incorrect recollection of placements – in the no table condition, but not significantly so.

adding S to it, we have a weakly primed version of O as modified by S already in mind. If we had a strong intention to add S the perceptual outcome O+S would become more strongly primed. Projection is a way of increasing the level of that weakly primed possible world without having the intention to change the world yet. It lets us entertain what the world would be like if we did act to make it so. The 3 by 3 version of tic tac toe confirmed our conjecture that the table condition would help. But it was not true for people who are powerful imagers. They found the table to be a distraction; they did best with a blank page. But everyone has a visualization limit. To test whether there is threshold where the urge to use structure and also to externalize becomes overwhelming, even for good imagers, we ran a second experiment in which we scaled up tic tac toe to a 4 by 4 board. One might think that playing this game without marks on the board to hold state would be nearly impossible, but in fact, humans are ingenious at finding ways of overcoming internal state limitations. They invent methods of reducing the overall cost of performing a task, especially when the alternative is failure.

Why should a blank table help? It may seem obvious reason that a blank table would be an aid to memory since without a table an agent would have to remember the structure of the table as well as the values in cells. So having a table in perceptible form ought to reduce memory load. Unfortunately not all subjects found the table helpful and the explanation itself is not as straightforward as it seems. The number of states in a game is neither reduced nor increased by the lines of a board that defines the boundary of the cells. A 3 by 3 matrix is an abstract structure that is not made simpler by providing a visual representation of the matrix. Thus, the three different data structures: • • •

Procedure: cells are identified with numbers 1-16 and learned beforehand. The goal of this enlarged game of tic tac toe is to be the first to get four in a row.

((x, o, x), (x, o, x), (o, x, o )); (x, o, x, x, o, x, o, x, o); the visible X and O matrix

Results: Unlike the 3 by 3, all subjects in the 4 by 4 reported that external supports were helpful. But their empirical performance did not always confirm this claim. As shown in Table 2, overall performance on the table condition was significantly better than in others. But we observed considerable variance in table performance among some subjects. During the debriefing interview several reported that they used a different strategy for the table condition than the one they used for the blank page condition. When they had no table present they felt overwhelmed and invariably played defensively, using a strategy of blocking the opponent as quickly as possible. In the table condition, however, these strong visualizers initially believed they could retain enough state in mind to play offensively as well. This was overly optimistic and they usually made several wrong moves and lost. After a few such loses they shifted to a defensive strategy and the table condition soon became their best condition.

are all informationally equivalent. Why prefer the visible one? If we had a good answer to this question we would be well on the way to a theory of visual thinking! A quick answer, though, is that projecting information or structure S onto an external structure or object O is much like perceiving O with S, the additional structure, already there. Since perception is a powerful process distinct from working memory (WM), any way of pushing informational load into perception is a way of reducing the burden on WM. To defend this view obviously requires more space than I have. But I suggest that projection is continuous with perception because perception already contains a component of ‘seeing the future’. For instance, when we see an object we do not literally see the back of the object, but knowing the object has a back is part of our perceptual experience, as phenomenologists have pointed out. This way of thinking has been articulated at a philosophical level by O’Regan and Noe and is sometimes called ‘enactive’ perception. Before it can be entertained as a serious theory it needs articulation in a more mechanistic computational form. But again, a short answer can be given if we treat perception as a process that involves instantiating a sensation (however it does that), and simultaneously activating or priming a constellation of related sensations – the ones we would experience if we were to move to the right or left, manipulate the object, saccade to the top, and so on. Projection is like perception understood in this special way because it is a process of increasing the priming level of some of the things we would see if we were to act in certain ways. That is, because we are able to modify object O by

Table 2. On a 4 by 4 game the importance of having a table to work with becomes more valuable, especially for good visualizers.

3

In the 4 by 4 condition we observed more gesture and manual gymnastics than in the 3 by 3. It was apparent that the more difficulty a subject had with the tic tac toe task, even with the help of the table, the more likely they were to externalize state information to help them out. When a table was present they could be far more effective in finding a coding scheme. For example, subject M, found a clever way of placing his fingers on the cells and the lines between the cells in the 4 by 4 table to encode more than 10 cells worth of information. Obviously he would have had far more difficulty encoding this information without the table there to ‘lean’ on since he would have had to project a visual structure ‘under’ his fingers on the page. There is much more to be said here concerning the nature of coding with hands and gesture and the timing of these interactions. But it may be more worthwhile tying this study back to the question of how people make sense of diagrams such as visual proofs and illustrations of mechanical systems.

WHY DOES INTERACTION IMPROVE PERFORMANCE?

To study why interaction, especially interaction that is tied to externalization, we have been running a few rather different experiments: one on word generation, scrabble style, and another on how people play with physical models, such as levers, to arrive at an understanding of how they work and a third on tangrams. I will briefly discuss the scrabble experiment. In this experiment subjects were shown 7 scrabble tiles in a flash implementation and asked to call words with 3 or more letters. The conditions were: 1.

No hands: subject cannot touch, point or move the tiles.

2.

Spacebar: hitting the spacebar causes a random reordering of the tiles.

3.

Hands: subject can freely move the tiles on a 2D surface.

The spacebar condition was added because we wanted to test whether removing the cognitive effort associated with thinking about how to rearrange the tiles and reducing the physical cost of manipulating them would result in increased tile movement. We had two conjectures: 1. 2. Figure 2a.

Figure 2b.

3.

Performance: Spacebar > Hands > No Hands There will be more occasions of tile shuffling in Spacebar than in Hands. At a deeper level self-cueing and opportunism are the primary advantages of moving tiles.

Consider the visual proof shown in figure 2a that converges to 1. As subjects immerse themselves in the proof they typically recreate a progression of cuts. They see that the operation of halving a square whose sides are 1 by 1, and then halving the remainder (whether it be a square or a rectangle) is a recursive process that will never yield a structure larger than the original square. This quasisimulation of cutting is a form of projection. This is further evidence that without a theory of projection we cannot understand visual thinking. Turning to figure 2b look at the figure and ask yourself if more force or less force is required to lift the load when the fulcrum is moved closer to the foot? How did you find the answer? By mentally moving the fulcrum and then simulating the consequences for the foot? This projected animation cannot be perception because, presumably, perception requires that what you see is, in fact, there to be seen. You cannot see the future. Yet there is something perception-like in this projected imagery even if it is not nearly as vivid as perception or even a mental image. Again projection seems to lie at the heart of our sense making here. Can we hope for a deep theory of sense making without a good theory of projection?

Table 3.

As shown in table 3, quantitative results confirmed our empirical predictions (i.e. 1 and 2). However the story is more complicated for the third claim. To determine if opportunism and self-cueing are the driving factors we need to know more about the relative costs of doing something in the physical world versus doing it solely in mind. We need a cost metric that will let us predict whether someone will perform an action in the world when they might just as well do the same thing exclusively in their head. Accordingly,

we spent most of our effort studying the time course of movements using a form of statistical analysis based on phonotactics. Phonotactics is the study of the probability of word generation based on assumptions about phonemic and lexical frequency. To these underlying frequency measures we added a third metric: permutational cost. The words that come to mind should be related to phonemic and lexical costs, but the words or letter sequences that an agent creates through movement should be related as well to permutational cost. The basic idea is that the cost structure of operating on letters in the world is different than the cost structure of operating on letters in the head. In the world actions that change letter order operate on letters. But in the head mental actions operate on lexical and phonemic representations as well as perhaps on letter ordering through some sort of projection or imagination. To be sure one can visually project letter reorganization on what is outside, but equally one can mentally operate on the sequence of sounds the letters can make – either by externalizing sounds or through inner speech. And one can also operate on the lexical elements – the bigrams, trigrams, tetragrams and words that are internally represented somehow. Since mental letter shifting qua letter permutation is a form of projection the external structure will affect the difficulty of that projection. Change the letter sequence that is visible on the outside and the cost of projection (the cost of searching different parts of the search space) should also change. The phonemic and lexical implications of this outside change will ramify through the internal system. The reason this is potentially explanatory of interactive behavior is that the cost of shifting letters outside is quite different than the cost of shifting letters mentally.

Figure 3.

As shown in Figure 3 a phonemic measure defines the closeness of different letter sequences in a phonemic attractor landscape based on sound closeness and sound frequencies. ‘Gnat’ and ‘nat’ are very close phonologically (i.e. phonemically). So a person sounding out words might strike upon ‘gnat’ after saying ‘nat’. A lexical measure defines closeness in a lexical landscape based on bigram, trigram and word frequencies and letter closeness. Because ‘gn’ is rarely the first two letters of a word, someone thinking in terms of letter clusters will not likely try discovering words that begin with ‘gn’ or combine ‘gn’ at the front with other clusters.. The permutational distance measures the number of tile movements or physical effort or time involved in movement. This distance is not great between ‘nat’ and ‘gnat’ since they are just one letter away. So someone who is a compulsive letter mover might move a ‘g’ to the front of ‘nat’. Given that these three metrics generate very different distances for letter sequences, what is close in one may be far in another. Note how the situation is different for ‘rat’ and ‘art’. Permutationally they are close but far’ish phonemically. It is good thing ‘ra’ and ‘ar’ are both high frequency bigrams. If a subject is shown ‘rat’ they will soon find ‘art’ and vice versa.

Proof that the inside and outside are subject to different permutation costs is shown by looking at the marginal cost to moving letters. In the physical world the cost of moving fifth letter of five-letter shift is not greater than the marginal cost of moving fourth letter of four-letter shift. Moving the fifth letter takes about as long moving the fourth letter. The critical factor is spatial: how far the letter has to be moved. By contrast, the marginal cost to mentally project new sequences tends to vary with the number of changes in the sequence. People tend to search the space of bi, tri, and tetragrams and words that are lexically and phonemically close. This is certainly not always the case. It is easy to jump from one letter sequence to another that is far away in lexical or phonemic space providing the new word is a high frequency word. But distance is a clear explanatory factor. In general, it is harder to systematically explore letter sequences that are farther from the currently visible sequence than to explore sequences that are close. See figure 3.

The upshot is that we would predict that an agent might stumble on sequences by simple physical rearrangement that would be hard to find mentally. Mental rearrangements will follow least energy paths in a lexical or phonological landscape. But least energy paths in the physical world will tend to be quite different. The result will be both selfcueing and opportunistic discovery of new words. And as the cost of shifting letters declines, as it does in the spacebar condition, the number of new words should go up because more parts of the search space are being exposed at low cost. This once again reaffirms the core design principle that reshaping the cost structure of activity, in this case reducing the cost of rearranging letters, can have a major impact on apparent creativity and task performance.

5

How Interaction Improves Sense Making

Projection, interactivity, sense making, visual thinking. .... we see a geometric illustration that is meant to help a subject prove implicit truths ... Figure 1. Two illustrations: 1a is a narrative illustration ... experiments, video'ed of course, to see how behavior and .... The basic idea is that the cost structure of operating on letters in ...

4MB Sizes 0 Downloads 197 Views

Recommend Documents

Making Sense of Word Embeddings - GitHub
Aug 11, 2016 - 1Technische Universität Darmstadt, LT Group, Computer Science Department, Germany. 2Moscow State University, Faculty of Computational ...

Making Sense of Trump Victory.pdf
Page 1 of 4. Dear Friends,. For those trying to make sense of the electoral outcome, I would like to offer a few thoughts. The following -. two observations, and three suggestions - combines some points that are well established in the social science

1 Making Sense of Nietzsche's “Truths”: Slavery, Misogyny and ...
Nietzsche begins the final chapter of Beyond Good and Evil, entitled “What is Noble,” ... Nietzsche's account of aristocracy gives rise to a host of interpretive ..... saw what his thoughts had come to and realized the kind of company he was in â

A common sense approach to informed medical decision making New ...
Download The 60-Minute Guide to Health Literacy: A common sense approach to informed medical decision making New Books. Books detail. Title : Download ...

Read PDF Making Sense of Statistics: A Conceptual ...
... on kindle Making Sense of Statistics: A Conceptual Overview ,free kindle app Making Sense of Statistics: A Conceptual Overview ,epub website Making Sense ...