Contrasting Neurocomputational Models of Value-based Decision Making

Konstantinos Tsetsos

Supervisor: Marius Usher

Master by Research School of Psychology Birkbeck College University of London 2008

Abstract Decisions theorists have for long time relied on static/ heuristic-based theories of choice in order to explain paradoxical instances of the human, value-based decision behavior. Recently, a new approach in modeling choice behavior, the dynamical neurocomputational framework, has been gaining recognition due to its descriptive and explanatory superiority over traditional theories of decision making. Neurocomputational models strive to establish detailed links between biology and cognition in a way that is consistent with established neural information processing principles. Nevertheless the process of making theories under the same framework does not always imply convergence on a common set of principles. And indeed in the class of neurocomputational theories of value-based choice there is significant variability in the assumptions which each model is based on. The main aim of the project is to qualitatively discriminate the two prominent dynamical theories of multi-attribute choice: the Leaky Competing Accumulators and Decision Field Theory. While both the aforementioned theories explain simultaneously a class of decision anomalies known as the contextual preference reversals, they do so drawing from diverging principles. The discrimination between the two theories will be based on computational explorations which will evaluate each theory’s mechanistic assumptions on the basis of the predictions they provide. The conclusions of the current study are highly expected to form a set of testable hypotheses, which will in turn inform future experimental studies towards the direction of identifying the common set of principles which mediate human choice behavior.

i

Acknowledgements This dissertation would not have been possible without the endless help and support of my supervisor, Marius Usher. For the last year, he has always been there to patiently guide, encourage and inspire me. Many thanks go also to Nick Chater for the useful discussions which have critically shaped my understanding of the important issues in the research of decision making. Finally I would like to thank Neil Stewart for providing me with unpublished material on Decision by Sampling and Jerome Busemeyer for his fruitful comments on my work.

ii

Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text. This work has not been submitted for any other degree or professional qualification except as specified.

(Konstantinos Tsetsos)

iii

Table of Contents

List of Figures

vi

List of Tables

viii

1

2

Introduction

1

1.1

From Rational Choice Theory to Neuroeconomics . . . . . . . . . . .

1

1.2

Trends in Theorizing Decision Behaviour . . . . . . . . . . . . . . .

2

1.3

Outline of the Project . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.3.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.3.2

Main Target and Aims . . . . . . . . . . . . . . . . . . . . .

6

1.3.3

Structure of the Project . . . . . . . . . . . . . . . . . . . . .

7

Background

9

2.1

Multi-attribute Decisions and Paradoxical Effects . . . . . . . . . . .

9

2.2

Traditional Theories of Multi-attribute Choice . . . . . . . . . . . . .

12

2.2.1

Elimination by Aspects . . . . . . . . . . . . . . . . . . . . .

12

2.2.2

Context-dependent Preferences . . . . . . . . . . . . . . . .

13

Dynamical Theories of Multi-attribute Choice . . . . . . . . . . . . .

15

2.3

3

2.3.1

Dynamical Theories under the Neurocomputational Framework 15

2.3.2

Decision Field Theory . . . . . . . . . . . . . . . . . . . . .

17

2.3.3

Leaky Competing Accumulators . . . . . . . . . . . . . . . .

20

2.3.4

The Echo Model . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.5

Decision by Sampling . . . . . . . . . . . . . . . . . . . . .

25

2.3.6

Discussion on Dynamical Theories of Choice . . . . . . . . .

28

Exploring Core Mechanisms in DFT and LCA

34

3.1

Introduction and Methodology . . . . . . . . . . . . . . . . . . . . .

34

3.2

Investigating Inhibition Functions in DFT . . . . . . . . . . . . . . .

36

iv

3.3

3.4

3.5 4

5

3.2.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.2.2

Piece-wise Linear Inhibition Functions . . . . . . . . . . . .

38

3.2.3

Sigmoid Inhibition Functions . . . . . . . . . . . . . . . . .

40

3.2.4

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Investigating Inhibition and Loss-aversion in LCA . . . . . . . . . . .

45

3.3.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.3.2

Exploring the level of global inhibition . . . . . . . . . . . .

46

3.3.3

Varying the value function . . . . . . . . . . . . . . . . . . .

49

3.3.4

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Simpler Versions of LCA . . . . . . . . . . . . . . . . . . . . . . . .

55

3.4.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.4.2

LCA with one reference point . . . . . . . . . . . . . . . . .

56

3.4.3

A Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . .

58

Summary of the Chapter . . . . . . . . . . . . . . . . . . . . . . . .

61

The Predictive Power of DFT and LCA

63

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

4.2

The dependency of the compromise effect on the separation between options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

4.3

Reference effects and unavailable options . . . . . . . . . . . . . . .

66

4.4

Extending the models beyond three alternatives . . . . . . . . . . . .

70

4.5

Summary of the Chapter . . . . . . . . . . . . . . . . . . . . . . . .

74

General Discussion

77

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

5.2

Local versus Global Inhibition . . . . . . . . . . . . . . . . . . . . .

77

5.3

Stability in DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

5.4

Linear versus Non-Linear Dynamics . . . . . . . . . . . . . . . . . .

82

5.5

Emerging versus Explicit Loss Aversion . . . . . . . . . . . . . . . .

84

5.6

Summary of the Project and Future Work . . . . . . . . . . . . . . .

86

Bibliography

89

v

List of Figures 2.1

A typical 2-dimensional choice space. . . . . . . . . . . . . . . . . .

10

2.2

Illustration of the decision process for a choice among three actions. .

17

2.3

Connectionist network for Decision Field Theory and a 2-D choice space. 18

2.4

Neural network for LCA and a typical asymmetric value function. . .

21

2.5

The connectionist network for the ECHO model. . . . . . . . . . . .

23

2.6

Schematic representation of the choice spaces for the reversal effects in DbS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.1

Illustration of the effect parameterization method. . . . . . . . . . . .

35

3.2

Connectionist network for DFT and a 2-D choice space. . . . . . . . .

37

3.3

Experimenting with linear inhibition functions. a)Various alternatives in the 2-d choice space. b)Linear inhibition functions with different slopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

3.4

Sigmoidal distance-inhibition functions . . . . . . . . . . . . . . . .

41

3.5

Color plot for the preferences of sigmoidal functions. . . . . . . . . .

41

3.6

Plot for the attraction and similarity effect for sigmoidal functions. . .

42

3.7

The magnitude of the compromise effect for sigmoidal functions. . . .

43

3.8

Function and predictions for a non-monotonic asymmetric inhibitory function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

Asymmetric value function for LCA. . . . . . . . . . . . . . . . . . .

46

3.10 Color plot for the preferences when varying inhibition in LCA . . . .

47

3.11 Contrast between P(A) and P(B) when varying inhibition for LCA. . .

48

3.9

3.12 The magnitude of the compromise effect when varying global inhibition for LCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.13 Symmetric value function. . . . . . . . . . . . . . . . . . . . . . . .

49

3.14 Color plot for the preferences for a symmetric value function. . . . . .

49

3.15 Contrast between P(A) and P(B) for a symmetric value function. . . .

50

vi

3.16 The magnitude of the compromise effect for a symmetric value function. 50 3.17 Various asymmetric value functions for LCA. . . . . . . . . . . . . .

51

3.18 Color plot for the preferences for various value functions. . . . . . . .

52

3.19 Contrast between P(A) and P(B) for various value functions. . . . . .

53

3.20 The magnitude of the compromise effect for various value functions. .

54

3.21 Symmetric value function and contour plot of 2-D utility for gains. . .

56

3.22 Reference points for the effects. . . . . . . . . . . . . . . . . . . . .

57

3.23 Results for LCA for one point of reference. . . . . . . . . . . . . . .

57

4.1

Investigating the effect of the distance between the extreme options on the compromise effect. . . . . . . . . . . . . . . . . . . . . . . . . .

4.2

65

The magnitude of the compromise effect as a function of the distance between the extreme options. . . . . . . . . . . . . . . . . . . . . . .

65

4.3

Depiction of the unavailability paradigm and experimental results. . .

67

4.4

Single trial activations for DFT in the unavailability paradigm. . . . .

68

4.5

Single trials activations and probability of choosing the correct pattern as a function of the negative valence, when a third dimension is introduced. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.6

Performance of DFT in a triadic choice setup. . . . . . . . . . . . . .

71

4.7

Performance of DFT and LCA in a choice setup with four options similar to the attraction effect. . . . . . . . . . . . . . . . . . . . . . . .

4.8

Performance of DFT in a choice setup with the four options forming a polygon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.1

72 73

Performance of DFT in a triadic choice setup under stable parameterization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

81

List of Tables 3.1

DFT choice and magnitude of reversal effects for linear inhibition functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

4.1

DFT and LCA on reference effects. . . . . . . . . . . . . . . . . . . .

67

4.2

DFT on reference effects with a third dimension called availability. . .

68

viii

Chapter 1 Introduction

1.1

From Rational Choice Theory to Neuroeconomics

A core topic of interest in cognitive sciences is the study of the mind and brain mechanisms which mediate choice. The ubiquity of choice processes across all the domains of cognition renders human decision behaviour classifiable in two distinct categories; perceptual choices encountered in psychophysical paradigms and motivational valuebased decisions considered in daily life. Perceptual human decision making has been studied within an experimental framework by considering neurophysiological data, behavioural accuracy and response times in psychophysical paradigms. Prevalent models in the class of perceptual choice are neuron-like computational models which address the process under which the observed choice behaviour emerges. On the other hand preferential decision making, being distinct from perceptual choice, was traditionally studied within the field of economics. Neo-classical economists relied on concepts of value maximization (i.e., expected utility theory) in order to conceptualise decision making from a normative perspective. However the violations in rational choice theory axioms, encountered repeatedly in actual decision behaviour, motivated descriptive theories of choice and the advent of behavioural economics. Behavioural economists attempted to describe decision paradoxes as deviations from the rational man assumption. Although appealing and often enlightening, such approaches are usually described in the form of disparate heuristics with each one ac-

1

Chapter 1. Introduction

2

counting for specialized empirical phenomena. Until recently rare and partially successful were the attempts to capture decision anomalies under a single mathematical framework (Tversky, 1972; Tversky and Simonson, 1993). However recent neurophysiological studies in behaving animals (Shadlen and Newsome, 2001; Glimcher, 2003; Sugrue et al., 2004) gave rise to the development of unified computational theories of value-based decision making (Roe et al., 2001; Guo and Holyoak, 2002; Usher and McClelland, 2004; Stewart et al., 2006). Such theories arise from assumptions in perceptual choice and provide computational accounts (i.e., neurocomputational models) for the process and its neural machinery, under which value-based decisions are issued. This line of research, lying at the intersection of neural and behavioural sciences, leads to the emerging and promising field of neuroeconomics.

1.2

Trends in Theorizing Decision Behaviour

In cognitive science three levels of cognitive theories have been identified consistent with the mind-computer analogy (Marr, 1982). At the highest level, theories capture the abstract goals which a system attempts to achieve. In analogy to the digital computer, this level of analysis describes the abstract computational goal of a program without worrying for algorithmic details. At an intermediate level these algorithmic details (i.e., how the top level goals are achieved) are taken into account while at the lowest level of analysis the physical implementation of the intermediate level is considered. The aforementioned three levels of analysis can be regarded in parallel to the two dominant theories of mind in cognitive science; symbolicism and connectionism. From the symbolic perspective the mind is a symbol manipulator and processing takes place as a sequence of serial, rule-governed steps. The symbolic approach can be adopted when constructing theories at the computational and algorithmic levels. Contrary to the sequential symbol manipulation assumption, connectionists hold the view that computation arises in cognition, as the property of highly connected networks of nodes which are analogous to neurons in the brain. According to connectionism, processing is dynamical and parallel. Connectionist systems are ideal for modelling in detail the dynamic processes which mediate the top level goal of the system (i.e.,

Chapter 1. Introduction

3

algorithmic level) along with the underlying neural structure (i.e., implementational level). In preferential choice research, much attention has been given to the highest level of analysis which aims to merely identify the abstract goals of the decision maker. Theories at this level are formulated as ad-hoc, fast and simple verbal rules (i.e., heuristics at the symbolic level) each one specialized at a particular subset of the behavioural data. Chomsky (Chomsky, 1965) and Seidenberg (Seidenberg, 1993) discussed the value of developing explanatory theories which explain phenomena in terms of a small set of independently motivated principles (e.g., theories in the connectionist framework), in contrast with ”descriptively adequate” theories. The latter generalize over the patterns of empirical data (i.e., heuristics) and only if they are made explicit enough, they can be formalized in computational explanatory models. Although heuristics provide good abstract descriptions on the empirical phenomena at the highest level of analysis, they do not explain how the latter arise from deeper principles. Attempts to transform heuristics into explanatory theories of choice can happen in two ways; either by assuming that the mind exploits a toolbox of disparate, specialized heuristics (Gigerenzer and Todd, 1999) or by formalizing them, in rigorous mathematical frameworks. However a theory of choice which assumes a toolbox of disparate heuristics, is rather meta-descriptive than explanatory. Additionally such a theory is based on evolutionary assumptions which are not testable. But most importantly and in contrary to mathematical accounts, it lacks predictive power since it does not formally state under which conditions each ad-hoc heuristic rule is applied (Chater, 2001). On the other hand theories of choice which incorporate the generalizations described by heuristics in a rigorous mathematical framework can be descriptively adequate, explanatory and predictive 1 . In the domain of mathematical theories, decision scientists have for long time relied on algebraic/deterministic theories with static parametrization (Tversky, 1972; Tversky and Simonson, 1993). Although these static theories provide a fair first order approximation of decision behaviour they fail to account fully for the variability of preferences 1 Tversky was the first who combined these two lines of research by formalizing insights from heuris-

tics in mathematical process models (Tversky, 1972; Tversky and Simonson, 1993).

Chapter 1. Introduction

4

across different choice contexts (Tversky, 1972; Huber et al., 1982; Simonson, 1989). Apart from this, being inherently static such theories cannot describe the relation between preference and deliberation time. These descriptive and explanatory limitations of the traditional algebraic theories of choice, has motivated alternative approaches based on the processing assumptions of perceptual choice about noisy accumulation of information (i.e., evidence in favour of each option) across time. The dynamical/stochastic processing assumption for valuebased choice has been instantiated in neurocomputational or connectionist models (Roe et al., 2001; Guo and Holyoak, 2002; Usher and McClelland, 2004). The neurocomputational approach has accounted for a class of decision anomalies in the domain of multi-attribute decision making (Tversky, 1972; Huber et al., 1982; Simonson, 1989), which had resisted an explanation under the framework of existing theories. Thus on top of providing a more detailed and low-level process analysis, dynamical theories offer also a more precise description of the empirical data than algebraic and heuristic theories do. Dynamical theories of choice combine existing concepts and models under a single dynamical framework in a neuron-like architecture. They must not be viewed in separation with static/algebraic theories, as they are built upon principles introduced by the latter 2 . Nevertheless it could be argued that the second order approximation of choice behaviour which the dynamical framework provides with the cost of extra complexity, is useless and premature in the absence of adequate behavioural data. However the reasons that prevented experimentalists from gathering more empirical data had been the lack of testable hypotheses (i.e., the limited predictive power of existing theories) and the weakness of traditional theories to account for existing empirical phenomena. Neurocomputational models seem capable of restoring the balance between theoretical and experimental work in the field, accounting for the existing data and providing novel and experimentally testable predictions, heralding the advent of a unified theory of value-based choice. 2 For

expample in the LCA model (Usher and McClelland, 2004), loss-aversion is implemented explicitly as asymmetry in the value function, as in Prospect Theory (Kahneman and Tversky, 1979) and later in Condext-dependent Advantage model (Tversky and Simonson, 1993).

Chapter 1. Introduction

1.3

5

Outline of the Project

1.3.1 Motivation Neurocomputational models strive to establish detailed links between biology and cognition in a way that is consistent with established neural information processing principles. Nevertheless the process of making a unified theory of choice does not always imply convergence on a common set of principles. And indeed in the class of neurocomputational theories of value-based choice there is significant variability in the assumptions which each model is based on. The present project will focus on discriminating two state of the art models of preferential decision making; the Leaky Competing Accumulator (hereafter, LCA) theory (Usher and McClelland, 2004) and the extension of the Decision Field Theory (hereafter, DFT) framework (Busemeyer and Townsend, 1993) for multi-attribute, multialternative choices (Roe et al., 2001). The aforementioned theories share many common assumptions but they diverge on crucial points. In particular both LCA and DFT conceptualise choice as a dynamical diffusion stochastic process. Noisy evidence in favour of each option is imperfectly integrated across time (i.e., leak), while alternative options compete to each other. A decision is issued either once evidence in favour of an alternative exceeds a predefined threshold (i.e., internal stopping rule), or after a specified period of time (i.e., external stopping rule). Despite the processing similarities between the two models there are several differences. While DFT is totally linear, in the LCA framework a set of non-linear assumptions are made. The first non-linearity concerns the dynamics of the model; while DFT does not impose any constraint in the values of the preferences states, LCA forces them to be non-negative. This constraint is rather biological since it implies that the preference states correspond to neural firing rates, and thus they must be non-negative. The second non-linearity is introduced in the value function, with disadvantages having larger impact than advantages. This loss-aversion property is not implemented directly in DFT and appears to emerge from the dynamics of the model. Intriguingly and in spite of the fact that LCA and DFT diverge at crucial points, they both account for the similarity, the attraction and the compromise effect, three empirical phenomena in the domain of multi-attribute choice which cannot be explained

Chapter 1. Introduction

6

simultaneously under any other mathematical framework. The failure of algebraic theories with static parameterization to explain these phenomena, suggests that we must seek for answers in the dynamics of the brain. However since each theory attributes these effects to different mechanisms, further research is required in order to evaluate each model’s accounts on the phenomena and to distil a common set of principles which issue them. Towards this direction the present project aims to provide a theoretical and critical analysis of DFT and LCA. The parameterization which enables each model to account for the contextual preference reversals will be used to generate qualitative predictions for each theory, under various situations. These predictions can form hypotheses for behavioural experiments, which will be the next crucial step in accepting or rejecting each theory’s mechanistic assumptions. Therefore the present work is the starting point towards propounding one, unified computational theory of value-based decision making.

1.3.2 Main Target and Aims The ultimate target of the project is to discriminate qualitatively LCA and DFT. This will be achieved by comparing them theoretically (i.e., which are the core mechanisms for each theory) and in terms of the predictions they provide. The theoretical analysis for DFT will focus on the competition it introduces between alternative options. DFT authors assume that the competition is proportional to the distance between the options, namely the more similar they are the more they inhibit each other. Although this assumption is a cornerstone in DFT, an explicit inhibitory mechanism has not yet been specified. Thus during the first set of computational investigations various inhibitory functions will be tried and assessed with respect to their predictions. Although the insufficiency of behavioural data does not allow for strong conclusions at this point 3 , we will focus our attempts in specifying a distance dependent inhibitory function which satisfies the minimum requirement of reproducing the three empirical phenomena under all the possible conditions specified in the definitions of the latter . This minimum requirement will be the baseline in the next set of investigations con3 For

example there are no empirical data about the nature of the effects e.g., the dependency of the magnitude of the effects on the similarity between the alternative options.

Chapter 1. Introduction

7

cerning LCA. Contrary to DFT, LCA does not assume distance dependent, but global competition between alternative options. Although the mechanism of competition is quite straightforward in LCA, theoretical issues occur in the implementation of the value function. For the latter LCA authors assume a non-linearity which weighs disadvantages more than it weighs advantages. At this stage we will explore whether LCA requires this asymmetry in order to account for the three empirical phenomena. After exploring the mechanisms of each theory separately, attempts will focus on evaluating the predictive power of the models under circumstances which can be experimentally tested. In particular predictions will be gathered concerning: • The dependency of the compromise effect on the separation between options. • Reference effects in a paradigm where the subject’s chosen option is rendered unavailable and a fast new decision is elicited. • Decisions with more than three options in the choice set. The theoretical analysis and the evaluation of each model’s predictive power are highly expected to designate hypotheses and aims for future computational and experimental work in the domain of multi-attribute, multi-alternative value-based choices.

1.3.3 Structure of the Project The project will be structured across the following sections: • Background – At this chapter the similarity, compromise and attraction effect will be presented along with existing algebraic and neurocomputational theories of multi-attribute decision making. • Exploring core mechanisms in DFT and LCA – First the basic method which parameterizes the effects across the choice set will be presented and explained.Next the mechanisms of inhibition and loss-aversion will be explored for DFT and LCA respectively. On top of that an alternative simplified version of LCA, with one reference point along with a hybrid model of LCA and DbS, will be tested.

Chapter 1. Introduction

8

• The predictive power of DFT and LCA – At this section LCA and DFT will be compared with respect to the qualitative predictions they provide, across three different paradigms (i.e., the dependency of the compromise effect on the separation of the options, reference effects and unavailability, choice for multiple options). • General Discussion – A critical analysis of the main findings will be provided. • Conclusions and Future Work – The limitations of the project will be discussed along with a summary of the main findings. Additionally directions for future computational and experimental work will be given.

Chapter 2 Background

2.1

Multi-attribute Decisions and Paradoxical Effects

Value-based decision making is a complex process even under simple decision situations. Contrary to perceptual choices where only perceptual information about the stimuli is required, in preferential decisions the outcome of each alternative is evaluated and matched to internal motivations. The current project will focus on decisions in which each alternative is described and characterized on multiple attributes, called multi-attribute, multi-alternative decision making. In a typical multi-attribute preferential choice problem, the decision maker is asked to choose among alternatives which vary on several dimensions. For instance, an individual faces the problem of choosing between two cars which differ in terms of two attributes; economy and quality (figure 2.1). Since economy and quality are described in different currencies, it is natural to assume that they are integrated in terms of an internal currency or value before any decision is reached. If we assume that there is a fixed algorithm which associates each attribute to value, then it is reasonable to expect that the decision maker’s preferences for each alternative are stable and well-ordered. However, contrary to this intuition empirical data on human decision behaviour suggest the opposite; preferences are not stable since they are subject to changes in the description and the context of the decision problem. For example changes in the context of the decision problem lead to a set of decision anomalies known as contextual preference reversals. In such situations the preferences

9

Chapter 2. Background

10

between two options are modified by the introduction of a third option. Three main instances of contextual preference reversals have been identified; the attraction, the similarity and the compromise effect. In order to outline each phenomenon we resort to the multi-attribute choice set (figure 2.1), concerning the evaluation of two automobiles A and B, in terms of economy and quality. For this example the decision maker is assumed to assign equal importance to both attributes. Option A is high in quality but low in economy while B has exactly the opposite properties. With just A and B in the choice set, the decision maker is of equal preference between the two options since their additive utilities are equal. 4

SB

Quality

3

C

2

1

D

0 0

1

2

Economy

A

3

4

Figure 2.1: Various options in the 2-dimensional choice space of a typical multi-attribute,multi-alternative choice problem.

The attraction effect (Simonson, 1989) suggests that when a third automobile D ,which is close to A but worse in both dimensions than it, is introduced then the choice preference changes in favor of A and against B. Formally we have: P(A|{A, B}) = P(B|{A, B}), P(A|{A, B, D}) > P(B|{A, B, D}). When we introduce option S which is similar to B and neither dominated nor dominating it (i.e, better in one dimension but worse in another) then the similarity effect (Tversky, 1972) is encountered; the choice is biased in favor of the dissimilar option B while A and S split their wins: P(A|{A, B}) = P(B|{A, B}),

Chapter 2. Background

11

P(A|{A, B, S}) < P(B|{A, B, S}). Finally the compromise effect (Huber et al., 1982) occurs when an option C that falls in between A and B in terms of both dimensions, is introduced. While in the binary choice, the decision maker is indifferent between A and C the compromise effect predicts that the introduction of option B in the choice set will alter the preference in favor of C and against A: P(A|{A,C}) = P(C|{A,C}), P(A|{A, B,C}) < P(C|{A, B,C}). All the phenomena described above are paradoxical: from a normative perspective the introduction of an option in the choice set should not affect the preferences for the existing alternatives. On top of these three contextual preference reversals two other cases, relevant to multi-attribute choices can be identified as violating the rationality assumption: reference effects and loss-aversion. Reference effects suggest that the decision between two options is dependent on a point of reference which is either the decision maker’s current situation or a desirable future state. For instance consider that the decision maker owns car D in figure 2.1 and is asked to exchange it with either car A or car B. It has been shown experimentally (Knetsch, 1989) that the decision maker will choose the option which is more similar to her current situation (i.e., reference point). Finally loss-aversion refers to the tendency of humans and behaving animals to maintain their current state when they are given the opportunity to exchange it with an equally beneficial situation. For the example in figure 2.1, this implies that subjects possessing car A would not exchange it for B and vice versa, while subjects possessing nothing would equiprobably choose either A or B. Loss-aversion has been conceptualized descriptively by Tversky and Kahneman in their prospect theory (Kahneman and Tversky, 1979) with an asymmetric value function which is steeper in the domain of losses than in the domain of gains. With the currently possessed option serving as the reference point and losses looming larger than gains, people prefer to retain their status quo. Prospect theory’s account apart from descriptive might also be explanatory. This must be investigated at the neural level by testing the hypothesis of asymmetry in neural activity, as a built-in attribute of the system which encodes value 1 . 1 An

explanation for loss-aversion has been given at a higher level in Decision by Sampling (Stewart et al., 2006) theory,where it is attributed to the assymetry in the natural real-world distributions of gains and losses.

Chapter 2. Background

12

Intriguingly contextual preference reversals and reference effects have eluded an explanation within the same mathematical framework for years. Algebraic/heuristic theories with static parameterization such as Elimination by Aspects (Tversky, 1972) and the Context-dependent Preferences framework (Tversky and Simonson, 1993) were partially successful attempts to account for the empirical phenomena. The latter were captured under the same mathematical framework only by dynamical theories of choice. In the next section both algebraic and neurocomputational theories are discussed with respect to the reversal effects.

2.2

Traditional Theories of Multi-attribute Choice

2.2.1 Elimination by Aspects According to Elimination by Aspects (hereafter EBA) (Tversky, 1972) theory, the decision process involves the sequential application of the following simple non compensatory (i.e., consideration of one attribute each time) strategic rule; at each time step one attribute is probabilistically chosen according to importance and any option which is disadvantageous on this attribute is eliminated. If two or more options survive after applying the rule with the most important attribute, then other attributes are considered until one alternative remains. EBA framework can account only for the similarity and compromise effects. We assume an initial binary choice set which consist of options A and B (figure 2.1). According to EBA heuristic, and since the decision maker assigns equal importance to both dimensions, A and B are equiprobably chosen. In the triadic choice set if economy is chosen as the most important attribute, then option B and option S survive the first elimination. Thus with the introduction of S, B and S split their wins and the share of B is reduced below that of A. Under the EBA framework, the attraction and compromise effects cannot be explained. For the case of the attraction effect the addition of a decoy option D (figure 2.1) does not change the shares of options A and B. Finally for the compromise effect situation, EBA in the first place does not seem able to explain how the middle option C will get favored by the introduction of B, as there will be always a better than C option in either dimension. However if we modify the algorithm so as instead of keeping the most

Chapter 2. Background

13

advantageous option to eliminate the most disadvantegeous on the active dimension, then the alternation between dimensions will allow only the compromise object to escape from the elimination. The incapacity of EBA to account for the attraction effects motivated the Context-dependent Preferences theory which is described next.

2.2.2 Context-dependent Preferences In the context-dependent advantage model (Tversky and Simonson, 1993), in the evaluation of each option all the alternatives in the choice set are treated as reference points. On top of assuming a reference frame for each option, Simonson and Tversky introduced an asymmetric, S-shaped value function which incorporates the principle of loss-aversion, by being steeper in the domain of losses. The combination of these two principles enables the context-dependent advantage model to account for the compromise and attraction effects. The attraction effect occurs because the relative advantage of A over D and B is larger than the relative advantage of B over D and A (figure 2.1). Analytically we have: P(A|A, B, D) = v(A) + θ[P(A|A, B) + P(A|A, D)] and P(B|A, B, D) = v(B) + θ[P(B|A, B) + P(B|B, D)] , with v being the additive utility function and θ the asymmetric value function. Since v(A) = v(B), and if we assume that the disadvantages are multiplied with a constant κ > 1 for the implementation of the loss-aversion attribute, the above relationships can be rewritten as follows: P(A|A, B, D) =

(Aecon − Decon ) + (Aqual − Dqual ) Aecon − Becon + (Aecon − Becon + κ(Bqual − Aqual )) (Aecon − Decon ) + (Aqual − Dqual ) =

Aecon − Becon +1 (Aecon − Becon + κ(Bqual − Aqual ))

, and

P(B|A, B, D) =

Bqual − Aqual Bqual − Dqual + . (Bqual − Aqual + κ(Aecon − Becon )) (Bqual − Dqual + κ(Decon − Becon ))

Options A and B are symmetric. Thus we can define a quantity α: α = |Aqual − Bqual | = |Aecon − Becon |.

Chapter 2. Background

14

Accordingly the preferences for A and B will be: P(A|A, B, D) = and P(B|A, B, D) =

α +1 α + κα

Bqual − Dqual α + . α + κα (Bqual − Dqual + κ(Decon − Becon ))

Thus: P(A|A, B, D) > P(B|A, B, D) ⇔

Bqual − Dqual <1 (Bqual − Dqual + κ(Decon − Becon ))

, or κ(Decon − Becon ) > 0. Since κ > 1 the above relationship holds for Decon > Becon , or for the addition of any option which is decoy to A and not to B. Essentially the attraction effect occurs because option B is penalized more, being the dissimilar alternative, by the steepness of the value function in the domain of losses. The same explanation is applicable for the compromise effect. In brief the decision maker is indifferent between A and C. By adding alternative B in the option set, the relative advantage of C over A and B is larger than the relative advantage of A over B and C. This extremeness aversion is due to the asymmetry in the value function which penalizes more the more distant option A. In the original paper (Tversky and Simonson, 1993) the context-dependent advantage model was not applied to the similarity effect. However DFT authors have proved that the context-dependent advantage framework cannot explain the similarity effect (Roe et al., 2001) 2 . Therefore the three empirical phenomena can be partially captured by two distinct and not overlapping theories. Although Tversky’s theories did not provide a unified explanation of the behavioural data, the principles of EBA and context-dependent advantage model inspired a class of theories under the dynamical framework which are outlined next. 2 The compromise effect requires a strict convexity in the value function. This means that that a disadvantage along a given attribute has more impact than the respective advantage, and that disadvantages grow faster than the respective advantages. It can be shown that the similarity cannot be reproduced with a strictly convex value function; in that case contrary to the similarity effect, the introduction of option S would increase the shares of B against A (figure 2.1)

Chapter 2. Background

2.3

15

Dynamical Theories of Multi-attribute Choice

2.3.1 Dynamical Theories under the Neurocomputational Framework Dynamical theories of value-based choice are built on principles of two main areas of research; perceptual choice and its processing assumptions and cognitive psychology and its descriptive insights mediated by heuristics/mathematical theories of choice. According to the dynamical approach preferential choice behavior emerges directly from a set of neural information processing principles. It is interesting to see under which assumptions and with what kind of models, a higher level cognitive process is linked to the brain mechanisms. The main underlying intuition of perceptual choice theories is that decisions are mediated by the stochastic accumulation of perceptual information. It has been shown that behavioral accuracies and response times in perceptual paradigms can be well accounted by models which assume two important components: neural decay and lateral inhibition (Usher and McClelland, 2001). Neural decay or leak suggests that the integration of information is imperfect while lateral inhibition refers to the competition between the units which represent the outcome of each alternative. While perceptual choice is driven by perceptual information alone, preferential decision making requires matching each alternative’s potential outcome to an internal motivational system. This process seems in the first place quite complex to be captured by simple principles of information processing, as it involves an interplay with other higher level functions (e.g., judgment, reasoning, planning). Nevertheless when the outcomes of each alternative are explicitly and deterministically described to the decision maker, preferential decision making can be regarded as analogous to a perceptual discrimination task. This simplification enables us to delineate the process of value-based deliberation as the integration of a stream of comparison of evaluations among options on attributes over time. This sequential sampling of information is not assumed to be performed consciously by the natural cognizer. The required computations are realized at the neurophysiological level and the dynamical theories are abstract representations of the

Chapter 2. Background

16

neural dynamics 3 . The process of value-based choice is better instantiated in dynamical connectionist or neurocomputational models. Although connectionism is often regarded as synonymous to the class models developed by the parallel-distributed processing group (Rumelhart and McClelland, 1986), the term here is used with its broader sense which includes all artificial neural network models. It is useful to discriminate between the tasks that PDP networks are ideal for and value-based decision making. Essentially PDP models are committed to a reverse engineering approach. According to the latter, the network can be trained ”innocent of engineering prejudices” and can reveal to the modeler how a task could be performed rather than the modeler directing the model to a particular solution (Churchland and Sejnowski, 1994). Contrary to the task learning oriented approach, the neurocomputational framework for valuebased choices does not include concepts of adaptation (i.e., changing weights strengths through a supervised or unsupervised learning algorithm) and self-organization; it is the implementation of a set of a priori and biologically informed (i.e., evidence from neurophysiology) processing assumptions in recurrent neural artificial networks. Another fundamental issue which has to be clarified is the concept of representation in the neurocomputational framework. For PDP connectionists, a concept, idea or object is coded in a distributed manner by a large population of neurons and it is definitely not captured by a node or a symbol. On the other extreme, symbolists allow themselves the flexibility of chaining together sequences of operations and arbitrary symbol binding. In between these two viewpoints the neurocomputational framework adopts both distributed and content specific or higher level neural representations. Overlap in the patterns is allowed for activation produced by different inputs, but the representation of distinct alternatives is treated as non-overlapping. Thus an output node in a neurocomputational model can represent an abstract concept such as preference but this representation does not rely on implausible symbolic machinery; although for modeling reasons it is treated as a single variable, preference is implied to be underlaid by potentially overlapping neuronal populations which encode it. 3 These

dynamical theories must be discriminated by the theory of mind known as dynamicism (van Gelder, 1998). Van Gelder’s dynamicism overstates the fact that cognition is dynamic. He attempts to subordinate symbolicism and connectionism under the dynamicism framework, without addressing the question of how a dynamical system could give rise to representational behavior and thus human style-intelligence (Eliasmith, 1998). The dynamical hypothesis is rather a high level description of the behavior of connectionist models, than a new paradigm for studying cognition (Eliasmith, 1997).

Chapter 2. Background

17

Having placed dynamical theories of choice within clear theoretical boundaries we describe next the two prominent neurocomputational models of value-based choice which can account for the three empirical phenomena under single frameworks; Decision Field Theory (Roe et al., 2001) and Leaky Competing Accumulators (Usher and McClelland, 2004). In spite of the fact that the current project focuses on discriminating qualitatively DFT and LCA, it is interesting to incorporate principles from alternative approaches in a broader theoretical discussion. Thus on top of outlining DFT and LCA, the ECHO model (Guo and Holyoak, 2002) and Decision by Sampling theory (Stewart et al., 2006) will also be discussed.

2.3.2 Decision Field Theory Decision field theory is one of the prominent process theories of decision-making, whose great asset is of being computationally explicit and making clear behavioral predictions (Busemeyer and Townsend, 1993). DFT theory has been applied to a broad class of value-based choice problems such as in decision making under uncertainty (Busemeyer and Townsend, 1993), multi-attribute decisions (Diederich, 1997) and recently in multi-alternative choices (Roe et al., 2001).

Figure 2.2: Illustration of the decision process for a choice among three actions with internal stopping rule.

DFT belongs to the general class of sequential sampling processing models of cognition. At each moment in time the decision maker focuses on different aspects of the problem by considering various payoffs of each prospect, which produces an affective reaction (i.e., valence) to each alternative. These affective reactions are integrated

Chapter 2. Background

18

across time into preference states at each moment of the deliberation process and a choice is issued either after some predefined time steps (i.e., external stopping rule) or once the preference state of an option exceeds a threshold (i.e., internal stopping rule as in figure 2.2). DFT shares many common properties with traditional models of choice; it is built on EBA (Tversky, 1972) assumption that the decision maker samples randomly one attribute at a time and on the principle of the context-dependent advantage model (Tversky and Simonson, 1993), that the choice is driven by valences which are computed contrasts between the options. However DFT departs from the context-dependent advantage framework in assuming a linear value function instead of an asymmetric one. Thus in DFT the loss-aversion property is not implemented directly in the value function but it is hypothesized to emerge from a combination of linear dynamics and distant dependent competition between the alternatives. DFT is instantiated in a linear neural network consisting of four layers as in figure 2.3(a). The attention of the decision maker switches stochastically across dimensions (D1,D2), according to a Bernoulli process. 4

Quality

3

SB C

2

1

0 0

D 1

(a)

2

Economy

A

3

4

(b)

Figure 2.3: Connectionist network for Decision Field Theory and a 2-D choice space.

The connectivity between the first two layers is given by a 2x3 matrix mi j , which encodes the 2D-characterization of each alternative on the D1-D2 space (Fig. 2.3(b)). Thus the units in the second layer compute a weighted utility, Ui , for each alternative: Ui (t) =



W j (t) · mi j + εi (t)

(2.1)

j=1,2

where ε is the probability of attending irrelevant dimensions. At the third layer the valence of each option or the contrast between the weighted value of the option and the average weighted values of the other options, is computed

Chapter 2. Background

19

as follows: vi (t) = Ui (t) −

∑k6=i Uk (t) . n−1

(2.2)

As the attention switches between the attributes, the valences vacillate from positive to negative values.Finally, the fourth-layer computes the preferences. This layer includes a connectivity matrix, s, with distance-dependent bi-directional inhibition (i.e., the more similar two alternatives are the more they inhibit each other) and a diagonal, self connectivity coefficient for recurrent excitation: Pi (t + h) = vi (t) + ∑ si j · Pj (t).

(2.3)

j

Although it has not yet been explicitly described by a specific function, the inhibition mechanism between the options in the s-matrix is a fundamental assumption in DFT since it plays a major role in accounting for the two of the three contextual reversal effects. The local inhibition between nearby alternatives (A and D in figure 2.3(b)) mediates the attraction effect. The similarity between options A and B results in their being strongly coupled by inhibitory connections. Nevertheless the inferiority of option D in both dimensions results in its taking negative valences. Thus, bearing negative activations, option D boosts the preference of option A by passing its negative activation values through a negative connection (i.e., negated inhibition is excitation). An inhibition between the less distant options A/B and C, which decreases in value the more distant A and B are (2.3(b)), mediates the compromise effect. Under the DFT framework, option A and C will bear the same inhibitory connections with the compromise option C as they are equidistant from it. Thus their activations will be correlated in time (i.e., same inhibition to a common option) and they will split their wins making the compromise option stand out and taking a larger share of choices. While the DFT account of the attraction and compromise effects, requires a distance dependent inhibitory mechanism this is not essential for the explanation of the similarity effect. The latter occurs due to the stochastic alternation between dimensions as in EBA (Tversky, 1972), which results in the similar options to split their shares. In summary the explanatory power of DFT relies on two pre-existing principles; the stochastic switching mechanism across attributes and the evaluation of each options

Chapter 2. Background

20

relative to the others. The novel assumption which DFT introduces is the distant dependent inhibitory mechanism between alternative options. This principle is not included in the -similar but in crucial points diverging- LCA framework which is outlined next.

2.3.3 Leaky Competing Accumulators The Leaky Competing Accumulator theory for preferential decision making (Usher and McClelland, 2004) arises within the LCA framework of perceptual choice (Usher and McClelland, 2001). Similarly to DFT the process of decision making is assumed to emerge from the imperfect integration of a momentary preference towards a response criterion, for each choice alternative. The stochastic scanning of dimensions and the evaluation of each alternative with reference to all the other options in the choice set are the common principles between DFT and LCA. However divergence occurs in the treatment of the competition between the alternatives, in the dynamics and in the implementation of the loss-aversion property. Concerning inhibition, while DFT assumes distance dependent inhibitory interactions between alternatives, LCA is based on global, constant inhibition. Additionally and contrary to the linear dynamics of DFT where activations are unconstrained, LCA truncates negative activations to zero. This constraint is consistent with many theories of perceptual choice where the propagation of negative activations via inhibitory links has computationally undesired consequences. Finally, similarly to the context-dependent advantage model the LCA framework incorporates the loss-aversion property explicitly by introducing a non-linear, asymmetric value function which weighs disadvantages more than advantages (figure 2.4(b)). LCA theory is instantiated in a four-layered connectionist network as in figure 2.4(a).At each time step the attention of the decision maker is focused randomly on one dimension. At the first three layers preprocessing takes place; at the first layer the 2Dcharacterization of each alternative in the choice space (figure 2.3(b)) is enoced while at the second layer a weighted utility for each option is computed with respect to the active dimension. At the third layer each option is evaluated with reference to the others as follows:

I1 = V (d12 ) +V (d13 ) + I0 .

(2.4)

Chapter 2. Background

21

2 1

value

0 −1 −2 −3 −4 −4

−2

0

gain/loss

2

4

Figure 2.4: a) Neural network for LCA for two attributes and three options. b)A typical asymmetric value function.

I2 = V (d21 ) +V (d23 ) + I0 .

(2.5)

I3 = V (d31 ) +V (d32 ) + I0 .

(2.6)

In the above equations di j is the advantage or disadvantage of option i to option j on the active dimension; V is a non-linear value function; I0 is a positive constant which promotes the alternatives in the choice space and ensures that activations will not get stuck to negative values and thus truncated to zero. A typical asymmetric value function is depicted in figure (2.4(b)) and is described as follows: ( V (x) =

z

,if x > 0.

−(z + z2 ) ,if x < 0.

and z = log(1 + |x|).

(2.7)

The preprocessing stage in the first three layers creates the inputs to the leaky competing accumulators at the fourth layer. The activations of the latter are given by the following equation: Ai (t + 1) = λAi (t) + (1 − λ)[Ii (t) − β ∑ A j (t) + ξ · (t)].

(2.8)

j6=i

In the above equation λ is the leak neural decay, β is the global mutual inhibition, ξ is normally distributed noise with zero mean and SD = σ and Ii are the inputs which were computed at the preprocessing stage (equations 2.4, 2.5, 2.6).

Chapter 2. Background

22

The principles which LCA relies on in order to account for the contextual preference reversals have been encountered in previous models of choice but they had never before been combined under a unified framework. The attraction and compromise effects are attributed to the loss-aversion property as in the context-dependent advantage model, whereas the similarity effect is a result of the stochastic alternation of attention across attributes as in EBA and DFT. In particular, the attraction effect is obtained because the dissimilar option B has two large disadvantages relative to A and D, while option A has only one large disadvantage relative to B (figure 2.3(b)). Accordingly the compromise effect occurs because the middle option C has two small disadvantages relative to the extreme options A and C while the latter have one small disadvantage relative to the compromise option and one large disadvantage relative to each other. Finally the similarity effect is a result of the correlation in the activations of the similar options; the preferences for the similar options rise together when their common supporting dimension is attended and fall together when the non-supporting dimension is scanned. Thus when the attention switching mechanisms favors the similar options, they will be chosen but they will split their shares whereas the dissimilar option will be anti-correlated to the other options and will not split its wins. Consequently the accounts which LCA provides for the attraction and compromise effects diverge from those given by DFT. While LCA is based on the explicit implementation of the loss-aversion property to account for the phenomena, DFT combines distance dependent inhibition and linear dynamics to explain them. This differentiation in the set of core principles between the two theories, allows for discriminating them on the basis of the distinct predictions their different mechanistic assumptions provide. Before discussing theoretical issues and ways to discriminate the two prominent neurocomputational theories of value-based choice, it is useful to present two alternative theories under the dynamical framework; the ECHO model (Guo and Holyoak, 2002) and Decision by Sampling (Stewart et al., 2006).

2.3.4 The Echo Model The explanatory power of DFT and LCA renders them the salient approaches in the domain of dynamical theories for value-based decision making. However Holyoak and

Chapter 2. Background

23

Simon (Holyoak and Simon, 1999) and Guo and Holyoak (Guo and Holyoak, 2002) have proposed an alternative theory of choice which has accounted for two of the three contextual preference reversals; the similarity and the attraction effect. According to the ECHO model, the decision paradoxes can be understood as value maximization constrained by categorization processes. The ECHO theory is instantiated in the symbolic connectionist network of figure 2.5. The special node at the first layer, called the external driver, becomes active when a decision problem is presented. It is directly connected to the attribute nodes at the second layer with constant weights. The attribute nodes are connected to the alternative nodes with bidirectional excitatory links which allows activations to pass back and forth from the second to the third layer. The weights between the second and the third layer correspond to the values of the options on each attribute. The alternative nodes at the third layer compete with global lateral inhibition. The model runs in an iterative manner; the decision process is initiated with the activation of the driver node which applies constant activation into the attribute nodes. The latter activate the alternative nodes which send back positive feedback at the second layer. The dynamics of the system are non-linear since activations are kept bounded between zero and one. A decision is issued once the changes in the activations fall below a threshold (i.e., the network settles).

Figure 2.5: The connectionist network for the ECHO model.

The ECHO model accounts for the similarity and the attraction effect by considering a sequential two stage process. At the first stage the two similar options are processed together and the final activation states of the two options are used as initial activation

Chapter 2. Background

24

states at the second stage where all the three alternatives are compared. The rationale is that the similar options are grouped together in a way similar to perceptual categorization while the processing assumption is that a within group binary comparison precedes the overall triadic comparison. For the similarity effect the binary comparison results in the two similar options to compete to each other and to end up with equal activations lower than the baseline 4 . The low activations for the similar options are carried over at the second stage where the dissimilar option joins the comparison with the baseline initial activation. As a result the dissimilar option will end up with higher choice probability than the two similar alternatives. The same two stage process is employed for the explanation of the attraction effect. The two similar options are grouped together and the first stage comparison results in the target option bearing a higher activation than the baseline due to its superiority as compared to the decoy. The advantage of the target option is carried over at the trinary comparison and results in its having the highest choice probabilities than the other options in the choice space. The ECHO model makes the novel prediction that as one option becomes dominant during deliberation, this dominance will enhance the activation of the attributes which are favored by the dominant alternative, due to the positive feedback from the third to the second layer of the network. This prediction suggests that the evaluation of the attributes becomes biased over time in a ”winner takes it all” fashion. This hypothesis has been experimentally tested by asking decision makers to evaluate the importance of each attribute at several points of the decision process. Indeed it was found that the rating increases for those attributes that are favored by the dominating alternative during the deliberation process (Simon et al., 2004). While LCA and DFT essentially restate existing principles drawn from previous descriptive models (e.g., EBA and context-dependent advantage model) under a dynamical framework, the ECHO model combines the rational assumption with simple psychological principles by suggesting that the decision process is value maximization constrained by categorization processes. Although the ECHO model does not account for the compromise effect it provides interesting insights. In particular it stresses that choice is driven by similarity as perceptual grouping in a two stage process. 4 At

the beginning of each processing stage the options that have not been preprocessed previously bear equal initial baseline activations.

Chapter 2. Background

25

The hypothesis that similarity modulates decision behavior is emphasized also in DFT by a different mechanism namely the distance dependent inhibition. In LCA the same concept is captured by constant inhibition and the asymmetry of the value function; the more dissimilar two options are the more they will be penalized by the value function (i.e., lower levels of activation) and thus they will compete less. The idea of similarity driving choice, as a perceptual process (i.e., categorization) is applicable in the Decision by Sampling framework (Stewart et al., 2006) which is reviewed in the next section.

2.3.5 Decision by Sampling The Decision by Sampling (hereafter, DbS) theory (Stewart et al., 2006) attempts to explain decision behavior by assuming that choice emerges from a limited set of cognitive tools. Contrary to the traditional descriptive theories of decision making, which are modifications of the normative theory, in DbS there are no underlying psychoeconomic scales. Instead the subjective value of an attribute is its rank in the decision sample which consists of attribute values both present in the decision context and drawn from memory. Thus the subjective value of a target is constructed online using simple cognitive processes such as binary comparisons and frequency accumulation. Drawing from simple psychological principles DbS accounts for a set of decision phenomena such as the loss-averse value functions, the hyperbolic temporal discounting and the overestimation of small probabilities. Being descriptively adequate and explanatory robust in several domains of value-based choice, the DbS core principles seem to provide a promising framework for the explanation of the contextual preference reversals in multi-attribute, multi-alternative choice. The authors of DbS have described a model for the time course of multi-attribute choice which combines the principles of their theory, with leaky competing accumulators in a two stage decision process (N. Stewart, personal communication). The first stage of the process involves a perceptual categorization of the options with the more similar alternatives clustered together (i.e., alike approach with the ECHO model). This perceptual grouping discriminates processing into within group and between groups. In across groups comparisons the options that are clustered together are treated as one option.

Chapter 2. Background

26

At the second stage a sequential sampling process, similar to LCA and DFT is employed. At each time step the attention switches stochastically across dimensions and between binary comparisons. Thus at each moment only one pair of options is compared with respect to the active attribute. The leaky competing accumulators, analogous to the fourth layer of the connectionist network in figure 2.4(a), encode the preference state for each alternative by integrating with decay the frequency of successful comparisons for each option. Note that each successful comparison for an option, results in increasing by one the corresponding counter while the counter for the option which was found to be disadvantageous is not affected. Alike to LCA the leaky competing accumulators interact via constant inhibitory links (i.e., global inhibition). The attention is hypothesized to switch faster across dimensions than between binary comparisons while the latter are more frequent for options which are more similar. For the similarity effect (figure 2.6(a)) options A and S will be grouped together since they are very similar. Comparisons within the group will give options A and S equal choice probabilities since each option is better than the other in one dimension only. However during between groups comparisons (i.e., A-S with B), options A and S will split their shares since they are processed as group and not separately (i.e., A and S are indistinguishably good in Y dimension), favoring that way the dissimilar option B.

(a)

(b)

(c)

Figure 2.6: Choice set for a)the similarity effect, b)the attraction effect and c)the compromise effect.

A similar explanation applies for the attraction effect (figure 2.6(b)); options A and D are grouped together and within group comparisons will favor option A in both dimensions. During between groups comparisons the grouped options A and D will split their victories. Nevertheless in the overall evaluation option A carries over an advantage from the within group comparisons, and will finally beat option B.

Chapter 2. Background

27

For the compromise effect (figure 2.6(c)), none of the options is significantly more similar to another and thus the perceptual preprocessing stage will not return any groups. Three binary comparisons will take place during the second stage of the process; A with B, A with C and B with C. As it was mentioned above the more similar two options are the more frequently their corresponding binary comparison takes place. Therefore the comparisons A-C and B-C will be more frequent and the compromise option C will be favored since it will be considered more often. For example we consider that the A-C and B-C comparisons will be considered 40% each one, while the A-B pair only 20%. Each option in each pair, will be advantageous during the half of the relevant comparisons. Thus option C will have an overall choice probability of 20% + 20% = 40% while each of the options A and B shall take an overall share of 20% + 10% = 30%. The accounts which DbS provides for the similarity and the attraction effect derive from the principle of perceptual categorization during the preprocessing stage. Similarly to the ECHO model the grouped options carry over their relative advantages/ disadvantages in the triadic overall evaluation. The second important mechanism in the DbS instantiation for multi-attribute choices is that the more similar two options are the more often their binary comparisons will be. Consequently the hypothesis that similarity mediates choice which is captured by other mechanisms in DFT, LCA and the ECHO model, is conceptualized in DbS as distant dependent frequency in binary comparisons. Summarily all the four theories which were outlined above, share the same hypothesis that choice is driven by similarity. Nevertheless this concept is implemented by different mechanisms in each theory’s model; DFT relies on distant dependent inhibition; LCA is based on global inhibition and a loss-averse value function; the ECHO model assumes the processing of the grouped options first, in a two stage decision process while DbS combines perceptual categorization with similarity dependent frequencies in the binary comparisons. Further theoretical and experimental research is required in order to evaluate each theory’s mechanisms. In the next section possible ways to qualitatively discriminate the models on the basis of the different predictions each one provides, are discussed.

Chapter 2. Background

28

2.3.6 Discussion on Dynamical Theories of Choice Scientific explanation is thought to have two main aims: ”successful predictions of the behavior of things in experimentally defined conditions, and theoretical representation of the causal structure of the world from which behavior follows” (Hesse, 1967). Thus in evaluating a theory of cognition,it is important to assess both its predictive power and the representation. A common approach in evaluating the predictions of a cognitive theory is to adjust the corresponding computational model to behave similarly to the natural cognizer from a particular aspect and once resemblance is achieved, to apply the model to psychological tests for which it was not specifically designed. That way it can be tested whether the theory is generic enough or just tailored to account for a particular subset of behavior. The evaluation of a theory’s representation of cognitive mechanisms can also be performed on the basis of the behavior of the model. This is because we usually lack empirical and neurophysiological data which would allow ruling out a priori particular cognitive architectures. Therefore we assume that the more accurate a theory’s predictions are the closer to the natural system its mechanistic assumptions will be. Thus the core instrument in our attempt to contrast theories of choice will be the evaluation of the predictions each one provides. The absence of adequate behavioral data concerning contextual preference reversals, urges us to focus in the first place only on the descriptively adequate theories. The ECHO model is ruled out as with its current configuration it cannot predict the compromise effect. This is not the case with DbS which seems able to account for the three effects simultaneously. However the DbS instantiation for multi-attribute choice has not yet been formalized mechanistically. For example the mechanism of perceptual grouping (i.e., how close two options must be in order to be clustered together) and the mechanism of distance dependent frequencies of the binary comparisons (i.e., which function characterizes it) have not been explicitly specified. Keeping in mind the promising framework which DbS provides, for future computational investigations, we focus on discriminating the two descriptively adequate and computationally realizable theories: DFT and LCA. As we have discussed above DFT is totally linear, closed form and relies on distance dependent inhibition while LCA is

Chapter 2. Background

29

non-linear and encompasses the loss-aversion property explicitly. This divergence on the mechanistic assumptions between the two theories has been the subject of a heated debate. Next we run through each side’s theoretical arguments. The linear dynamics of DFT and the loss-aversion property as emerging from distance dependent inhibition and negative activations, motivated LCA authors to develop their alternative theory. As Usher and McClelland note in their original paper (Usher and McClelland, 2004), the DFT mechanisms cannot provide a unified account for all the situations in which loss-aversion is found in decision-making, while the propagation of negative activations can have undesired computational consequences. Concerning different instances of loss-aversion in behavior three cases relevant to multi attribute choice can be identified (Kahneman and Tversky, 1991); cases in which subjects prefer improvements rather than trade-offs, situations in which decision makers prefer small advantages/ disadvantages than large ones and finally cases where individuals prefer to retain their status quo than to exchange it with an equally beneficial situation. All the three instances of loss-aversion can be grounded on a loss-averse value function as LCA authors suggest. DFT accounts for the improvement instead of trade-offs effect, by considering it an instance of the attraction effect; the current situation which is about to be abandoned is allowed to interact with the active options and sicne it is dominated by the close improved option it boosts the latter via negated inhibition. This mechanism alone cannot account for the status quo effect. The currently owned option is equidistant with the other available alternative and the two objects inhibit each other equally. The way that DFT authors accounted for this effect (Busemeyer and Townsend, 1993) is by assuming a higher initial preference state for the currently possessed option. However this extra assumption is not clearly justifiable unless one introduces a mechanism similar to explicit loss-aversion. Apart from this in paradigms where the subjects are given time to deliberate, the initial advantage of the possessed option will be absorbed and lost in the dynamic leaky accumulation of information. Finally the preference for small changes instead for large ones cannot be accommodated in DFT. The possessed option S in that case is located in the one extreme of the choice space, in a choice set which consists of A, S, B and is set unavailable (figure 2.3(b)). It has been shown experimentally that the decision maker picks the alternative B which is closest to the previously owned option (Usher et al., 2008). It is not clear

Chapter 2. Background

30

that the previously possessed option S is active in the decision process (i.e., if it is not the same problems as in the status quo case emerge) but let us consider that it is; since the unavailable option S is not dominated by the closest option B, it will not bear negative valences and consequently it will not boost the actiavtion of B. On top of that options B and S will be correlated in time as in the similarity effect, favoring that way the choice of the distant option A. Apart from suggesting that the loss-aversion property is not fully captured in DFT, LCA authors envisage that further problems might arise by retaining the dynamics totally linear. Allowing the propagation of both positive and negative activations and having that way a closed form, linear model is very convenient since it enables us to obtain predictions using linear algebra rather than time consuming simulations. However the propagation of negative activations might have deleterious consequences in cases where more than three options are present in the choice set. In that situation DFT has no mechanism to rule out the uninformative options (i.e., those with very low additive utilities) and as a consequence a cascade of negative activations transmitted via inhibitory links will blow up the system. As opposed to DFT, LCA authors propose that negative activations are truncated to zero and thus uninformative options are ruled out at an early stage of the decision process. There is also a biological motivation for keeping activations strictly positive, since the latter are assumed to correspond to neural firing rates. The final point of criticism from LCA authors to DFT, concerns the treatment of the distant dependency. In their original paper (Roe et al., 2001), Roe and colleagues do not specify an explicit function for the inhibitory mechanism. Instead they avoid overspecification by describing that such a function should decrease proportionally to the distance between options. Thus in the reproduction of the effects they defined a value for the inhibitory link between the distant options and a value for the close ones. This means that options A and C for the compromise effect, have the same inhibitory connection with options B and S in the similarity case as both the pairs are treated as consisting of nearby options (figure 2.3(b)). This has specific behavioral implications; in a choice set consisting of A, B and C, options A and C are conceived to be equally similar as B and S in a choice set consisting of A, B and S. Raising the point of explicitness in distance dependency, LCA authors do not attempt to reject DFT on that basis. They rather emphasize the necessity and the value of introducing an explicit inhibition function in DFT.

Chapter 2. Background

31

In summary LCA authors criticize DFT from three different viewpoints. Firstly the emergent loss-aversion which DFT suggests is a subset of the loss-aversion encountered in behavior. Secondly the transmission of negative activations through inhibitory links can have uncontrolled computational consequences and finally the similarity dependent inhibition mechanism is described rather than explicitly specified. These issues were addressed by the DFT authors’ in a theoretical reply (Busemeyer et al., 2005). Concerning loss-aversion, DFT authors point out that there is no logical inconsistency between their model and loss-averse behavior. One could code the inputs to the DFT system as positive or negative with respect to some reference point and apply a lossaversion type of transformation to these inputs. Nevertheless this statement has to be grounded theoretically; one cannot have the freedom to code inputs at will in order to accommodate paradoxical effects, but instead coding must be the result of applying a consistent and well specified mechanism. DFT authors describe a computational model, which can predict reversals between buying and selling prices in decision making under risk problems (Busemeyer and Johnson, 2004). This type of reversals can be viewed as a special case of the status quo effect. However the computational model that is adopted, aims to find selling and buying prices for a given gamble, using a search process based on Markov chain theory. Hence there is no apparent overlap between this model tailored to account preference reversals under risk, and DFT for multi-attribute choice. The status quo effect for multi-attribute, multi-alternative choice still resists an explanation under the DFT framework. This is not the case with the instance of loss-aversion in which preferences are stronger for small changes rather than large ones. Busemeyer and Johnson (Busemeyer and Johnson, 2004) have suggested that DFT can accommodate this situation by introducing a third attribute called availability to distinguish the reference from the available options. By being inherently unavailable, the reference would have a lower overall valence than the other alternatives, becoming dominated and boosting the similar alternative by negated inhibition. Thus by this extra assumption, DFT authors accommodate this type of loss-averse behavior, as a special case of the attraction effect. Further experimental research needs to be done in order to assess whether the reference interacts with the active alternatives in the decision process. It is theoretically clear though that while LCA explains the attraction effect as a direct derivative of the loss-aversion

Chapter 2. Background

32

property, DFT subordinates loss-aversion instances in behavior as being special cases of the attraction effect. The next point which the DFT authors discuss is the linear dynamics of their model that allow the propagation of negative activations through inhibitory links. Busemeyer and colleagues focus on providing a biologically consistent justification of the negative states as activation that is suppressed below a baseline firing rate of a neural unit. The negated inhibition encountered in their model, can be viewed as analogous to the phenomenon of disinhibition in which the suppression of inhibitory units can force the previously inhibited units to fire above the baseline rate. Although reasonable this justification, the linearity of DFT imposes problems of biological plausibility. Consider an inhibited neural unit which fires in the presence of a particular input. We assume that the firing rate of the unit will be below the baseline (i.e., negative in DFT) on account of the inhibition it receives. Now if the inhibiting neuron is suppressed then the firing rate of the unit potentially exceeds the baseline rate. However the firing rate of the disinhibited unit cannot be higher than its firing rate when it is no longer linked with an inhibitory connection with the inhibiting neuron. In other words the boost which the previously inhibited neuron gets from disinhibition, must be bounded from above. In DFT the linearity of the model does not set such an upper boundary. The existence of neurons or neuronal populations (i.e., consistent with the representations in DFT where preference states stand for large overlapping neuronal populations) which through disinibhibition, potentially fire at infinitely large firing rates is biologocially implausible. One could argue that DFT captures mathematically the concept of disinhibition in order to approximate behavior without aiming to provide precise or plausible neural accounts. Even in this case the linear dynamics in a model which conceptualizes disinhibition can have undesired computational consequences in cases where activations approach infinity. In that case the DFT will not be criticized on the basis of biological plausibility but on its descriptive inadequacy since the dynamic system will blow up providing inconsistent predictions. Although this issue has not been thoroughly investigated, DFT authors recognize that the linear approximation which their model attempts will reach at some point its limits and thus they will need to provide a nonlinear version of DFT. The final point of criticism which LCA authors raised, questions the reasonableness

Chapter 2. Background

33

of the selection of inhibitory parameters in DFT. As we mentioned above although the distance between the similar option and its target is smaller than the distance between the compromise and its targets, the inhibitory connections have the same strengths for both the effects. On that point DFT authors mention: ”Why did we keep the parameters constant across applications? Our purpose was to make it perfectly clear that we could reproduce all three findings with exactly the same parameters. If we are free to adjust the parameters in an appropriate manner for each application, then it becomes even easier for DFT to produce the desired results” (Busemeyer et al., 2005). However it is inconsistent to describe a distance dependent inhibition mechanism and to keep the same parameterization for different distances. DFT authors imply that alternatively they could adjust freely and at will the parameters in order to accommodate each effect. Obviously this would be of little value and would not overcome the problem of not using an explicit similarity dependent competition function. Without implying that there cannot be such function to capture the three effects simultaneously, LCA authors simply stress the necessity for stating explicitly the mechanisms which DFT relies on. This explicitness becomes necessary as using uniform inhibition values introduces considerable model freedom which encumbers the qualitatively discrimination between DFT and LCA. Thus the aim of the first set of computational investigations which will follow will be to specify the class of distance dependent inhibition functions which enable the DFT model to account for the behavioral data. After that the loss-averse value function in LCA model will be explored. Having examined the core mechanisms of the two models we will next evaluate them with respect to the predictions they provide. This evaluation will help us to identify the limits of each theory and to motivate future experimental research in order to resolve which set of principles mediate contextual preference reversals; intrinsic loss-aversion and non-linear dynamics or emergent loss-aversion from linear dynamics.

Chapter 3 Exploring Core Mechanisms in DFT and LCA

3.1

Introduction and Methodology

Cognition is so complex that simulation models are often the only source of empirical verification for a theory, and help to guide the theory’s evolution by organizing the data and motivating further experiments (Churchland and Sejnowski, 1994). If the simulation behaves similarly to a natural cognizer, the underlying model and theory are considered to be good. This is the rationale which we will follow in evaluating DFT and LCA. Both of the theories are computationally viable, nevertheless further investigations are required in order to crystallize basic mechanism for each one, namely distant dependent inhibition for DFT and the loss-averse value function in LCA. Having explicit instantiations of the theories we can next gather predictions which will drive their qualitative discrimination. The absence of adequate behavioral data concerning the nature of the reversal effects renders quite hard the process of searching ”optimized” mechanisms for the theories. Nevertheless we can reject instances of the mechanisms we investigate, if the latter cannot capture all the three effects simultaneously. This baseline criterion motivates a methodology which will parameterize the effects across the choice space. The virtue of a parametric investigation is twofold; firstly it provides a graphical depiction of the places of the choice space in which the effects take place (i.e., the extent of the effects under several instances of the theories) and secondly it provides predictions concerning 34

Chapter 3. Exploring Core Mechanisms in DFT and LCA

35

the magnitude of the effects as a function of the separation between the alternatives. Predictions for each instance of DFT and LCA will be gathered in a parametric way as follows: the two options (e.g., A and B) remain constant, while a third option (e.g., C) is modified across the two dimensional space (Fig. 3.1). We output only results for C not much above the diagonal between A and B, where the option C is always chosen. In the example we are using in order to outline the parametric method, the subject of evaluation is the distance dependent inhibition fucntion for DFT which is illustrated in figure 3.1(a). Distance Dependent Inhibition Function

Color Plot of Preference

0.05

4 3

Quality

Inhibition

0.04 0.03 0.02

B

2 1

0.01 0 0

2

Distance

4

0 0

6

A

1

(a)

2

Economy

3

4

(b)

P(A)−P(B)

Quality

3

0.5 0.4

B

0.3 2 0.2 1 0 0

A

2

Economy

0.1 4

0

Compromise effect on the diagonal Magnitude of the effect

4

0.2 0.1

Compromise for A Compromise for B

0 −0.1 −0.2 0

(c)

B

1

A

2

3

Point on the diagonal

(d)

Figure 3.1: a) Sigmoidal inhibitory function.

b) Color plot for choice-

preference. For each point, the RGB color-scale is determined the the choice probability (Red - A; green - B and blue -C. c)Gray scale plot for showing the boost that option A receives relatively to B, as a result of the introducing option C: P(A|A, B,C) − P(B|A, B,C). d)Plot showing the magnitude of the compromise effect as option C moves across the diagonal tradeP(C|A,B,C)

off line between A and B. Red line: P(A|A,B,C)+P(C|A,B,C) − 0.5. Green line: P(C|A,B,C) P(B|A,B,C)+P(C|A,B,C) − 0.5.

4

Chapter 3. Exploring Core Mechanisms in DFT and LCA

36

For each unique combination of A, B and C the preferences for each option are recorded. Two alternative ways of expressing the output of the parametric investigation are used. In figure 3.1(b) each point of the two dimensional space is colored according to the RGB scale, where red is the preference for A, green is the preference for B and blue is the preference for C in the triadic choice. For example a point that is colored towards red, suggests that the introduction of option C at this particular point, favors the preference for option A. We observe (figure 3.1(b)) that the area below and left of option A (3,1) is colored with red, implying an attraction effect (i.e., adding the decoy-C near A, enhances the preference for A). In order to better illustrate the magnitude of the attraction and similarity effects, we use a second output-diagram (figure 3.1(c)), which depicts the boost that A receives as a result of the introduction of C into the choice-set: P(A|A, B,C) − P(B|A, B,C). Here we use a gray scale, where brighter points correspond to a stronger enhancement of the preference for A, by the introduction of C. In addition to illustrating the attraction effect, figure 3.1(c) also illustrates clearly the similarity effect as a thin white line close to option B (1,3) and adjacent to the diagonal (i.e.,the introduction of an option C similarneither dominating nor dominated by- option B results in boosting the preference for the disimilar option A). The final output diagram we will use examines the compromise effect, for C-alternatives chosen on the diagonal line between A-B (figure 3.1(d)). In this diagram, the magniP(C|A,B,C) tude of the compromise effect ( P(A|A,B,C)+P(C|A,B,C) − 0.5) relative to A (red-line) and P(C|A,B,C) relative to B ( P(B|A,B,C)+P(C|A,B,C) − 0.5) (green-line), is plotted as a function of the

location of C-on the diagonal. Positive values correspond to locations where the introduction of option C, together generates a compromise effect. In this example, the compromise effect occurs on a narrow area around the centre of the diagonal line.

3.2 3.2.1

Investigating Inhibition Functions in DFT Motivation

An important feature of the DFT account to the reversal effects is the mechanism of local or distance-dependent competition: the inhibition between alternatives is variable and dependent on the similarity between the objects. Until now, an explicit inhibition-

Chapter 3. Exploring Core Mechanisms in DFT and LCA

37

function that parametrically quantifies the distance-dependency has not been tested. Instead the DFT model, has set up the values of the inhibitory connections between the alternatives, separately, for a number of separate choice situations. While the choice for these values is not inconsistent, it is of interest to explore a unified parametrization of the inhibition-function and its predictions on the choice patterns predicted by DFT. In principle the local inhibition is not required for the explanation of the similarity effect. The latter is accounted on the basis of the stochastic alternation between dimensions and could potentially be reproduced with global inhibition only. Large inhibition between the pair B-S would damage the popularity of the dissimilar A, since it would tend to anti-correlate the activations of the similar in both dimensions B and S (figure 3.2(b)).This is not the case for the compromise and the attraction effect. The attraction effect requires high coupling between the decoy D and target A, while option B must not interact with the A-D pair. Preliminary investigations on the compromise effect suggest that the inhibition between options A-C and B-C must be relatively high while A-B ideally should not interact. This is due to the fact that the highest the inhibition between the compromise and the extreme objects, the highest the correlation, which mediates the effect, between A and B will be. 4

Quality

3

SB C

2

1

0 0

(a)

D 1

2

Economy

A

3

4

(b)

Figure 3.2: Connectionist network for DFT and a 2-D choice space.

However the distance between A-C and B-C is much larger than the distance between B and S in the similarity effect, although the competition in the compromise case has to be larger than the one in the similarity. As shown by Roe et al. (2001), it was possible to find a set of single inhibition values that satisfy all the three effects simultaneously. This corresponds to an inhibition function that has the shape of a step-function of the

Chapter 3. Exploring Core Mechanisms in DFT and LCA

type:

( β(x) =

β0 ,if x < x0 . β1 ,if x >= x0 .

38

(3.1)

(the values, used in Roe et al. (2001), are β0 = .025, β1 = .0001, x0 > 1.5). Finally it is important to note the role of the additive noise at the third layer of the network, where the valences are computed, for the accountability of DFT (figure 3.2(a)). More noise damages the similarity effect since it anti-correlates the similar options but it favors the reproduction of the compromise effect, pushing the extreme options to rise and fall together across time. Thus the noise parameter has a similar impact to the effects as inhibition. DFT could explain more easily the effects, by adjusting freely the additive noise depending on each situation. This would give the model more flexibility, but a noise parameter which changes across different instances of the choice set is hardly justifiable. On top of this it is impossible to gain insights on the inhibition mechanism without keeping the noise parameter fixed. Consequently in the following investigations for DFT we set the noise parameter equal to 0.2. Additionally the self-connection coefficient, which characterizes the neural leak in the accumulation of preferences is set to 0.95 and is similar to the value of 0.94 used in Roe et al. (2001). Given these assumptions a natural function for the lateral inhibitory links between different objects is searched in the next sections.

3.2.2 Piece-wise Linear Inhibition Functions We start with the simplest type of decreasing functions of distance, which are piecewise linear. The nine functions explored are illustrated in figure 3.3(b) and span a range of distances determined by the diagonal in figure 3.3(a). The inhibition functions were selected as follows. The first series of 3 functions (1a, 2a, 3a, solid lines), reflect global inhibition of different magnitudes. The series (1a, 1b, 1c) are inhibition function that start from the same magnitude (at x=0.8) and decrease with different slopes. For 1b the inhibition becomes very small at distance |A − B|, while for for 1c, the inhibition becomes very small, already at a distance of |A − C|. The series (2a, 2b, and 2c) and (3a, 3b, 3c) are analogous in slope with (1a, 1b, 1c) but have a lower magnitude of the starting inhibition.

Chapter 3. Exploring Core Mechanisms in DFT and LCA

0.1

4

SB

Inhibition

Quality

3

C

2

1

0 0

39

D 1

2

Economy

1a 1b 1c 2a 2b 2c 3a 3b 3c

0.05

A

3

0 0

4

5 Distance

(a)

(b)

Figure 3.3: Experimenting with linear inhibition functions. a)Various alternatives in the 2-d choice space. b)Linear inhibition functions with different slopes.

The DFT model with distance-dependency determined by each of these nine functions was simulated for binary and trinary choices [(A,B, D); (A,B,C); (A, S, B)], and the choice probabilities, as well as the magnitude of the three reversal effects are given in Table 3.1. Function

Similarity

Attraction

Compromise

P(A)

P(B)

P(S)

Effect

P(A)

P(B)

P(D)

Effect

P(A)

P(B)

P(C)

Effect

(1a). f (x) = 0.08

0.46

0.19

0.35

0.2

0.5

0.5

0

0

0.43

0.46

0.1

-0.3

(1b). f (x) = 0.08 − 0.02x

0.03

0.47

0.5

-0.42

1

0

0

0.5

0.3

0.27

0.43

0.09

(1c). f (x) = 0.08 − 0.08x

0.14

0.4

0.45

-0.24

1

0

0

0.5

0.44

0.45

0.1

-0.3

(2a). f (x) = 0.06

0.45

0.19

0.36

0.18

0.51

0.49

0

0.02

0.49

0.41

0.1

-0.3

(2b). f (x) = 0.06 − 0.015x

0.28

0.35

0.37

-0.06

1

0

0

0.5

0.35

0.4

0.24

-0.1

(2c). f (x) = 0.06 − 0.06x

0.37

0.26

0.37

0.09

0.96

0.04

0

0.46

0.49

0.41

0.1

-0.34

(3a). f (x) = 0.04

0.46

0.19

0.35

0.2

0.49

0.51

0

-0.01

0.47

0.43

0.1

-0.3

(3b). f (x) = 0.04 − 0.01x

0.45

0.23

0.32

0.16

0.82

0.17

0

0.33

0.42

0.4

0.16

-0.22

(3c). f (x) = 0.04 − 0.04x

0.45

0.21

0.33

0.18

0.76

0.28

0

0.22

0.47

0.43

0.1

-0.3

Table 3.1: DFT choice and magnitude of reversal effects for linear inhibition functions; positive numbers correspond to reversal effects in the predicted direction, while negative values to results against the predicted direction.

We observe, first that both the attraction effect and the compromise effect rule out the constant functions (i.e.,global inhibition). Second, we observe that the similarity effect requires a low level of local inhibition (i.e., it disappears under 1b, 1c, and 2b, all of which have a local inhibition larger than .05). Third, we observe that the compromise effect requires relatively high level of local inhibition (1b), so that the inhibition is still large enough at intermediate distances (A-C), but not at larger ones for the extreme options (A-B). These observations are consistent with the ideas that the similarity effect

Chapter 3. Exploring Core Mechanisms in DFT and LCA

40

is ideally reproduced for global inhibition (i.e., for the correlation between similar objects to be maintained), while the compromise effect requires high inhibition between the extremes and the comrpomise options, and no competition between the extreme objects (i.e., for the correlational mechanism between the extreme alternatives to work in favour of the popularity of the compromise object). None of the linear functions was able to capture the three phenomena simultaneously. Consequently other non-linear inhibitory functions, have to be tested. The tension between the compromise and similarity effects suggests that a natural function which maintains the inhibition constant within a radius and then allows it to decay can potentially account for the three reversal effects simultaneously. Indeed in preliminary investigations we found that the distance-dependency of the competition is adequately captured by sigmoidal functions, which resemble the properties of the step function 3.1, that was used initially in DFT for multi-attribute choice (Roe et al., 2001). In the next section we use the parametric method in order to explore different sigmoidal inhibition functions.

3.2.3 Sigmoid Inhibition Functions We present here the investigation of the DFT account of reversal effects for sigmoidal dependency of the inhibition functions. This class of functions has the property of keeping a constant value of inhibition, when the distance between the options is below a threshold, and decrease the inhibition magnitude relatively steeply once this threshold is exceeded. The sigmoid-threshold and slope were varied according to:

f (x) = 0.042

1

. 1 + e20(x−1.5) 1 f (x) = 0.042 . 20(x−2.4) 1+e 1 f (x) = 0.042 . 20(x−2.8) 1+e 1 f (x) = 0.042 . 1 + e1.8(x−2.5)

(3.2) (3.3) (3.4) (3.5)

The results, reported in figures(3.5, 3.6), indicate that all four sigmoids account successfully for the attraction and similarity effects. The predictions across the different

Chapter 3. Exploring Core Mechanisms in DFT and LCA

41

Distance Dependent Inhibition Function

Inhibition

0.06

3.2 3.3 3.4 3.5

0.04 0.02 0 0

2

|A−C|

|A−B|

Distance

4

6

Figure 3.4: Sigmoidal distance-inhibition functions. The maximal inhibition (0.042) was set so as to not abolish the similarity effect.

functions are similar for the similarity effect. But as we can see in plot (3.6), increasing the sigmoidal-threshold or delaying the decay of the inhibition function, restricts the attraction effect in the lower right corner. Color Plot of Preference

Color Plot of Preference

4

4 3

B

Quality

Quality

3 2 1

a)

1

2

Economy

2 1

A

0 0

3

4

0 0

b)

Color Plot of Preference

3

Quality

Quality

B

2

0 0

1

2

Economy

3

4

4

1

c)

A

Color Plot of Preference

4 3

B

A

1

2

Economy

3

B

2

1

4

d)

0 0

A

1

2

Economy

3

4

Figure 3.5: Color plot for the preferences. Each point corresponds to an RGB(prefA,prefB,prefC). a)Function (3.2), a=-20, c=1.5. b)Function (3.3), a=-20, c=2.4. c) Function (3.4), a=-20, c=2.8. d)Function (3.5), a=-1.8, c=2.5

In figure 3.7), we can see that the shallow sigmoidal function (3.5) is the only one that totally fails to reproduce the compromise effect. It seems that a late decline point or a shallow slope does not provide the appropriate level of inhibition between the compromise and the extreme options and thus it damages the compromise effect. Function (3.2) which starts to decay quite early (i.e., for distance equal to 1.5) results in

Chapter 3. Exploring Core Mechanisms in DFT and LCA P(A)−P(B)

P(A)−P(B)

4

0.5 0.4

B

3

0.3 2 0.2 1

A

0 0

a)

4

Quality

Quality

3

2

0.3 0.2

b)

P(A)−P(B) 0.4

B

3

0.3

Quality

Quality

c)

0.2 A

2

Economy

2

0.1 4

Economy

4

2

0 0

A

0

P(A)−P(B) 0.5

1

0.4

B

2

0 0

0

4

Economy

0.5

1

0.1

4 3

42

0.4

B

0.3 2 0.2 1

0.1 4

0.5

0

d)

0 0

A

2

Economy

0.1 4

0

Figure 3.6: Contrast between P(A) and P(B), or the boost that A gets by the introduction of C: P(A|A, B,C) − P(B|A, B,C). a)Function (3.2), a=-20, c=1.5. b)Function (3.3), a=-20, c=2.4. c) Function (3.4), a=-20, c=2.8. d)Function (3.5), a=-1.8, c=2.5

a balanced but restricted at the centre of the diagonal, compromise effect. The two remaining functions (3.4) and 3.3, both predict a quite broad compomise effect, spread at several points of the diagonal line. Such a breadth is desirable since it is unlikely that the effect occurs only in a narrow area around the centre of the diagonal. Function 3.4, predicts that in the centre of the line the effect disappears, whereas function 3.3 gives a relatively balanced prediction for the compromise effect. Consequently the latter sigmoid function with decline point at around 2.4 (note A − B = 2.82), is the one which seems to account satisfactorily for the three phenomena. Other non-monotonic functions were tested as well, providing poor results. An indicative such function along with the corresponding predictions is given as below: f (x) =

0.5x + 1.9 −0.00001x13 e . 60

(3.6)

The point of using non-monotonic functions is that we want to start from low inhibition in order to maintain the similarity effect and to increase inhibition for moderate distances in order to facilitate the compromise effect. Although this approach is ad-hoc and tailored for the two effects we can see that while the compromise effect is repro-

Chapter 3. Exploring Core Mechanisms in DFT and LCA

a)

Compromise effect on the diagonal

0.2

Magnitude of the effect

Magnitude of the effect

Compromise effect on the diagonal Compromise for A Compromise for B

0.1 0 −0.1 −0.2 0

B

A

1

2

3

4

Point on the diagonal

b)

0.1 0 Compromise for A Compromise for B B

A

1

2

3

4

Point on the diagonal

Compromise for A Compromise for B

0.2 0.1 0 −0.1 −0.2 0

B

A

1

2

3

Point on the diagonal

4

0.5

Magnitude of the effect

Magnitude of the effect

c)

0.2

−0.2 0

0.3

Compromise effect on the diagonal

Compromise effect on the diagonal

−0.1

43

d)

Compromise for A Compromise for B

0

−0.5 0

B

A

1

2

3

Point on the diagonal

4

Figure 3.7: The magnitude of the compromise effect as option C moves on the diagonal.a)Function (3.2), a=-20, c=1.5. b)Function (3.3), a=-20, c=2.4. c) Function (3.4), a=-20, c=2.8. d)Function (3.5), a=-1.8, c=2.5

Distance Dependent Inhibition Function

Color Plot of Preference

0.05

4 3

Quality

Inhibition

0.04 0.03 0.02

0 0

2

Distance

4

6

0.4

B

0.3 2 0.2 1 0 0

A

2

Economy

1

2

Economy

3

4

Compromise effect on the diagonal 0.5

Magnitude of the effect

Quality

3

A

0 0

b)

P(A)−P(B) 4

c)

2 1

0.01

a)

B

0.1 4

0

d)

0.3 0.2 0.1 0 −0.1 −0.2 0

Compromise for A Compromise for B B

1

A

2

3

Point on the diagonal

4

Figure 3.8: Function and predictions for a non-monotonic asymmetric inhibitory function (3.6). a) Function (3.6). b) Color plot which outlines similarity and attraction effects. c) Contrast between P(A) and P(B) by the introduction of a third option C. d) Compromise effect on the diagonal.

Chapter 3. Exploring Core Mechanisms in DFT and LCA

44

duced successfully, there is a counter-intuitive prediction (i.e.,see blob below option B), which interferes with the attraction and similarity effect (figure 3.8(c)).

3.2.4 Conclusions The main finding of the current investigation is that a sigmoid function must be restricted to be sharp enough in order to resemble the shape of a step function, so as to accommodate both the similarity and compromise effects. This is consistent with the single set of values that DFT authors originally used. Thus the general description of the distance dependency in competition must be narrowed in the class of the sigmoid functions which was tried, since other type of functions such as linear or exponential would fail to account for the effects. We also found that the point of decay for the sigmoid function must be larger than the distance |A −C|, in order to maintain high enough inhibition between the compromise and the extreme alternatives. Accordingly at distance |A − B|, the inhibition must approach zero or the two extreme objects must not interact. Consequently a sigmoid function which starts to decrease sharply after the point |A − C|, can account for the compromise effect. Such a function is also able to reproduce the similarity and attraction effects. For the similarity, the starting point of the inhibition function is the property which matters more. A high starting point would damage the effect, while a low one would favor it. Thus the sigmoid function starts from a point which is not too high to damage the similarity effect, but at the same time not too low to hurt the compromise effect. In our investigation, this point was found to be around 0.042, for a fixed noise parameter equal to 0.2. An increase in the value of the noise would be compensated by lowering proportionally the starting point of the sigmoid. Finally the attraction effect requires a strong inhibitory coupling between the target and the decoy for the phenomenon of boosting through negated inhibition to take place, and no interaction between the aforementioned pair and the distant, dissimilar option. It was found that a sigmoid function with a starting point at 0.042 provides strong enough links between the pair of the decoy and the target, for the attraction effect to be present at a large extent. Unfortunately, until now there are no experimental studies which question the nature

Chapter 3. Exploring Core Mechanisms in DFT and LCA

45

of the preference reversals in a parametric setup as the one we used in our computational investigations. Thus the refined predictions about the magnitude of the effects across the choice space which we provided cannot be fully interpreted in the absence of relevant behavioral data. Nevertheless the parametric method helped us to rule out inhibition functions (e.g., the asymmetric, non-monotonic function) on the basis of counterintuitive or inadequate predictions they provided at specific areas of the choice space. Therefore the parametric method proved to be a useful instrument in refining the inhibition mechanism for DFT, while the detailed predictions it provided can motivate future experimental studies. In the following section we will rely on the same methodology in order to explore core mechanisms in LCA.

3.3

Investigating Inhibition and Loss-aversion in LCA

3.3.1 Motivation The loss-averse, asymmetric value function is the mechanism which allows LCA to account for the compromise and attraction effects. In terms of importance, the asymmetric value function is analogous to the similarity dependent inhibition mechanism in DFT. However as we discussed in the previous section, the competition mechanism in DFT is subject to various restrictions concerning the exact form of the distance dependency. Therefore in this section we will question whether the value function in LCA is subject to other restrictions than asymmetry of any type between gains and losses, in order to explain the three preference reversals. The broader aim of this exploration is to compare the two theories in terms of the universality of the set of principles they rely on in order to explain behavior. For DFT the very specific sigmoid shape that the inhibition function must bear in order to capture the effects can undermine the generality of the theory as it is currently formulated. This happens because the principle of distance dependent competition implies that any monotonically decreasing function should explain the effects. However as we figured out only a particular subset of this class of functions (i.e., sharp sigmoid functions) validates the theory. On that basis a reformulation of DFT should be possible. Instead of describing the inhibition mechanism as being local and distant dependent, it would be more accurate if we defined it as being global within a particular radius with the latter being determined at an earlier preprocessing perceptual stage (i.e., options that

Chapter 3. Exploring Core Mechanisms in DFT and LCA

46

are conceived as belonging to the same category compete to each other, while distant options are realized as non-competitors). On the other hand LCA relies on the principles of intrinsic loss-aversion and global competition between all the options of the choice space. These two principles seem generic enough to constitute a robust theory; explicit loss-aversion can be implemented potentially by any asymmetric value function, while global competition is anticipated to have the same impact within a broad range of values. Thus in this section we will evaluate the global inhibition and the loss-aversion mechanisms in LCA using the parametric method. It is highly expected that the variability of the predictions across different configurations of LCA will be too low to enable us characterize LCA as being robust and accurately formulated.

3.3.2 Exploring the level of global inhibition At this stage the levels of global inhibition between alternative options are varied in the range [0.35, 0.85]. The rest of the LCA parameters are defined as follows: noise parameter equals to 0.2, neural leak equals to 0.95 and I0 equals to 0.75. The value function that was used, following the original model (Usher and McClelland, 2004) is depicted at figure 3.9 and described by the following equation: ( z ,if x > 0. V (x) = and z = log(1 + |x|). 2 −(z + z ) ,if x < 0.

(3.7)

2 1

value

0 −1 −2 −3 −4 −4

−2

0

gain/loss

2

4

Figure 3.9: Asymmetric value function (3.7).

As figure 3.11 suggests the predictions that LCA gives concerning the attraction and similarity effect are relatively stable. The attraction effect is stronger closer and down

Chapter 3. Exploring Core Mechanisms in DFT and LCA Color Plot of Preference

Color Plot of Preference

1

1 B

.8

Quality

Quality

.8 .6 .4 .2 0 0

.2

.4

.6

Economy

.8

1

.4 A

.2

.4

.6

Economy

.8

1

Color Plot of Preference 1

B

.8

Quality

Quality

.6

0 0

Color Plot of Preference

.6 .4 .2 0 0

B

.2

A

1 .8

47

.4

.6

Economy

.8

.6 .4 .2

A

.2

B

1

0 0

A

.2

.4

.6

Economy

.8

1

Figure 3.10: Color plot for the preferences. a) Inhibition equals to 0.35. b) Inhibition equalts to 0.55. c) Inhibition equals to 0.65. d) Inhibition equals to 0.85.

to A and as option C moves further away it diminishes. The similarity effect seems stronger left and above from B. The clear predictions concerning the magnitude of the attraction and the similarity effects can be atrributed to the specific non-linear value function we used. On the other hand in DFT parametric investigations, the linearity of the value function along with the sharpness of the distance dependent inhibition gave dichotomous and uniform, in terms of magnitude, predictions for the two effects. While attraction and similarity effect are stable across different levels of global inhibition, the compromise effect is damaged for higher levels of competition as suggested in figure 3.12. While lower inhibition (i.e., figure 3.12(a)) creates a strong and smooth compromise effect stronger inhibition makes it noisier and less symmetric. Note also that for inhibition equal to 0.85 the compromise effect is not even predicted. In summary it is found that for inhibition in between [0.35, 0.75], LCA can robustly account for the three contextual phenomena. Too low inhibition facilitates the compromise effect while too high inhibiton destroys it. In the next section symmetric and asymmetric value functions are tested.

Chapter 3. Exploring Core Mechanisms in DFT and LCA P(A)−P(B)

P(A)−P(B)

1

0.5 B

0.4

0.6

0.3

0.4

0.2

0.2 0.5

0.5 B

0.4

0.6

0.3

0.4

0.2

0.2 0.5

P(A)−P(B)

P(A)−P(B)

Quality

0.5

1

0.4

0.8

0.6

0.3

0.4

0.2 0.1

A

0.5

Economy

Quality

B

0 0

0

(b)

1

0.2

1

Economy

(a)

0.8

0.1

A

0 0

0

1

Economy

0.8

0.1

A

0 0

1

Quality

Quality

0.8

48

0.4

0.6

0.3

0.4

0.2

0.2

0.1

A

0 0

0

1

0.5 B

0.5

Economy

(c)

1

0

(d)

Figure 3.11: Contrast between P(A) and P(B), or the boost that A gets by the introduction of C: P(A|A, B,C) − P(B|A, B,C). a) Inhibition equals to 0.35. b) Inhibition equalts to 0.55. c) Inhibition equals to 0.65. d) Inhibition equals to 0.85.

0.1 0 −0.1 −0.2 0

Compromise for A Compromise for B B

A

0.5

Compromise effect on the diagonal Magnitude of the effect

Magnitude of the effect

Compromise effect on the diagonal 0.2

1

Point on the diagonal

0.15 0.1 0.05 0 Compromise for A Compromise for B

−0.05 −0.1 0

B

(a)

0.1 0 Compromise for A Compromise for B B

A

0.5

Point on the diagonal

Compromise effect on the diagonal Magnitude of the effect

Magnitude of the effect

Compromise effect on the diagonal

−0.2 0

1

(b)

0.2

−0.1

A

0.5

Point on the diagonal

1

0.2 Compromise for A Compromise for B

0.1 0 −0.1 −0.2 0

B

A

0.5

Point on the diagonal

1

Figure 3.12: The magnitude of the compromise effect as option C moves on the diagonal. a) Inhibition equals to 0.35. b) Inhibition equalts to 0.55. c) Inhibition equals to 0.65. d) Inhibition equals to 0.85.

Chapter 3. Exploring Core Mechanisms in DFT and LCA

49

3.3.3 Varying the value function At this section various value functions are tried. The same parameters as in the previous section, are used while the level of inhibition is set to 0.5. The first value function which is tested is the symmetric one. The point of this investigation is to confirm across the choice space, that the loss-averse property in LCA enables it to account for the compromise and the attraction effects. The equation of the symmetric function along with predictions concerning the three effects are given as below: ( log(1 + x) ,if x > 0. f (x) = −log(1 − x) ,if x < 0.

(3.8)

2 1

value

0 −1 −2 −3 −4 −4

−2

0

2

gain/loss

4

Figure 3.13: Symmetric value function (3.8).

Color Plot of Preference

Color Plot of Preference

1 B

.8

Quality

Quality

.8

1

.6 .4 .2 0 0

.4

.6

Economy

.8

.6 .4 .2

A

.2

B

1

0 0

A

.2

.4

.6

Economy

.8

1

Figure 3.14: Color plot for the preferences. Each point corresponds to an rgb(prefA,prefB,prefC). a)Asymmetric value function (3.7). b)Symmetric value function (3.8).

Figure 3.15(b) suggests that the symmetric value function, predicts a very weak attraction effect and a strong similarity effect, as opposed to the asymmetric value function in figure 3.15(a) which captures similarity and attraction simultaneously. Additionally a symmetric value function totally fails to reproduce the compromise effect (figure

Chapter 3. Exploring Core Mechanisms in DFT and LCA

P(A)−P(B)

P(A)−P(B)

Quality

B

0.5

1

0.4

0.8

0.6

0.3

0.4

0.2

0.2

0.1

A

0 0

0.5

Economy

Quality

1 0.8

50

0.4

0.6

0.3

0.4

0.2

0.2

0.1

A

0 0

0

1

0.5 B

0.5

Economy

(a)

0

1

(b)

Figure 3.15: Contrast between P(A) and P(B), or the boost that A gets by the introduction of C: P(A|A, B,C) − P(B|A, B,C). a)Asymmetric value function (3.7). b)Symmetric value function (3.8)

0.1 0 −0.1 −0.2 0

Compromise for A Compromise for B B

A

0.5

Point on the diagonal

(a)

Compromise effect on the diagonal Magnitude of the effect

Magnitude of the effect

Compromise effect on the diagonal 0.2

1

0.4 Compromise for A Compromise for B

0.2 0 −0.2 −0.4 0

B

A

0.5

Point on the diagonal

1

(b)

Figure 3.16: The magnitude of the compromise effect as option C moves on the diagonal, from the upper left corner to the lower right corner. P(C|A,B,C)

P(C|A,B,C)

Red line: P(A|A,B,C)+P(C|A,B,C) − 0.5. Green line: P(B|A,B,C)+P(C|A,B,C) − 0.5. a)Asymmetric value function (3.7). b)Symmetric value function (3.8).

Chapter 3. Exploring Core Mechanisms in DFT and LCA

51

3.16(b)). Consequently it is confirmed computationally that the explanatory power of LCA lies in the asymmetry in the value function (i.e., explicit implementation of loss aversion). The results also stress the analogy between distance dependent inhibition in DFT and asymmetry in the value function of LCA; the absence of distance dependent inhibition (i.e., global inhibition) in DFT facilitates the similarity effect and accordingly the absence of asymmetry in the value function of LCA enhances the similarity effect as we can see in figure 3.15(b). Having exhibited the importance of asymmetry in the value function for LCA, we next proceed with testing various asymmetric value functions with varying steepness.Note that equation (3.9) is the one which was introduced in prospect theory (Kahneman and Tversky, 1979). Note also that changing the value function requires change in I0 , which is the constant positive activation we introduce in order to prevent valences from getting stuck to negative values and thus activations to remain at zero level. Therefore the values for I0 that were used in each case are also given. The value functions are given as below. 4

value

2 0 3.9 3.10 3.11 3.12

−2 −4 −6 −4

−2

0

2

4

gain/loss

Figure 3.17: Various asymmetric value functions.

( f (x) = ( f (x) =

,if x > 0.

−2.25(−x)0.88 ,if x < 0. z

,if x > 0.

−3z ,if x < 0. (

f (x) =

x0.88

z

,if x > 0.

−4z ,if x < 0. (

f (x) =

1

−3(−x)

1 2

(3.9)

and z = log(1 + |x|), I0 = 1.5.

(3.10)

and z = log(1 + |x|), I0 = 2.

(3.11)

,if x > 0.

x2

and I0 = 1.75.

,if x < 0.

and I0 = 3.75.

(3.12)

Chapter 3. Exploring Core Mechanisms in DFT and LCA Color Plot of Preference

Color Plot of Preference

1

1 B

.8

.6

Quality

Quality

.8

.4 .2 0 0

.4 .6 Economy

.8

.6 .4 .2

A

.2

B

0 0

1

A

.2

(a) Color Plot of Preference

1

Color Plot of Preference

B

.8 Quality

Quality

.8

1

.6 .4 .2 0 0

.4 .6 Economy

(b)

1 .8

52

.4 .6 Economy

.8

.6 .4 .2

A

.2

B

1

(c)

0 0

A

.2

.4 .6 Economy

.8

1

(d)

Figure 3.18: Color plot for the preferences. Each point corresponds to an rgb(prefA,prefB,prefC). a)Value function as in prospect theory (3.9). b)Value function (3.10). c)Value function (3.11). d)Value function (3.12).

The results in figures 3.18, 3.19 and 3.20, suggest that different asymmetric value functions can return uniform predictions. Therefore the loss-aversion property is not subject to restrictions concerning the exact type of asymmetry; any type of asymmetry suffices for the three effects to be reproduced. At this point it is useful to discuss the role of the I0 parameter in the accountability of LCA and whether it imposes problems in the generality of the theory. Busemeyer and colleagues (Busemeyer et al., 2005) mention that the I0 parameter though crucial enough as it prevents inputs from becoming negative, it is problematic since it depends on the total loss-aversion contribution (i.e., loss aversion summed across all the alternatives). They argue that if the contributions from loss aversion increase, then I0 needs to be increased to overcome the total loss-aversion contribution. They finally envisage that keeping fixed I0 will affect the model’s accountability across a wide variety of choice sets. During our investigations concerning global inhibition I0 was kept fixed at a value of 0.75 and the three effects were successfully predicted across the whole choice space. However when we tried different asymmetric value functions we needed to specify a different I0 value for each function (i.e., having I0 = 0.75 for all the functions proved to

Chapter 3. Exploring Core Mechanisms in DFT and LCA

53

be problematic for the compromise effect). This was necessary since value functions with stronger loss-aversion (i.e., higher degree of asymmetry) require higher constant activation in the inputs, in order to prevent them from being negative. An appropriate I0 value, can be derived directly by the properties (i.e., the slopes) of each value function. Once it is determined, it remains fixed across the whole choice space, and does not have to be manipulated for different choice sets, in order to capture the effects as DFT authors anticipated. P(A)−P(B)

P(A)−P(B)

1 B

1

0.5 B

0.4

0.8

0.6

0.3

0.6

0.3

0.4

0.2

0.4

0.2

0.2

0.1

A

0 0

0.5 Economy

Quality

Quality

0.8

0.5

1

0.2 0.5 Economy

1

0

(b) Value function (3.10)

P(A)−P(B)

P(A)−P(B)

1

0.5 B

1

0.5 B

0.4

0.8

0.6

0.3

0.6

0.3

0.4

0.2

0.4

0.2

0.2

0.1

0 0

A

0.5 Economy

1

0

(c) Value function (3.11)

Quality

Quality

0.1

A

0 0

0

(a) Default value function (3.1)

0.8

0.4

0.4

0.2 0 0

0.1

A

0.5 Economy

1

0

(d) Value function (3.12)

Figure 3.19: Contrast between P(A) and P(B), or the boost that A gets by the introduction of C: P(A|A, B,C) − P(B|A, B,C). a)Value function as in prospect theory (3.9). b)Value function (3.10). c)Value function (3.11). d)Value function (3.12).

3.3.4 Conclusions At this section we assessed the predictive robustness of LCA for various levels of inhibition and different value functions. It was found that inhibition within a broad range (i.e., [0.35, 0.75]) and any type of asymmetry, given that the value of I0 is tuned accordingly, returns similar patterns of predictions. A novel prediction, as compared to DFT is the gradation in the magnitude of the attraction effect, as the separation between the target and the decoy increases (figure 3.11 as opposed to 3.6).

Chapter 3. Exploring Core Mechanisms in DFT and LCA

Compromise effect on the diagonal

Compromise effect on the diagonal 0.3 Magnitude of the effect

Magnitude of the effect

0.3 0.2 0.1 0 Compromise for A Compromise for B

−0.1 −0.2 0

B

0.2 0.4 0.6 0.8 Point on the diagonal

A

0.2 0.1 0

−0.2 0

1

B

0.2 0.4 0.6 0.8 Point on the diagonal

A

1

(b) Value function (3.10)

Compromise effect on the diagonal

Compromise effect on the diagonal 0.3 Magnitude of the effect

0.4 Magnitude of the effect

Compromise for A Compromise for B

−0.1

(a) Default value function (3.9)

0.2 0 −0.2 −0.4 0

54

Compromise for A Compromise for B B

0.2 0.4 0.6 0.8 Point on the diagonal

A

1

Compromise for A Compromise for B

0.2 0.1 0 −0.1 0

(c) Value function (3.11)

B

0.2 0.4 0.6 0.8 Point on the diagonal

A

1

(d) Value function (3.12)

Figure 3.20: The magnitude of the compromise effect as option C moves on the diagonal. a)Value function as in prospect theory (3.1). b)Value function This was (3.10). c)Value function (3.11). d)Value function (3.12).

not observable in DFT for two possible reasons; firstly the linearity of the value function and the sharpness of the inhibition mechanism dichotomize the choice space in areas where the attraction effect occurs and in areas where it does not. A second possible explanation is that on account of the linear dynamics, the decoy option bears infinitely large activations (i.e., the system blows up) and as a consequence the attraction effect is indistinguishably high across the whole relevant area. The latter assumption will be investigated in the following chapters where the undesired computational consequences of retaining linear dynamics will be explored. Another important finding in the exploration of the current section, was that a symmetric logarithmic (i.e., diminishing returns) value function totally fails to reproduce the compromise effect. This motivates us to consider simpler versions of LCA where instead of comparing all options the each other, we construct from the choice set one reference point each time. Such a model with a logarithmic symmetric value function is anticipated to be able to reproduce the compromise effect. This prospect is explored in the forthcoming section.

Chapter 3. Exploring Core Mechanisms in DFT and LCA

3.4

55

Simpler Versions of LCA

3.4.1 Motivation In the previous section we observed how important the loss-aversion function is for the accountability of the LCA with respect to the compromise effect. Essentially the lossaversion property results in the compromise having two small disadvantages, while the extreme objects bear one large and one small disadvantage each one, which amounts o a larger penalty from the asymmetry of the value function. In the default version of LCA, for the evaluation of each option a reference frame is constructed which consists of all the other options in the set. However let us consider that each option is evaluated with respect to one point of reference e.g., the neutral (0, 0) point. Under this configuration a symmetric, logarithmic value function should be able to account for the compromise effect. Such a function starts linear and then it saturates (figure 3.21(a)). In the context of LCA, at each moment only one dimension is evaluated and the total additive utility is the sum of two logarithmic functions, one for each dimension. That way the options that are placed in the middle of the choice set bear a higher additive utility than the extreme options. The contour plot in figure 3.21(b), depicts equal preference curves, curved in the x1 − x2 continuum; it is confirmed that the compromise option (0.5, 0.5) has a much better utility than the (1, 0) object. In other words a logarithmic value function captures the concept of extremeness aversion for multi-attribute choice. This fact motivates us to present an alternative approach of LCA with only one reference point. In preliminary simulations, we tried several reference points such as the neutral point, reference defined as the minimum in each dimension in the choice set and reference defined as the centre of gravity of the options set. While all the reference points which were tested reproduced successfully the compromise effect, some of them returned poor results in terms of the attraction effect and here is the explanation. With a reference point equal to the neutral (0, 0), the target and the dissimilar option will have equal utilities since they are symmetric. This situation could change only if we moved the reference close to the decoy option. That way the target being compared to its similar decoy would have larger additive utility and would finally be more popular than the distant object. Thus LCA with symmetric value function would be optimized for the three effects if the reference point was being constructed adaptively during a

Chapter 3. Exploring Core Mechanisms in DFT and LCA

56

2 1

value

0 −1 −2 −3 −4 −4

−2

0

gain/loss

2

4

(a)

(b)

Figure 3.21: a)Symmetric, logarithmic value. b)Combined 2D-utility function for gains (x1 > 0, x2 > 0).

preprocessing perceptual stage, dependent on the choice set. Nevertheless this is a quite ad-hoc approach, hardly justifiable and tailored to accomodate the three effects.

3.4.2

LCA with one reference point

Let us consider that the reference point is constructed adaptively. A plausible assumption when setting the reference point would be to relate it with the object which has the lowest additive utility in the choice set. That way we hypothesize that the worst option becomes salient in a preprocessing stage. But such an approach would be problematic when all the three options lie on the indifference curve, namely they have equal additive utilities. We present a preprocessing algorithm which constructs a reference point each time as follows: 1. 2. 3. 4. 5.

Calculate additive utilities for the options; If all objects have the same utility then ref=(min(da ,mindb ); else W=object with the minimum utility; W reference=( Wki , k j );

Upon presentation of the choice set the decision maker searches whether she can find a salient, poor object; if no, then the reference point is set to her current situation which is the neutral point. But if there is a worst, in terms of additive utility, object then the reference point is set in between it and the neutral point. The latter situation is

Chapter 3. Exploring Core Mechanisms in DFT and LCA

57

applicable to the attraction effect where the point of reference is located on the line that is defined between the zero point and the decoy object (figure 3.22(b)). 1

1

1

0.8

0.6

Quality

Quality

0.8

A

0.4 0.2 0.5 Economy

0.4

(a)

R

0.6 C

0.4 0.2

B

0 0

1

A

0.8

0.6

0.2

B

0R 0

C

A

Quality

C

0.5 Economy

1

(b)

B

0R 0

0.5 Economy

1

(c)

Figure 3.22: a)Reference point for the similarity effect (neutral). b)Reference point for the attraction effect (between neutral and decoy). c)Reference point for the compromise effect (neutral).

We simulated LCA using the above approach, for a symmetric value function, global inhibition equal to 0.4, noise parameter equal to 0.2 and I0 =0.2. The results of the parametric investigation are given as follows: Color Plot of Preference

P(A)−P(B)

0.8

.6 .4 .2 0 0

.4 .6 .8 Economy

(a)

B

0.4

0.6

0.3

0.4

0.2

0.2

A

.2

0.5

1

0 0

0.1

A

0.5 Economy

1

0

0.4 Magnitude of the effect

B

Quality

Quality

.8

Compromise effect on the diagonal

1

1

0.2 0 −0.2

Compromise for A Compromise for B

−0.4 0

B

(b)

0.5 Point on the diagonal

A

1

(c)

Figure 3.23: a) Color plot for the preferences for one reference point. b) Contrast between P(A) and P(B) for one point of reference. c) The magnitude of the compromise effect for one point of reference.

As we can observe in figure 3.23, all the three effects are captured satisfactorily. We get a broad similarity effect and a gradual -as a function of separation between the decoy and the targer- attraction effect (figure 3.23(b)), while the compromise effect is broad and well-balanced (figure 3.23(c)). It is interesting to see how the effects are reproduced under a symmetric, logarithmic value function. As we mentioned in the introduction, the compromise effect occurs because the value of the compromise object is higher than the value of the extreme alternatives on account

Chapter 3. Exploring Core Mechanisms in DFT and LCA

58

of the non-linearity of the value function. The similarity effect is due to the stochastic alternation of the dimensions which forces the similar options to be correlated in time and to split their shares. The attraction effect can be understood in terms of additive utility: with respect to the point of reference (figure 3.22(b)), the target option A is advantageous in both dimensions, while the dissimilar option B bears a larger advantage in the one attribute and a small disadvantage in the other. Thus option A has a larger additive utility than B, on account of the saturation of the value function for large gains and the small disadvantage of B in one dimension. This approach was designed on the basis of reproducing the three effects simultaneously. It is quite fuzzy whether there exists only one reference point and whether it changes adaptively. This motivates us to present a hybrid version of LCA and DbS which utlizes the symmetric value function without making strong and implausible claims about the relative position of the reference point in the choice space.

3.4.3 A Hybrid Model The main finding of the previous investigation is that it is possible to capture the three effects using alternative versions of LCA and a symmetric value function. However the approach with a changing reference point which was presented above is artificial and tailored in order to optimize the effects. Additionally we do not want to introduce unrealistic assumptions in the LCA framework, since until now it has been proved to be robust and well-defined. Any modification of the LCA framework should improve its descriptive adequacy without abolishing its explanatory power (i.e., a well-defined set of principles which mediates choice). Thus there is no point in abandoning the asymmetric function and at the same time introducing a heuristic-like and computationally complex 1 mechanism which freely defines the reference point depending on the choice set. But why should we abandon the assumption of an asymmetric value function, especially when we do not have enough evidence that it is not an intrinsic property of the neural system which encodes value? Essentially what we are attempting is to provide a higher level explanation of the loss-aversion property as emerging from the cognitive architecture which drives choice. A characteristic such explanation is given under the 1 A sequence of if-then-else rules, might look simple enough in the first place. However they bear more computational complexity when it comes to implementation, than the additive/ multiplicative operation of neural network.

Chapter 3. Exploring Core Mechanisms in DFT and LCA

59

DbS (Stewart et al., 2006) framework where loss-aversion is attributed to the asymmetry in the real-world distribution of losses and gains. This motivates us to combine LCA and DbS under a hybrid scheme. As we reviewed earlier, DbS models multi-attribute choice as a two stage process. In the first perceptual stage grouping takes place and at the decision stage choice is mediated by a combination of stochastic alternation between dimensions, binary comparisons (one at each moment) and leaky competing accumulation of each option’s successes. The clustering process along with the similarity dependent frequencies of the binary comparisons account for the attraction and the similarity effect, while in the compromise effect no grouping takes place and the middle option dominates because it participates in more comparisons (e.g., it is considered more often). The grouping assumption imposes a problem in the implementation of the model; we have to define how close two options must be in order to form a group. Additionally the fact that the comparisons are binary prevents DbS from making graded predictions on the magnitude of the effects in a parametric setup. It is possible though to overcome these mechanistic problems by introducing some assumptions from the LCA framework. In particular we maintain the leaky competing accumulation scheme as it was described in DbS, but instead of grouping the options and making binary comparisons, we introduce a symmetric value function. At each moment one pair of groups is considered, with more similar options being compared more often (i.e., frequency is maintained similarity dependent). For each pair the relative advantages of the options are considered according to the value function, on the active dimension. Essentially only advantages are taken into account since a disadvantage does not bring on a change in the preference state of an alternative. Thus instead of sending to the accumulators only whether an option is better than another, we output how much better it is. It is interesting to describe how this approach will account for the effects. For the similarity case, the comparison between the similar options will give a small advantage to either option splitting their shares. The comparisons with the dissimilar option will give a 50% (i.e., advantageous dimension) share to it and 25% to each of the similar options (attentions fluctuates across dimensions and pairs i.e., A-S, A-B). Thus we have the concept of the stochastic alternation between dimensions which results in the similar alternatives to split their shares. The more frequent comparisons between the

Chapter 3. Exploring Core Mechanisms in DFT and LCA

60

similar options will not compensate since during such comparisons the objects cancel each other. The same description is applicable for the attraction effect. The comparisons between the similar options will always favor the target. Nevertheless the relative advantage which the target will get and will carry over depends on its distance with the decoy. Thus we envisage that if we place the decoy very close to the target the latter will get a very small advantage and it will not dominate the dissimilar option. On the other hand if the decoy is not so close a clear advantage will be given to the target and the attraction effect will occur. This is an interesting prediction, since it suggests that only clear decoys will boost the popularity of the target. Alternatively we could introduce a value function which saturates very early; that way the difference between the small and large advantages would be eliminated. For the compromise effect, the fact that the compromise option is considered in more comparisons (i.e., A-C, B-C are more frequent than A-B), suffices for boosting its shares. Note however that in this version the comparisons are not binary and consequently the value function must map the advantage of e.g., C over A to a value close to the advantage of e.g., A over B passed through the function. This is essential since a linear value function would weigh the advantage of the one extreme to the other as two times higher than the advantage of the compromise to the first extreme object and that way the relative overall advantage that the compromise takes due to the more comparisons it participates in, would be abolished. Thus the value function must be adjusted to start saturating at least at distance |A −C|. This concept is pretty similar to the distance dependent inhibitory function in DFT, where the sigmoid must be high enough in a distance |A −C| and zero at |A − B|. Thus the idea of an emergent loss-aversion is grounded in both cases, by under-processing the pairs of the distant objects, either by preventing them from competing (i.e, DFT) or by assuming that they are subject to less frequent comparisons and the value function bears diminishing returns (i.e., hybrid model). Therefore we could predict that a cognitive explanation for the loss-aversion property might lie in the assumption that dissimilar comparisons are under-considered, while similar options receive more processing. This explanation must be regarded as a higher level description of a single neural system which encodes asymmetrically gains and losses. Presumably people do not employ themselves much in dissimilar comparisons because the latter are more frustrating since large disadvantages loom much larger than the corresponding advan-

Chapter 3. Exploring Core Mechanisms in DFT and LCA

61

tages. More computational work must be done concerning the hybrid model we discussed above. For example we should question what a continuous value function adds to the model. It might be possible that we could quantify the effects in the 2-d choice space, using only binary comparisons (i.e., the original DbS version) and a continuous, distant dependent function to finely determine the frequency of each binary comparison. Experimental studies which will examine the level of gradation -if any- of the magnitude of the effects across the 2-d space, will define accurately what should be expected from a computational model.

3.5

Summary of the Chapter

In the first section of the chapter we explored the inhibition mechanism in DFT, being motivated by the fact that in the original version of the model the mechanism was described as distance dependent but the same set of inhibition values was explicitly assigned for all the situations. The computational simulations suggested that the inhibition mechanism must be subject to several constraints; the inhibition function has to resemble a step function (i.e., sharp sigmoid) and the point after which the function decays must be derived from the range of the attribute values. The fact that the inhibition mechanism is characterized by a step function is not a problem itself. However the description of distance dependent competition in DFT is too general and inaccurate. Inhibition should be formulated as global within a radius which can be linked to a concept of perceptual grouping at a preprocessing stage. In the next section of the chapter we evaluated the robustness of the LCA model. Several levels of inhibition were tried along with different asymmetric value functions and the results of the parametric investigations across all these different configurations were quite stable. This suggests an advantage of LCA over DFT, since the set of principles of the first are quite well-defined and generic (i.e., asymmetry of any type in the value function) whereas the assumptions of the latter need to be restated. Finally in the third section of the chapter we questioned whether LCA can account the effects using a symmetric function. This was possible in a version of LCA with one reference point which changes depending on the choice set. Nevertheless we recognize that this approach is quite ad-hoc and hardly justifiable or linked to behavior. Thus we

Chapter 3. Exploring Core Mechanisms in DFT and LCA

62

attempted to combine the concept of the symmetric value function with the promising DbS framework, proposing that way an interesting prospect of unifying the two models. However the absence of behavioral data concerning the magnitude of the effects in the choice space renders the selection of the appropriate set of principles from each model uncertain. It seems that the field reaches its limits and before we are able to revise the theories we need more empirical data.

Chapter 4 The Predictive Power of DFT and LCA

4.1

Introduction

So far we have explored core mechanisms in DFT and LCA, assessing them on the basis of the predictions they provide in a parametric setup. This methodology allowed us to refine DFT inhibition mechanism and to confirm the robustness of LCA. However the lack of parametric experimental data prevents us from discriminating the two models being based only on the predictions they produce across the whole choice space. Thus we need additional paradigms for which each model gives different qualitative predictions. A first such paradigm is the relationship of the magnitude of the compromise effect to the separation between the extreme objects. We recall that DFT theory explains the compromise effect as a result of the correlation between the extreme object due to their both competing via local inhibition with the compromise alternative. The anticorrelations between the extreme options and the compromise one are driven by distant dependent inhibition; the stronger the local inhibition the higher the magnitude of the effect will be. Therefore the effect should decrease as we increase the separation of the extreme alternatives. Note however that this expectation is grounded on a distance decreasing inhibitory mechanism. In our explorations we restated this mechanism as being constant within a radius and inactive outside the radius. It is interesting to test computationally what DFT will predict for this paradigm, with a sharp sigmoid inhibition function. On the other hand LCA exploits the loss-averse value function in order to explain the 63

Chapter 4. The Predictive Power of DFT and LCA

64

compromise effect. The asymmetry of the function increases with distance. Consequently separating further the two options will penalize them more, enhancing the compromise effect. The different qualitative predictions which each scheme provides can be tested experimentally in a relevant paradigm. Thus in the first place we will provide some insights about what we expect from each model, using computational simulations. A relevant experiment which tests the correlational hypothesis has already been performed (Usher et al., 2008). The decision maker was presented with a choice set similar to the one used for the compromise effect. When the subject was choosing the one extreme object, then the latter was rendered unavailable and a quick second choice was elicited. According to the correlational hypothesis the next highest activation would be the other extreme alternative. However the experimental results suggested that the option that is chosen next is the compromise option. Essentially this is consistent with the loss-averse behavior which suggests that people prefer small changes instead of large ones. Thus assuming that the unavailable option is the point of reference then the closest option which will be chosen is the compromise. LCA which encompasses an asymmetric value function can fit the experimental results and account for this instance of loss-aversion. On the other hand the correlational hypothesis of DFT predicts the opposite choice pattern (i.e., the distant option is chosen). However DFT authors suggested an alternative approach in which a third dimension called unavailability is introduced (Busemeyer and Johnson, 2004). That way DFT accounts for the data and captures this instance of loss-averse behavior. We confirm this computationally and discuss how justifiable the introduction of a third dimension is. Finally we question whether the linear dynamics of DFT impose problems in the performance of the model. As it was discussed in previous chapters DFT does not have a mechanism in order to rule out from the competition, uninformative objects. This might have deleterious consequences in cases where many uninformative options are present in the choice set. Their activations will grow boundlessly negative, and the transmission of the latter via inhibitory links is expected to blow up the system. Thus we contrast the two models in paradigms where more than three options are presented in the choice set.

Chapter 4. The Predictive Power of DFT and LCA

4.2

65

The dependency of the compromise effect on the separation between options

The parametric characterization of trinary choice, was made in the previous chapter for a third alternative C, added to a standard pair A and B, which are situated at fixed points on the diagonal (i.e.,indifference curve). In this section we carry a complementary investigation, which keeps the compromise option, C (2,2) constant, while varying the distances between C and the extreme objects in a symmetric fashion (4.1(a)). In figure 4.2 we report the magnitude of the compromise effect, as a function of the distance

Distance Dependent Inhibition Function

4 Bn

0.05

3

0.04

2

Inhibition

Quality

between the extremes, for both DFT and LCA.

Bo C Ao

1 0 0

a)

0.03 0.02 0.01

An

2 Economy

4

b)

0 0

2

Distance

4

6

Figure 4.1: a)Investigating the effect of the distance between the extreme options on the compromise effect. b)Sigmoid function (3.3).

0.2

a)

Magnitude of the effect

Magnitude of the effect

0.2

0

−0.2

−0.4 0

2 4 Distance between extremes

b)

0 −0.2 −0.4 −0.6 0

2 4 Distance between extremes

6

Figure 4.2: The magnitude of the compromise effect as a function of the distance between the extreme options. a)For DFT with a sigmoid inhibition function (3.3). b)For LCA with an asymmetric, loss-averse value function (3.9).

Figure 4.2(a) shows that the compromise effect in DFT occurs within a particular distance range. The latter is determined by the exact shape of the sigmoid function, so that

Chapter 4. The Predictive Power of DFT and LCA

66

the effect occurs only when the distance between the compromise and the extremes is within the sigmoid-threshold and there is enough inhibition between the middle and the extreme objects. The sharp decline of the inhibition function (i.e., step-like function) results in the effect being relatively flat and very slightly decreasing within the radius it appears. On the other hand figure 4.2(b) suggests that under the LCA scheme the compromise effect starts to emerge after a distance, and its magnitude increases proportionally to the separation between the extreme alternatives and the compromise. Nevertheless after a point the magnitude of the effect starts to decrease. This reflects the properties of the asymmetric value function; the slope in the domain of losses increases with distance which results in the extreme options being more penalised and the compromise effect being enhanced. However after a point the value function at the negative domain starts to saturate and this decreasing slope results in the elimination of the effect. The above results are not conclusive since we do not have empirical data derived from a similar paradigm. However we can see that the LCA provides a clear prediction on the nature of the effect; the larger the separation between the extremes the stronger the effect will be. On the other hand DFT with distant dependent inhibition was expected to predict the opposite pattern. Nonetheless we have shown that not any decreasing inhibition function can accommodate the three effects under DFT, but only a sharp sigmoid one. Being restricted in using a particular function for inhibition in DFT, we predict a within range, flat compromise effect. Although we have to confirm it experimentally, it is unlikely that the compromise effect will occur suddenly within a narrow radius, with no gradation in its magnitude. Thus it seems that the sharp inhibition function, at which we concluded as the only possibility to account for the effects using distance dependent competition, restricts significantly the predictive power of DFT. In the next section we present experimentally informed arguments against the correlational explanation which DFT provides for the compromise effect.

4.3

Reference effects and unavailable options

An instance of loss-averse behavior suggests that small changes are preferred over large ones. Thus when a choice is made with respect to a point of reference, the option

Chapter 4. The Predictive Power of DFT and LCA

67

that is closer to it is more likely to be chosen. While LCA can predict this behavior due to the explicit implementation of loss-aversion, DFT with its existing mechanisms cannot. Consider a triadic choice set with two extremes and one compromise option (figure 4.3). When the one extreme (i.e., option A) in the triadic choice set is chosen, it is rendered unavailable and the decision maker is asked to choose between the two remaining objects under time pressure. This paradigm has been tested experimentally and the subjects were found to choose the option which is closest to their point of reference i.e., the compromise option C in figure 4.3 (Usher et al., 2008). 1 A

Quality

0.8

chosen and set unavailable

0.6 C

0.4 0.2

B

0 0

0.5

1

Economy

Figure 4.3: a)When option B is chosen it becomes unavailable and options A and C compete for a short period until a choice is issued. b)Experimental results in the paradigm depicted in a).

As we have already seen, DFT employs a correlational mechanism under a compromise effect setup. The two extreme objects become correlated in time, due to the local inhibition with the middle option and as a result the latter emerges. However consider a trial which favours the correlated extreme objects, and one of them is chosen. The next highest in activation option, will be the other extreme object and as a result the one which is chosen next. The above paradigm with the chosen extreme being set unavailable and a fast next choice being elicited is simulated in LCA and DFT. In both cases the unavailability of the chosen option is implemented by taking it out of the competition. The results for the two models are presented as below: Table 4.1: DFT and LCA on reference effects when the chosen option (i.e.,point of reference) is taken out of the competition. Model

Triadic choice set

Binary choice set

P(A)

P(B)

P(C)

P(A)

P(B)

P(C)

DFT

0.292 ± 0.014

0.2931 ± 0.014

0.407 ± 0.014

0.8068 ± 0.022

—–

0.1932 ± 0.022

LCA

0.2782 ± 0.02

0.297 ± 0.02

0.4421 ± 0.02

0.321 ± 0.04

—–

0.6729 ± 0.04

Chapter 4. The Predictive Power of DFT and LCA

15 10

68

A (chosen extreme) B (second extreme) C (compromise)

5 0 −5 −10 0

100 200 Time steps

300

Figure 4.4: Activations for a single trial after setting the chosen extreme option unavailable.

As we can see in figure 4.4, due to the correlational mechanism of DFT the next highest option in activation is the other extreme object. Apparently DFT with its existing mechanism is unable to accomodate this instance of loss-averse behavior under its framework. However DFT authors have proposed an alternative implementation of the situation (Busemeyer and Johnson, 2004). According to it a new dimension called availability is introduced. The chosen option is being set unavailable, but is allowed to compete with the others. Additionally being inherently unavailable it bears negative valence due to its low atrribute value in the availability dimension. That way we have an other instance of the attraction effect; the unavailable option boosts the activation of the compromise object, by transmitting its negative activations via inhibitory links and thus the correct pattern is predicted: Table 4.2: DFT on reference effects when the availability dimension is introduced. Model

Triadic choice set

Binary choice set

P(A)

P(B)

P(C)

P(A)

P(B)

P(C)

DFT

0.292 ± 0.014

0.2931 ± 0.014

0.407 ± 0.014

0.178 ± 0.022

—–

0.821 ± 0.022

LCA

0.2782 ± 0.02

0.297 ± 0.02

0.4421 ± 0.02

0.321 ± 0.04

—–

0.6729 ± 0.04

As figure 4.5(a) suggests the compromise option is chosen for two reasons; first due to the negative activations of its close unavailable and second because the correlational mechanism drives the other extreme negative. Figure 4.5 shows the probability of predicting the right pattern as a function of the negative value of the valence, of the

Chapter 4. The Predictive Power of DFT and LCA

1 Probability of choice

10

Activation

5 0 −5 −10

a)

−15 0

69

A (chosen extreme) B (second extreme) C (compromise) 50

100 150 Time steps

200

250

b)

0.8 0.6

A (chosen/inhibited extreme) B (second extreme)

0.4

C (compromise)

0.2 0 0 0.5 1 1.5 2 Level of inhibition at the valence of option A

Figure 4.5: a)Activations for a single trial after setting the chosen extreme option unavailable. Unavailability is implemented by introducing a third dimension. b)Probability of choosing the compromise option as a function of the inhibition in the valence of the chosen extreme.

unavailable option. We can see that the highest the degree of unavailability is the more probably the compromise option is chosen. However the availability dimension should take only binary values; an object is or is not available at any given moment. Although the introduction of the third dimension provides the desired predictions it is still counter-intuitive to allow an unavailable option to participate as an active object in the decision process. A way to test this experimentally is as follows. Consider a choice set with three options as in the attraction effect case. During deliberation we announce that the decoy is unavailable. According to the availability assumtpion the decoy option will have even more negative valence. Thus the popularity of the target option should be higher as compared to the case when the decoy remains available. The unavailability hypothesis is also hard to justify. If unavailable options were allowed to compete with the active ones then the choice set would be infinitely large, containing all the objects that had been considered in the past. This motivates us to assess the performance of DFT in choice sets with more than three options. Due to the linear dynamics of the model it is possible that undesired computational consequences will occur under several situations. If this is true then it is questionable whether a third dimension assumption can be supported under the DFT framework.

Chapter 4. The Predictive Power of DFT and LCA

4.4

70

Extending the models beyond three alternatives

The main characteristic of DFT in comparison to LCA, is that it is a linear dynamic model. A consequence of this linearity is that the units can bear both negative and positive activations. Theoretical debates on whether this linearity is problematic or justifiable have been raised (Usher and McClelland, 2004; Busemeyer et al., 2005). Of particular questionability is the transmission of negative activations through inhibitory links leading to excitation. This phenomenon is accounted as being analogous to disinhibition, during which the suppression of an inhibiting neuron relases the previously inhibited unit and boosts its activation. However the excitation through disinhibition is meant to be bounded to the firing rates of the unit when it is not connected with other inhibiting neurons. Thus theoretically it is implausible to hypothesize that a theory of choice is an unbounded linear system. In this section we set aside the theoretical arguments and we investigate whether the linearity combined with the distance dependent inhibition, affects the choice patterns in DFT. Usher and McClelland (Usher and McClelland, 2004) were the first to argue that the fact that the activations are not constrained, may have deleterious consequences in cases where many non-informative options are present in the choice set. Poor options will end up with very negative activations which will send through inhibitory connections, a cascade of positive activations (i.e., excitation) to the other competing options with unpredicted results. DFT has no mechanism to prevent the dynamical system from blowing up. On the other hand LCA truncates negative activations to zero, ruling out that way at an early stage the disadvantageous, non-informative options. The first computational investigation examined the DFT with external stopping rule in a triadic choice set. The distance dependency was chosen as a steep sigmoid, consistent with the values used by Roe et al. (2001)1 . The choice set is depicted in figure 4.6(a) and consists of three options, ordered so that B is the best and A the worst with C in the middle. The DFT choice probability (over 1000 simulation trials) is plotted in figure 4.6(b), as a function of processing time (from 0-500 time-steps). This is done by computing, for each time step, the fraction of trials in which the corresponding option has the highest activation. 1 inh

= 0.042 for the close options and inh = 0.001 for distant options. Noise was set to 0.2 instead of 1 that was used in Roe et al. (2001), in order to compensate for setting inhibition higher than in the original paper (i.e.,inh = 0.042 instead of inh = 0.03).

Chapter 4. The Predictive Power of DFT and LCA

1

Probability of Choice

3.5 3 Quality

2.5 B

2 1.5 1 0.5 0 0

C

60 40

0.8

Preference State

4

71

0.6 0.4

2 Economy

3

4

0 −20 −40

0.2 −60

A 1

20

0 0

100

200

300

400

Time Steps

500

−80 0

100

200

300

400

500

Time Steps

Figure 4.6: a) The choice set suggests that option B has much higher additive utility. b) Probability of choice for the three options in DFT. c) Activations in a single trial for DFT.

As we see in figure 4.6(b), we obtain a very counter-intuitive prediction. After approximately 200 time steps option C emerges and finally beats option B, although the latter is superior by definition. This prediction results from the fact that, option B is distant from the others and the inhibitory connections that it bears with them are weak (i.e., inh = 0.001), C and A interact with stronger inhibitory links (i.e., inh = 0.042). Option A is a decoy to C and as a result it will bear negative activations which will feed through inhibitory connections option C with excitation (i.e., as in the attraction effect), which in the long run and after integration will render it the dominant option (see also figure 4.6(c), for single trial activations). The second problematic case for DFT involves a situation with 4 choice alternatives, illustrated in figure (4.7(a)). The configuration is similar to the one of the attraction effect, except that a second decoy D is inserted close to the C-decoy. In this case a distance dependent inhibitory function was assumed (i.e.,function (3.3) as in figure 4.1(b)). According the distance dependent function option B will not interact with the other options (i.e.,very low inhibition), while option A-C and C-D will interact (i.e.,A and D very low inhibition as well). Although A and B are symmetric and placed on the indifference curve, option A should win in the presence of decoy options similar to it as the attraction effect suggests. Nevertheless, counter-intuitively again, DFT predicts that the decoy option C is preferred to both A and B (figure (4.7(b)). The activations from a single trial in figure 4.7(c) show in the first place that the system blows up due to negative feedback. More importantly, the activation of option C is boosted by D which is constantly negative and after some point C bears positive activations which inhibit option A, rather than boosting it as it should. In other words placing a decoy option (D) to another decoy (C) boosts

Chapter 4. The Predictive Power of DFT and LCA

72

1

3

A

2

C

1 0 0

D

1

Probability of Choice

Quality

4

B

2 Economy

0.8 0.6 0.4 0.2

3

0 0

4

200 300 Time Steps

400

500

0.8

150

Preference State

100

Preference State

100

50 0 −50

0.6

0.4

0.2

−100 −150 0

100

200

300

Time Steps

400

500

0 0

100

200 300 Time Steps

400

500

Figure 4.7: a) The choice set implies that option A should be chosen. b) Probability of choice for the four options for DFT. c) Activations in a single trial for DFT. b) Probability of choice for the four options for LCA.

the second decoy and destroys the occurrence of the attraction effect in the dominant option (A). For consistency we present the results for the same configuration for LCA (figure 4.7(d)). First we observe that the poor objects are ruled out and only the two with equal additive utilities compete (i.e., A and B). Finally option A wins on account of its having decoy options placed underneath it, like in the attraction effect. Finally a third case, involving 4-alternatives illustrated in figure 4.8(a) is examined. Option A is the best in both dimensions while B and C are equal to A in one dimension but much worse in the other. The four options form a polygon and the same distant dependent inhibitory function as in the above case is used (i.e.,function (3.3) as in figure 4.1(b)). The distances between alternatives designated by the sides of the polygon, result in much higher inhibition than between alternatives separated by the diagonal. Consequently, the non-interacting options are A-D and B-C. Note that coordinates of D are slightly lower that the minimal values of C and B, i.e., D is diagonally below the location of a perfect square that links A, B, and C. The simulation results (figure 4.8(b)), show that now B and C are preferred, although A is, by definition, superior to both of

Chapter 4. The Predictive Power of DFT and LCA

4

1

C

A

D

B

Probability of Choice

Quality

3 2 1

1

Probability of Choice

Preference State

20 0 −20 −40 −60 0

0.6 0.4 0.2

2 Economy

40

0.8

3

4

0 0

100

200 300 Time Steps

1

1

0.8

0.8

Preference State

0 0

73

0.6 0.4

100

Time Steps

150

0 0

500

0.6 0.4 0.2

0.2

50

400

100

200 300 Time Steps

400

500

0 0

100

200 300 Time Steps

400

500

Figure 4.8: a) The choice set implies that option A should be chosen. b) Probability of choice for the four options in DFT. c) Activations in a single trial. d) Probability of choice when the polygon in a) is a perfect square. e)Probability of choice for the four options in DFT.

them. To explain this result, we should remember that, at each time step one dimension is attended. With respect to the D1-dimension A and B are equal and C is slightly better than D. Consequently the most negative valence will be assigned to D which is the worst with respect to dimension D1. C and D act as decoys to A and B respectively, but as D is more negative than C (i.e.,as it is worse than it in dimension D1) it will enhance B’s activation more than C will do for A. The same happens when dimension D2 (quality) is attended with B and C swapping roles. Consequently the reason that B and C are finally chosen, results from the fact that, they have a “better decoy” (i.e, option D), at each time step than option A has. Single trial activations are given in figure (4.8(c)). Note that the single trials activations are given for the first 150 time steps when B and C start to become dominant, as after that point the activations start to increase exponentially. The validity of this explanation is confirmed by considering the same choice set but this time, the four alternatives form a perfect square. In that situation D is equal to C with respect to the D1 dimension and equal to B with respect to the D2 dimension. Consequently at each time step A will be equiprobably boosted by a decoy, as either B or C. But the fact that A is high in both dimensions renders it the winner as shown in figure (4.8(d)). Thus such a slight change in the coordinates of option D can change the output of DFT model. In figure (4.8(e)) we present the corresponding results for

Chapter 4. The Predictive Power of DFT and LCA

74

LCA. We can see that the object with the higher additive utilities is always chosen. These instabilities are an outcome of the distance dependent inhibition and linearity assumptions in the DFT theory. The sharp sigmoid which characterizes the inhibition mechanism dichotomizes the choice space and we have either pairs which compete strongly or pairs with no interaction at all. This sharpness, as we have already seen is necessary in order to accommodate the three effects however combined with unbounded dynamics it provides unstable predictions at several points of the choice space. Thus we can argue that DFT with its current mechanisms, is tailored to account for the reversal effects but it is not generic enough so as to provide reasonable predictions under all possible conditions. The redefinition of DFT mechanisms is important especially as its counterpart model- LCA- is far more robust and less problematic.

4.5

Summary of the Chapter

In this chapter we judged the predictive adequacy of LCA and DFT under three experimentally testable paradigms. In the first section we assessed the performance of the two models in quantifying the compromise effect as a function of the separation between the extreme objects. While LCA made a monotonic -up to some range of distance- prediction about the magnitude of the effect, DFT produced it abruptly in a very narrow area of the choice space, as a consequence of the sharp sigmoid inhibition function it employs. Although we do not have empirical data in order to rule out this strange prediction, we envisage that it is unlikely for the compromise effect to appear suddenly after a particular distance. This counterintuitive prediction would be avoided if the inhibition function was smoother and decreased linearly or exponentially. However such functions have been proved inadequate in accounting for the three contextual reversals. In the second part of the chapter we studied an instance of loss-averse behavior exploiting a paradigm in which the chosen option is announced to be unavailable and a fast second decision is elicited. LCA correctly predicts that the next selected option will be the one that is closest to the alternative which was chosen previously, due to the asymmetric value function it employs. On the other hand DFT under the assumption that unavailable options are eliminated cannot produce the correct choice pattern. In order to do so, DFT authors proposed an extra dimension called availability and

Chapter 4. The Predictive Power of DFT and LCA

75

managed to subordinate this instance of loss-aversion as a special case of the attraction effect. Although this assumption seems somewhat unrealistic we cannot rule it out a priory without experimental verification. However the availability assumption makes a strong claim about unavailable options actively participating in the decision process. This led us to the third and final section of the chapter where we assessed the performance of DFT and LCA in choice sets with three or more options including poor and uninformative ones (e.g., such are the unavailable options). What we argued in the third chapter is that the combination of linear dynamics and distant dependent inhibition can have undesired computational consequences. Indeed it was found that the choice patterns which DFT produces under several situations are wrong. This is a direct consequence of the linearity of the model; activations can grow infinitely large and the balance of the dynamic system is violated. LCA provides us with an ideal counterexample: bounding the dynamics enables the model to discard the poor objects at an early stage of the process, leaving only the strong ones to compete. Apart from the linear dynamics a crucial problematic factor is the sigmoid inhibitory function. Its sharpness, which is necessary in order to account for the contextual effects, essentially imposes either global or no inhibition between pairs of objects. Thus by definition the inhibition mechanism can isolate pairs of options whose activations will evolve independently of the others across time. Consequently DFT has the disadvantage of dichotomizing the choice space making it discontinuous. As a result options with low additive utility are predicted to dominate. Summarily we brought out that DFT has a set of problematic mechanisms some of which can be refined. For example the dynamics of the model can be turned into non-linear by setting a lower, even negative, activation limit. Besides DFT authors have recognized that such a shift would be essential when their model will reach its limits. However a more crucial point which, we cannot see how it could change, is the distant dependent inhibition mechanism. Theoretically a smoothly decreasing with distance inhibition, can guarantee that the model will work reasonably. However the only function which can account for the empirical data is a step-like function which in turn creates a cascade of problems. This allows us to conclude that DFT with its current formalization lacks robustness as it is tailored to a subset of the decision behavior. On the other hand the LCA framework can predict the effects being based on a simple set of principles namely the stochastic alternation between the dimensions and any loss-averse value function. Therefore the alleged advantage of DFT to produce loss-

Chapter 4. The Predictive Power of DFT and LCA

76

aversion indirectly instead of incorporating it a priori becomes questionable in the face of its descriptive inadequacy. All these issues will be discussed in the next chapter informed also by DFT authors’ fruitful reactions on some of the issues touched upon so far.

Chapter 5 General Discussion

5.1

Introduction

During this project we investigated computationally the core mechanisms of DFT and LCA in an attempt to qualitatively discriminate them. Towards this direction different predictions for each model were gathered. The lack of relevant experimental data prevents us from being conclusive regarding the superiority of the one theory over the other. However undoubtedly, the LCA framework was found to be more robust, especially when the choice set is extended for more than three options. In this chapter we will present succinctly and critically the diverging set of principles of the two theories; global versus local inhibition, issues of stability in the two dynamic systems, linear versus non-linear dynamics and finally emerging versus explicit loss aversion are the topics which we will discuss. Interestingly our line of argument will meet DFT authors’ fruitful comments and recent reactions regarding our work (J. Busemeyer, personal communication).

5.2

Local versus Global Inhibition

The role of competition in the two models has already been discussed. To sum up DFT encompasses distant dependent inhibition while LCA relies on constant inhibition between the alternatives. But why DFT needs similarity dependent competition between the alternatives, while LCA does not? Inhibition in the two theories cannot be consid77

Chapter 5. General Discussion

78

ered in isolation with the rest of the assumptions each one involves. Essentially DFT is based on distance dependent inhibition so as to compensate for not using an asymmetric, loss-averse value function. In other words loss-aversion in DFT is mediated by the inhibitory mechanism and the linear dynamics, while in LCA it is hard-wired. The distance dependent inhibition mechanism in DFT was the target of our computational investigations (i.e., chapter 3), since it had not been explicitly formalized in the original paper (Roe et al., 2001). We found that DFT needs inhibition characterized by a steep sigmoid function, in order to reproduce simultaneously the similarity, the attraction and the compromise effect. Global inhibition would fail to predict attraction and compromise effects while a linearly or exponentially decreasing inhibition function would not be able to accommodate the compromise effect under the DFT scheme. The steepness of the inhibition function which captures the three effects, allows us to restate the competition mechanism as being global within a radius and non-existing outside it. Having refined the inhibition mechanism in DFT we were able to test it computationally under a set of situations. During the parametric investigations, DFT with inhibition defined by a steep sigmoid was found to dichotomize the choice space. Keeping the two symmetric options fixed on the indifference line and varying the position of the third option, we found that there are two main areas; one area in which the introduction of the third option boosts the activation of the first fixed alternative (i.e., attraction for option A) and another area which favors the choice probability for the second fixed object (i.e., attraction for option B). Additionally the magnitude of the attraction effect as a function of the separation between the decoy and the target was found to be relatively flat, as a direct consequence of the sharpness of the inhibition function (i.e., attraction occurs within a radius where the inhibition between the decoy and the target is constantly high enough and it disappears outside the radius). The second case, for which the inhibition mechanism provided strange predictions, was when the dependency of the magnitude of the compromise effect on the separation between the extreme alternatives, was examined. For this paradigm DFT predicted that the compromise effect will occur abruptly at a particular range, which is again a result of the global- within a radius only- inhibition it employs. The absence of relevant behavioral data did not allow for strong conclusions at this stage, although it is unlikely that the compromise effect appears and disappears suddently without any gradation in its magnitude.

Chapter 5. General Discussion

79

A third case in which using the steep sigmoid inhibition mechanism, was proved to be clearly problematic was under situations which involve poor objects (i.e., with low additive utility) in the choice set. There were paradigms under which DFT predicted that an object with low overall utility would dominate. Without doubt, and without the need of experimental verification, this prediction is wrong. Again such counterintuitive predictions are direct derivatives of the sharp sigmoid function. The latter segments the choice space and allows pairs of objects to evolve without affecting or being affected by the activations of other alternatives. That way the option with the highest additive utility, can be dominated by other worse alternatives (e.g., when a type of the attraction effect occurs within the interacting pair) when isolated from its competitors, due to the sharpness of the sigmoid inhibition function. Contrary to DFT, LCA provides robust predictions for the aforementioned situations. In particular the predictions that LCA provides are graded in a parametric setup and correct for choice sets which involve non-informative options. This robustness cannot be attributed to the global inhibition mechanism that LCA employs, but to the explicit loss-aversion which in turn does not render a local competition mechanism necessary. While an inhibition mechanism which is defined by a steep sigmoid function is problematic, it is worthwhile to question whether it is the only possible route for DFT. Jerome Busemeyer (J. Busemeyer, personal communication) calls to attention that the noise parameter can play an important role in allowing other than the steep sigmoid, inhibitory functions to account for the three effects simultaneously. We recall that the noise parameter is set fixed for both the models during our investigations. This happened for both practical and theoretical reasons; firstly variable noise can cancel out the effects of other mechanisms on the choice patterns and secondly there is no rationale why noise should change across different choice sets. In preliminary investigations we concluded that in DFT, changing the noise can have similar results as changing the level of inhibition in the similarity and the compromise effects. For both the effects, the account which DFT provides is correlational. The similarity effect occurs because the similar options are correlated as they are favorable on the same attribute and due to the stochastic alternation of the dimensions while the compromise effect is a result of the correlation of the two extreme objects due to local inhibition. For the similarity effect, lowering the inhibition or the level of noise will enable the similar options to be even more correlated (i.e., less variability in their activation patterns). On the other hand the compromise effect is enhanced with more

Chapter 5. General Discussion

80

local inhibition or more noise. This happens because the source of correlation between the two extreme alternatives is that they interact via equally strong inhibitory links with the compromise option. Adding more noise to the extreme options would restrict further their tendency to anti-correlate due to the stochastic alternation of dimensions and the fact that they are favorable on different dimensions. Since noise and inhibition are functionally overlapping, having a separate distant dependent mechanism which controls the value of the noise parameter, would allow for potentially infinite combinations of feasible noise and inhibition functions which capture the three effects simultaneously. Such a possibility would give DFT extra degrees of freedom. This would improve its descriptive power (i.e., the more degrees of freedom come into a model the easier it can be optimized to data), however it would undermine its explanatory power as there is no obvious justification for having higher noise in the compromise effect and lower in the similarity case 1 .

5.3

Stability in DFT

The problematic performance of DFT, in choice sets which include uninformative options, motivated DFT authors to bring on the issue of stability in their linear dynamic system. In their original paper, the parameterization they used was constrained so as not to violate the principle of stability of the system. For DFT to be stable, the maximum eigenvalue of the connectivity matrix (i.e., s) has to be lower than one. In practice an unstable instance of DFT does not settle to its steady states, namely it blows up and the activations grow infinitely large (i.e., either negative or positive). While the inhibitory links between different options are defined by the inhibition function, the elements on the diagonal of the s matrix (i.e., self-excitation coefficient) whose values correspond to the neural leak or decay are set independently. During our investigations the value of the neural leak for both LCA and DFT was set to 0.95. Given the sigmoid inhibition function which we ended up, this value of the decay factor can lead DFT to instability. Indeed in the case we examined in chapter 4, with the option with the highest additive utility to loose in a choice set consisting of three 1 The

ameliorative role of noise in the neural system is undoubted. Additionally there is evidence that the nervous system tunes the levels of the background noise depending on the input it receives. However in the case we are discussing, variable noise is not introduced for biological plausibility but as an ad-hoc manipulation which in the presence of LCA with fixed noise, seems redundant.

Chapter 5. General Discussion

81

alternatives, the parameterization is unstable (figure 5.1(a)). 4

1

4 Probability of Choice

3.5 3 Quality

2.5 B

2 1.5 1 0.5 0 0

C

0.8

B

3 0.6 0.4

2

0.2

1

cc

cc

cc

cc

cc

cc

cc

c cc

A 1

2 Economy

3

4

0 0

100

200

300

Time Steps

400

500

0 0

A

2

4

Figure 5.1: a) A choice set which leads DFT to instability. b) Probability of choice for the three options of a). c) Changing the location of option C under stability. A red point corresponds to instances where option C wins.

A way to avoid instability is to lower the self-coefficient from 0.95 to 0.94. Under this configuration and using the same sigmoid inhibition function we investigate whether strange predictions can occur under stability. We move option C from A to the point before it starts interacting with option B, and we annotated in figure 5.1(c) with red the points where placing option C gives it counter-intuitively the highest choice probability. As we can see in figure 5.1(c), there are places under stable parameterization in which option C can dominate option B, although the latter has a higher additive utility. As we have already explained this happens due to the sharpness of the inhibition function which segments the choice space and allows pair of objects to evolve independently of other alternatives in the choice set. In the example which we are studying, options A and C do not interact with B, and the decoy option A boosts the activation of C as in the attraction effect giving it the highest choice probability. Thus although instability helps such strange prediction to occur since the system blows up and the activations grow very large, the very explanation of such paradoxical predictions lies on the steepness of the sigmoid inhibition function. We can understand analytically why these strange prediction result from the inhibition function and not from instability. Under stable parameterization the steady state in which the system will settle is: P = V · S−1 . In the above formula V is the matrix with valences, and S is the connectivity matrix. Essentially what we have is VB > VC > VA and a multiplication of the valences with the inverted connectivity matrix returns PC > PB > PA . The reason why this reversal takes

Chapter 5. General Discussion

82

place lies on the connectivity matrix, namely on the inhibition function. Under instability the connectivity matrix, is simply not invertible and that is why the activations grow infinitely large, enhancing that way the strange predictions. Thus there is no causal relationship between instability and paradoxical predictions. One could argue that lowering further the neural decay would restrict the strange results as it would reduce the impact which the inverted connectivity matrix has to the steady preference states. However the leaky accumulation of information is an essential property of DFT (e.g.,in parametric computational investigations a leak value equal to 0.90 prevents DFT from reproducing the contextual preference reversals). Therefore suppressing the importance of inhibition might restrict the unexpected predictions, but it will also abolish the descriptive adequacy of DFT; responsible for the problematic performance of DFT is the particular shape of the inhibition function. Keeping the parameters stable in DFT is useful since only that way we can obtain predictions using linear algebra and not time consuming simulations. However this seems impossible when more than three options are present in the choice set. In that case the elements of the connectivity matrix with high values will be more (i.e., more pairs will interact with inhibitory links) and the constraint of the maximum eigenvalue of the matrix being less than one will be violated under most of the situations. Therefore we question the value of keeping DFT linear and at the same time being restricted to maintain the parameterization stable. While stability does not prevent the model from providing counterintuitive predictions, a lower boundary in the activations might do. In the next section we discuss the theoretical implications which a non-linear version of DFT has.

5.4

Linear versus Non-Linear Dynamics

LCA provides us with an ideal example of the advantage of a theory of choice adopting non-linear dynamics. As we saw in chapter 4, the non-linearity of LCA allows the model to discard at an early processing stage the disadvantageous alternatives. This becomes possible as the latter will tend to have negative activation values, which under the LCA scheme will be truncated to zero and thus they will stop to interact with other objects. The lack of such a mechanism in DFT is partly responsible for the undesired computa-

Chapter 5. General Discussion

83

tional consequences in cases where non-informative options are present in the choice set. However the implementation of non-linearity in DFT is different than the one under the LCA framework. And this happens because the lowest activation boundary cannot be zero, as the principle of negative activations transmitted via inhibitory links enables DFT to account for the attraction effect. Thus the activation boundary in DFT must be set to a negative activation threshold which will add to the model an extra parameter, probably dependent on the number of the alternatives which the choice set includes. But most importantly DFT authors are not clear on how the options which will exceed the negative threshold will be treated. In the original paper, Roe and Busemeyer described a non-linear version of DFT, in order to demonstrate how a strategy switching mechanism could be implemented in their theory (Roe et al. (2001), p.385). In particular they argued that decreasing the negative threshold, is equivalent with a compensatory strategy where comparisons are made across attributes, whereas increasing the negative threshold implies switching to a non-compensatory strategy where comparisons are made within an attribute. The second strategy would be implemented by slowing down the stochastic alternation between dimensions and eliminating the objects that are disadvantageous on the active aspect and hence exceed the lowest activation boundary. However it is not straightforward how this elimination of the disadvantageous objects will be realized. Under the LCA scheme, the options whose activations exceed the lowest boundary are taken out from competition. In DFT though, negative activations are an important attribute of the theory and thus there are two possibilities; either the eliminated options are taken out from competition or their activations are maintained to the negative threshold but the objects remain active in the decision process. The first scenario is plausible enough, however it contradicts with the concept of an extra dimension called availability which DFT authors introduced in order to explain an instance of loss-averse behavior. We recall that according to the available dimension hypothesis, objects that are set unavailable remain active in the choice set and are allowed to interact with the available alternatives. Consequently for consistency reasons, this should be the case for the alternatives that are eliminated during the deliberation process. Nevertheless if the eliminated objects are allowed to be active in the choice process (i.e., and with even lower valences due to their having the minimum possible value in the availability di-

Chapter 5. General Discussion

84

mension), they would still be able to excite their nearby objects via inhibitory links. This situation, as combined with a sharp sigmoid inhibition function which isolates pairs of objects, would still produce undesired results. Therefore the non-linearity in the dynamics of DFT cannot be useful unless existing principles (i.e., the available dimension hypothesis) are restated. Additionally we cannot tell with certainty whether an elimination mechanism, even properly implemented, would be adequate in preventing DFT from returning unexpected predictions, since the latter are mainly attributed to the distance dependent inhibition mechanism. Thus even when the clearly disadvantageous objects are ruled out from competition, the introduction of a decoy option very close to the target (i.e., so as the decoy does not exceed the negative threshold), can still boost the latter’s activation and give it the highest choice probability, even if the choice set involves a third distant, non-interacting option which has the highest overall utility. To sum up the problematic behavior of DFT is derived from the combination of distance dependent inhibition and linear dynamics. Turning the dynamics into non-linear is the one side of the problem and thus it is not a radical solution. However we would welcome a non-linear version of DFT since it would add to the theory biological plausibility. As we have already discussed, the phenomenon of disinhibition which DFT captures mathematically through negated inhibition, is meant to be bounded. Having appealed to biology, we drive our discussion at the next and final section, where the nature of loss-aversion and how a neurocomputational model should implement it is debated with reference to a recent neurophysiological study.

5.5

Emerging versus Explicit Loss Aversion

The computational explorations so far have brought out the importance of the lossaverse value function in the accountability of the LCA model. While explicit lossaversion is a corner stone in LCA for explaining the compromise and the attraction effect, DFT compensates for the absence of built-in loss aversion by relying on linear dynamics and distant dependent inhibition. It is questionable though whether linear dynamics and similarity dependent competition, is equivalent to explicit loss-aversion. As we have already discussed DFT cannot account for an instance of loss-averse behavior, the status quo effect in which the decision maker retains her current status rather

Chapter 5. General Discussion

85

than exchanging it with an equally beneficially situation. In addition to this, DFT introduces a strong assumption according to which inactive objects are allowed to participate in the decision process, in order to explain why people prefer small changes instead of large ones. In principle DFT attempts to embrace loss-averse behaviors as being special instances of the attraction effect, while LCA exploits the property of loss-aversion in order to explain the latter. Busemeyer and colleagues note for LCA that ”no explanation is given regarding the behavioral emergence of loss aversion from underlying neurophysiology” (Busemeyer et al., 2006). That way they claim that emergent loss aversion from a set of other principles is inherently valuable. For argument’s sake we will not focus on the fact that DFT explains part of the loss-averse behavior based on a set of mechanisms which create a sequence of undesired side effects. Instead we will question whether loss aversion emerges from underlying neurophysiology or it is hard-wired in the neural system. Until now a consensus on the nature of loss aversion has not yet been reached and thus we cannot a priori rule out a theory based on the way it models loss aversion. Although the nature of the loss aversion property is still an open issue, a recent neurophysiological study on the neural basis of loss aversion in decision making under risk supports the hypothesis that there is a single system in the brain which encodes subjective value asymmetrically, weighing disadvantages more than advantages (Tom et al., 2007). In spite of the fact that this study favors the theories of choice which view loss-aversion as being intrinsic (Kahneman and Tversky, 1979; Tversky and Simonson, 1993; Usher and McClelland, 2004), it would be superficial if we discarded DFT because it attempts to capture it indirectly. In principle the finding that loss aversion is hard-wired in the brain, does not rule out explanations of the property at a cognitive level. For example in Decision by Sampling (Stewart et al., 2006), loss aversion is attributed to the asymmetry of the real world distributions of gains and losses. We also attempted in the current project to describe a hybrid model between LCA and DbS, which is based on a symmetric value function and instead it appeals to a mechanism which favors the processing of the more similar options. Presumably the explanation of the latter mechanism lies on the asymmetry on the neural basis; people prefer to compare more similar objects since processing dissimilar ones yields more negative feelings (i.e., thinking of large disadvantages and large advantages might be more painful and cognitive effortful than thinking small ones). Thus instead of incorporating loss aversion in the value function

Chapter 5. General Discussion

86

we can encompass it in a similarity dependent mechanism which will characterize the degree of processing. In summary neurophysiological data alone will never be enough to allow for conclusive judgments of the cognitive architecture of a theory. That is why in cognitive science theoretical analyses are driven by the assumption that the more accurate a theory’s predictions are the closer to the natural system its mechanistic assumptions will be. In that context we appeal to the robustness of LCA as a consequence of any type of asymmetry in its value function as opposed to the problematic performance of DFT under several situations. In other words we have more reasons (i.e., biologically informed and based on the robustness of LCA), to regard loss aversion as being hard wired in the brain rather than emerging from independent principles (i.e., disinhibition which in DFT is mathematically captured by distance dependent negated inhibition).

5.6

Summary of the Project and Future Work

During the final chapter of the project we summarized the main arguments for and against the mechanistic assumptions of the two prominent theories of multi-attribute choice. DFT with its linear dynamics and distant dependent inhibition was found to be problematic and tailored to a subset of the decision behavior as opposed to the far more robust non-linear, with global inhibition LCA which encompasses loss-aversion explicitly. If we had to distill the main theoretical difference between the two schemes then we would choose the way loss-aversion is implemented as the most important. And it is important because it is the main reason why LCA is robust while its absence in DFT forces its authors to resort to a set of mechanisms which were proved to be doubtful. At this point it is interesting to recall how we reached to the above conclusions, by outlining the course of the project. In the third chapter we explored the core mechanisms of both the theories. While for LCA its existing principles had already been well-defined and generic enough in DFT we identified a lack of explicitness in the way the distance dependent inhibition mechanism was characterized. Therefore the aim of the first set of our computational investigations was to specify a class or classes of inhibition functions which could enable DFT to capture the three contextual preference reversals simultaneously. Interestingly it was found that the inhibition mechanism in DFT has to be restricted by the shape of a steep sigmoid function. In other words

Chapter 5. General Discussion

87

inhibition under the DFT framework must be retained global within a radius and zero outside it and not generally decreasing with distance as it was initially proposed. On the other hand LCA with its original specification was found to be robust enough. In particular under any type of asymmetry in the value function and global inhibition within a broad range of values, LCA was proved to provide stable predictions. Having refined the mechanisms of DFT and confirmed the ones in LCA as being well specified, we proceeded in the fourth chapter where predictions for specific paradigms were gathered for each theory. The first paradigm which was considered, quantified the dependency of the magnitude of the compromise effect on the separation between the two extreme objects. This relationship was predicted to be monotonic up to a particular range of distance under LCA, whereas in DFT the effect was totally determined by the shape of the sigmoid inhibition function and hence it appeared and disappeared quite abruptly. The lack of relevant behavioral data did not allow us to make strong claims however we were able to observe that the predictive power of DFT is restricted by the very specific properties of the inhibition function. The next paradigm we considered, concerned an instance of loss-averse behavior under which small changes are preferred over large ones. LCA was able to fit experimental data using its default assumptions. On the other hand DFT predicted the right choice patterns only after adopting extra assumptions which involved the introduction of a third dimension called availability and the allowance of interaction between available and unavailable options during the deliberation process. These assumptions although they enabled DFT to account for the data, were criticized as being ad-hoc and tailored to the specific instance of loss aversion. The unavailability hypothesis motivated us to test the two theories under circumstances in which the choice set involves uninformative options. Interestingly it was found that DFT provides wrong predictions when poor options are present in the choice set. This is a direct consequence of the shape of the inhibition mechanism which segments the choice space and allows pairs of options to evolve independently of others. Thus the unavailability assumption which argues that unavailable options participate actively as poor objects in the decision process is strongly doubted on the basis of the paradoxical predictions which DFT provides under such situations. DFT authors suggested two possible ways of improving the quality of predictions of their theory; having noise as a free parameter and introducing non-linear dynamics.

Chapter 5. General Discussion

88

While with a variable noise parameter, DFT could avoid the specific dichotomizing inhibitory mechanism, this would abolish its explanatory power as such a manipulation is optimizing the theory but its existence is not justifiable. The second way which DFT authors suggested so as to prevent their model from returning counterintuitive predictions, was the introduction of a negative activation threshold below which the options are eliminated. However according to the unavailability hypothesis the eliminated options should be able to interact actively with the other objects in the decision process. But even if the unavailability hypothesis was abandoned, a non-linear version of DFT would not avoid the wrong predictions since the principal reason of the latter is the distant dependent inhibition mechanism due to the lack of explicit loss aversion. Although the explicit implementation of loss aversion was proved to provide robust predictions and has been further corroborated by neurophysiological data we are still open to new cognitive architectures or improved versions of DFT which will encompass the loss aversion property indirectly. Thus we define the aim of future computational studies as trying to avoid the explicit implementation of loss aversion in models which will be at least as robust as LCA. Such models can be based on combination of existing theories into new ones, i.e., the hybrid model we discussed in this project or alternative versions of existing theories such as the non-linear DFT. While we recognize the importance of theoretical studies in conceptualizing choice behavior, we have to admit that during this project we found ourselves in the awkward situation of having much more predictions than experimental data. Thus an obvious priority for future studies in the field of multi-attribute choice should be the restoration of the balance between computational and experimental aspects of the problem. Towards this direction we propose behavioral experiments which will quantify the effects parametrically across the choice space under psychophysical setups. We strongly believe that novel experimental data is the key factor which will integrate the existing and in cases overlapping theoretical principles into a unified theory of multi-attribute, multi-alternative preferential choice.

Bibliography Busemeyer, J., Jessup, R., Johnson, J., and Townsend, J. (2006). Building bridges between neural models and complex decision making behaviour. Neural Networks, 19(8):1047–1058. Busemeyer, J. and Johnson, J. (2004). Computational models of decision making. Handbook of judgment and decision making, pages 133–154. Busemeyer, J. and Townsend, J. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100(3):432–459. Busemeyer, J., Townsend, J., Diederich, A., and Barkan, R. (2005). Contrast effects or loss aversion? Comment on Usher and McClelland (2004). Psychological Review, 112(1):253–255. Chater, N. (2001). How smart can simple heuristics be? Behavioral and Brain Sciences, 23(05):745–746. Chomsky, N. (1965). Aspects ofthe Theory of Syntax. Cambridge, MA. Churchland, P. and Sejnowski, T. (1994). The Computational Brain. MIT Press. Diederich, A. (1997). Dynamic Stochastic Models for Decision Making under Time Constraints. Journal of Mathematical Psychology, 41(3):260–274. Eliasmith, C. (1997). Computation and Dynamical Models of Mind. Minds and Machines, 7(4):531–541. Eliasmith, C. (1998). Dynamical models and van Gelder’s Dynamicism: Two Different Things. Commentary on van Gelder, T. The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21:616–665. Gigerenzer, G. and Todd, P. (1999). Precis of Simple Heuristics that Make Us Smart. Behavioural and Brain Sciences, 22(5). Glimcher, P. (2003). Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics. Bradford Book. Guo, F. and Holyoak, K. (2002). Understanding Similarity in Choice Behavior: A Connectionist Model. Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society, pages 393–398.

89

Bibliography

90

Hesse, M. (1967). Models and Analogy in Science. The Encyclopedia of Philosophy, 5:354–359. Holyoak, K. and Simon, D. (1999). Bidirectional Reasoning in Decision Making by Constraint Satisfaction. JOURNAL OF EXPERIMENTAL PSYCHOLOGY GENERAL, 128:3–31. Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research: An Interdisciplinary Quarterly, 9(1):90–98. Kahneman, D. and Tversky, A. (1979). Prospect Theory: An Analysis of Decision Making Under Risk. Econometrica, 47(2):263–291. Kahneman, D. and Tversky, A. (1991). Loss Aversion in Riskless Choice: A Reference-Dependent Model. Quarterly Journal of Economics, 106(4):1039–61. Knetsch, J. (1989). The Endowment Effect and Evidence of Nonreversible Indifference Curves. American Economic Review, 79(5):1277–1284. Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc. New York, NY, USA. Roe, R., Busemeyer, J., and Townsend, J. (2001). Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological Review, 108(2):370–392. Rumelhart, D. and McClelland, J. (1986). Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations. MIT Press Cambridge, MA, USA. Seidenberg, M. (1993). CONNECTIONIST MODELS AND COGNITIVE THEORY. Psychological Science, 4(4):228–235. Shadlen, M. and Newsome, W. (2001). Neural Basis of a Perceptual Decision in the Parietal Cortex (Area LIP) of the Rhesus Monkey. Journal of Neurophysiology, 86(4):1916–1936. Simon, D., Krawczyk, D., and Holyoak, K. (2004). Construction of Preferences by Constraint Satisfaction. Psychological Science, 15(5):331–336. Simonson, I. (1989). Choice Based on Reasons: The Case of Attraction and Compromise Effects. Journal of Consumer Research, 16(2):158. Stewart, N., Chater, N., and Brown, G. (2006). Decision by sampling. Cognitive Psychology, 53(1):1–26. Sugrue, L., Corrado, G., and Newsome, W. (2004). Matching Behavior and the Representation of Value in the Parietal Cortex. Science, 304(5678):1782–1787. Tom, S., Fox, C., Trepel, C., and Poldrack, R. (2007). The Neural Basis of Loss Aversion in Decision-Making Under Risk. Science, 315(5811):515.

Bibliography

91

Tversky, A. (1972). Elimination by aspects: a theory of choice. Vol. 79 No. 4, pp. Psychological Review. Tversky, A. and Simonson, I. (1993). Context-dependent preferences. Management Science, 39(10):1179–1189. Usher, M., Elhalal, A., and McClelland, J. (2008). The neurodynamics of choice, value-based decisions, and preference reversal. Probabilistic Mind: Prospects for Bayesian cognitive science, pages 277–302. Usher, M. and McClelland, J. (2001). The Time Course of Perceptual Choice: The Leaky, Competing Accumulator Model. PSYCHOLOGICAL REVIEW-NEW YORK, 108(3):550–592. Usher, M. and McClelland, J. (2004). Loss aversion and inhibition in dynamical models of multialternative choice. Psychological Review, 111(3):757–769. van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21(05):615–628.

Contrasting Neurocomputational Models of Value ...

3.1 Illustration of the effect parameterization method. . . . . . . . . . . . 35 ... 3.16 The magnitude of the compromise effect for a symmetric value function. 50.

2MB Sizes 5 Downloads 281 Views

Recommend Documents

Contrasting Neurocomputational Models of Value ...
The discrimination between the two theories will be based on computational explo- ... my own except where explicitly stated otherwise in the text. This work has not .... 4.7 Performance of DFT and LCA in a choice setup with four options sim-.

Contrasting vulnerability.pdf
implementing them in a Geographic Information System (GIS) with. a 10-m spatial resolution. We used the average ratio of precipita- tion (P) to potential ...

Contrasting effects of bromocriptine on learning of a ... - Springer Link
Materials and methods Adult male Wistar rats were subjected to restraint stress for 21 days (6 h/day) followed by bromocriptine treatment, and learning was ...

Contrasting effects of bromocriptine on learning of a ...
neurochemistry experiments, Bhagya and Veena for their help in data entry. ..... of dopaminergic agonists could produce the behavioural recovery by acting on .... Conrad CD, Galea LA, Kuroda Y, McEwen BS (1996) Chronic stress impairs rat ...

E-Books Determining Value: Valuation Models and ...
There has also been an explosion of interest in shareholder value in recent years, with a corresponding need to understand how value is created and measured.

Brave Neurocomputational World. Review of Paul ...
of an individual's theory of the world as his current configuration of synaptic weights, a configuration which serves to partition the space of possible activity patterns into a vast number of prototypical regions; explanatory understanding then cons

Contrasting Patterns of Selection at Pinus pinaster Ait ...
France; and |Department of Forest Systems and Resources, Forest Research Institute, Centro de .... such approaches, ''neutral'' expectations accounting for.

Geofile - Contrasting Contemporary Population Issues- 3 case ...
a 5–10% salary bonus for limiting. to one child. • a 10% salary reduction for having. two children. • priority in housing, education and. health care for 'only' ...

Provision of contrasting ecosystem services by soil ... - Toby Kiers
maize roots was determined using molecular tools. Detailed information on the .... correlated when all data were included in the analysis. (Fig. 2a), and likewise when ..... (1999) Visualization of ribosomal DNA loci in spore interphasic nuclei of ..

Contrasting Effects of Sulfur Dioxide on Cupric Oxide and Chloride ...
Contrasting Effects of Sulfur Dioxide on Cupric Oxide and Chloride ...https://sites.google.com/site/fjmogomjf/publication/Fujimori2014EST_2.pdfSimilarby T Fujimori - ‎2014 - ‎Cited by 5 - ‎Related articlesNov 7, 2014 - ABSTRACT: Sulfur dioxide

Contrasting styles of volcanic activity as observed by ...
eruptive column that extended up to 20 km and pyroclastic flows that extended up to 7.5 km of the summit. (Gardeweg & Medina, 1994). The more active volcanoes in Southern Volcanic Zone (SVZ) are Villarrica and Llaima volcanoes (Moreno &. Fuentealba,

nuclear microsatellites reveal contrasting patterns of ...
species life history (breeding system, modes of seed and pol- .... genetic clusters in humans, where genetic differences among .... ulations in the input file.

Contrasting evolutionary patterns in populations of demersal sharks ...
Oct 24, 2017 - DOI 10.1007/s00227-017-3254-2. ORIGINAL PAPER. Contrasting evolutionary patterns in populations of demersal sharks throughout the western Mediterranean. Sergio Ramírez‑Amaro1,2. · Antonia Picornell1 · Miguel Arenas3,4,5 · Jose A.

Research Article Identification of contrasting genotypes ...
information obtained through AFLP studies to select genotypes ... Among the selected ..... E-ACC. 5'-GAC TGC GTA CCA ATT C AAC-3'. Mse I + 3-GCT. M-GCT.

Contrasting growth phenology of native and invasive ...
ples from recalcitrant species were prepared and run with alter- nate buffers, standards, amounts of ..... The PLANTS database. National Plant Data Team,.

Disentangling the Formation of Contrasting Tree-Line ...
For instance, a recent meta-analysis. showed that abrupt tree .... Disentangling the Formation of Contrasting Tree-Line physiognomies.pdf. Disentangling the ...

Consider Issues from Contrasting Perspectives ...
Sep 26, 2013 - Middle East Forum. 6/1/00 3/1/00 promotes fighting radical Islam; Palestinian acceptance of Israel; asserting U.S. interests vis-à- vis Saudi Arabia; developing strategies to deal with. Iraq and contain Iran. Modern Age. Intercollegia

Contrasting relatedness patterns in bottlenose dolphins - NCBI
Published online 17 January 2003. Contrasting ..... super-alliance members might prefer to associate with males to which they are more closely related. .... Behav. 31, 667–682. Vehrencamp, S. L. 1983b Optimal degree of skew in cooperat-.

Academic Word List- Comparing and Contrasting - UsingEnglish.com
Importance of hierarchy. - Infrastructure. - Interest in sustainability. - Job security ... Residential care. - Restoration of historical buildings. - Rigid management.

2.4. Average Value of a Function (Mean Value Theorem) 2.4.1 ...
f(x)dx . 2.4.2. The Mean Value Theorem for Integrals. If f is con- tinuous on [a, b], then there exists a number c in [a, b] such that f(c) = fave = 1 b − a. ∫ b a f(x)dx ,.

Elements of Value - Services
Aug 9, 2016 - When customers evaluate a product or service, they weigh its perceived value against the price. It is fairly easy for marketers to understand and analyze how consumers think about the price side of the equation. But understanding what c

On the Value of Variables
rewriting rules at top level, and then taking their closure by evaluation contexts. A peculiar aspect of the LSC is that contexts are also used to define the rules at top level. Such a use of contexts is how locality on proof nets (the graphical lang