Habilitation Ã Diriger des Recherches MathÃ©matiques ...

Viewer
Transcript

i

Habilitation à Diriger des Recherches Mathématiques Université Pierre et Marie Curie Présentée par

Rida LARAKI Titre de la thèse:

Jeux, Optimisation, Contrôle Algorithmes, Rationalité et Choix Social Games, Optimization, Control Algorithms, Rationality and Social Choice Soutenue le 30 Novembre 2011. Jury :

Hedy Attouch, Université de Montpellier II, Rapporteur. Pierre Cardaliaguet, Université Paris 9 Dauphine, Examinateur. Roberto Cominetti, Universidad de Chili (Santiago), Rapporteur. Olivier Gossner, CNRS et Paris School of Economics, Examinateur. Sergiu Hart, Univesity of Jerusalem, Rapporteur. Jean-François Mertens, Université Catholique de Louvain, Examinateur. Sylvain Sorin, Université Paris 6 Jussieu, Président. Wieslaw Zielonka, Université Paris 7 Diderot, Examinateur.

ii A mes enfants, Carlonn et Raoul Toute ma gratitude à mes deux amis, co-auteurs et modèles d’excellence scientiﬁque. Sylvain Sorin est mon directeur de thèse et d’habilitation. Il m’a imprégné de son amour pour la beauté et la variété des outils mathématique en théorie des jeux. Michel Balinski a été l’artisan de mon entrée au CNRS et m’a transmis sa passion pour les applications des mathématiques, en particulier au choix social. Je remercie mes autres amis et co-auteurs qui m’ont beaucoup appris et m’ont fait avancer dans les moments diﬃciles : Philippe Bich, Pierre Cardaliaguet, Olivier Gossner, Andrew Jennings, Jean-Bernard Lasserre, Panayotis Mertikopoulos, Jérôme Renault, Eilon Solan, Tristan Tomala, Christina Pawlowitsch, William Sudderth et Nicolas Vieille. J’ai une pensée particulière pour les participants du séminaire parisien de théorie des jeux : ceux avec qui je l’ai co-organisé depuis de nombreuses années (Frédéric Koessler, Tristan Tomala et Yannick Viossat), les jeunes co-organisateurs qui ont accepté de prendre le relais et qui apportent de la fraîcheur (Vianney Perchet et Guillaume Vigeral), les jeunes sans lesquels le séminaire ne serait pas aussi chaleureux (Mario Bravo, Fabien Gensbittel, Marie Laclau, Xiaoxi Li, Pablo Maldonado Lopez, Miquel Oliu Barton, Xavier Venel et Cheng Wan), et puis tous les autres que j’ai oublié de citer. Merci à Marc Quincampoix, Sylvain Sorin et Tristan Tomala pour avoir initié le GDR Jeux et avoir organisé autant d’événements de grande qualité. Un grand merci à tous mes ami(e)s et collègues : (1) du Laboratoire d’Économétrie de l’Ecole Polytechnique grâce à qui j’ai appris les fondements de l’économie, un domaine qui ne cesse de m’inspirer dans mes recherches : Vincent Renard et Claude Henry (pour les souvenirs innoubliables de la rue Descartes), Jean-Pierre Ponssard (avec qui j’ai enseigné en master et dirigé une thèsarde : Claudia Saavedra), Yukio Koriyama et Jorgen Weibull (avec qui j’enseigne la théorie des jeux), Marie-Laure Allain, Patricia Crifo et Jean-Fransois Laslier (avec qui j’ai co-dirrigé plusieurs GT en économie à l’ENSAE), Pierre Cahuc et Pierre Picard (avec qui j’ai appris la micro-économie) et (2) du Laboratoire Combinatoire et Optimisation de Paris 6, qui m’ont associé et avec qui j’ai eu des discussions enrichissantes en mathématiques autour d’un café: Eric Balandraud, Jérôme Bolte, Thierry Champion, Hélène Frankowska, Benjamin Girard, Frédéric Meunier, Oana Sylvia Serea, Alain Plagne. J’en ai oublié beaucoup, qu’ils m’ en excusent. Merci au CNRS de m’avoir recruté. Sans cela, je n’aurais probablement pas pu travailler aussi sereinement sur des projets de long terme. Toute ma reconnaissance aux membres de mon jury qui m’honorent par leur présence et de leur jugement.

Contents 1 Résumé des Travaux et Publications 1.1 Jeux Répétés et Temps Continu . . . . . . . 1.2 Jeux d’Arrêt en Temps Continu . . . . . . . 1.3 Régularité en Optimisation et Contrôle . . . 1.4 Développement d’Algorithmes . . . . . . . . 1.5 Rationalité Stratégique: Nouveaux Concepts 1.6 Choix Social: Modèle et Méthode Nouveaux 1.7 Publications . . . . . . . . . . . . . . . . . .

. . . . . . .

1 1 2 3 4 5 6 8

. . . . . . . . . . . . . . . . . . .

11 14 14 14 15 16 17 18 18 20 23 24 24 25 26 27 28 28 29 30

3 Stopping Games in Continuous Time 3.1 Deterministic Stopping Games . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 34 34

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

2 Repeated Games and Continuous Time 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Discounted Stochastic Games . . . . . . . . . . . 2.1.2 Repeated Games . . . . . . . . . . . . . . . . . . 2.1.3 General Evaluation . . . . . . . . . . . . . . . . . 2.1.4 Classical Approaches . . . . . . . . . . . . . . . . 2.1.5 New Approaches . . . . . . . . . . . . . . . . . . 2.2 Repeated Games with Incomplete Information . . . . . . 2.2.1 Discounted Games . . . . . . . . . . . . . . . . . 2.2.2 Finitely Repeated Games and General Evaluation 2.3 Splitting Games . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Discounted Games . . . . . . . . . . . . . . . . . 2.3.2 Finitely Repeated Games and General Evaluation 2.4 Absorbing Games . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Discounted Games . . . . . . . . . . . . . . . . . 2.4.2 Finitely Repeated Games and General Evaluation 2.5 The Dual of a Game with Incomplete Information . . . . 2.5.1 The Dual Game . . . . . . . . . . . . . . . . . . . 2.5.2 The Associated Diﬀerential Game . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . .

iv

CONTENTS 3.1.2 Results . . . 3.2 Stochastic Stopping 3.2.1 Model . . . 3.2.2 Results . . . Bibliography . . . . . .

. . . . . Games . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4 Regularity in Optimization and Control 4.1 Convexiﬁcation Operator . . . . . . . . . . . . 4.1.1 Representation Formulas . . . . . . . . 4.1.2 Preserving Continuity . . . . . . . . . 4.1.3 Preserving Lipschitz Continuity . . . . 4.2 MDP Operators . . . . . . . . . . . . . . . . . 4.2.1 Gambling Houses . . . . . . . . . . . . 4.2.2 Red-and-Black . . . . . . . . . . . . . 4.2.3 Maximal Houses . . . . . . . . . . . . 4.2.4 Preserving Continuity . . . . . . . . . 4.2.5 Preserving Lipschitz Continuity . . . . 4.2.6 Preserving Hölder Continuity . . . . . 4.3 Splitting Operator . . . . . . . . . . . . . . . 4.4 Informationally Optimal Correlation Systems . 4.4.1 Properties . . . . . . . . . . . . . . . . 4.4.2 Characterization . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . .

5 Developing Algorithms 5.1 Generalized Moment Problem . . . . . . . . . . 5.1.1 Primal and Dual GMP . . . . . . . . . . 5.1.2 Dual Relaxations . . . . . . . . . . . . . 5.1.3 Primal Relaxations . . . . . . . . . . . . 5.1.4 The Main Result . . . . . . . . . . . . . 5.2 Convexiﬁcation Operator . . . . . . . . . . . . . 5.3 MinMax of Rational Functions and Applications 5.3.1 Finite Games . . . . . . . . . . . . . . . 5.3.2 Loomis Games . . . . . . . . . . . . . . 5.3.3 Absorbing Games . . . . . . . . . . . . . 5.4 Generalized Polynomial Games . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . 6 Strategic Rationality: New Concepts 6.1 Relaxed Equilibria in Discontinuous Games . 6.1.1 Model . . . . . . . . . . . . . . . . . 6.1.2 Reny Equilibria . . . . . . . . . . . . 6.1.3 Quasi equilibrium . . . . . . . . . . .

. . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . .

. . . . .

35 38 38 39 40

. . . . . . . . . . . . . . . .

43 43 44 45 47 49 49 50 51 51 53 56 56 57 60 60 61

. . . . . . . . . . . .

63 63 64 64 66 68 68 70 70 71 72 73 76

. . . .

79 79 81 82 85

CONTENTS 6.1.4 Byproducts . . . . . . . 6.2 Robust Rationalizability . . . . 6.2.1 Robust-Best-Responses . 6.2.2 Iterating Eliminations . 6.3 Coalitional Equilibria . . . . . . 6.3.1 A Fixed Point Theorem 6.3.2 Existence . . . . . . . . Bibliography . . . . . . . . . . . . .

v . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

7 Social Choice: New Model and Method 7.1 Traditional Model . . . . . . . . . . . . . . . . 7.2 Practice in Skating . . . . . . . . . . . . . . . 7.2.1 Condorcet’s and Arrow’s Paradoxes . . 7.2.2 Strategic Manipulation . . . . . . . . . 7.2.3 Meaningfulness . . . . . . . . . . . . . 7.3 A More Realistic Model . . . . . . . . . . . . 7.4 Majority Judgment: Description . . . . . . . . 7.4.1 Small Jury . . . . . . . . . . . . . . . . 7.4.2 Large Electorates . . . . . . . . . . . . 7.5 Majority Judgment: Salient Properties . . . . 7.5.1 Eliciting Honesty . . . . . . . . . . . . 7.5.2 Meaningfulness . . . . . . . . . . . . . 7.5.3 Resisting Manipulation . . . . . . . . . 7.5.4 Majority and Consensus . . . . . . . . 7.5.5 Equilibria and Condorcet Consistency . 7.5.6 Honest Equilibria . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

85 88 89 91 94 94 96 96

. . . . . . . . . . . . . . . . .

101 102 105 105 109 110 111 114 114 115 118 119 120 122 125 126 128 129

Chapter 1 Résumé des Travaux et Publications Mathématicien et modélisateur, je cherche à résoudre des problèmes d’interactions stratégiques entre plusieurs décideurs par des outils mathématiques en général et en utilisant en particulier la théorie des jeux. Mes motivations viennent des sciences sociales, de l’économie, de la biologie ou encore de la recherche opérationnelle. En utilisant des outils variés en mathématiques (optimisation, contrôle optimal, probabilités, analyse convexe, géométrie, topologie) j’ai étudié plusieurs modèles comme les jeux à information incomplète, les jeux répétés et stochastiques, les jeux en temps discret et continu. J’ai manipulé, appliqué ou étendu des concepts de rationalité et d’équilibre. J’ai introduit le concept de langage commun en choix collectif et j’ai étudié certaines de ses propriétés grâce à la théorie du mesurage. J’ai développé une nouvelle théorie du choix social d’où résulte une nouvelle méthode pour élire et classer des compétiteurs par un jury ou électorat: le jugement majoritaire que j’ai testé en organisant des expériences.1

1.1

Jeux Répétés et Temps Continu

De Meyer et Rosenberg2 (1999) ont conjecturé l’existence d’un lien formel entre l’étude asymptotique des jeux répétés à somme nulle et à information incomplète d’un côté et les jeux diﬀérentiels. Sorin3 conjecturait que le lien est encore plus étroit: il devrait exister une méthodologie avec laquelle tous les jeux dynamiques peuvent être étudiés. Mon premier article de recherche4 a résolu la conjecture de De Meyer et Rosenberg. Cela m’a amené à initier, dans une série d’articles, l’approche variationnelle qui a conforté également la conjecture de Sorin. L’approche variationnelle permet à la fois de montrer l’existence de la valeur asymptotique dans les jeux répétés et stochastiques mais aussi de la caractériser par des inégalités 1

Mes articles peuvent être trouvés sur mon site web suivant https://sites.google.com/site/ridalaraki/ De Meyer B. and D. Rosenberg (1999) “Cav u" and the dual game, Mathematics of Operations Research, 24, 619-626. 3 Durant des séminaires et des communications personnelles durant ma thèse de doctorat qu’il a dirigé (entre 1997 et 2000). 4 Laraki R. (2002). Repeated Games with Lack of Information on One Side: the Dual Differential Approach. Mathematics of Operations Research, 27, 419-440. 2

1

2

Chapter 1. Résumé des Travaux et Publications

variationnelles. Ces dernières essayent d’étendre les solutions de viscosité à des problèmes de contrôle où la dynamique ne suit pas nécessairement une équation diﬀérentielle. Enﬁn, une telle caractérisation permet, comme les solutions de viscosité dans les jeux diﬀérentiels, de construire des schémas d’approximations numériques. J’ai appliqué cette approche dans le cadre des jeux répétés à information incomplète des deux côtés5 puis pour les jeux de splitting6 (jeux stochastiques où les joueurs contrôlent partiellement une martingale). Dans le prolongement de ces idées, l’approche variationnelle m’a permis de donner une nouvelle de l’existence de la valeur asymptotique ainsi qu’une formule explicite de celle-ci dans les jeux stochastiques absorbants.7 L’approche variationnelle avait la fâcheuse limite de ne s’appliquer qu’à la classe restreinte des jeux escomptés quand le taux d’escompte tend vers 0 (ce qui correspond à étudier la moyenne d’Abel). Après plusieurs tentatives pour l’étendre aux jeux ﬁniment répétés quand le nombre de répétitions tend vers l’inﬁni (ce qui correspond à étudier la moyenne de Césaro), nous sommes parvenus récemment, en collaboration avec Pierre Cardaliaguet et Sylvain Sorin,8 à généraliser l’approche variationnelle à n’importe quelle manière d’évaluer la suite des paiements dans le jeu répété. Pour cela, il suﬃt de plonger le jeu répété dans un jeu limite en temps continu et d’interpréter chaque manière d’évaluer les paiements du jeu répété comme une discrétisation particulière du jeu limite en temps continu. Le problème de l’existence de la valeur asymptotique est réduit à prouver la convergence des schémas de discrétisation du jeu en temps continu quand le pas de la discrétisation tend vers zéro. De manière étonnante, les preuves de convergence sont courtes, assez élégantes et utilisent explicitement les solutions de viscosité. Ce travail permet, enﬁn, de répondre entièrement à la problématique qui m’occupe depuis ma thèse et uniﬁe des résultats existants dans plusieurs domaines (jeux répétés, jeux stochastiques et jeux diﬀérentiels).

1.2

Jeux d’Arrêt en Temps Continu

Je me suis intéressé à la classe des jeux d’arrêt et de timing, motivé par la multitude d’applications en économie, biologie et ﬁnance entre autres. Le modèle est très simple à décrire. On considère plusieurs joueurs qui peuvent chacun choisir à quel moment agir (qui est modélisé par un temps d’arrêt). Le paiement (dans le cas stochastique) est une variable aléatoire qui dépend du plus petit temps d’arrêt et de l’identité des joueurs qui ont arrêté en premier. 5

Laraki, R. (2001). Variational Inequalities, System of Functional Equations and Incomplete Information Repeated Games. SIAM Journal on Control and Optimization, 40, 516-524. 6 Laraki R. (2001). The Splitting Game and Applications. International Journal of Game Theory, 30, 359-376. 7 Laraki R. (2010). Explicit Formulas for Repeated Games with Absorbing States. International Journal of Game Theory, 39, 53-70. 8 Cardaliaguet P., R. Laraki and S. Sorin (2011). A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, 2011-11. A paraître dans SIAM Journal Control Optimization

Régularité en Optimisation et Contrôle

3

Avec Eilon Solan en 2005, nous avons prouvé9 que tout jeu d’arrêt à somme nulle qui satisfait la condition minimale que les processus de paiements sont continus à droite et uniformément intégrables admet un équilibre de Nash (approché) en stratégies mixtes. Nous venons d’étendre ce résultat au cadre de jeux à deux joueurs et à somme non-nulle.10 Par ailleurs, en collaboration avec Eilon Solan et Nicolas Vieille, nous étudions11 dans le cadre déterministe ce problème pour un nombre arbitraire de joueurs. Nous montrons que pour deux joueurs, un équilibre sous-jeu-parfait approché existe toujours. Pour trois joueurs ou plus, un équilibre de Nash, même approché, n’existe pas nécessairement (ce qui est très surprenant). Nous avons aussi caractérisé plusieurs classes au niveau des applications pour lesquelles un équilibre de Nash sous-jeu-parfait, stationnaire, symétrique ou Markovien existe.

1.3

Régularité en Optimisation et Contrôle

L’étude des jeux répétés a fait apparaître une question cruciale jusque là sans réponse en analyse convexe (depuis que Kruskal12 a soulevé la question en 1969). Le problème concerne la conservation des propriétés de régularité d’une fonction réelle déﬁnie sur un ensemble convexe compact X lors du passage à l’enveloppe convexe sur X. Kruskal a montré (en dimension 3) que l’enveloppe convexe d’une fonction continue peut être discontinue. La raison est liée au fait que l’ensemble des points extrêmes de X n’est pas fermé. Depuis, la question suivante est restée ouverte: sous quelle condition topologique sur la géométrique de X, l’enveloppe convexe de toute fonction continue est-elle aussi continue? Je suis parvenu à montrer13 que pour préserver la continuité en dimension ﬁnie, il faut et il suﬃt que la limite (au sens de Hausdorﬀ) d’une suite de faces de X soit aussi une face de X. En particulier, je montre que la propriété de fermeture de l’ensemble des points extrêmes est une condition nécessaire et suﬃsante en dimension 3 tandis que cette condition est nécessaire et non suﬃsante en dimension supérieure. Ceci explique bien le contre exemple de Kruskal. Pour préserver la continuité de Lipschitz en dimension ﬁnie, je montre qu’il faut et il suﬃt que X soit un polytope. J’ai étudié aussi la préservation de la constante de Lipschitz et montre qu’elle peut être préservée si X est un produit de simplexes. L’implication inverse est un problème encore ouvert. Il peut par contre être démontré qu’il est impossible de préserver la constante de Lipschitz pour tout polytope. 9

Laraki R. and E. Solan (2005). Stopping Games in Continuous Time. SIAM Journal on Control and Optimization, 43, 1913-1922. 10 Laraki R. and E. Solan (2010). Equilibrium in Two Player Non-Zero-Sum Dynkin Games in Continuous Time, preprint arXiv:1009.5627v1. 11 Laraki R., E. Solan and N. Vieille, (2005). Continuous-Time Games of Timing. Journal of Economic Theory,120, 206-238. 12 Kruskal, J. B. (1969). Two Convex Counterexamples: a Discontinuous Envelope Function and a non Differentiable Nearest-Point Mapping, Proceedings of the American Mathematical Society, 23, 697-703. 13 Laraki R. (2004). On the Regularity of the Convexification Operator on a Compact Set. Journal of Convex Analysis, 11, 209-234.

4

Chapter 1. Résumé des Travaux et Publications

L’opérateur de convexiﬁcation peut être vu comme un cas particulier d’un opérateur valeur d’un problème de contrôle stochastique (qui est équivalent au modèle de maison de jeu introduit par Dubins et Savage en 1965). Cela nous a amené, en collaboration avec William Sudderth, à étendre14 les résultats précédents. Nous donnons ainsi les conditions nécessaires et suﬃsantes sur les données d’un problème MDP (Markov Decision Process) en temps discret et horizon ﬁni ou inﬁni aﬁn que la fonction valeur soit régulière (nous traitons la continuité, et les propriétés de Lipschitz et de Hölder). Nous appliquons ensuite nos résultats au problème d’arrêt optimal et à une classe de MDP (les Casinos) qui inclue certains modèles de mathématique ﬁnancière en temps discret, mais aussi à l’étude asymptotique des jeux de splitting. Enﬁn, en collaboration avec Olivier Gossner et Tristan Tomala, nous introduisons un nouveau concept de corrélation optimale15 étant donné une structure d’information. Ce travail étroitement lié à la théorie de l’information de Shanon. Nous l’appliquons ensuite pour trouver des formules explicites de la valeur asymptotique de certains jeux répétés à observation imparfaite.

1.4

Développement d’Algorithmes

Je me suis intéressé aux aspects algorithmiques et numériques en collaboration avec JeanBernard Lasserre. Il a introduit en 2001 de nouveaux outils en optimisation globale combinant la programmation semi-déﬁnie, la théorie des moments et la géométrie semialgébrique. Ma première collaboration avec Lasserre développe un algorithme qui calcule l’enveloppe convexe16 d’une fonction. Ce problème est réputé diﬃcile. En eﬀet, pour calculer l’enveloppe convexe d’une fonction en un point, il faut connaître cette fonction partout. A notre connaîssance, il n’existe aucun algorithme qui approche uniformément une fonction convexe par une suite de fonctions convexes. Nous proposons un tel algorithme pour une classe large de problèmes (toute fraction rationnelle déﬁnie sur un ensemble compact semi-algébrique). Chaque étape du calcul se résume à résoudre un problème SDP (Semi-Deﬁnite-Programming) calculable en temps polynomial. Dans un jeu ﬁni, un proﬁl de stratégies mixtes (i.e. des probabilités sur les stratégies pures) est un équilibre de Nash si aucun joueur n’a intérêt à dévier unilatéralement. C’est de loin le concept le plus important en théorie des jeux. Il est donc primordial de développer des algorithmes eﬃcaces pour calculer ou approximer un équilibre mixte ou tous les équilibres mixtes. Or jusque là, presque tous les algorithmes pour calculer les équilibres de Nash mixtes d’un jeu ﬁni sont uniquement basés sur des techniques 14

Laraki R. and W. D. Sudderth (2004). The Preservation of Continuity and Lipschitz Continuity by Optimal Rewards Operators. Mathematics of Operations Research, 29, 672-685. 15 Gossner O., R. Laraki and T. Tomala (2009). Infomationally Optimal Correlation. Mathematical Programming B, suméro spécial en l’honneur d’Alfred Auslender, 116, 147-172. 16 Laraki R. and J.-B. Lasserre (2008). Computing Uniform Convex Approximations for Convex Envelopes and Convex Hull. Journal of Convex Analysis, 11, 635-654.

Rationalité Stratégique: Nouveaux Concepts

5

d’homotopie.17 Ces algorithmes sont connus pour être exponentiels, ne convergent que génériquement, et sélectionnent un seul équilibre du jeu. Avec Lasserre, nous montrons18 que plusieurs problèmes en théorie des jeux peuvent être approximés par une hiérarchie de relaxation SDP. Cela permet en particulier d’approximer toujours un équilibre de Nash du jeu, et, sous une condition de rang qui en pratique est satisfaite dans toutes les simulations aléatoires, de calculer tous les équilibres de Nash du jeu. Enﬁn, nous étendons l’algorithme du problème des moments géneralisés au cadre plus général et non trivial des jeux polynomiaux à somme nulle (en somme non-nulle est un problème encore ouvert).

1.5

Rationalité Stratégique: Nouveaux Concepts

Dans plusieurs applications, comme les enchères, la concurrence à la Bertrand ou encore les jeux stochastiques, les théorèmes classiques de points ﬁxes ne s’appliquent pas pour prouver l’existence d’un équilibre ou d’un ǫ-équilibre car les jeux en question sont discontinus. Pourtant, ils admettent bien des équilibres ou des équilibres approximatifs. Pourquoi? Reny (1999) a proposé une explication dans un article célèbre. En collaboration avec Philippe Bich, nous venons19 de généraliser les résultats de Reny, et d’uniﬁer la plupart des résultats qui ont suivi sur l’existence d’un équilibre de Nash dans les jeux discontinus ou non quasi-concaves. Cela a été possible grâce à l’introduction d’une nouvelle notion d’équilibre relaxé. Le concept de décision et de rationalité, que nous généralisons à plusieurs joueurs, est trivial et intuitif dans le cadre d’un seul joueur décideur. Il dit qu’un joueur rationnel cherchant dans un jeu où l’optimum global n’est pas atteint va jouer proche de l’optimum et espérera gagner un paiement proche du paiement optimal. Comment cela se généralise t-il à plusieurs joueurs tout en maintenant une certaine propriété d’équilibre? Ceci s’inscrit donc dans une logique de relaxation de la rationalité individuelle. Dans la logique inverse, j’ai introduit un nouveau concept qui renforce la rationalité individuelle, que j’ai appellé la rationalisabilité robuste. Il est basé sur l’idée suivante. Dans un jeu ﬁni, un joueur, incertain sur ce que vont faire les autres joueurs, gardera seulement les stratégies qui sont meilleure réponse à un ensemble large de stratégies mixtes (i.e. ouvert). Ce concept est axiomatisé, étudié et est comparé aux précédents dans la littérature.20 Dans la même logique, je me suis intéréssé à la stabilité de l’équilibre suite à des déviations de certaines coalitions. En eﬀet, les deux notions importantes d’équilibre en théorie des jeux sont celles de Nash (demandant la stabilité contre toute déviation d’un 17

Herings P.J.-J. and R. Peeters (2009). Homotopy Methods to Compute Equilibria in Game Theory. To appear in Economic Theory. 18 Laraki R. and J.-B. Lasserre (2010). Semidefinite Programming for Min-max Problems and Games. Mathematical Programming A, online. 19 Bich P. and R. Laraki (2011). Relaxed Equilibria in Discontinuous Games. Preprint. 20 Laraki R. (2011). Robust Rationalizability, Strategic Induction and Equilibrium Refinement. Preprint.

6

Chapter 1. Résumé des Travaux et Publications

joueur) et d’Aumann (demandant la stabilité contre toute déviation d’une coalition de joueurs). Prouver l’existence pour chaque type d’équilibre nécessite des conditions et outils mathématiques complètement diﬀérents. Je viens d’établir un nouveau théorème de point ﬁxe et d’introduire une nouvelle notion d’équilibre21 (l’équilibre coalitionnel) qui généralise ces deux notions à la fois. Cela permet d’uniﬁer les conditions d’existence en une seule. L’équilibre coalitionnel demande la stabilité contre toute déviation d’une coalition admissible de joueurs. L’ensemble de toutes les coalitions admissibles est exogène (par exemple une cométition politique, on peut supposer qu’il n’est pas imaginable que l’extrême gauche et l’extrême droite puissent former une coalition). Dans le cadre de l’équilibre de Nash, l’ensembles des coalitions admissibles est l’ensemble de tous les singletons de joueurs. Dans le cade de l’équilibre fort de Aumann, c’est l’ensemble de toutes les coalitions de joueurs.

1.6

Choix Social: Modèle et Méthode Nouveaux

Le problème fondamental de la théorie du choix social est d’amalgamer les «opinions», «évaluations» ou «ordres de préférences» de plusieurs «juges» ou «électeurs» pour arriver à une «opinion», «évaluation» ou «ordre de préférence» du jury ou de l’électorat. Le modèle traditionnel imagine que chaque juge ou électeur a une «liste de préférence» où le premier candidat (ou alternatif) sur sa liste est son préféré, le deuxième son préféré si le premier n’est pas classé premier par la collectivité, et ainsi de suite. En vote, chaque mode de scrutin est une règle de jeu pour élire un président, un député ou un représentant. L’étude formelle des modes de scrutins est une tradition française qui a commencé un peu avant la révolution française par les mathématiciens et académicens des sciences, Condorcet, Laplace et Borda. Condorcet a montré dans son fameux livre, publié en 1785, qu’il est possible que la société préfère à la majorité le candidat A au candidat B, le candidat B au candidat C et le candidat C au candidat A. C’est le célèbre paradoxe de Condorcet. Dans l’ignorance des travaux de Condorcet, Kenneth Arrow a réintroduit en 1951 le modèle de Condorcet en supposant que chaque votant doit soumettre un rangement de tous les candidats (du pire au meilleur) et le but est de chercher un mécanisme qui agrège ces votes et range les candidats du meilleur au pire, le premier du rangement étant le gagnant. Arrow a montré qu’il n’existe aucun bon mécanisme: ou bien une règle n’est pas transitive (elle est sujette au paradoxe de Condorcet) ou bien elle est telle que le retrait d’un perdant peut changer le gagnant (elle est sujette au paradoxe d’Arrow). L’élection présidentielle de 2002 en France est une parfaite illustration du paradoxe d’Arrow. Le retrait de Jean-Pierre Chévènement ou de Christiane Taubirat aurait pu qualiﬁer Lionel Jospin au second tour de l’élection. De plus, Lionel Jospin aurait pu battre Jacques Chirac (selon plusieurs sondages de l’époque). Ou encore, le retrait de Ralph 21

Laraki R. (2009). Coalitional Equilibria of Strategic Games. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique. Une nouvelle version est disponible sur demande.

Choix Social: Modèle et Méthode Nouveaux

7

Nader aux USA en 2000 aurait pu permettre à Albert Gore de gagner les voix électorales de la Floride et donc de gagner l’élection présidentielle américaine contre George W. Bush. Ces deux exemples (et bien d’autres, comme par exemple en patinage artistique) montrent la possibilité sérieuse de ces deux paradoxes et la nécessité d’y remédier. Arrow a obtenu le prix Nobel en 1972 pour son célèbre théorème d’impossibilité, mais aussi pour ses contributions dans la théorie de l’équilibre général. Son fameux livre, publié en 1951, a initié la théorie du choix social qui est devenue la base théorique de l’économie politique et normative. D’autres théorèmes d’impossibilité ont été établis par la suite. Le plus important est celui de Gibbard et Satterthwaite.22 Ce théorème dit qu’il n’existe pas de mode de scrutin où la stratégie dominante de chaque votant consiste à voter sincèrement. L’élection présidentielle de 2007 en France est une parfaite illustration de ce phénomène. En réaction à ce qui s’est passé en 2002, plusieurs électeurs ont voté “utile” car ils ont compris que leurs intérêts étaient mieux servis en votant pour un candidat majeur que pour leur candidat mineur préféré. Ces théorèmes d’impossibilité engendrent l’idée qu’il n’existe aucune «bonne» méthode pour élire ou pour ranger. J’ai commencé à travailler avec Michel Balinski juste après mon entrée au CNRS en 2001 où j’ai participé à l’organisation d’une expérience électorale pendant l’élection présidentielle de 2002 à Orsay. L’expérience visait à tester en grandeur nature «le vote par assentiment». Suite à la publication d’un dossier dans Pour la Science, un œnologue de renom (Jaques Blouin) nous a contactés pour nous demander notre avis sur les méthodes utilisées pour classer les vins par un jury d’œnologues. Ces méthodes ne sont pas couvertes par le cadre traditionnel du choix social. Ils n’est pas demandé aux juges de ranger les candidats du meilleur au pire. Il est demandé à chacun d’évaluer chaque caractéristique (goût, couleur, odeur, etc), dans l’échelle (excellent, très bon, bon, moyen, médiocre, mauvais). Nous avons alors étudié en détail plusieurs applications (patinage artistique, gymnastique, plongeons, saut à ski, compétitions de vin ou de musique) pour comprendre ce qui les lient et les séparent. Cela nous a amenés à élaborer une nouvelle théorie du choix social: le jugement majoritaire.23 Dans le nouveau paradigme, nous supposons l’existence d’un ingrédient supplémentaire qui permet de mesurer les mérites des compétiteurs ou candidats: une échelle commune d’évaluations (comme les T dans Télérama ou les étoiles de Michelin). Nous l’appelons un «langage commun» car l’échelle doit être absolue et comprise à peu près de la même façon par tous les électeurs ou juges. C’est un peu comme l’échelle [0,20] en 22

Gibbard A. (1973). “Manipulation of voting schemes: a general result.” Econometrica vol. 41 587-601. Satterthwaite M. A. (1973). “Strategy-proofness and Arrow’s conditions: existence and correspondence theorems for voting procedures and social welfare functions.” Journal of Economic Theory 10 187-217. 23 Balinski M. and R. Laraki (2010). Majority Judgement: Measuring Ranking and Electing, MIT Press.

8

Chapter 1. Résumé des Travaux et Publications

France utilisée pour noter les élèves et étudiants. Par opposition, classer comme dans le modèle d’Arrow et Condorcet est relatif. Le rang d’un candidat dans le classement d’un juge change si on rajoute ou supprime un candidat. Cette modélisation du problème permet de se débarrasser des impossibilités de la théorie classique. Nous montrons que le jugement majoritaire est l’unique règle qui évite les paradoxes de Condorcet et d’Arrow et qui résiste le mieux aux diﬀérentes manipulations stratégiques. Enﬁn, nous avons testé le jugement majoritaire au premier tour de l’élection présidentielle de 2007 dans trois bureaux de vote d’Orsay, lors des primaires solialistes et dans une compétition de vin à Bordeaux en 2008.24 Le jugement majoritaiure a aussi été appliqué avec succès (1) en 2009 pour attribuer un grand prix de journalisme25 par la Nieman Foundation of Harvard University, (2) en 2008 pour le recrutement d’un maître de conférences en mathématiques à l’Université de Montpellier II, et (3) en 2010 pour élire un membre de la British Academy en sciences politiques.

1.7

Publications

Articles Scientifiques 1. The Splitting Game and Applications. International Journal of Game Theory, 30, 359-376 (2001). 2. Variational Inequalities, System of Functional Equations and Incomplete Information Repeated Games. SIAM Journal on Control and Optimization, 40, 516-524 (2001). 3. Repeated Games with Lack of Information on One Side: the Dual Diﬀerential Approach. Mathematics of Operations Research, 27, 419-440, (2002). 4. On the Regularity of the Convexiﬁcation Operator on a Compact Set. Journal of Convex Analysis, 11, 209-234, (2004). 5. The Preservation of Continuity and Lipschitz Continuity by Optimal Rewards Operators. Mathematics of Operations Research, 29, 672-685, avec W. D. Sudderth, (2004). 6. Continuous-Time Games of Timing. Journal of Economic Theory, 120, 206-238, avec E. Solan et N. Vieille, (2005). 7. Stopping Games in Continuous Time. SIAM Journal on Control and Optimization, 43, 1913-1922, avec E. Solan, (2005). 24 25

Citadelles du vin. Louis Lyons Award for Conscience and Integrity in Journalism.

Publications

9

8. A Theory of Measuring, Electing and Ranking. Proceedings of the National Academy of Sciences USA, 104, 8720-8725, avec M. Balinski, (2007). 9. Computing Uniform Convex Approximations for Convex Envelopes and Convex Hull. Journal of Convex Analysis, 635-654, avec J.-B. Lasserre, (2008). 10. Monotone Incompatibility Betwenn Electing and Ranking. Economics Letters, 105, 145-147, M. Balinski et A. Jennings, (2009). 11. Infomationally Optimal Correlation. Mathematical Programming B, 116,147-172, avec O. Gossner et T. Tomala, (2009). 12. Explicit Formulas for Repeated Games with Absorbing States. International Journal of Game Theory, 39, 53-70, (2010). 13. Semideﬁnite Programming for Min-max Problems and Games. Mathematical Programming A, avec J.-B. Lasserre, publié en ligne (2010). 14. Election by Majority Judgement: Experimental Evidence. Chapter in the book: In Situ and Laboratory Experiments on Electoral Law Reform: French Presidential Elections, edited by Bernard Dolez, Bernard Grofman and Annie Laurent. Springer, avec M. Balinski (2010). 15. A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, 2011-11. A paraître dans SIAM Journal Control Optimization, avec P. Cardaliaguet et S. Sorin.

Livres 1. Majority Judgement: Measuring Ranking and Electing, MIT Press, avec M. Balinski, (2010). 2. Bases Mathématiques de la Théorie des Jeux. Polycopié de cours. Editions de l’Ecole Polytechnique, avec J. Renault et S. Sorin, (2010). 3. Théorie des Jeux: Introduction à la Théorie des Jeux Répétés. Journées Mathématiques XUPS. Editions de l’Ecole Polytechnique. Editeurs: Nicole Berline, Alain Plagne & Claude Sabbah, avec J. Renault et T. Tomala, (2006). 4. Théorie des Jeux. Polycopié de cours. Editions de l’Ecole Polytechnique, avec S. Zamir, (2003).

10

Chapter 1. Résumé des Travaux et Publications

Cahiers de Recherche 1. Le vote par Assentiment : une Expérience. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, avec M. Balinski, J.-F. Laslier et K. Van der Straeten (2003). 2. Coalitional Equilibria of Strategic Games. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, (2009). Une nouvelle version est disponible sur demande. 3. Equilibrium in Two Player Non-Zero-Sum Dynkin Games in Continuous Time, preprint arXiv:1009.5627v1, avec E. Solan (2010). En révision. 4. Irreversible Games with Incomplete Information: The Asymptotic Value. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, (2010). 5. Judge: Don’t Vote. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, avec M. Balinski, (2010). En révision.

Articles en Cours 1. Relaxed Equilibria in Discontinuous Games. En cours avec P. Bich, disponible sur demande, (2011). 2. Robust Rationalizability, Strategic Induction and Equilibrium Reﬁnement, disponible sur demande, (2011). 3. Advances in Zero-Sum Repeated Games. En cours avec S. Sorin. A paraître dans Handbook of Game Theory, IV.

Articles Grand Public 1. Le pouvoir des votes. Pour la Science, avec M. Balinski, (Avril, 2002). 2. Expérience Electorale du Vote par Assentiment. Pour la Science, avec M. Balinski, J.-F. Laslier et K. Van der Straeten, (Juin 2002). 3. Le Dilemme du Vote Utile. Le Monde, avec M. Balinski, (30 Mars, 2007). 4. Le Jugement Majoritaire : l’Expérience d’Orsay. Commentaire, 30(118), 413-420, avec M. Balinski, (été, 2007). 5. Une des Racines du Mal Socialiste. Libération, avec M. Balinski, (26 novembre 2008). 6. David Gale In Paris. Games and Economic Behavior, 66, 594-597, avec M. Balinski et S. Sorin, (2009). 7. PS : Le jugement Majoritaire, Meilleur Système de Primaires. Rue 89, avec M. Balinski, (30 aout 2010).

Chapter 2 Repeated Games and Continuous Time “When people interact, they have usually interacted in the past, and expect to do so again in the future. It is this ongoing element that is studied by the theory of repeated games.” 1 Aumann and Maschler introduced in 1966-1968 repeated games with incomplete information. Their approach was revolutionary in many aspects and has generated a huge and deep literature, some of which is described below. One of the main contributions was the conceptual distinction of the several ways of evaluating the stream of payoﬀs in long interactions. The ﬁnitely repeated game Γn has “a deﬁnite duration, n, on which the players can base their strategies. Indeed, optimal strategies in Γn may be quite diﬀerent for diﬀerent n”. On the other hand, in the inﬁnitely repeated game Γ∞ “the strategies are by deﬁnition independent of n. Thus, Γ∞ reﬂects properties of the game Γn that hold “uniformly” in the duration n [...] By using an optimal strategy in Γ∞ (if there is one), a player guarantees in one fell swoop that in suﬃciently long ﬁnite truncation, the outcome will not be appreciably worse in each Γn .” 2 If vn denotes the value of the ﬁnitely repeated game Γn , limn→∞ vn “tells the analyst something about repetitions that are “long”, without his having to know how long. But lim vn is only meaningful as the limit of values of games whose duration is precisely known to the players [...] To analyze a situation in which the players themselves know only that the game is “long”[...], Γ∞ is the appropriate model.” 3 Let v∞ denote the uniform value of the repeated game (i.e. the value of Γ∞ if it exists). In addition to v∞ and lim vn , “there are two speciﬁc models of “long” repetitions that warrant discussion. The ﬁrst is the limit of the values vλ of the discounted game Γλ as the discounted rate goes to zero [...] this is conceptually closer to lim vn than to v∞ [...] [and] most of the above discussion applies when Γλ is substituted for Γn [...] On the other hand, discounted games Γλ are like Γ∞ –and unlike Γn – in that they have no ﬁxed, commonly known last stage [...] [and] it admits [optimal] strategies with some kind of 1

Aumann and Maschler (1995), page Xi. Aumann and Maschler (1995), page 131. 3 Aumann and Maschler (1995), page 132. 2

11

12

Chapter 2. Repeated Games and Continuous Time

stationarity property.” 4 For repeated games with incomplete information on one side, Aumann and Maschler proved that v∞ exists and so the asymptotic values lim vn and lim vλ exist and are equal to v∞ , using martingale tools. More importantly, they provide an explicit formula for the common value (their famous Cav(u) theorem) and show that v∞ does not always exist for repeated games with incomplete information on both sides. Mertens and Zamir (1971) proved the existence of the asymptotic values lim vn = lim vλ in repeated games with incomplete information on both sides and provided an elegant system of functional equations that characterizes the common limit. The proof is very involved and uses martingale tools as well as convex analysis. Aumann and Maschler extended the existence of v∞ and its characterization to repeated games with incomplete information on one side, imperfect monitoring and state dependent signaling (that is, the players do not fully observe the past moves of the opponents but only a signal that may depend, deterministically or stochastically, on the true state and the last moves). To study repeated games with symmetric incomplete information on both sides, imperfect monitoring and state dependent public signaling, Kohlberg and Zamir (1974) reduced the existence of the uniform value in the deterministic case to the study of an absorbing game Γ∗ . Combined with the result of Kohlberg (1974), this implies the existence of the uniform value. Repeated games with absorbing states, in short absorbing games, are stochastic games in which only one state is non-absorbing. Stochastic games are repeated games in which a state variable follows a Markov chain controlled by the actions of the players. Shapley (1953) introduced the two player zero-sum model with ﬁnitely many states and actions (the ﬁnite model). He proved the existence of the value vλ in the λ-discounted game by introducing a dynamic programming principle (called the Shapley operator). The idea of the Kohlberg and Zamir (1974) reduction is simple: each time an informative pair is played, the identity of the true state is revealed (i.e. the game is absorbed). “At about this time, it was realized that the game Γ∗ are particular instances of “stochastic games” in the sense of Shapley [...] Motivated by the above application to repeated games, Bewley and Kohlberg managed to prove that lim vn exists for all stochastic games [...] But thought they tried hard, and obtained important partial results, they were unable to prove that v∞ exists for all stochastic games. This diﬃcult problem was ﬁnally solved (positively) by Mertens and Neyman (1981).” 5 The Kohlberg and Zamir reduction has been extended by Neyman and Sorin (1998) to establish the existence of uniform equilibria in multi-player repeated games with symmetric incomplete information and non-deterministic public signaling. Using an operator approach, Kohlberg (1974) proved the existence of the uniform (and hence asymptotic) value in any ﬁnite absorbing game with full monitoring. The operator approach uses the additional information obtained from the derivative of the Shapley operator at λ = 0 to deduce the existence of lim vλ and its characterization 4 5

Aumann and Maschler (1995), page 139. Aumann and Maschler (1995), page 217.

13 via variational inequalities. Rosenberg and Sorin (2001) extended the Kohlberg operator approach for the asymptotic values to a large class of stochastic games that includes (1) compact and separately-continuous absorbing games6 and (2) repeated games with incomplete information on both sides An algebraic approach allows Bewley and Kohlberg (1976a, 1976b) to prove the existence of the asymptotic values lim vλ = lim vn in every ﬁnite stochastic game. Unlike lim vλ , the proof for lim vn is quite involved. The breakthrough comes when Mertens and Neyman (1981) prove the existence of the uniform value v∞ in every ﬁnite stochastic game with full monitoring. To study long interactions, ﬁxed-point theorems are not in general suﬃcient and more sophisticated methods need to be devised. Proving the existence of the asymptotic value or of the uniform value is an important theoretical contribution, but ﬁnding an explicit formula or a variational characterization between the data of the game and its value (as did Aumann-Maschler (1995) and Mertens-Zamir (1971)) allows numerical computations and enables the study of how changes in the underlying data aﬀect the value of the game. Unfortunately, very few repeated games have an explicit formula or a variational characterization for the asymptotic or uniform values. Inspired by diﬀerential game theory, I introduced in a series of papers, the variational approach for discounted stochastic games7 . The approach allows us to prove the existence of lim vλ and characterizes the limit explicitly or through variational inequalities. Only recently, the approach has been extended by Cardaliaguet, Laraki and Sorin (2011) to prove the existence of lim vn and also the existence of the asymptotic value of any evaluation of the stream of payoﬀs. This chapter is a synthesis of these papers and shows that the same tool can be used to study many repeated game models. More precisely, techniques which are typical of continuous time games (viscosity solutions) can be used to prove the convergence of lim vλ , as well as the convergence of lim vn . The originality of our approach is that it provides the same proof for both cases. It also allows us to handle general decreasing evaluations of the stream of stage payoﬀs. Our approach is illustrated by three classical problems: (1) repeated games with incomplete information on both sides, ﬁrst analyzed by Mertens-Zamir (1971), (2) splitting games, introduced by Laraki (2001a,b), and (3) absorbing games, studied in particular by Kohlberg (1974). The spirit of the approach is similar to the one (1) in Laraki (2002), where it is shown that the dual of a repeated game with incomplete information on one side is a discretization of a diﬀerential game (resolving De Meyer and Rosenberg’s conjecture), and (2) in Vieille (1992), where it is shown, again using diﬀerential games, that every set is either weakly approachable or weakly excludable. This chapter is based on the papers Laraki (2001a, 2001b, 2002, 2010) and Cardaliaguet, Laraki and Sorin (2011). 6

Meaning that action sets are compact and metric and payoff and transition functions are separatelycontinuous. 7 Laraki (2001a, 2001b, 2010)

14

2.1 2.1.1

Chapter 2. Repeated Games and Continuous Time

Preliminaries Discounted Stochastic Games

A stochastic game is a repeated game where the state changes from stage to stage according to a transition depending on the current state and the moves of the players. We consider the two person zero-sum case. The game is speciﬁed by a state space Ω, move sets I and J, a transition probability ρ from I × J × Ω → ∆(Ω) and a payoﬀ function g from I × J × Ω → IR. All sets A under consideration are ﬁnite and ∆(A) denotes the set of probability measures on A. Inductively, at stage n = 1, ..., knowing the past history hn = (ω1 , i1 , j1 , ...., in−1 , jn−1 , ωn ), player 1 chooses in ∈ I, player 2 chooses jn ∈ J. The new state ωn+1 ∈ Ω is drawn according to the probability distribution ρ(in , jn , ωn ). The triplet (in , jn , ωn+1 ) is publicly announced and the situation is repeated. The payoﬀ at stage n is gn = g(in , jn , ωn ) and P the total payoﬀ is the discounted sum n λ(1 − λ)n−1 gn . This discounted game has a value vλ (Shapley, 1953). The Shapley operator T(λ, ·) associates to a function f on IRΩ the function: X T(λ, f )(ω) = val∆(I)×∆(J) [λg(x, y, ω) + (1 − λ) ρ(x, y, ω)(˜ ω)f (˜ ω )] (2.1) ω ˜

P where g(x, y, ω) = Ex,y g(i, j, ω) = i,j xi yj g(i, j, ω) is the multilinear extension of g(., ., ω) and similarly for ρ(., ., ω), and val is the value operator val∆(I)×∆(J) = max min = min max . x∈∆(I) y∈∆(J)

y∈∆(J) x∈∆(I)

The Shapley operator T(λ, ·) is well deﬁned from IRΩ to itself. Its unique ﬁxed point is vλ .

2.1.2

Repeated Games

A recursive structure leading to an equation similar to the previous one (2.1) holds in general for repeated games described as follows. M is a parameter space and g a function from I × J × M to IR. For each m ∈ M this deﬁnes a two person zero-sum game with action spaces I and J for player 1 and player 2 respectively and payoﬀ function g. The initial parameter m1 is chosen at random and the players receive some initial information about it, say a1 (resp. b1 ) for player 1 (resp. player 2). This choice is performed according to some initial probability π on A × B × M, where A and B are the signal sets of both players. At each stage n, player 1 (resp. 2) chooses an action in ∈ I (resp. jn ∈ J). This determines a stage payoﬀ gn = g(in , jn , mn ), where mn is the current value of the parameter. Then a new value of the parameter is selected and the players get some information. This is generated by a map ρ from I × J × M to probabilities on A × B × M. Hence at stage n a triple (an+1 , bn+1 , mn+1 ) is chosen

Preliminaries

15

according to the distribution ρ(in , jn , mn ). The new parameter is mn+1 , and the signal an+1 (resp. bn+1 ) is transmitted to player 1 (resp. player 2). Note that each signal may reveal some information about the previous choice of actions (in , jn ) and both the previous (mn ) and the new (mn+1 ) values of the parameter. Stochastic games correspond to public signals including the parameter. Incomplete information games correspond to an absorbing transition on the parameter (which thus remains ﬁxed) and no further information on the parameter after the initial one is given to the players. Mertens, Sorin and Zamir (1994) associate to each general repeated game G an auxiliary stochastic game Γ having the same discounted value that satisﬁes a recursive equation of the type (2.1). However, the strategies in both games diﬀer. In repeated games with incomplete information on both sides, M is a product space K × L, π is a product probability p × q with p ∈ P = ∆(K), q ∈ Q = ∆(L) and in addition a1 = k and b1 = ℓ. Given the parameter m = (k, ℓ), each player knows his own component and holds a prior on the other player’s component. From stage 1 on, the parameter is ﬁxed and the information of the players after stage n is an+1 = bn+1 = {in , jn }. The auxiliary stochastic game Γ corresponding to the recursive structure can be taken as follows: the “state space” Ω is P × Q and is interpreted as the space of beliefs on the true parameter. X = ∆(I)K and Y = ∆(J)L are the type-dependent mixed action sets of the players; P g is extended to X × Y × M by g(p, q, x, y) = k,ℓ pk q ℓ g(k, ℓ, xk , y ℓ). P Given (p, q, x, y), let x(i) = k xki pk be the probability of action i and p(i) be the conp k xk ditional probability on K given the action i, explicitly pk (i) = x(i)i (and similarly for y and q). In this framework the Shapley operator is deﬁned on the set F of continuous concaveconvex functions on P × Q : X T(λ, f )(p, q) = valX×Y {λg(p, q, x, y) + (1 − λ) x(i)y(j)f (p(i), q(j))} (2.2) i,j

and vλ (p, q) is the unique ﬁxed point of T(λ, .) on F . These relations are due to Aumann and Maschler (1966-68) and Mertens and Zamir (1971).

2.1.3

General Evaluation

The basic formula expressing the discounted value as a ﬁxed point of the Shapley operator follows vλ = T(λ, vλ ). (2.3) It can be extended to games with alternative evaluations of the stream of payoﬀs {gn }. P For example the n-stage game with payoﬀ deﬁned by the Cesaro mean n1 nm=1 gm has a

16

Chapter 2. Repeated Games and Continuous Time

value vn and the recursive formula for these values is obtained similarly as 1 vn = T( , vn−1 ) n with obviously v0 = 0. More generally, to any probability measure µ on IN⋆ is associated a general evaluation of P the stream of payoﬀs deﬁned as follows. The corresponding payoﬀ in the game is n µn gn . P Note that µ induces a partition Π = {tn } of [0, 1] with t0 = 0, tn = nm=1 µm , ... and thus the repeated game is naturally represented as a game played between times 0 and 1, where the actions are constant on each subinterval (tn−1 , tn ), which length µn is the weight of stage n in the original game. Let vΠ be its value. The corresponding recursive equation is now vΠ = val{t1 g1 + (1 − t1 )EvΠt1 } where Πt1 is the normalization on [0, 1] of Π restricted to the interval [t1 , 1]. If one deﬁnes VΠ (tn ) as the value of the game starting at time tn , i.e. with evaluation µn+m for the payoﬀ gm at stage m, one obtains the alternative recursive formula VΠ (tn ) = val{(tn+1 − tn )gn+1 + EVΠ (tn+1 )}.

(2.4)

The stationarity properties of the game induce time homogeneity VΠ (tn ) = (1 − tn )VΠtn (0)

(2.5)

where, as above, Πtn stands for the normalization of Π restricted to the interval [tn , 1]. By taking the linear extension of VΠ (tn ) we deﬁne, for every partition Π, a function VΠ (t) on [0, 1]. Lemma 2.1 Assume that the sequence µn is decreasing. Then VΠ is C-Lipschitz in t, where C is a uniform bound on the payoffs in the game.

2.1.4

Classical Approaches

We consider now the asymptotic behavior of vn as n goes to ∞, or of vλ as λ goes to 0. For games with incomplete information on one side, the ﬁrst results proving the existence of limn→∞ vn and limλ→0 vλ are due to Aumann and Maschler (1966-1968) including, in addition, an identiﬁcation of the limit as Cav∆(K) u. Here u(p) = val∆(I)×∆(J)

X

pk g(x, y, k)

k

is the value of the one shot non revealing game, where the informed player does not use his information and CavC is the concaviﬁcation operator: given φ, a real bounded function deﬁned on a convex set C, CavC (φ) is the smallest function greater than φ and

Preliminaries

17

concave on C. Extensions of these results to games with lack of information on both sides were achieved by Mertens and Zamir (1971). In addition they identiﬁed the limit as the unique solution of the system of implicit functional equations with unknown φ: φ(p, q) = Cavp∈∆(K) min{φ, u}(p, q),

(2.6)

φ(p, q) = Vexq∈∆(L) max{φ, u}(p, q)

(2.7)

Here again u stands for the value of the non revealing game: X u(p, q) = val∆(I)×∆(J) pk q ℓ g(x, y, k, ℓ) k,ℓ

and we will write MZ for the corresponding operator φ = MZ(u).

(2.8)

As for stochastic games, the existence of limλ→0 vλ in the ﬁnite case (Ω, I, J ﬁnite) is due to Bewley and Kohlberg (1976) using algebraic arguments: the Shapley ﬁxed point equation can be written as a ﬁnite set of polynomial equalities and inequalities involving the variables {λ, xλ (ω), yλ(ω), vλ (ω); ω ∈ Ω} thus it deﬁnes a semi-algebraic set in some euclidean space IRN , hence by projection vλ has an expansion in Puiseux series of λ. The existence of limn→∞ vn is obtained by an algebraic comparison argument (Bewley and Kohlberg (1976)). The asymptotic values for speciﬁc classes of absorbing games with incomplete information are studied in Sorin (1984, 1985), see also Mertens, Sorin and Zamir (1994).

2.1.5

New Approaches

Starting with Rosenberg and Sorin (2001), the operator approach is a relatively new technique from which several existence results for the asymptotic value have been obtained. It is based on the Shapley operator. In the same spirit, I initiated the variational approach in discounted games and solved with this approach many classes of games, see Laraki (2001a, 2001b, 2010). The analysis of the asymptotic behavior for the discounted games is simpler because of its stationarity: vλ is a ﬁxed point of (2.3). Recently, Cardaliaguet, Laraki and Sorin extended the variational approach to more general evaluations of the stream of stage payoﬀs including the limit of Cesaro mean. Each evaluation of the stream of payoﬀs is interpreted as a discretization of an underlying continuous time game. It is proved for several classes of games (incomplete information, splitting, absorbing) that there exists a uniform limit of the values of the discretized continuous time game as the mesh of the discretization goes to zero. The basic recursive structure is used to formulate variational inequalities that

18

Chapter 2. Repeated Games and Continuous Time

have to be satisﬁed by any accumulation point of the sequences of values. Then an ad-hoc comparison principle allows to prove uniqueness, hence convergence. This technique is a simple transposition to discrete games of the numerical schemes used to approximate the value function of diﬀerential games via viscosity solution arguments, as developed in Barles-Souganidis (1991). The main diﬀerence is that, in our case, the limit equation is singular and does not satisfy the conditions usually required to apply the comparison principles.

2.2

Repeated Games with Incomplete Information

Let us brieﬂy recall the structure of repeated games with incomplete information: at the beginning of game the pair (k, ℓ) is chosen at random according to some product probability p ⊗ q where p ∈ P = ∆(K) and q ∈ Q = ∆(L). Player 1 is informed about k while player 2 is informed about ℓ. At each stage n of the game, player 1 (resp. player 2) chooses a mixed strategy xn ∈ X = ∆(I)K (resp. yn ∈ Y = ∆(J)K ). This determines a payoﬀ g(xn , yn , p, q). In the discounted case, the total payoﬀ is given by P n n λ(1 − λ) g(xn , yn , p, q) and we denote by vλ (p, q) the corresponding value. In this framework the Shapley operator is deﬁned on the set F of continuous concave-convex functions on P × Q : X T(λ, f )(p, q) = valX×Y {λg(p, q, x, y) + (1 − λ) x(i)y(j)f (p(i), q(j))} (2.9) i,j

P k k where, given (x, y, p, q), x(i) = k xi p is the probability of action i and p(i) is the p k xk conditional probability on K given the action i, namely: pk (i) = x(i)i (and similarly for y and q). Recall that vλ (p, q) is the unique ﬁxed point of T(λ, .) on F ((1), (15)).

2.2.1

Discounted Games

We now describe the analysis in the discounted case. We follow here Laraki (2001b). Note that the family of functions {vλ (p, q)} is C−Lipschitz continuous, where C is an uniform bound on the payoﬀs, hence relatively compact. To prove convergence it is enough to show that there is only one accumulation point (for the uniform convergence on P × Q). Observe that by (2.3) any accumulation point w of the family {vλ } will satisfy w = T(0, w) i.e. is a ﬁxed point of the projective operator, see Sorin (2002). P Explicitly here: T(0, w) = valX×Y { i,j x(i)y(j)w(p(i), q(j))} = valX×Y Ex,y,p,q w(˜ p, q˜), k l where p˜ = (p (i)) and q˜ = (q (j)). Let S be the set of ﬁxed points of T(0, ·) and S0 ⊂ S the set of accumulation points of the family {vλ }. Given w ∈ S0 , we denote by X(p, q, w) ⊆ ∆(I)K = X the set of optimal strategies for player 1 (resp. Y(p, q, w) ⊆ ∆(J)L = Y for player 2) in the projective game

Repeated Games with Incomplete Information

19

with value T(0, w) at (p, q). A strategy x ∈ X of player 1 is called non-revealing at p, x ∈ NRX (p), if p˜ = p a.s. (i.e. p(i) = p for all i ∈ I with x(i) > 0) and similarly for y ∈ Y. The value of the non revealing game satisﬁes (2.10)

u(p, q) = valN RX (p)×N RY (q) g(x, y, p, q) . A subset of strategies is non-revealing if all its elements are non-revealing. Lemma 2.2 Any w ∈ S0 must satisfy D1 and D2 where: • D1: If X(p, q, w) ⊂ NRX (p) then w(p, q) ≤ u(p, q). • D2: If Y(p, q, w) ⊂ NRY (q) then w(p, q) ≥ u(p, q).

Proof. Consider a family {vλn } converging to w and xn ∈ X optimal for T(λn , vλn )(p, q), see (2.2). Fix j ∈ J. Jensen’s inequality applied to (2.2) leads to vλn (p, q) ≤ λn g(p, q, xn , j) + (1 − λn )vλn (p, q),

∀j ∈ J .

Thus vλn (p, q) ≤ g(p, q, xn, j). If x¯ ∈ X is an accumulation point of the family {xn }, then x¯ is still optimal in T(0, w)(p, q). Since, by assumption X(p, q, w) ⊂ NRX (p), x¯ is non revealing and therefore one obtains as λn goes to 0: w(p, q) ≤ g(p, q, x ¯, j), ∀j ∈ J . So, by (2.10), w(p, q) ≤

max

min g(p, q, x, j) = u(p, q) .

x∈N RX (p) j∈J

Furthermore, a comparison principle is established. This idea goes back to Mertens and Zamir (1971). Lemma 2.3 Let w1 and w2 be in S that satisfies D1 and D2 respectively. Then w1 ≤ w2 . Consequently: Proposition 2.4 limλ→0 vλ exists and is the unique function that satisfies D1 and D2. This variational characterization can be shown to be equivalent to the Merten-Zamir system.

20

Chapter 2. Repeated Games and Continuous Time

2.2.2

Finitely Repeated Games and General Evaluation

P We now turn to the ﬁnitely repeated game: recall that the payoﬀ is given by n1 nk=1 g(p, q, xk , yk ). We denote by vn the value of this game. We have the recursive formula: " # 1 1 X 1 vn (p, q) = max min g(x, y, p, q) + (1 − ) x(i)y(j)vn−1 (p(i), q(j)) = T( , vn−1 ). x∈X y∈Y n n i,j n

(2.11) Given an integer n, let Π be the uniform partition of [0, 1] with mesh and write simply Wn for the associate function VΠ . Hence Wn (1, p, q) := 0 and for m = 0, ..., n − 1, , p, q) satisﬁes: Wn ( m n 1 n

Wn

m n

, p, q = max

min

x∈∆(I)K y∈∆(J)L

"

X 1 m+1 g(p, q, x, y) + x(i)y(j)Wn ( , p(i), q(j)) n n i,j

#

(2.12) Note that = 1− vn−m (p, q, ω) and if Wn converges uniformly to W , vn converges uniformly to some function v with W (t, p, q) = (1 − t) v(p, q). Let T be the set of real continuous functions W on [0, 1] × P × Q such that for all t ∈ [0, 1], W (t, ·, ·) ∈ S. X(t, p, q, W ) is the set of optimal strategies for Player 1 in T(0, W (t, ·, ·)) and Y(t, p, q, W ) is deﬁned accordingly. Let T0 be the set of accumulation points of the family {Wn } for the uniform convergence. Wn ( m , p, q, ω) n

m n

Lemma 2.5 T0 6= ∅ and T0 ⊂ T . We now deﬁne two properties for a function W ∈ T and a C 1 test function φ : [0, 1] → IR. • P1: If t ∈ [0, 1) is such that X(t, p, q, W ) is non-revealing and W (·, p, q) − φ(·) has a global maximum at t, then u(p, q) + φ′ (t) ≥ 0. • P2: If t ∈ [0, 1) is such that Y(t, p, q, W ) is non-revealing and W (·, p, q) − φ(·) has a global minimum at t then u(p, q) + φ′ (t) ≤ 0. Lemma 2.6 Any W ∈ T0 satisfies P1 and P2. Note that this result is the variational counterpart of Lemma 2.2. To understand the originality of the approach, the proof is presented below. Proof. Let t ∈ [0, 1), p and q be such that X(t, p, q, W ) is non-revealing and W (·, p, q) − φ(·) admits a global maximum at t. Adding the function s 7→ (s − t)2 to φ if necessary, we can assume that this global maximum is strict. Let Wϕ(n) be a sequence converging uniformly to W . Deﬁne θ(n) ∈ {0, . . . , ϕ(n) − 1} such that Wϕ(n) (·, p, q) − φ(·) reaches its global maximum on the set {0, . . . , ϕ(n) − 1} at

Repeated Games with Incomplete Information θ(n) . ϕ(n)

Since t is a strict maximum, one has

Wϕ(n)

θ(n) , p, q ϕ(n)

θ(n) ϕ(n)

21 → t, as n → ∞. From (2.12):

"

X θ(n) + 1 1 = max min g(x, y, p, q) + x(i)y(j)Wϕ(n) ( , p(i), q(j)) x∈X y∈Y ϕ(n) ϕ(n) i,j

Let xn ∈ X be optimal for player 1 in the above formula and let j ∈ J be any (nonrevealing) pure action of player 2. Then: X θ(n) 1 θ(n) + 1 Wϕ(n) , p, q ≤ g(xn , j, p, q) + xn (i)Wϕ(n) , pn (i), q ϕ(n) ϕ(n) ϕ(n) i By concavity of Wϕ(n) with respect to p, we have X i∈I

xn (i)Wϕ(n)

θ(n) + 1 , pn (i), q ϕ(n)

≤ Wϕ(n)

θ(n) + 1 , p, q ϕ(n)

,

hence: θ(n) θ(n) + 1 , p, q − Wϕ(n) , p, q . 0 ≤ g(xn , j, p, q) + ϕ(n) Wϕ(n) ϕ(n) ϕ(n) Since has:

θ(n) ϕ(n)

achieves the global maximum of Wϕ(n) (·, p, q) − φ(·) on {0, . . . , ϕ(n) − 1} one

Wϕ(n) so that:

θ(n) + 1 θ(n) θ(n) + 1 θ(n) , p, q − Wϕ(n) , p, q ≤ φ −φ ϕ(n) ϕ(n) ϕ(n) ϕ(n) θ(n) + 1 θ(n) −φ . 0 ≤ g(xn , j, p, q) + ϕ(n) φ ϕ(n) ϕ(n)

Since X is compact, one can assume without loss of generality that {xn } converges to some x. Note that x belongs to X(t, p, q, W ) and, so, is non-revealing. Thus, passing to the limit one obtains: 0 ≤ g(x, j, p, q) + φ′ (t). Since this inequality holds for every j ∈ J, we also have: min g(x, j, p, q) + φ′ (t) ≥ 0 . j∈J

Taking the maximum with respect to x ∈ NRX (p) gives the desired result: u(p, q) + φ′ (t) ≥ 0 .

#

22

Chapter 2. Repeated Games and Continuous Time The comparison principle in this case is given by the next result.

Lemma 2.7 Let W1 and W2 in T satisfying P1 and P2 respectively. Suppose also that both satisfy: • P3: W1 (1, p, q) ≤ W2 (1, p, q) for any (p, q) ∈ ∆(K) × ∆(L). Then W1 ≤ W2 on [0, 1] × ∆(K) × ∆(L). To understand the originality of the approach, the proof is presented below. Proof. We argue by contradiction, assuming that [W1 (t, p, q) − W2 (t, p, q)] = δ > 0 .

max

t∈[0,1],p∈P,q∈Q

Then, for ε > 0 suﬃciently small, δ(ε) :=

max

[W1 (t, p, q) − W2 (s, p, q) −

t∈[0,1],s∈[0,1],p∈P,q∈Q

(t − s)2 + εs] > 0 . 2ε

(2.13)

Moreover δ(ε) → δ as ε → 0. We claim that there is (tε , sε , pε , qε ), point of maximum in (2.13), such that X(tε , pε , qε , W1 ) is non-revealing for player 1 and Y(sε , pε , qε , W2 ) is non-revealing for player 2. Let (tε , sε , p′ε , qε′ ) be a maximum point of (2.13) and C(ε) be the set of maximum points in P × Q of the function: (p, q) 7→ W1 (tε , p, q) − W2 (sε , p, q). This is a compact set. Let (pε , qε ) be an extreme point of the convex hull of C(ε). By Caratheodory’s theorem, this is also an element of C(ε). Let xε ∈ X(tε , pε , qε , W1 ) and yε ∈ Y(sε , pε , qε , W2 ). Since W1 and W2 are in T , we have: X xε (i)yε (j) [W1 (tε , pε (i), qε (j)) − W2 (sε , pε (i), qε (j))] . W1 (tε , pε , qε ) − W2 (sε , pε , qε ) ≤ i,j

By optimality of (pε , qε ), one deduces that, for every i and j with xε (i) > 0 and yε (j) > 0, P (pε (i), qε (j)) ∈ C(ε). Since (pε , qε ) = i,j xε (i)yε (j)(pε (i), qε (j)) and (pε , qε ) is an extreme point of the convex hull of C(ε) one concludes that (pε (i), qε (j)) = (pε , qε ) for all i and j: xε and yε are non-revealing. Therefore we have constructed (tε , sε , pε , qε ) as claimed. Finally we note that tε < 1 and sε < 1 for ε suﬃciently small, because δ(ε) > 0 and W1 (1, p, q) ≤ W2 (1, p, q) for any (p, q) ∈ P × Q by P3. 2 ε) Since the map t 7→ W1 (t, pε , qε )− (t−s has a global maximum at tε and since X(tε , pε , qε , W1 ) 2ε is non-revealing for player 1, condition P1 implies that u(pε , qε ) +

tε − sε ≥0. ε

(2.14) 2

−s) In the same way, since the map s 7→ W2 (s, pε , qε ) + (tε2ε − εs has a global minimum at sε and since Y(sε , pε , qε , W2 ) is non-revealing for player 2, we have by condition P2 that

u(pε , qε ) +

tε − sε +ε≤0. ε

Splitting Games

23

This latter inequality contradicts (2.14). We can now conclude the convergence of limn→∞ vn . It is interesting to observe, at this point, that the original proof of Mertens and Zamir is based on completely diﬀerent technics and is much more longer. To our knowledge, this is the shortest convergence proof of limn→∞ vn . ALso, the approach is in the same spirit as in diﬀerential game theory to prove the existence of the value and to characterize it through viscosity solutions. Proposition 2.8 Wn converges uniformly to the unique point W ∈ T that satisfies the variational inequalities P1 and P2 and the terminal condition W (0, p, q) = 0. Consider now an arbitrary evaluation probability µ on IN∗ with µn ≥ µn+1 inducing the partition Π. Let VΠ (tk , p, q) be the value of the game starting at time tk . One has VΠ (1, p, q) := 0 and " # X x(i)y(j)VΠ (tn+1 , p(i), q(j)) . (2.15) VΠ (tn , p, q) = max min µn+1 g(x, y, p, q) + x∈X y∈Y

i,j

Moreover VΠ belongs to F and is C Lipschitz in (p, q). Lemma 2.1 then implies that any family of values VΠ(m) associated to partitions Π(m) with µ1 (m) → 0 as m → ∞ has an accumulation point. Denote by T1 ⊂ T . It is easily shown that lemma 2.6 extends in a natural way. Consequently. Proposition 2.9 VΠ(m) converges uniformly to the unique point V ∈ T that satisfies the variational inequalities P1 and P2.

2.3

Splitting Games

We consider now the framework of splitting games. Let P and Q be two simplices (or products of simplices) of some ﬁnite dimensional spaces, and H a C-Lipschitz function from P × Q to IR. The corresponding Shapley operator is deﬁned on continuous real functions f on P × Q by Z T(λ, f )(p, q) = valµ∈MpP ×ν∈MqQ [(λH(p′ , q ′ ) + (1 − λ)f (p′ , q ′ )]µ(dp′)ν(dq ′ ) P ×Q

where MpP stands for the set of Borel probabilities on P with expectation p (and similarly for MqQ ). The associated repeated game is played as follows: at stage n + 1, knowing the state (pn , qn ), player 1 (resp. player 2) chooses µn+1 ∈ MpPn (resp. ν ∈ MqQn ). A new state (pn+1 , qn+1 ) is selected according to these distributions and the stage payoﬀ is H(pn+1, qn+1 ). We denote by Vλ the value of the discounted game and by Vn the value of the ﬁnitely repeated game.

24

Chapter 2. Repeated Games and Continuous Time

2.3.1

Discounted Games

This section is based on the results in Laraki (2001a,b). The next regularity properties are proved in Chapter 4. Let G be the set of C-Lipschitz functions that are concave-convex on P × Q. Lemma 2.10 The Shapley operator T(λ, ·) maps G into itself and Vλ (p, q) is the only fixed point of T (λ, .) in G. The corresponding projective operator is the splitting operator Ψ: Z Ψ(f )(p, q) = valMpP ×νMqQ f (p′ , q ′ )µ(dp′ )ν(dq ′ )

(2.16)

P ×Q

and we denote again by S the set of its ﬁxed points. Given W ∈ S, P(p, q, W ) ⊂ MpP denotes the set of optimal strategies of player 1 in (2.16) for Ψ(W )(p, q). We say that P(p, q, W ) is non-revealing if it is reduced to δp , the Dirac mass at p. We use the symmetric notation Q(p, q, W ) and terminology for player 2. We deﬁne two properties for functions in S (that we denote again by D1 and D2). • D1: If P(p, q, W ) is non-revealing, then W (p, q) ≤ H(p, q). • D2: If Q(p, q, W ) is non-revealing, then W (p, q) ≥ H(p, q) A convergence result similar to the one of the last section is obtained. Proposition 2.11 Vλ converges uniformly to the unique point V ∈ S that satisfies the variational inequalities D1 and D2.

2.3.2

Finitely Repeated Games and General Evaluation

Recall the recursive formula deﬁning by induction the value of the n stage game vn ∈ G, using Lemma 2.10: Z 1 1 1 vn (p, q) = valMpP ×MqQ [ H(p′ , q ′ ) + (1 − )vn−1 (p′ , q ′ )]µ(dp′ )ν(dq ′ ) = T( , Vn−1 ). n n P ×Q n (2.17) m For each integer n, let Wn (1, p, q) := 0 and for m = 0, ..., n − 1 deﬁne Wn ( n , p, q) inductively as follows: Z m 1 m+1 ′ ′ Wn , p, q = valMpP ×MqQ [ H(p′ , q ′) + Wn ( , p , q )]µ(dp′ )ν(dq ′ ) . (2.18) n n P ×Q n

m By induction we have Wn ( m , p, q) = 1 − vn−m (p, q). Note that Wn is the function on n n [0, 1] × P × Q associated to the uniform partition of mesh n1 .

Absorbing Games

25

Lemma 2.12 Wn is Lipschitz continuous (uniformly in n) on { m , m ∈ {0, . . . , n}} × n P × Q. Let T be the set of real continuous functions W on [0, 1] × P × Q such that for all t ∈ [0, 1], W (t, ., .) ∈ S. P(t, p, q, W ) is deﬁned as P(p, q, W (t, ., .)) and Q(t, p, q, W ) as Q(p, q, W (t, ., .)). Let T0 be the set of uniform accumulation points of the family Wn . Using (2.18), we have that T0 ⊂ T . We introduce two properties for a function W ∈ T and any C 1 test function φ : [0, 1] → IR (that we denote again by P1 and P2). • P1: If, for some t ∈ [0, 1), P(t, p, q, W ) is non-revealing and W (·, p, q) − φ(·) has a global maximum at t, then H(p, q) + φ′ (t) ≥ 0. • P2: If, for some t ∈ [0, 1), Q(t, p, q, W ) is non-revealing and W (·, p, q) − φ(·) has a global minimum at t then H(p, q) + φ′ (t) ≤ 0. A proof similar to the one given above yields the following result. Lemma 2.13 Any W ∈ T0 satisfies P1 and P2. A similar comparison principle is given now. Lemma 2.14 Let W1 and W2 in T satisfying P1 and P2 respectively. Suppose also that both satisfy: • P3: W1 (1, p, q) ≤ W2 (1, p, q) for any (p, q) ∈ ∆(K) × ∆(L). Then W1 ≤ W2 on [0, 1] × ∆(K) × ∆(L). This implies the convergence of Wn (and so Vn ): Proposition 2.15 Wn converges uniformly to the unique point W ∈ T that satisfies the variational inequalities P1 and P2 and the terminal condition W (1, p, q) = 0. The proof and the result extend straightforward to any sequence of decreasing evaluation of the payoﬀs.

2.4

Absorbing Games

An absorbing game is a stochastic game where only one state is non absorbing. In the other states one can assume that the payoﬀ is constant (equal to the value) thus the game is deﬁned by the following elements: two ﬁnite sets I and J, two (payoﬀ) functions f , g from I × J to [−1, 1] and a function p from I × J to [0, 1] .

26

Chapter 2. Repeated Games and Continuous Time

The repeated game with absorbing states is played in discrete time as follows. At stage m = 1, 2, ... (if absorbtion has not yet occurred) player 1 chooses im ∈ I and, simultaneously, player 2 chooses jm ∈ J: (i) the payoﬀ at stage m is f (im , jm ); (ii) with probability 1 − p (im , jm ) absorption occurs and the payoﬀ in all future stages n > m is g (im , jm ). And, (iii) with probability p (im , jm ) the situation is repeated at stage m + 1. Recall that the asymptotic analysis for these games is due to Kohlberg (1974).

2.4.1

Discounted Games

The λ discounted game has a value, vλ . Using the Shapley operator, vλ is the unique real number in [−1, 1] satisfying vλ = max min [λf (x, j) + (1 − λ) p(x, j)vλ + (1 − λ) f ∗ (x, j)] . x∈∆(I) j∈J

(2.19)

where p∗ (i, j) = 1 − p(i, j) and f ∗ (i, j) = p∗ (i, j)g(i, j) and any map ϕ : I × J → IR is P extended linearly to IRI × IRJ as follows: ϕ(α, β) = i∈I, j∈J αi β j ϕ(i, j). A simple computation implies that the payoﬀ r(λ, x, y) induced by the stationary strategies x ∈ ∆(I) and y ∈ ∆(J) is r(λ, x, y) =

λf (x, y) + (1 − λ) f ∗ (x, y) λp(x, y) + p∗ (x, y)

(2.20)

so that vλ = max min r(λ, x, j) x∈∆(I) y∈∆(J)

(2.21)

The next result by Laraki (2010) identiﬁes the limit as the value of a one shot game on (∆(I) × IRI+ ) × (∆(J) × IRJ+ ) with payoﬀ A(x, α, y, β) =

f ∗ (x, y) f (x, y) + f ∗ (α, y) + f ∗ (x, β) ∗ 1 + 1{p∗ (x,y)=0} . {p (x,y)>0} p∗ (x, y) 1 + p∗ (α, y) + p∗ (x, β)

Proposition 2.16 (Laraki (2010)) vλ converges, as λ goes to zero, to v := val(∆(I)×IRI )×(∆(J)×IRJ ) A(x, α, y, β) +

(2.22)

+

Because of the simplicity of the proof, it is provided below. Proof. Let w = limn→∞ vλn be an accumulation point of {vλ } and consider an optimal stationary strategy x (λn ) of player 1 for vλn in (2.21). Thus, for every y ∈ ∆(J) and β ∈ IRJ+ one has, using homogeneity: vλn ≤

λn f (x(λn ), y + λn β) + (1 − λn ) f ∗ (x(λn ), y + λn β) . λn p(x(λn ), y + λn β) + p∗ (x(λn ), y + λn β)

(2.23)

Absorbing Games

27

By compactness of ∆(I), we can assume that x (λn ) → x. Case 1: p∗ (x, y) > 0. Letting λn go to zero in (2.23) implies f ∗ (x, y) w≤ ∗ . p (x, y) Case 2: p (x, y) = 0. Let α(λn ) = ∗

and because p(x, y) = 1, w ≤ lim inf n→∞

xi (λn ) λn

i∈I

∈ IRI+ . Hence, from equation (2.23),

f (x, y) + f ∗ (x, β) + (1 − λn ) f ∗ (α(λn ), y) . 1 + p∗ (x, β) + p∗ (α(λn ), y)

(2.24)

Since f ∗ and p∗ are linear in y and J is ﬁnite, for any ε > 0, there is N(ε) such that, for every y ∈ ∆(J), f (x, y) + f ∗ (x, β) + f ∗ (α(λN (ε) ), y) + ε. w≤ 1 + p∗ (x, β) + p∗ (α(λN (ε) ), y) Hence there exists (x, α) such that for any (y, β) w ≤ A(x, α, y, β) + ε. Consequently, w ≤ sup(∆(I)×IRI ) inf (∆(J)×IRJ ) A(x, α, y, β) and the result follows by sym+ + metry.

2.4.2

Finitely Repeated Games and General Evaluation

The values {vn }n=1,... of the ﬁnitely repeated games satisfy: vn =

max min

x∈∆(I) y∈∆(J)

1 n−1 n−1 ∗ f (x, y) + p(x, y)vn−1 + f (x, y) , n n n

with v0 = 0. For each integer n, deﬁne a function Wn on [0, 1] as follows: Wn (1) = 0 and for m = 0, ..., n − 1, Wn ( m ) is speciﬁed inductively by: n m 1 m+1 n−m−1 ∗ Wn ( ) = max min f (x, y) + p(x, y)Wn ( )+ f (x, y) . x∈∆(I) y∈∆(J) n n n n m By induction, Wn ( m ) = 1 − vn−m . Extend Wn (·) to [0, 1] by linear interpolation. n n Consequently: Wn (·) is a C-Lipschitz function and if Wn converges uniformly to some function W , vn converges to W (0) and W (t) = (1 − t) W (0). The projective operator is Φ(v) = maxx∈∆(I) miny∈∆(J) (p(x, y)v + f ∗ (x, y)) and S is the set of its ﬁxed points. As usual the set S0 of accumulation points of {vn } is included in S.

28

Chapter 2. Repeated Games and Continuous Time

Deﬁne the Hamiltonian H from [0, 1] × IR × IR → IR as follows. It is the value of a zero-sum game where the strategies of Player 1 are of the form (x, α) ∈ ∆(I) × IRI+ while strategies for Player 2 are (y, β) ∈ ∆(J) × IRJ+ and the payoﬀ is given by:

(1 − t)f ∗ (x, y) − a)1{p∗ (x,y)>0} p∗ (x, y) f (x, y) + (1 − t)f ∗ (α, y) + (1 − t)f ∗ (x, β) − [p∗ (α, y) + p∗ (x, β)] a + b + 1{p∗ (x,y)= 1 + p∗ (α, y) + p∗ (x, β)

h(t, a, b, x, α, y, β) = (

According to Proposition 2.16, this game has a value. Let U be in S. The variational characterization for this class uses the following properties: for all t ∈ [0, 1) and any C 1 function φ : [0, 1] → IR : • R1: If U(·) − φ(·) admits a global maximum at t then H(t, U(t), φ′ (t)) ≥ 0. • R2: If U(·) − φ(·) admits a global minimum at t then H(t, U(t), φ′ (t)) ≤ 0. Lemma 2.17 Any accumulation point U(·) of Wn (·)satisfies R1 and R2. The comparison principle for this class is the next result. Lemma 2.18 Let U1 and U2 be two Lipschitz functions with Ui (t) ∈ S for all t ∈ [0, 1] satisfying R1, R2, and • R3: U1 (1) ≤ U2 (1). Then U1 ≤ U2 on [0, 1] × ∆(K) × ∆(L). Corollary 2.19 vn converges to v. The proof and the result extend straightforward to any sequence of decreasing evaluation of the payoﬀs.

2.5 2.5.1

The Dual of a Game with Incomplete Information The Dual Game

Consider a two person zero sum game with incomplete information on one side deﬁned by sets of actions S and T , a ﬁnite parameter space K, a probability measure p ∈ P = ∆(K) and for each k a real payoﬀ function Gk on S×T . Assume S and T convex and for each k, Gk bounded and bilinear on S×T . The game is played as follows: k ∈ K is selected according to p and told to player 1 (the maximizer) while player 2 only knows p. In normal form, Player 1 chooses s = P k k k {sk } in S K , Player 2 chooses t in T and the payoﬀ is Gp (s, t) = k p G (s , t). Let p p v(p) = supS K inf T G (s, t) and v(p) = inf T supS K G (s, t). Then both are concave in p on P (Sorin 2002).

The Dual of a Game with Incomplete Information

29

Following De Meyer (1996a,b), one introduces for each z ∈ Rk , the “dual game" G∗ (z), where player 1 chooses k and plays s in S while player 2 plays t in T and the payoﬀ is h[z](k, s; t) = Gk (s, t) − z k . Deﬁne by w(z) and w(z) the corresponding maxmin and minmax. One has: Theorem 2.20 (De Meyer (1996a,b), Sorin (2002)) The following duality relations hold: w(z) = max {v(p) − hp, zi} (2.25) p∈∆(K)

v(p) = inf { w(z) + hp, zi}

(2.26)

w(z) = max {v(p) − hp, zi}

(2.27)

v(p) = inf { w(z) + hp, zi}

(2.28)

z∈RK

p∈∆(K)

z∈RK

Consider now a game with incomplete information on one side and recall the recursive formula for Gn : X X (n + 1) vn+1 (p) = max min{ pk xk Gk y + n xˆ(i) v n (p(i))} (2.29) x∈X K y∈Y

i

k

Given Gn , let us consider the dual game G⋆n and its value wn . One has wn (z) = max {vn (p) − hp, zi} p∈∆(K)

which leads De Meyer to the recursive equation in the dual game: (n + 1)wn+1 (z) = min max nwn ( y∈Y

2.5.2

i∈I

n+1 1 z − Gi y). n n

(2.30)

The Associated Differential Game

The second advantage of dealing with (2.30) rather than with (2.29) is that the state variable evolves smoothly from z to z + n1 (z − Gi y) while the martingale p(i) could have jumps. Laraki (2002) proved that wn may be seen the value of the time discretization with mesh 1 of a diﬀerential game on [0, 1] with dynamic ζ(t) ∈ RK given by: n dζ = xt Gyt , dt

ζ(0) = −z

xt ∈ X, yt ∈ Y and terminal payoﬀ maxk ζ k (1). Basic results of diﬀerential games of ﬁxed duration, show that the game starting at time t from state ζ has a value ϕ(t, ζ), which is the only viscosity solution of the following Hamilton-Jacobi equation with boundary

30

Chapter 2. Repeated Games and Continuous Time

condition:

∂ϕ + u(Dϕ) = 0, ϕ(1, ζ) = max ζ k . (2.31) k ∂t Hence ϕ(0, −z) = limn→∞ wn (z) = w(z). Using Hopf’s representation formula, one obtains: ϕ(1 − t, ζ) = sup inf {max bk + ha, ζ − bi + tu(p)} K a∈RK b∈R

k

and ﬁnally w(z) = supp∈∆(K) {u(p) −hp, zi}. Hence lim vλ = lim vn = Cav∆(K) u, by taking the Fenchel conjugate. Moreover, this is true for any general evaluation of payoﬀs. An alternative identiﬁcation of the limit is through variational inequalities by translating in the primal the viscosity properties in the dual in terms of local sub- and superdiﬀerentials. This leads exactly to the properties P1 and P2 described in section 2.2. This shows that the variational approach in the primal game (as described in sections 2.2, 2.3 and 2.4) is the dual analog of the viscosity approach in the dual.

Bibliography [1] Aumann R.J. and M. Maschler (1995). Repeated Games with Incomplete Information, M.I.T. Press. [2] Barles G. and Souganidis, P. E. (1991). Convergence of Approximation Schemes for Fully Nonlinear Second Order Equations. Asymptotic Anal. 4, 3, 271-283. [3] De Meyer B. (1996a) Repeated games and partial diﬀerential equations, Mathematics of Operations Research, 21, 209-236. [4] De Meyer B. (1996b) Repeated games, duality and the Central Limit theorem, Mathematics of Operations Research, 21, 237-251. [5] Bewley T. and Kohlberg E. (1976a). The Asymptotic Theory of Stochastic Games. Mathematics of Operations Research, 1, 197-208. [6] Bewley T. and Kohlberg E. (1976b). The Asymptotic Solution of a Recursion Equation Occurring in Stochastic Games. Mathematics of Operations Research, 1, 321336. [7] Cardaliaguet P., R. Laraki and S. Sorin (2011). A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique, 2011-11. A paraître dans SIAM Journal Control Optimization. [8] Kohlberg E. (1974). Repeated Games with Absorbing States. Annals of Statistics, 2, 724-738. [9] Kohlberg, E. and S. Zamir (1974). Repeated Games of Incomplete Information: The Symmetric Case. Annals of Statistics, 2, p. 1040.

BIBLIOGRAPHY

31

[10] Laraki R. (2001a). The Splitting Game and Applications. International Journal of Game Theory, 30, 359-376. [11] Laraki R. (2001b). Variational Inequalities, System of Functional Equations, and Incomplete Information Repeated Games. SIAM J. Control and Optimization, 40, 516-524. [12] Laraki R. (2002). Repeated Games with Lack of Information on One Side : the Dual Diﬀerential Approach. Mathematics of Operations Research, 27, 419-440. [13] Laraki R. (2010). Explicit Formulas for Repeated Games with Absorbing States. International Journal of Game Theory, 39, 53-69. [14] Mertens, J.- F. and A. Neyman (1981). Stochastic Games. International Journal of Game Theory, 10, 53-66. [15] Mertens J.-F. and S. Zamir (1971). The Value of Two-Person Zero-Sum Repeated Games with Lack of Information on Both Sides. International Journal of Game Theory, 1, 39-64. [16] Mertens J.-F., S. Sorin and S. Zamir (1994). Repeated Games. CORE DP 9420-22. [17] Neyman, A. ans S. Sorin (1998). Equilibria in Repeated Games with Incomplete Information: The General Symmetric Case. International Journal of Game Theory, 27, 201-210. [18] Rosenberg D. (2000). Zero-Sum Absorbing Games with Incomplete Information on One Side: Asymptotic Analysis. SIAM J. Control and Optimization, 39, 557-597. [19] Rosenberg D. and S. Sorin (2001). An Operator Approach to Zero-Sum Repeated Games. Israel Journal of Mathematics, 121, 221-246. [20] Shapley L. S. (1953). Stochastic Games. Proceedings of the National Academy of Sciences of the U.S.A., 39, 1095-1100. [21] Sorin S. (1984). “Big Match” with Lack of Information on One Side, Part I. International Journal of Game Theory, 13, 201-255. [22] Sorin S. (1985). “Big Match” with Lack of Information on One Side, Part II. International Journal of Game Theory, 14, 173-204. [23] Sorin S. (2002). A First Course on Zero-Sum Repeated Games. Springer. [24] Sorin S. (2005). New Approaches and Recent Advances in Two-Person Zero-Sum Repeated Games. Advances in Dynamic Games, A. Nowak and K. Szajowski (eds.), Annals of the ISDG, 7, Birkhauser, 67-93. [25] Vieille N. (1992). Weak Approachability. Mathematics of Operations Research, 17, 781-791.

Chapter 3 Stopping Games in Continuous Time Many economic and political interactions revolve around timing. A well-known example is the class of war of attrition games, in which the decision of each player is when to quit, and the game ends in the victory of the player who held on longer. These games were introduced by Maynard Smith (1974), and later analyzed by a number of authors such as Bulow and Klemperer (1999). Another important class of timing games are preemption games, in which each player prefers to stop ﬁrst. The analysis is then sensitive to the speciﬁcation of the payoﬀ in case that the two players stop simultaneously, see Fudenberg and Tirole (1985, 1991). Yet another class of timing games consists of duel games. These are two-player zerosum games. In the simplest version, both players are endowed with one bullet, and have to choose when to ﬁre. As time proceeds, the two players get closer and the accuracy of their shooting improves. These games are similar to preemption games in the sense that a player who decides to act may be viewed as preempting her opponent. However, as opposed to preemption games, in duel games a player has no guarantee that ﬁring ﬁrst would result in a victory. We refer the reader to Karlin (1959) for a detailed presentation of duel games, and to Radzik and Raghavan (1994) for an updated survey. Dynkin (1969) introduced stopping games, as a variation of optimal stopping problems. In Dynkin’s setup, two players observe the realization of a payoﬀ process in discrete time. Once one of the players decides to stop, player 2 pays player 1 the amount indicated by the payoﬀ process. However, at every given stage only one of the players is allowed to stop; the identity of that player is governed by another process. The strategic choice of each player is the choice of his stopping time. Dynkin proved that those games admit a value. Dynkin’s seminal paper was extended in various directions. Lepeltier and Maingueneau (1984) studied the problem in continuous time. The ﬁrst section addresses the question of existence of equilibrium in multiplayer deterministic stopping games (i.e. timing games). The second section is concerned with the stochastic version of the model. This chapter is based on the papers by Laraki and Solan (2005, 2010) and Laraki, Solan and Vieille (2005). 33

34

Chapter 3. Stopping Games in Continuous Time

3.1 3.1.1

Deterministic Stopping Games Model

A timing game Γ is given by: • A ﬁnite set of players I, and a discount rate δi ∈ R+ for each player i ∈ I. • For every non-empty subset S ⊆ I, a continuous and bounded function uS : [0, ∞) → RI , with the interpretation that uS (t) is the payoﬀ vector if the players in S – called the leaders – are the ﬁrst to act, and they do so at time t. In addition, player i’s time-preferences are described by a discount rate δi . A plan of action (or plan in short) of player i is simply to act at time ti , namely an element of [0, ∞], where the alternative ti = ∞ corresponds to never acting. Such a time does not deﬁne a strategy in the usual sense, since it does not prescribe what to do, if the game were to start after ti . Given a pure plan proﬁle (ti )i∈I , we let θ := mini∈I ti denote the terminal time, and S∗ := {i ∈ I | ti = θ} be the coalition of leaders. The payoﬀ g i ((tj )j ) to player i is e−δi θ uiS∗ (θ) if θ < ∞ – i.e., if the game terminates in ﬁnite time – and 0 otherwise, where (tj )j = (t1 , ..., tI ). In most timing games of economic interest, the players incur costs, or receive proﬁts prior to the end of the game, and the discounted sum of proﬁts/costs up to t is bounded as a function of t. This reduces to the case under study here by deducting/adding the total cost/proﬁt up to time t from the discounted uS (t). Hence, our standing assumption that g i = 0 if θ = ∞ is a normalization convention, and entails no loss of generality. A mixed plan for player i is a probability distribution σ i over the set [0, ∞]. The expected payoff given a plan proﬁle σ = (σ i )i∈I is: γ0i (σ) = E⊗i∈I σi [g i (t1 , . . . , tI )].

(3.1)

The subscript reminds that payoﬀs are discounted back to time zero. We denote by γti (σ) = eδi t γ0i (σ), the expected payoﬀ discounted to time t. For every t ≥ 0, the subgame that starts at time t is the game of timing Γt with player set I, where the payoﬀ function when coalition S terminates is u′S (s) = uS (t + s). Thus, payoﬀs are evaluated at time t. Definition 3.1 A strategy of player 1 is a function σ bi : t 7→ σti that assigns to each t ≥ 0 i a mixed plan σt that satisfies • Properness: σti assigns probability one to [t, ∞].

• Consistency: for every 0 ≤ t < s and every Borel set A ⊆ [s, ∞], one has σti (A) = (1 − σti ([t, s)))σsi (A).

Deterministic Stopping Games

35

The properness condition asserts that σti is a mixed plan in the subgame that starts at time t: the probability that player i acts before time t is 0. The consistency condition asserts that as long as a plan does not act with probability 1, later strategies can be calculated by Bayes’ rule. Given a strategy proﬁle σ b = (b σ i ), a player i ∈ I and a time t ∈ R+ , we denote by γti (b σ ) := γti (σt ) the payoﬀ induced by σ b in the subgame starting at time t. A Markov strategy is a strategy that depends only on payoﬀ relevant past events, see Maskin and Tirole (2001). In the context of timing games, this requirement is expressed as follows. A real number T ∈ R+ is a period of the game if uS (t + T ) = uS (t), for every t ∈ R+ and every S ⊆ I. A strategy proﬁle σ is Markov if, for every t ∈ R+ and every i i ∈ I, the mixed plan σt+T is obtained from σti by translation: for each Borel set A ⊆ R+ , i one has σti (A) = σt+T (A + T ).

3.1.2

Results

Let ε > 0 be given. A proﬁle of mixed plans is a Nash ε-equilibrium if no player can proﬁt more than ε by deviating to any other mixed strategy. Equivalently, no player can proﬁt more than ε by deviating to a pure plan. A proﬁle of strategies σ b = (σt )t≥0 is a subgame-perfect ε-equilibrium if for every t ≥ 0, the proﬁle σt is a Nash ε-equilibrium in the subgame that starts at time t (when payoﬀs are discounted to time t). Theorem 3.2 Every two-player discounted game of timing admits a Markov subgameperfect ε-equilibrium, for every ε > 0. If δi = 0 for some i, the game admits a Nash ε-equilibrium, for each ε > 0. The proof works as follows. Given n ∈ N, we consider the version of the timing game that terminates at time tn with a payoﬀ of zero if no player acted before. In this game with ﬁnite horizon, we deﬁne inductively, for 0 ≤ k < n, a strategy proﬁle σ bk (n) over the time interval [tk , tk+1 ). We prove that the proﬁle obtained by concatenating the proﬁles σ bk (n) is a subgame-perfect ε-equilibrium in the game with ﬁnite horizon. Next, we let n go to ∞. We observe that, for ﬁxed k, the sequence (b σk (n))n takes only ﬁnitely many values, so that by a diagonal extraction argument a limit σ b of σ b(n) exists. This limit is our candidate for a subgame-perfect ε-equilibrium. Some classes of timing games of speciﬁc interest has been studied. We analyze games with cumulative payoffs, deﬁned by the property that for i ∈ S, the payoﬀ uiS (t) does not depend on which other player(s) happen to act at time t. Formally, uiS (t) = ui{i} (t) for every player i and every subset S that contains i. This class includes games in which each player receives a stream of payoﬀs until he/she exits from the game (and the game proceeds with the remaining players). Theorem 3.3 Every game of timing with cumulative payoffs has a subgame-perfect εequilibrium, for each ε > 0. Moreover,

36

Chapter 3. Stopping Games in Continuous Time • there is such a profile in which symmetric players play the same strategy;1 • there is a Markov subgame-perfect ε-equilibrium, provided not all functions uS (·), S ⊆ I, are constant.

The proof works as follows. Let Γ be a game with cumulative payoﬀs. Fix a strictly increasing sequence (sn ) with s0 = 0 and lim sn = ∞

n→∞

such that supn supsn ≤s 0. Finally, we prove that under somewhat restrictive assumptions, the existence of an ε-equilibrium implies the existence of an equilibrium. Theorem 3.5 Let I be a finite set of players, let uS (·) be a constant function for each ∅ = 6 S ⊆ I, and let δi = 0 for each i ∈ I. If the game of timing (I, (uS )S ) has an ε-equilibrium for each ε > 0, then it also has a zero equilibrium. In particular, combined with Theorem 3.2, Theorem 5.9 implies that every two-player, constant-payoﬀ, undiscounted game of timing has a mixed Nash equilibrium. This equilibrium existence result is not standard. It does not follow from the famous existence Players i and j are symmetric if (i) uiS = ujS , for every S that either contains both i and j, or none of them, (ii) uiS∪{i} = ujS∪{j} for every S that contains neither i nor j, and (iii) δi = δj . 1

Deterministic Stopping Games

37

result due to Reny (1999). Chapter 6, where Reny’s result is extended, describes a new equilibrium existence results which apply in this context. We end with an example of three-player zero-sum game of timing with constant payoﬀs i+2 i that has no ε-equilibrium.2 ui{i} (t) = 1, ui+1 {i} (t) = 0, u{i} (t) = −1, u{i,i+1} (t) = 0, i+2 i + ui+1 {i,i+1} (t) = −1, u{i,i+1} (t) = 1 and u{1,2,3} (t) = 0 for every i ∈ I and every t ∈ R . The game is described by the following matrix Don’t Act Act Don’t Act Act Don’t Act Act Don’t Act Act

1, 0, −1

−1, 1, 0

0, −1, 1

1, 0, −1

0, −1, 1

−1, 1, 0

0, 0, 0

Figure 1 in which players 1, 2 and 3 choose respectively a row, a column and a matrix. We assume that the three players have the same discount rate δ ≥ 0. The value of δ plays no role in the analysis. In particular, we allow for the possibility that δ = 0, allowing in eﬀect for the case of an un-discounted game. We prove that this game has no ε-equilibrium, provided ε > 0 is small enough. It is interesting to recall that three-player games of timing in discrete time do have a subgameperfect equilibrium (see Fink (1964), Solan (1999)). Thus, this example stands in sharp contrast with known results in discrete time. We ﬁrst verify that this game has no (exact) equilibrium. Let σ be a plan proﬁle. If σ is an equilibrium, the probability that the game terminates at time 0 is below one. Otherwise, at least one player, say player 1, would act with probability one at time 0. By the equilibrium condition, player 2 would act with probability 0: given that player 1 acts, act is a strictly dominated action for player 2. Hence, player 3 would act with probability one at time 0, and player 1 would ﬁnd it optimal not to act at time 0 – a contradiction. Next, given that the game does not terminate at time 0, each player i can get a payoﬀ arbitrarily close to one, by acting ”immediately” after time 0, that is, by acting at time t > 0, where t is suﬃciently small so that the probability that σ i+1 or σ i+2 act in the time interval (0, t] is arbitrarily small. Thus, the continuation equilibrium payoﬀ of each player must be at least one – a contradiction to the zero-sum property. Hence σ is not an equilibrium. We now prove that the game has no ε-equilibrium. For every w ∈ [−1, 1]3 let G(w) be the one-shot game with payoﬀ matrix as in Figure 1, where the payoﬀ if no player acts is w. The result of the previous paragraph can be rephrased as follows: for every P w ∈ [−1, 1]3 with 3i=1 w i = 0, the probability that the game terminates at time 0, under any Nash equilibrium in G(w), is strictly less than 1. Since the correspondence that assigns to each w ∈ [−1, 1]3 and every ε > 0 the set of ε-equilibria of the game G(w) has P a closed graph, there exists ε > 0 such that for every w ∈ [−1, 1]3 with 3i=1 w i = 0, the probability that the game terminates at time 0, under any ε-equilibrium in G(w), is 2

Here addition is understood modulo 3.

38

Chapter 3. Stopping Games in Continuous Time

strictly less than 1 − 2ε. Let σ be an ε-equilibrium of the timing game. In particular, the probabilities σ i ({0}) assigned to act at time zero form an ε-equilibrium of the game G(w), taking for w the continuation payoﬀ vector in the game. Since the game is zero-sum, the continuation payoﬀ at time 0 of at least one player is non-positive. As argued above, by acting right after time 0, this player can improve his payoﬀ by almost 1 if the game is not terminated at time 0. By the previous paragraph, this event has probability at least 2ε, hence the deviation improves by more than ε, which is a contradiction.

3.2 3.2.1

Stochastic Stopping Games Model

Let (Ω, A, P ) be a probability space, and let F = (Ft )t≥0 be a ﬁltration in continuous time that satisﬁes “the usual conditions”. That is, F is right continuous, and F0 contains all P -null sets: for every B ∈ A with P (B) = 0 and every A ⊆ B, one has A ∈ F0 . All stopping times in the sequel are w.r.t. the ﬁltration F . Denote F∞ := ∪t≥0 Ft . Assume without loss of generality that F∞ = A. Hence (Ω, A, P ) is a complete probability space. Let (Xi , Yi , Zi )i=1,2 be uniformly bounded F -adapted real-valued processes,3 and let (ξi )i=1,2 be two bounded real-valued F∞ -measurable functions. In the sequel we will assume that the processes (Xi , Yi , Zi )i=1,2 are right continuous. The process Xi represents the payoﬀ to player i if player 1 stops before player 2, the process Yi represents the payoﬀ to player i if player 2 stops before player 1, the process Zi represents the payoﬀ to player i if the two players stop simultaneously, and the function ξi represents the payoﬀ to player i if no player ever stops. Definition 3.6 A two-player nonzero-sum Dynkin game over (Ω, A, P, F ) with payoffs (Xi , Yi , Zi , ξi )i=1,2 is the game with player set N = {1, 2}, where the set of pure strategies of each player is the set of stopping times, and the payoff function of player i ∈ {1, 2} is: γi (λ1 , λ2 ) := E Xi (λ1 )1{λ1 <λ2 } + Yi (λ2 )1{λ2 <λ1 } + Zi (λ1 )1{λ1 =λ2 <∞} + ξi 1{λ1 =λ2 =∞} , (3.2) where λ1 and λ2 are the stopping times chosen by the two players respectively. The game is zero-sum if X1 + X2 = Y1 + Y2 = Z1 + Z2 = ξ1 + ξ2 = 0. In non-cooperative game theory, a randomized strategy is a probability distribution over pure strategies, with the interpretation that at the outset of the game the player randomly chooses a pure strategy according to the probability distribution given by the randomized strategy, and uses it along the game. In the setup of Dynkin games in 3

Our results hold for the larger class of D payoff processes defined by Dellacherie and Meyer (1975) SII-18. This class contains in particular integrable processes.

Stochastic Stopping Games

39

continuous time, a randomized strategy is a randomized stopping time, which is deﬁned as follows. Definition 3.7 A randomized stopping time for player i is a measurable function ϕi : [0, 1] × Ω → [0, +∞] such that the function ϕi (r, ·) : Ω → [0, +∞] is a stopping time for every r ∈ [0, 1] (see Aumann (1964)). Here the interval [0, 1] is endowed with the Borel σ-ﬁeld. For strategically equivalent deﬁnitions of randomized stopping times, see Touzi and Vieille (2002). The interpretation of Deﬁnition 3.7 is that player i chooses r in [0, 1] according to the uniform distribution and then stops at the stopping time ϕi (r, ·). Throughout the paper, the symbols λ, µ and τ stand for stopping times, and ϕ and ψ stand for randomized stopping times. The expected payoff for player i that corresponds to a pair of randomized stopping times (ϕ1 , ϕ2 ) is: Z γi (ϕ1 (·), ϕ2(·)) := γi (ϕ1 (r, ·), ϕ2 (s, ·)) dr ds, i = 1, 2. [0,1]2

A pair of randomized stopping times (ϕ∗1 , ϕ∗2 ) is an ε-equilibrium if no player can proﬁt more than ε by deviating from ϕ∗i . Definition 3.8 Let ε ≥ 0. A pair of randomized stopping times (ϕ∗1 , ϕ∗2 ) is an εequilibrium if for every two randomized stopping times ϕ1 , ϕ2 the following inequalities hold: γ1 (ϕ1 , ϕ∗2 ) ≤ γ1 (ϕ∗1 , ϕ∗2 ) + ε, (3.3) and γ2 (ϕ∗1 , ϕ2 ) ≤ γ2 (ϕ∗1 , ϕ∗2 ) + ε.

(3.4)

Because of the linearity of the payoﬀ function, Eqs. (3.3) and (3.4) hold for every randomized stopping time ϕ1 and ϕ2 respectively as soon as they hold for non-randomized stopping times.

3.2.2

Results

Suppose that a player wants to stop at the stopping time λ, but he would like to mask the exact time at which he stops (for example, so that the other player cannot stop at the very same moment as he does). To this end, he can stop at a randomly chosen time in a small interval [λ, λ + δ], and, since the payoﬀ processes are right continuous, he will not lose (or gain) much relative to stopping at time λ. This leads us to the following class of simple randomized stopping times. Definition 3.9 A randomized stopping time ϕ is simple if there exist a stopping time λ and a Fλ -adapted function δ ≥ 0, such that for every r ∈ [0, 1] one has ϕ(r, ·) = λ + rδ. The stopping time λ is called the basis of ϕ, and the function δ is called the delay of ϕ.

40

Chapter 3. Stopping Games in Continuous Time

Since ϕ(r, ·) ≥ λ and ϕ(r, ·) is Fλ -measurable, by Dellacherie and Meyer (1975, SIV56), ϕ(r, ·) is a stopping time for every r ∈ [0, 1]. Consequently, ϕ is indeed a randomized stopping time. We now state our main result. Theorem 3.10 Every two-player non-zero-sum Dynkin game with right-continuous and uniformly bounded payoff processes admits an ε-equilibrium in simple randomized stopping times, for every ε > 0. Moreover, the delay of the simple randomized stopping time that constitute the εequilibrium can be arbitrarily small. Theorem 3.10 was proved by Laraki and Solan (2005) for two-player zero-sum games. Our proof uses ε-equilibria in zero-sum games to construct an ε-equilibrium in the nonzero-sum game. Under additional conditions on the payoﬀ processes, the ε-equilibrium is given in non-randomized stopping times. Theorem 3.11 Under the assumptions of Theorem 3.10, if Z1 (t) ∈ co{X1 (t), Y1 (t)} and Z2 (t) ∈ co{X2 (t), Y2 (t)} for every t ≥ 0, then the game admits an ε-equilibrium in nonrandomized stopping times, for every ε > 0, where coA is the convex hull of the set A. Hamadène and Zhang (2010) proved the existence of a 0-equilibrium in non-randomized stopping times under stronger conditions than those in Theorem 3.11, using the notion of the Snell envelope of a process and backward stochastic diﬀerential equations.

Bibliography [1] Aumann R.J. (1964). Mixed and Behavior Strategies in Inﬁnite Extensive Games, in Advances in Game Theory, M. Dresher, L.S. Shapley and A.W. Tucker (eds), Annals of Mathematics Study 52, Princeton University Press. [2] Billingsley P. (1995) Probability and Measure. Wiley. [3] Bulow J. and P. Klemperer (1999). The Generalized War of Attrition. American Econ. Rev., 89, 175-189. [4] Dellacherie C. and P.-A. Meyer (1975). Probabilités et Potentiel, Chapitres I à IV, Hermann. [5] Dynkin E.B. (1969). Game Variant of a Problem on Optimal Stopping. Soviet Math. Dokl., 10, 270–274. [6] Fine C.H. and L. Li (1989). Equilibrium Exit in Stochastically Declining Industries. Games Econ. Behavior, 1, 40-59.

BIBLIOGRAPHY

41

[7] Fink A.M. (1964). Equilibrium in a Stochastic n-Person Game, J. Sci. Hiroshima Univ.,28, 89-93. [8] Fudenberg D. and J. Tirole (1985). Preemption and Rent Equalization in the Adoption of New Technology. Rev. Econ. Stud., LII, 383-401. [9] Fudenberg D. and J. Tirole (1991) Game Theory. MIT Press. [10] Hamadène S. and J. Zhang (2010) The Continuous Time Nonzero-Sum Dynkin Game Problem and Application in Game Options, SIAM J. Control Optim., 48, 3659-3669. [11] Hendricks K., A. Weiss and C. Wilson (1988). The War of Attrition in Continuous Time with Complete Information. Int. Econ. Rev., 29, 663-680. [12] Karlin S. (1959). Mathematical Methods and Theory in Games, Programming and Economics. Reading, Massachussets: Vol. 2, Addison-Wesley. [13] R. Laraki and E. Solan (2005). The value of Zero-Sum Stopping Games in Continuous Time. SIAM J. Control Optim., 43, 1913-1922. [14] R. Laraki, E. Solan and N. Vieille (2005). Continuous Time Games of Timing. Journal Economic Theory, 120, 206-238. [15] J.P. Lepeltier and M.A. Maingueneau (1984). Le Jeu de Dynkin en Théorie Générale sans l’Hypothèse de Mokobodsky, Stochastics, 13, 25-44. [16] Maskin E. and J. Tirole (2001). Markov Perfect Equilibrium I. Observable Actions. Journal Economic Theory, 100, 191-219. [17] Maynard-Smith J. (1974). The Theory of Games and the Evolution of Animal Conﬂicts. Journal Theoretical Biology, 47, 209-221. [18] Radzik T. and T.E.S. Raghavan (1994). Duels. In Aumann R.J. and Hart S. (eds.). Handbook of Game Theory with Economic applications, 2, 761-768. [19] Reny P.J. (1999). On the Existence of Pure and Mixed Strategy Nash Equilibria in Discontinuous Games. Econometrica, 67, 1029-1056. [20] Solan E. (1999). Three-Person Absorbing Games. Math. Oper. Res., 24, 669-698. [21] Touzi N. and N. Vieille (2002). Continuous-Time Dynkin Games with Mixed Strategies. SIAM J. Cont. Optim., 41, 1073-1088.

Chapter 4 Regularity in Optimization and Control A very natural quest in mathematics is to ﬁnd regularity conditions for a problem that imply regularity conclusions about its solution. We have seen in chapter 2 that the asymptotic analysis of zero-sum repeated games requires the discounted values to be equicontinuous. For this to be true, one needs the Shapley operator to preserve equi-continuity (for example by associating to any Lipshitz function with constant C a Lipschitz function with the same constant). But then, the same question may be asked for the case that there is only one player instead of two. The Shapley operator for a one-player splitting game is just the convexification operator. Furthermore, an elegant way to study regularity in MDP problems (i.e. one-player repeated games) is by gambling houses which can embed any MDP problem, as shown in Maitra and Sudderth (1996). In the ﬁrst section, necessary and suﬃcient conditions for the convexiﬁcation operator on a compact set to preserve continuity and Lipschitz continuity are established. In the second section, the results are extended to gambling houses (and so to any MDP). The regularity results can easily be extended to the two-player framework as shown in Laraki (2001a), (2001b) for the splitting game (third section). The last section is independent from the rest. It introduces the notion of optimal correlation systems, characterizes in the small dimensional framework and apply them to some repeated games with signals. This chapter is mainly based on the papers Laraki (2004) and Laraki and Sudderth (2004). The third section is based on Laraki (2001a,b). The last section is based on Gossner, Laraki and Tomala (2009).

4.1

Convexification Operator

Throughout this section, E is assumed to be a Hausdorﬀ locally convex topological vector space and X is a convex compact and metrizable subset of E. Sometimes, E will be supposed to be ﬁnite dimensional and/or normed. The convexiﬁcation operator associates to a real valued bounded function f on X, the largest convex function on X smaller than f . The convex envelope of f is usually denoted by coX (f ) in optimization and V exX (f ) in game theory (see Chapter 2). 43

44

Chapter 4. Regularity in Optimization and Control

Kruskal (1969) gives an example of a 3-dimensional compact set X and a continuous function f on X for which coX (f ) is discontinuous. This is due to the fact that the set of extreme points of X is not closed. Hence, a natural question arises: under which topological condition on the geometry of X does coX (f ) have the same regularity properties as f ? The operator coX (·) preserves continuity at x if the image of any continuous function on X is continuous at x. For a metric d on X, the convexiﬁcation operator uniformly preserves d-Lipschitz continuity if there exists a constant ρ > 0 such that the image of any d-Lipschitz function with constant 1 is d-Lipschitz with constant at most ρ. Finally, the convexiﬁcation operator exactly preserves d-Lipschitz continuity if the image of any d-Lipschitz function with constant 1 is d-Lipschitz with constant 1. X is faces-closed if the Hausdorﬀ limit of any Hausdorﬀ convergent sequence of faces of X is a face of X. X is a polytope if it is the convex envelope of ﬁnitely many points. X is a simplex if it is a polytope and if the extreme points are aﬃnely independent. The main results when E is a ﬁnite dimensional normed space are: • The preservation of continuity by coX is equivalent to X being faces-closed. • The uniform preservation of Lipschitz continuity by coX is equivalent to X being a polytope. • Being a simplex is suﬃcient for the existence of a norm on E for which coX exactly preserves Lipshitz continuity.

4.1.1

Representation Formulas

Denote the set of finite-support probability measures on X by ∆∗X . This is the set of all P Pm probability measures σ of the form m i=1 αi δxi where αi ≥ 0, i=1 αi = 1, xi ∈ X and δxi denotes a Dirac mass at xi . For f in B(X) (the set of bounded real valued functions P ∗ on X) and σ = m i=1 αi δxi ∈ ∆X , deﬁne hσ, f i :=

m X

αi f (xi )

i=1

The barycenter (or the resultant) r(σ) ∈ X of σ = r(σ) :=

m X i=1

Pm

αi xi ,

i=1

αi δxi ∈ ∆∗X is

Convexiﬁcation Operator

45

hence, r (·) deﬁnes a function from ∆∗X to X. Finally, the set of ﬁnite-support probability measures that are centered at x is ∆∗X (x) := r −1 (x) = {σ ∈ ∆∗X : r(σ) = x} It is easy to show that for any function f in B(X) and x in X: coX (f )(x) =

inf

σ∈∆∗X (x)

hσ, f i .

The cone of bounded σ-additive positive Borel measures on X is denoted by M+ (X) and the real vector space of bounded signed Borel measures on X is M(X) := M+ (X) − M+ (X). Finally, denote the set of Borel probability measures on X by ∆X and the set of continuous functions on X by C(X). Note that M(X) is the dual of C(X), where for σ ∈ M(X) and f ∈ C(X) the duality crochet is Z hσ, f i := f (x)σ(dx). X

Recall that the weak* topology is the coarsest topology on M(X) for which σ → hσ, ·i is continuous on C(X). Hence, ∆X is also compact, locally convex and metrizable for the weak* topology. For x in X, let ∆X (x) denote the closure of ∆∗X (x) with respect to the weak* topology. Definition 4.1 The correspondence y → ∆X (y) from X to ∆X is called the splitting correspondence. Taking the closure in the ﬁrst representation formula leads to the following one. For any function f in C(X) and any x in X: coX (f )(x) = min hσ, f i . σ∈∆X (x)

Thus, the control formulation leads us to guess that the convexiﬁcation operator preserves continuity if and only if the set of constraints (the splitting correspondence) is continuous for some suitable topology. Once this result is established, it is then used to derive a more intuitive result in ﬁnite dimensional spaces, that explains and clariﬁes Kruskal’s example.

4.1.2

Preserving Continuity

Let Y denote some metrizable compact topological space. Let N be the set of strictly increasing sequences from the set of integers into itself. The sequence {yn′ }n is a subse quence of {yn }n if there exists ϕ ∈ N such that {yn′ }n = yϕ(n) n .

Definition 4.2 (Attouch (1984)) Let {Yn }n be a sequence of subsets of Y.

46

Chapter 4. Regularity in Optimization and Control

The Kuratowski-upper-limit of {Yn }n , is n o K − lim sup Yn := y ∈ Y : ∃ϕ ∈ N : yϕ(n) ∈ Yϕ(n) , lim yϕ(n) = y n→∞

n

The Kuratowski-lower-limit of {Yn }n is n o K − lim inf Yn := y ∈ Y : ∃ (yn )n∈N , yn ∈ Yn , lim yn = y . n

n→∞

The sequence {Yn }n Kuratowski-converges if K − lim sup Yn = K − lim inf Yn , n

n

and in such a case, the Kuratowski-limit is denoted K − limn Yn . Let d be some metric on Y compatible with its topology. Recall that given two closed and non-empty sets A and B in Y , their Hausdorﬀ distance is deﬁned by D(A, B) := max{max d(a, B); max d(b, A)}. a∈A

b∈B

Because Y is assumed compact, Kuratowski and Hausdorﬀ notions of set convergence coincide (Klein and Thompson (1984)). The closed and open segments between x and y are respectively [x, y] := {λx + (1 − λ) y, λ ∈ [0, 1]} , and ]x, y[:= [x, y] \{x, y}. Let us introduce some geometric deﬁnitions. Definition 4.3 A point x in X is an extreme point of X (or belongs to E(X)) if there are no points x1 and x2 in X such that x ∈]x1 , x2 [. Definition 4.4 A subset of X is a face of X if it is convex and if for any σ in ∆∗X such that r(σ) ∈ F, the (finite) support S(σ) of σ is included in F. Definition 4.5 The set X is faces-closed if for any Kuratowski (=Hausdorff ) convergent sequence of faces of X the limit is also a face of X. A correspondence y → G(y) from X to ∆X is continuous at x if for any sequence {xn }n in X that is converging to x, the sequence of sets {G(xn )}n Kuratowski-converges to G(x). Definition 4.6 X is splitting-continuous at x ∈ X if y → ∆X (y) is continuous at x.

Convexiﬁcation Operator

47

The representation formula suggests that splitting continuity is suﬃcient for the preservation of continuity by coX . In fact we have much more: Theorem 4.7 • coX preserves continuity at x if and only if X is splitting-continuous at x. • If X is everywhere splitting-continuous it must be faces-closed. • If X is finite dimensional, then it is everywhere splitting-continuous if and only if it is faces-closed. • In dimension 4, there exists sets X which are extreme-points-closed but not facesclosed.

4.1.3

Preserving Lipschitz Continuity

Let d be some metric on X. Definition 4.8 X is d-splitting-Lipschitz if there exists ρ > 0 such that for any (x, y) in P X × X and for any m ∆∗X (x) there exist m points y1 , ..., ym in X such that i=1 αi δxi in Pm P m ∗ i=1 αi δyi belongs to ∆X (y) and i=1 αi d(xi , yi ) ≤ ρd(x, y).

Proposition 4.9 If X is d-splitting-Lipschitz and if f is d-Lipschitz with constant L then coX (f ) is d-Lipschitz with constant at most ρ × L. P n n n Proof. Let x, y be in X and suppose that coX (f )(x) = limn→∞ m i=1 αi f (xi ) with Pmn n ∗ n with constant ρ, there exists a sei=1 αi δxi ∈ ∆X (x). Since X is d-splitting-Lipschitz P n n n n Pmn n ∗ n mn quence of vectors (yi )i=1 such that i=1 αi δyin ∈ ∆X (y) and m i=1 αi d(xi , yi ) ≤ ρd(x, y). Thus, coX (f )(y) − coX (f )(x) = coX (f )(y) − lim

n→∞

=

"

mn X

αin f (xni )

i=1 mn X

lim coX (f )(y) −

n→∞

≤ lim sup n→∞

≤ lim sup n→∞

"m n X

" i=1 mn X

≤ L lim sup n→∞

αin f (xni )

i=1

αin f (yin ) −

mn X

#

αin f (xni )

i=1

αin |f (xni ) − f (yin )|

i=1 "m n X i=1

≤ ρ × L × d(x, y)

αin d(xni , yin )

#

#

#

48

Chapter 4. Regularity in Optimization and Control

Thus a natural question arises: which sets are splitting-Lipschitz for some norm? To answer this question, the following non-trivial lemma is crucial. We believe that it may have many other applications. Let Y be a measurable space and denote by M(Y ) the set of σ-additive bounded Borel measures on Y and by M+ (Y ) the set of σ-additive bounded positive measures on Y . Then any η ∈ M(Y ) can be decomposed uniquely as the diﬀerence of two measures with disjoint supports η + and η − in M+ (Y ) (the Hahn-Jordan decomposition Theorems, Cohn (1980)). The total variation norm of η ∈ M(Y ) is deﬁned by kηkM(Y ) := η + (Y ) + η − (Y ). Lemma 4.10 Let µ ∈ ∆(Y ) and (αk , µk )m k=1 be such that, ∀k = 1, ..., m : (i) µk ∈ ∆(Y ); P Pm (ii) αk ≥ 0, m k=1 αk = 1 and k=1 αk µk = µ.

Then, for every ν ∈ ∆(Y ), there exist ν1 , ..., νm such that ∀k = 1, ..., m

(i’) νk ∈ ∆(Y ); P (ii’) ν = m k=1 αk νk ; and P (iii’) kµ − νkM(Y ) = m k=1 αk kµk − νk kM(Y ) .

Our result is in fact more precise. It constructs explicitly ν1 , ..., νm . The last lemma together with a result of Walkup and Wets (1969) implies the following result.

Proposition 4.11 If X is a polytope then it is splitting-Lipshitz for some norm on V ect(X). If X is a simplex then there exists a norm on V ect(X) for it is splittingLipshitz with constant ρ = 1. Thus, being a polytope implies the uniform preservation of Lipschitz continuity and being a simplex implies the exact preservation of Lipschitz continuity. Our main result is more precise: Theorem 4.12 • If E is a finite dimensional normed space, coX uniformly preserves Lipschitz continuity if and only if the set X is a polytope. • X being a polytope is not sufficient for the exact preservation of Lipschitz continuity by coX for some norm. • If X is a product of simplices, coX (·) exactly preserves Lipschitz continuity for some norm on V ect(X).

MDP Operators

4.2 4.2.1

49

MDP Operators Gambling Houses

A gambling problem in the sense of Dubins & Savage (1965) has three ingredients : a state space or fortune space X, a gambling house Γ and a utility function u. Here we assume that X is a Borel subset of a complete, separable metric space. So, in particular, X is separable metric. The gambling house Γ is a function that assigns to each x ∈ X a nonempty set Γ(x) of probability measures deﬁned on the Borel subsets of X. Let ∆(X) be the set of all Borel probability measures on X and endow ∆(X) with the usual weak* topology. Then we can view Γ as corresponding to the set {(x, γ) : γ ∈ Γ(x)} ⊆ X × ∆(X). We assume that this set is a Borel subset of the product space X × ∆(X). The utility function u is a mapping from X to the real numbers with the usual interpretation that u(x) represents the value to a player of each state x ∈ X. In this section we will always assume that u is bounded and Borel measurable. Usually, we will assume that u is continuous. A strategy σ is a sequence σ0 , σ1 , ... such that σ0 ∈ ∆(X) and, for n ≥ 1, σn is a universally measurable function from X n into ∆(X). A strategy σ is available in Γ at x if σ0 ∈ Γ(x) and σn (x1 , ..., xn ) ∈ Γ(xn ) for every n ≥ 1 and (x1 , x2 , ..., xn ) ∈ X n . Each strategy σ determines a unique probability measure, also denoted by σ, on the Borel subsets of the history space H = X N , where N is the set of positive integers and H is given the product topology. Let X1 , X2 , ... be the coordinate process on H. Then, under σ, X1 has distribution σ0 and for n ≥ 1, Xn+1 has conditional distribution σn (x1 , x2 , ..., xn ) given X1 = x1 , ..., Xn = xn . We will concentrate on leavable gambling problems in which a player chooses a time to stop playing in addition to a strategy. A stop rule is a universally measurable function ′ from H into {0, 1, ...} such that whenever t(h) = n and h and h agree in their ﬁrst n coordinates, then t(h′ ) = n. (In particular, if t(h) = 0 for some h, then t is identically 0.) A player with initial fortune x selects a strategy σ available at x and a stop rule t. The player’s expected reward is then Z u(Xt )dσ where X0 = x. The optimal reward function or réduite is for x ∈ X Z U(x) = (Ru)(x) = sup u(Xt )dσ where the supremum is taken over all strategies σ at x and all stop rules t. For n ≥ 1,

50

Chapter 4. Regularity in Optimization and Control

the n-day optimal reward function Un (x) = (Rn u)(x) is deﬁned in the same way except that stop rules are restricted to satisfy t ≤ n. We sometimes write RΓ and RΓn for the operators R and Rn , respectively, to show their dependence on the gambling house Γ. The one-day operator G = GΓ is deﬁned by Z (Gu)(x) = sup udγ : γ ∈ Γ(x) , x ∈ X. The n-day rewards Rn u can be calculated by backward induction using G : R1 u = u ∨ Gu ,

Rn+1 u = u ∨ G (Rn u)

(4.1)

for n ≥ 0, where a ∨ b is the maximum of a and b. Furthermore, Rn u ≤ Rn+1 u

(4.2)

Ru = lim Rn u.

(4.3)

and n

(See Dubins & Savage (1965) or Maitra & Sudderth (1996)). Thus the operators Rn and R are completely determined by the operator G. In the sequel we ﬁnd conditions on Γ that imply that the operators G, Rn and R map the set of continuous (respectively, Lipschitz continuous) functions into itself. The gambling model described above is leavable in the sense that the player is allowed to stop the game at any time. Such models essentially include positive Markov decision processes as is explained in Example 3.3.5 of Maitra and Sudderth (1996).

4.2.2

Red-and-Black

Consider a gambling problem with fortune space X = [0, ∞) and think of states as being cash. Suppose that at each fortune x ≥ 0, the gambler can stake any amount s in her possession. The gambler wins back the stake s and an equal amount more with probability w, but loses the stake with probability 1 − w where 0 < w < 1. Thus the gambling house Γw is given by Γw (x) = {wδ(x + s) + (1 − w)δ(x − s) : 0 ≤ s ≤ x},

x ∈ X,

where δ(y) denotes the Dirac mass at y. Suppose further that the gambler has utility function ( x, 0 ≤ x ≤ 1, u(x) = 1, x > 1.

MDP Operators

51

The function u is continuous and even Lipschitz continuous. Is the same true for the optimal reward function Ru? The answer depends on the parameter w. If w ≤ 1/2, then, under every strategy σ available in Γw at x, the process Xn of successive fortunes is a nonnegative supermartingale that begins at X0 = x. By the optional sampling theorem (see for example Maitra and Sudderth (1996)), Z Z u(Xt ) dσ ≤ Xt dσ ≤ x, for every stop rule t. Hence, Ru(x) ≤ x and, obviously, Ru(x) ≤ 1. So Ru(x) ≤ u(x). On the other hand, the gambler can attain u(x) by stopping immediately. Therefore, for w ≤ 1/2, Ru = u and Ru is Lipschitz continuous. In fact, as will be seen below, Ru is Lipschitz continuous for every Lipschitz continuous utility function u. Now suppose that w > 1/2 and x > 0. Then, by staking a small proportion of her fortune at every stage, the gambler can reach a fortune of at least 1 with probability 1. Hence, Ru(x) = 1 for x > 0. Obviously, Ru(0) = 0 because the gambler can stake only 0 at fortune 0. So, for w > 1/2, Ru is not continuous. Why?

4.2.3

Maximal Houses

Let C(X) be the collection of all bounded, continuous functions u : X → R. It can easily happen that diﬀerent gambling houses determine the same operators G, Rn , and R on the domain C(X). In this section, we characterize the houses Γ1 and Γ2 associated with a given house Γ that are the largest possible houses such that GΓ1 = GΓ and RΓ2 = RΓ on C(X). For A ⊆ ∆(X), let co(A) be the convex hull of A and let co(A) be the closure of co(A) in the usual weak* topology on ∆(X). Theorem 4.13 The largest house Γ1 such that GΓ1 u = GΓ u for all u ∈ C(X) is the house coΓ defined by (coΓ) (x) = co (Γ(x)) for all x ∈ X. To describe the maximal house for the réduite operator RΓ , we ﬁrst introduce the house Γc . For each x, Γc (x) is the set of all possible distributions of Xt , where t is a bounded stop rule, under a strategy σ available at x in the original house Γ. (We make the convention that X0 = x.) The one-day operator GΓc for the house Γc agrees with the réduite operator RΓ on bounded, Borel measurable functions and in particular on C(X). This yields: Theorem 4.14 The largest house Γ2 such that RΓ2 u = RΓ u for all u ∈ C(X) is the house coΓc defined by (coΓc ) (x) = co (Γc (x)) for all x ∈ X.

4.2.4

Preserving Continuity

The operator GΓ preserves continuity at x ∈ X if, for all u ∈ C(X), the function GΓ u is continuous at x. The preservation of continuity by the operators RΓ and RΓn is deﬁned

52

Chapter 4. Regularity in Optimization and Control

in the same way. Also the preservation of upper semicontinuity and lower semicontinuity by these operators is deﬁned similiarly. The key to understanding when GΓ preserves continuity is the notion of Kuratowski convergence of sets and the related notion of Kuratowski continuity of a set-valued function such as a gambling house (see the last section). Theorem 4.15 Let x ∈ X and assume that there is a compact subset C of ∆(X) such that Γ(y) ⊆ C for all y in some neighborhood of x. (a1) If the mapping Γ is Kuratowski-upper semicontinuous at x, then GΓ preserves upper semicontinuity at x. (a2) If the mapping Γ is Kuratowski-lower semicontinuous at x, then GΓ preserves lower semicontinuity at x. (b1) If GΓ preserves upper semicontinuity at x and C is convex, then coΓ is Kuratowskiupper semicontinuous at x. (b2) If GΓ preserves lower semicontinuity at x and C is convex, then coΓ is Kuratowskilower semicontinuous at x. Proof. (Part a1) Let u : X → R be bounded and upper semicontinuous. Suppose that the mapping Γ is K-upper semicontinuous at x. Let us prove that GΓ u is also upper semicontinuous at x. To do so, let xn → x and suppose that (GΓ u) (xn ) converges to some α (up to a subsequence). Take ε > 0 and let γn ∈ Γεu (xn ) where Z ε Γu (y) := γ ∈ Γ(y) : udγ ≥ (GΓ u) (y) − ε . Since, for n large, Γ(xn ) is contained in the compact set C, we can assume that γn converges (up to a subsequence) to some γ0 . Then γ0 must be in Γ(x) because the mapping Γ is assumed to be K-upper semicontinuous at x. Hence, by Fatou’s lemma and the fact that u is upper semicontinuous, Z Z lim sup (GΓ u) (xn ) ≤ lim sup udγn + ε ≤ udγ0 + ε ≤ (GΓ u) (x) + ε Thus α ≤ (GΓ u) (x). It follows that GΓ u is upper semicontinuous at x. (Part b1) By contradiction, suppose that coΓ is not K-upper semicontinuous at x. Then there exists xn → x such that k−lim sup [coΓ(xn )] is not included in coΓ(x). Since for n large the sets Γ(xn ) are contained in the convex, compact set C, so are the sets coΓ(xn ). By using P3 above and passing to a subsequence, we can assume that k − lim coΓ(xn ) := D exists but is not included in coΓ(x). Hence, there exists γ ∈ D but not in coΓ(x). Since coΓ(x)

MDP Operators

53

is convex and closed, we deduce, by a separation theorem, that there exists u ∈ C(X) R R such that udγ > supτ ∈coΓ(x) udτ. Next, by the deﬁnition of Kuratowski convergence, there exist γn ∈ coΓ(xn ) such that γn → γ. Hence Z lim sup (GcoΓ u) (xn ) ≥ lim udγn Z = udγ > GcoΓ u(x),

that is, GcoΓ is not upper semicontinuous at x. An immediate corollary is the following: Corollary 4.16 Let x ∈ X and suppose Γc (y) is contained in a compact subset C of ∆(X) for all y in some neighborhood of x (a) If Γc is Kuratowski-continuous at x, then RΓ preserves continuity at x. (b) If RΓ preserves continuity at x and C is convex, then coΓc is Kuratowski-continuous at x. This corollary is unsatisfactory because it may be diﬃcult to determine whether Γc or coΓc is Kuratowski-continuous. We will return in the next section to the question of when RΓ preserves continuity. Another easy consequence from the recursive equation is: Corollary 4.17 Let x ∈ X and suppose that Γ(y) is contained in a compact convex subset of ∆(X) for all y in some neighborhood of x. If Γ is Kuratowski-continuous at x, then RΓn preserves continuity at x for every n ∈ N. The Kuratowki-continuity of Γ is not suﬃcient to imply that the operator RΓ for the inﬁnite horizon problem preserves continuity. The red-and-black houses Γw are easily seen to be Kuratowski-continuous, but, as was shown, they do not preserve continuity if w > 1/2. We need to go further...

4.2.5

Preserving Lipschitz Continuity

Let d be the metric for the metric space X. Recall that a function u : X → R is Lipschitz with constant α > 0 if |u(x) − u(y)| ≤ αd(x, y) for all x and y in X. Let L(α) be the set of all such functions. In this section, we seek conditions on Γ to guarantee that GΓ (u) is Lipschitz continuous whenever u is. First, we deﬁne, for γ, e γ ∈ ∆(X) Z Z λ(γ, e γ ) = sup udγ − ude γ (4.4) u∈L(1) Z Z = sup udγ − ude γ (4.5) u∈L(1)

54

Chapter 4. Regularity in Optimization and Control

(The function λ is the Kantorovich metric on ∆(X).) Definition 4.18 For ρ > 0, the gambling house Γ is said to satisfy condition Λ(ρ) if, for all x, y ∈ X, sup inf λ(γ, e γ ) ≤ ρd(x, y) e∈Γ(y) γ∈Γ(x) γ

The ﬁrst part of the next theorem is very close to results of Hinderer (1995) on Markov decision processes. Theorem 4.19 Let α and ρ be positive numbers. (a) If Γ satisfies Λ(ρ) and u ∈ L(α), then GΓ u ∈ L(αρ).

(b) Assume that coΓ(x) is compact for all x ∈ X. If GΓ u ∈ L(ρ) for all u ∈ L(1), then coΓ satisfies Λ(ρ). Proof. For (a), let ε > 0, x, y ∈ X, and u ∈ L(1). Choose γ ∈ Γ(x) such that Z (GΓ u) (x) ≤ udγ + ε then (GΓ u) (x) − (GΓ u) (y) ≤

Z

udγ − (GΓ u) (y) + ε Z Z = inf udγ − ude γ +ε γ e∈Γ(y) Z Z ≤ sup inf udγ − ude γ +ε γ ∈Γ(y) γ∈Γ(x) e

≤ ρd(x, y) + ε

So (GΓ u) (x) − (GΓ u) (y) ≤ ρd(x, y) and, by symmetry, we can interchange x and y. Therefore, GΓ ∈ L(ρ) when u ∈ L(1). For u ∈ L(α), we have αu ∈ L(1) and u u |(GΓ u) (x) − (GΓ u) (y)| = α GΓ (x) − GΓ (y) ≤ αρd(x, y). α α

Hence, GΓ u ∈ L(ρα). For the proof of (b), assume that Γ = coΓ. There is no harm done since we know GΓ = GcoΓ by Theorem 4.13. Assume also that GΓ u ∈ L(ρ) for all u ∈ L(1). By contradiction, suppose that Γ does not satisfy Λ(ρ). Then, for some ε > 0, x, y ∈ X and γ ∈ Γ(x), inf λ(γ, e γ ) > ρd(x, y) + ε.

γ e∈Γ(y)

MDP Operators

55

Now Z

inf λ(γ, e γ ) = inf sup udγ − ude γ γ e∈Γ(y) γ ∈Γ(y) u∈L(1) e Z Z = sup inf udγ − ude γ Z

e∈Γ(y) u∈L(1) γ

by Sion (1958) min-max theorem since ϕ(u, e γ) =

Z

udγ −

Z

ude γ

is linear in u, continuous in γe, and Γ(y) is compact and convex. Therefore, there exists u ∈ L(1) such that Z Z ε ρd(x, y) + < inf udγ − ude γ γ e∈Γ(y) 2 Z = udγ − (GΓ u) (y) ≤ (GΓ u) (x) − (GΓ u) (y)

so GΓ u ∈ / L(ρ), a contradiction. The recursive equation implies the following. Theorem 4.20 Let α and ρ be positive numbers. Assume u ∈ L(α) and that Γ satisfies Λ(ρ). (a) If ρ > 1, then RΓn u ∈ L(α ∨ ρn ) for all n ∈ N. (b) If ρ ≤ 1, then RΓn u ∈ L(α) for all n ∈ N, RΓ u ∈ L(α) and RΓn u converges uniformly to RΓ u on compact subsets of X. Theorem 4.20 gives a suﬃcient condition for the réduite operator to preserve Lipschitz continuity and also the constant of Lipschitz. Corollary 4.21 Let α > 0 and u ∈ L(α). If Γ satisfies Λ(1), then RΓ u ∈ L(α). The same condition is also suﬃcient for the preservation of continuity when X is compact. Corollary 4.22 Assume X is compact and Γ satisfies Λ(1). Then RΓ preserves continuity. Proof. Let u ∈ C(X). It follows from the Stone-Weirstrass theorem that there is a sequence un of Lipschitz continuous functions that converges uniformly to u. By

56

Chapter 4. Regularity in Optimization and Control

Theorem 4.20, Run ∈ C(X) for all n. Also, it is straightforward to check that Run converges uniformly to Ru. Hence Ru ∈ C(X). Let Γw , 0 < w < 1, be the red-and-black gambling house. It is easy to check that Γw satisﬁes Λ(1) if and only if w ≤ 1/2. Now we understand why continuity is not preserved when w > 1/2!

4.2.6

Preserving Hölder Continuity

Let d be the metric for the metric space X. Recall that a function u : X → R is δ−Hölder with constant α > 0 if |u(x) − u(y)| ≤ αdδ (x, y) for all x and y in X. Let Lδ (α) be the set of all such functions. By analogy with the function λ, deﬁne, for γ, e γ ∈ P(X) Z Z λδ (γ, e γ ) = sup udγ − ude γ u∈Lδ (1) Z Z (4.6) = sup udγ − ude γ u∈Lδ (1)

Definition 4.23 For ρ > 0, the gambling house Γ is said to satisfy condition Λδ,β (ρ) if, for all x, y ∈ X, sup inf λδ (γ, e γ ) ≤ ρdβ (x, y) γ ∈Γ(y) γ∈Γ(x) e

Here is a generalization of Theorem 4.19 to the preservation of Hölder continuity. (The proof is the same.) Theorem 4.24 Let α and ρ be positive numbers. (a) If Γ satisfies Λδ,β (ρ) and u ∈ Lδ (α), then GΓ u ∈ Lβ (αρ).

(b) Assume that coΓ(x) is compact for all x ∈ X. If GΓ u ∈ Lβ (ρ) for all u ∈ Lδ (1), then coΓ satisfies Λδ,β (ρ).

4.3

Splitting Operator

Consider two state spaces X and Y that are assumed to be compact subset of a complete, separable metric space. Let ∆(X) denote the set of Borel probability measures on X and similarly for ∆(Y ). They are endowed with the weak* topology. Let ∆x (X) denote the set of probabilities centered at x and similarly for ∆y (Y ). Let u be continuous function from X × Y to IR. Recall that the splitting game is played as follows: at stage n + 1 knowing the state (xn , yn ) player 1 (resp. player 2) chooses µn+1 ∈ ∆x (X) (resp. ν ∈ ∆y (Y )). A new state (xn+1 , yn+1) is selected according to these distributions and the stage payoﬀ is u(xn+1 , yn+1).

Informationally Optimal Correlation Systems

57

The Shapley splitting operator is deﬁned on continuous real functions f on X × Y by Z T(λ, f )(x, y) = valµ∈∆x (X)×ν∈∆y (Y ) [(λu(x′ , y ′) + (1 − λ)f (x′ , y ′)]µ(dx′ )ν(dy ′ ) X×Y

We denote by Vλ the value of the discounted splitting game. It is the unique ﬁxed point of the Shapley splitting operator. The next regularity properties are established in Laraki (2001a,b) and is a direct consequence of the regularity results proved for the convexiﬁcation operator. Lemma 4.25 If X and Y are product of simplices, and if u is 1-Lipschitz, the Shapley splitting operator preserves exactly Lipschitz continuity (i.e. it associates to each 1-Lipschitz function a 1-Lipschitz function). If X and Y are splitting continuous and u is continuous then the Shapley splitting operator preserves continuity (i.e. it associates to any continuous function a continuous function). Consequently, when X and Y are product of simplices and u is Lipschitz, the family of discounted values Vλ is equi-Lipshitz. Thus, the variational approach (developed in chapter 2) allows Laraki (2001b) to deduce that Vλ uniformly converges to the unique continuous solution of the Mertens-Zamir System. This shows, in particular, that the Mertens-Zamir system admits a solution under these assumptions, that this solution is unique, and that it is a Lipschitz function. However, when X and Y are splitting-continuous and u is continuous, the discounted values Vλ are continuous but not necessarily equi-continuous. A more diﬃcult proof based on hypo-convergence and epi-convergence allows nevertheless Laraki (2001a) to show that Vλ uniformly converges to the unique continuous solution of the Mertens-Zamir System. This shows, in particular, that the Mertens-Zamir system admits a solution under these assumptions, that this solution is unique, and that it is continuous. When u is not continuous or X or Y are not splitting-continuous, it is an open question wether the Mertens-Zamir system admits at least one scs-sci solution and wether this solution is unique. In an ongoing work, in collaboration with Jérôme Renault, gambling houses are extended to a two-player zero-sum framework. It is shown that, under some conditions that guarantee equi-continuity of the discounted values and other conditions, the asymptotic value exists and is characterized with a system of functional equations that extends the Mertens-Zamir system.

4.4

Informationally Optimal Correlation Systems

This work is motivated by a paper of Gossner and Tomala (2007) that characterizes explicitly the minmax in repeated game with imperfect monitoring using tools from Shanon information theory. This section develops new tools that allows to compute analytically the minmax value in 3-player 2 × 2 × 2 games.

58

Chapter 4. Regularity in Optimization and Control

Let N = {1, . . . , n} be a ﬁnite team of players and Ai be a ﬁnite set of actions for player i ∈ N. A mixed strategy for player i is a probability distribution xi on Ai and we Q let X i = ∆(Ai ) be the set of probability distributions on Ai . We let A = i∈N Ai be the set of action proﬁles and X N = ∆(A) be the set of (correlated) probability distributions on A. We also let X = ⊗i∈N X i the set of independent probability distributions on A i.e. a distribution D is in X if there exist x1 ∈ X 1 , . . . , xn ∈ X n such that for each a, Q D(a) = i xi (ai ), we write then D = ⊗i xi ∈ ∆(A). We describe how correlation of actions is obtained. A ﬁnite random variable k with law p = (pk )k is drawn and announced to each player in the team and to no one else. Then each player chooses an action, possibly at random. We think of k as a common information shared by the team’s members which is secret for an external observer. For example, k can be the result of secret communication within the team, or it can be provided by a correlation device. Conditioning the mixed strategies on the value of k, the team can generate every distribution of actions of the form: X D= pk ⊗i xik k

for each k, xik ∈ X i . The distribution D can thus be seen as the belief of the external observer on the action proﬁle played by the team. Note that the random variable k intervenes in the decomposition through its law only and in fact only through the distribution it induces on mixed strategies. We deﬁne thus a correlation system as follows: Definition 4.26 A correlation system Z is a distribution with finite support on X: Z=

K X

pk δ⊗i xik

k=1

P where for each k, pk ≥ 0, k pk = 1, for each i, xik ∈ X i , and δ⊗i xik stands for the Dirac measure on ⊗i xik . P The distribution of actions induced by Z is D(Z) = k pk ⊗i xik , element of X N .

We measure the randomness of correlation systems using the information theoretic notion of entropy. Let x be a ﬁnite random variable with law p, the entropy of x is: H( x) = E[− log p(x)] = P − x p(x) log p(x), where 0 log 0 = 0 and the logarithm is in basis 2. H( x) is nonnegative and depends only on p, we shall thus also denote it H(p). Let (x, y) be a pair of ﬁnite random variables with joint law p. For each x, y, deﬁne the conditional entropy P of x given y by: H(x|y) = − x,y p(x, y) log p(x|y). Entropy veriﬁes the following chain rule: H(x, y) = H( y) + H(x|y). In the case of a binary distribution (p, 1 − p) we let, h(p) := H(p, 1 − p) = −p log p − (1 − p) log(1 − p) The uncertainty of an observer regarding the action proﬁle of the team is the result of

Informationally Optimal Correlation Systems

59

two eﬀects: (1) team players condition their actions on the random variable k, (2) conditional on the value of k team players use mixed actions xik . We measure the uncertainty generated by the team itself by the expected entropy of ⊗i xik . P Definition 4.27 Let Z be a correlation system, Z = K k=1 pk δ⊗i xik . The expected entropy of Z is: X J(Z) = pk H(⊗i xik ) k

Example Consider a two-player team, with two actions for each player: A1 = A2 = {G, H}. We identify a mixed strategy for player i with the probability it puts on G. d1 d2 12 A distribution D ∈ X is denoted D = , where d1 denotes the probability d3 d4 of the team’s action proﬁle (G, G), d2 the probability of (G, H) etc. The distribution 1 0 2 D = can be uniquely decomposed as a convex combination of independent 0 12 distributions as follows: D = 21 (1 ⊗ 1) + 12 (0 ⊗ 0). A correlation system Z such that D(Z) = D is thus uniquely deﬁned: Z = 12 δ1⊗1 + 12 δ0⊗0 , i.e. the players ﬂip a fair coin an play (G, G) if heads and (H, H) if tails. Then given k = k, the strategies used are pure, P thus J(Z) = k pk H(⊗i xik ) = 0. 1 1 3 3 By contrast the distribution D ′ = can be obtained by several correlation 0 13 system. For example, D ′ = D(Z) for the following Z’s: • Z1 = 13 δ1⊗1 + 13 δ1⊗0 + 13 δ0⊗0 . • Z2 = 23 δ1⊗ 1 + 13 δ0⊗0 . 2

• Z3 = 12 δ1⊗ 2 + 12 δ 1 ⊗0 3

3

Under Z1 , the players play pure strategies conditional on the value of k, thus J(Z1 ) = 0. P P Under Z2 , J(Z2 ) = k pk H(⊗i xik ) = 32 H( 12 , 12 ) = 32 . Under Z3 , k pk H(⊗i xik ) = H( 13 , 23 ). One gets then J(Z3 ) > J(Z2 ) > J(Z1 ). The question is how to generate D ′ with maximal expected entropy? It turns out that Z3 is optimal for D ′ in this sense. This leads to the following deﬁnition. Definition 4.28 Given D ∈ X N , a correlation system Z is informationally optimal for D if: 1. D(Z) = D; 2. For every Z ′ such that D(Z ′ ) = D, J(Z ′ ) ≤ J(Z). In other words, Z is a solution of the optimization problem: max

Z:D(Z)=D

J(Z)

(PD )

A correlation system Z is informationally optimal if it is informationally optimal for D(Z).

60

Chapter 4. Regularity in Optimization and Control

4.4.1

Properties

It is easy to prove that for every D ∈ X N , there exists Z optimal for D which has ﬁnite Q support of cardinal no more than i |Ai | + 1. Some additional properties on the value of (PD ) ar given now. Proposition 4.29 1. The mapping ϕ : D 7→ value of PD is the smallest concave function on X N such that its restriction to X, ϕ|X is pointwise (weakly) greater than the entropy function, i.e. ϕ(⊗i xi ) ≥ H(⊗i xi ) for each ⊗i xi ∈ X. 2. ϕ is continuous on X N . 3. For each D, ϕ(D) ≤ H(D). Furthermore, ϕ(D) = H(D) iff D is a product distribution. The set of optimal correlation systems possesses a kind of consistency property. Roughly, one cannot ﬁnd in the support of an optimal system, a sub-system which is not optimal. In geometric terms, if we denote by Z the set of all correlation systems and F (Z) the minimal geometric face of the convex Z containing Z, then the following lemma states that if Z is optimal then any correlation system that belongs to F (Z) is also optimal. Lemma 4.30 If Z is informationally optimal and supp Z ′ ⊆ supp Z then Z ′ is also informationally optimal. P In particular, if Z = K k=1 pk δ⊗i xik is informationally optimal, then for any k1 and p k1 p k2 k2 in {1, . . . , K} such that pk1 + pk2 > 0, pk +p δ⊗i xik + pk +p δ⊗i xik is informationally k k2 1 2 1 2 1 optimal.

4.4.2

Characterization

We characterize informationally optimal correlation systems for two player teams where each team player possesses two actions. We assume from now on that A1 = A2 = {G, H}. We identify a mixed strategy x (resp. y) of player 1 (resp. 2) with the probability of playing G, i.e. to a number in the interval [0, 1]. We denote distributions D ∈ X 12 by: D=

d1 d2 d3 d4

,

where d1 denotes the probability of the team’s action proﬁle (G, G), d2 the probability of (G, H) etc. The following theorem shows that the informationally optimal correlation system associated to any D is unique, contains at most two elements in its support, can be easily

BIBLIOGRAPHY

61

computed for a given distribution, and that the set of informationally optimal correlation systems admits a simple parametrization. Theorem 4.31 For every D ∈ X 12 , there exists a unique Z D which is informationally optimal for D. Moreover, • If det(D) = 0, Z D = δx⊗y where: x = d1 + d2

,

y = d1 + d3

• If det(D) < 0, Z D = pδx⊗y + (1 − p)δy⊗x where x and y are the two solutions of the second degree polynomial equation X 2 − (2d1 + d2 + d3 )X + d1 = 0 and p=

y − (d1 + d2 ) . y−x

• If det(D) > 0, Z D = pδ(1−x)⊗y + (1 − p)δ(1−y)⊗x where x and y are the two solutions of the second degree polynomial equation X 2 − (2d3 + d4 + d1 )X + d3 = 0 and p=

y − (d3 + d4 ) y−x

Surprisingly, the proof is quite involved. This results may be applied to compute analytically the asymptotic value in some repeated games with imperfect monitoring.

Bibliography [1] Attouch, H. (1984). Variational Convergence for Functions and Operators, Pitman Publishing Limited, London. [2] Choquet, G. (1969). Integration and Topological Vector Spaces, Lectures on Analysis, Volume 1, W. A. Benjamin, Inc. [3] Cohn, D. L. (1980). Measure Theory, Birkhäuser. [4] Dubins, L. E., and L. J. Savage (1965). Inequalities for Stochastic Processes, McGraw-Hill. 2nd edition (1976). Dover. [5] Gossner O. R. Laraki and T. Tomala (2009). Informationally Optimal Correlation. Mathematical Programming, Ser. B, 116, 147Ű172.

62

Chapter 4. Regularity in Optimization and Control

[6] Gossner O. and T. Tomala (2007). Secret Correlation in Repeated Games with Imperfect Monitoring, Mathematics of Operations Research, 32, 413-424. [7] Hinderer K. (1995). Lipshitz continuous Markov Decision Processes, unpublished lecture notes, Applied Probability Conference, Atlanta. [8] Klein E., and A. C. Thompson (1984). Theory of Correspondences, WilleyInterscience Publication. [9] Kruskal, J. B. (1969). Two Convex Counterexamples: a Discontinuous Envelope Function and a non Diﬀerentiable Nearest-Point Mapping, Proceedings of the American Mathematical Society, 23, 697-703. [10] Laraki R. (2001a). The Splitting Game and Applications. International Journal of Game Theory, 30, 359-376. [11] Laraki R. (2001b). Variational Inequalities, System of Functional Equations, and Incomplete Information Repeated Games. SIAM J. Control and Optimization, 40, 516-524. [12] Laraki, R. (2004). On the regularity of the convexﬁcation operator on a compact set, Journal of Convex Analysis, 11, 209 [13] Laraki, R. and W. D. Sudderth (2004). The Preservation of Continuity and Lipschitz Continuity by Optimal Reward Operators, Mathematics of Operation Research, 29, 672-685. [14] Maitra A. P., and W. D. Sudderth (1996). Discrete Gambling and Stochastic Games, Springer-Verlag. [15] Rockafellar, R.T. (1970). Convex Analysis, Princeton University Press, Princeton, New Jersey. [16] Walkup, D. W. and R. J. -B.Wets (1969). A Lipschitzian Characterization of Convex Polyhedra, Proceedings of the American Mathematical Society, 23, 167-173. [17] Sion, M. (1958) On General Minmax Theorems. Pacific Journal of Mathematics, 8, 171-176.

Chapter 5 Developing Algorithms The convexiﬁcation operator plays an important role in game theory, variational calculus and many other branches of mathematics. Computing the convex envelope co(f ) of a function f at some point x is a challenging and diﬃcult problem since it requires the knowledge of the function everywhere. To the best of our knowledge, there is no algorithm that approximates uniformly co(f ) by convex functions (except for the simpler univariate case). For instance, for a function f on a bounded domain Ω, Brighi and Chipot (1994) propose triangulation methods, provide piecewise degree-1 polynomial approximations fh ≥ f , and derive estimates of fh − co(f ) (where h measures the size of the mesh). Nash equilibrium is perhaps the most central concept in noncooperative game theory. Hence, it is crucial to develop algorithms that compute, for any ﬁnite game, at least one Nash equilibrium, and if possible all Nash equilibria. Lemke and Howson (1994) provide a famous algorithm for two-player games. The algorithm has been extended to multi-player games by various authors. Herings and Peeters (2009) prove that all existing algorithms are homotopy-based, that they converge only generically, and select only one Nash equilibrium (and not all Nash equilibria). Recently, Savani and von Stengel (2006) proved that the Lemke-Howson algorithm may be exponential in time. Daskalakis, Goldberg and Papadimitriou (2009) proved that solving numerically a 3-player ﬁnite game is PPAD complete. The result has been extended to 2-player games by Chen and Deng (2009). During the last ten years, Jean-Bernard Lasserre developed new tools of global optimization that have many applications in diverse branches of mathematics. A brief description is provided in the ﬁrst section. The other two sections adapt or extend Lasserre’s techniques to solve some problems related to my research agenda (such as the two problems cited above on convexiﬁcation and computation of Nash equilibria). This chapter is based on the papers Laraki and Lasserre (2008, 2010).

5.1

Generalized Moment Problem

In this section, we present some fundamental results that may be found in Lasserre (2010). 63

64

5.1.1

Chapter 5. Developing Algorithms

Primal and Dual GMP

Let K be a Borel subset of Rn and let M(K)+ be the positive cone of ﬁnite Borel measures µ on K. Given a set of indices Γ, a set of reals {γk , k ∈ Γ}, and functions f , hk : K → R, k ∈ Γ, that are universally integrable, the generalized moment problem GMP is deﬁned as follows: R f dµ such that: K R ρ = sup h dµ ≦ γk µ∈M(K)+ K k

(Primal GMP)

where ≦ stands for either inequality ≤ or equality =. Let Γ+ ⊆ Γ stand for the set of indices k for which the constraint is an inequality: ≤. Problems involving GMP arise in many applications (see Lasserre (2010)). For example, consider the problem of ﬁnding a global maximum of a function f over a set K. This is a particular instance of GMP where Γ contains only one element k, and the constraint R is K dµ = 1 stating that µ must be a probability measure on K. In classical optimization, without assuming concavity of f and convexity of K it is not possible to compute the global maximum of f eﬃciently. GMP is an inﬁnite dimensional lineal problem. Its dual is:  P  λk γk such that:  k∈ΓP  ′ ρ = inf (1) k∈Γ λk hk (x) ≧ f (x), ∀x ∈ K,  λ=(λk )∈RΓ  (2) λk ≥ 0, ∀k ∈ Γ+

(Dual GMP)

Weak duality implies that when both problems are feasible then ρ ≤ ρ′ . Moreover, if Slater’s condition is satisﬁed (the interior of the set of constraints in the dual is nonempty), there is no duality gap (that is ρ = ρ′ ) and GMP has an optimal solution (the supremum is achieved). It may be shown that there is no duality gap when K is compact, f and hk , k ∈ Γ, are continuous and hk > 0 on K for some k. Without additional assumptions, the GMP and its dual are very diﬃcult to solve. Constraint (1) in the dual GMP asks that some polynomial must be positive on K. This kind of constraint is manageable if one assumes polynomial structure.

5.1.2

Dual Relaxations

Let R[x] be the ring of real polynomials in the variables x = (x1 , . . . , xn ). A polynomial P p ∈ R[x] is a sum of squares (in short s.o.s.) if p(x) = i∈I pi (x)2 for some ﬁnite family of polynomials pi . Denote by Σ[x] ⊂ R[x] the cone of polynomials that are sums of squares. For any two real symmetric matrices, A, B let hA, Bi = trace(AB). For a multi-index P α ∈ Nn let |α| = ni=1 αi and let (xα )α∈Nn be the canonical basis of monomials. Denote by vd (x) = (xα )|α|≤d the column vector of all monomials of degree less than d. It has dimension s(d) = n+d . Those monomials form the canonical basis of the vector space d R[x]d of polynomials of degree at most d. Finally let kxk denote the Euclidean norm of

Generalized Moment Problem

65

x ∈ Rn and let At be the transpose of matrix A. An easy but important result follows. If a nonnegative polynomial has a sum of squares representation, it is possible to compute such a representation using semideﬁnite optimization methods. Proposition 5.1 A polynomial p ∈ R[x]2d has a sum of squares decomposition if and only if there exists a real symmetric and positive semidefinite matrix Q ∈ Rs(d)×s(d) such that p(x) = vdt (x)Qvd (x). So, given a s.o.s. polynomial p ∈ R[x]2d , the identity p(x) = vdt (x)Qvd (x) for all x provides linear equations that the coeﬃcients of the matrix Q must satisfy. Thus, if X vd (x)vdt (x) = Bα xα α∈Nn

for some real symmetric matrices Bα , then, checking whether p(x) = reduces to solving the following semideﬁnite optimization problem:

P

α

pα xα is a s.o.s.

Find Q ∈ Rs(d)×s(d) such that Q = Qt , Q 0, hQ, Bα i = pα , ∀α ∈ Nn . Given that there exists eﬃcient algorithms to solve such semideﬁnite optimization problems, if a polynomial admits a sum of squares representation, then this representation can be computed eﬃciently. What is the class of semi-algebraic sets K in which it easy to test whether a polynomial is positive or not? K ⊂ Rn is a basic semi-algebraic set if it is deﬁned as: K := {x ∈ Rn : | gj (x) ≥ 0,

j = 1, . . . , m},

(5.1)

for some polynomials {gj } ⊂ R[x]. K satisﬁes Putinar’s property if there exists u ∈ R[x] of the form u = σ0 +

m X

σj gj

(5.2)

j=1

for some s.o.s. polynomials (σj )m j=0 ⊂ Σ[x] such that {x : u(x) ≥ 0} is compact. Obviously Putinar’s property implies compactness of K. Putinar’s property holds if the level set {x : gj (x) ≥ 0} is compact for some j, or if all gj are aﬃne and K is compact (in which case it is a polytope). In case it is not satisﬁed and if for some known in advance M > 0, kxk2 ≤ M whenever x ∈ K, then it suﬃces to add the redundant quadratic constraint gm+1 (x) := M − kxk2 ≥ 0 to the deﬁnition of K. The importance of Putinar’s property comes from the following result. Theorem 5.2 (Putinar (1993)) If K satisfies Putinar’s property and p ∈ R[x] is strictly positive on K, then p can be written in the form of u in (5.2).

66

Chapter 5. Developing Algorithms

This leads, for each integer d ∈ N to the following relaxations for the dual GMP:  P    λ γ such that: k k   k∈Γ   Pm  (1) P  λ h − f = f + f g , k k 0 j j ′ k∈Γ j=1 ρd = inf   (2) λk ≥ 0, ∀k ∈ Γ+ λ=(λk )∈RΓ ,{fj ∈Σ[x]}m j=0     (3) deg(f ) ≤ 2(d − v ), j = 0, ..., m   j j (Dual relaxation) where, depending on the parity of gj , 2vj or 2vj − 1 is the degree of gj and d ≥ maxj vj . This is a semideﬁnite program. Example: let p(x) = x31 − x21 + 2x1 x2 − x22 + x32 and K := {x ∈ R2 : | g1 (x) = x1 ≥ 0, g2 (x) = x2 ≥ 0, g3 (x) = x1 + x2 − 1 ≥ 0}. To check whether p ≥ 0 on K let us P try to write p = p0 + 3j=1 pj gj where each pj ∈ Σ[x] has degree 2. That is, we want to ﬁnd symmetric and positive matrices Qj such that pj = (1, x1 , x2 )Qj (1, x1 , x2 )t and P p = p0 + 3j=1 pj gj . Solving the SDP feasibility problem gives 

     0 0 0 0 0 0 0 0 0 Q0 = 0, Q1 =  0 0 0  , Q2 =  0 1 0  , Q3 =  0 1 −1  0 0 1 0 0 0 0 −1 1

and so, p(x) = x22 x1 + x21 x2 + (x1 − x2 )2 (x1 + x2 − 1). This proves that p ≥ 0 on K.

5.1.3

Primal Relaxations

P A polynomial p(x) = α pα xα ∈ R[x]d is written as p(x) = hp, vd (x)i where p = (pα ) ⊂ R is a sequence indexed by the canonical monomial basis (xα ). Hence, we can identify a polynomial with its vector of coeﬃcients. For y = (yα ), let Ly : R[x] → R be the linear functional Ly : f =

X

α∈Nn

fα xα 7−→

X

f ∈ R[x].

fα yα ,

α∈Nn

Given y, the moment matrix Md (y) of order d associated with y, has its rows and columns indexed by (xα ) and its (α, β)-entry is deﬁned as: Md (y)(α, β) = Ly (xα+β ) = yα+β ,

|α|, |β| ≤ d.

Or, equivalently, Md (y) = Ly (vd (x)vd (x)t ) X = yα Bα α∈Nn

for real symmetric matrices (Bα ) of appropriate dimensions. In fact, Md (y) deﬁnes a bilinear form h·, ·iy on R[x]d as follows:

Generalized Moment Problem

67

hp, qiy = Ly (pq) = pt Md (y)q where p and q denote the vector coeﬃcients of p and q respectively. For example: 

 y00 y10 y01 M1 (y) =  y10 y20 y11  y01 y11 y02 P Similarly, given y = (yα ) and θ ∈ R[x] (= γ θγ xγ ), the localizing matrix Md (θ, y) of order d associated with y and θ, has its rows and columns indexed by α and its (α, β)-entry is deﬁned by: X Md (θ, y)(α, β) = Ly (xα+β θ(x)) = θγ yγ+α+β , |α|, |β| ≤ d. =

X

γ

yα Bαθ

α∈Nn

for real symmetric matrices (Bαθ ) of appropriate dimensions. In fact, Md (θ, y) deﬁnes a bilinear form h·, ·iθ on R[x]d as follows: hp, qiθ = Ly (θpq) = pt Md (θ, y)q. For example, if θ(x) = a − x21 − x22 then: 

 ay00 − y20 − y02 ay10 − y30 − y12 ay01 − y21 − y03 M1 (θ, y) =  ay10 − y30 − y12 ay20 − y40 − y22 ay11 − y31 − y13  ay01 − y21 − y03 ay11 − y31 − y13 ay00 − y22 − y04 One says that y = (yα ) has a representing measure supported on K if there is some ﬁnite Borel measure µ on K such that Z yα = xα dµ(x), ∀ α ∈ Nn . K

Observe that if y admits such a representation measure µ then for every p : Z t p Md (y)p =Ly (pp) = p2 dµ

R and so Md (y) 0. Similarly, pt Md (θ, y)p = θp2 dµ and so Md (θ, y) 0 whenever µ has its support contained in {x : θ(x) ≥ 0}. The dual formulation of Putinar’s theorem is the following result.

Theorem 5.3 (Putinar (1993)) If K satisfies Putinar’s property then y = (yα ) admits

68

Chapter 5. Developing Algorithms

a representing measure on K if and only if Md (y) 0,

Md (gj , y) 0,

j = 1, . . . , m;

d = 0, 1, . . .

(5.3)

This leads to the following primal dual relaxations:   Ly (f ) such that:    L (h ) ≦ γ , k ∈ Γ y k k ρd = sup Md (y) 0 y     M d−vj (gj , y) 0,

    

   j = 1, . . . , m 

(Relaxed primal)

where, depending on the parity, 2vj or 2vj − 1 is the degree of gj and d ≥ maxj vj . This is again a semideﬁnite program.

5.1.4

The Main Result

Theorem 5.4 (Lasserre (2010)) Let {f, (hk ), (gj )} ⊂ R[x]. Suppose that hk > 0 for some k, K is a basic semi-algebraic set that satisfies Putinar’s property and ρ < +∞. Then: (a) ρ = ρ′ : there is no duality gap. (b) ρd ↓ ρ and ρ′d ↓ ρ′ as d goes to infinity. (c) If the primal relaxed problem has a solution y and if the following rank condition rank[Ms (y)] = rank[Ms−v (y)] is satisfied for some s ≤ d with v = max vj then ρd = ρ and the GMP has a finitely supported optimal solution. To use this theorem in the various applications, one must check that its conditions hold.

5.2

Convexification Operator

Recall that the convex envelope of a real-valued bounded function f : Rn → R is the largest convex function on Rn majorized by f . Let D ⊂ Rn be compact, and denote by: - K, the convex envelope of D. By Caratheodory’s theorem, K is convex and compact. - C(D), the space of real-valued continuous functions on D. - M(D), its topological dual is the space of ﬁnite signed Borel measures on D and is endowed with the weak* topology. - M+ (D) ⊂ M(D), the cone of positive Borel measures. - ∆(D) ⊂ M+ (D), the set of Borel probability measures on D.

Convexiﬁcation Operator

69

- for f in C(D), let fe be its extension to Rn that is fe(x) :=

f (x) on D +∞ on Rn \ D.

(5.4)

Note that fe is lower-semi-continuous (l.s.c.) and so admits a minimum.

- for f in C(D), let co(f ) denote the convex envelope of fe, i.e. the greatest convex function smaller than fe. Note that the vector spaces M(D) and C(D) are in duality with the duality crochet Z hσ, f i := f dσ, σ ∈ M(D), f ∈ C(D) D

With f ∈ C(D), and x ∈ K = co(D), consider the inﬁnite-dimensional linear program (LP):  inf hσ, f i   σ∈M+ (D) LPx : (5.5) s.t. hσ, yi i = xi , i = 1, . . . , n   hσ, 1i = 1. where yi ∈ R[y] is the natural projection on the i-variable that associates to every x ∈ Rn , yi (x) = xi . The optimal value is denoted by inf LPx , and min LPx if the inﬁmum is attained. Then, the convex envelope co(f ) of fe is given by: co(f )(x) =

min LPx , x ∈ K, +∞, x ∈ Rn \ K,

so that the domain of co(f ) is K. Consider the class of rational functions f on a compact semi-algebraic set D ⊂ Rn . In that case, it may be shown that the problem LPx can be written as a special case of the general moment problem, and so we can provide an algorithm for computing convex and uniform approximations of its convex envelope co(f ). More precisely: - (a) We provide a sequence of convex functions {fd } and show that it converges to co(f ), uniformly on any compact subset of K in which co(f ) is continuous. - (b) We prove that at each point x ∈ Rn , computing fd (x) reduces to solving a semideﬁnite program (the relaxed primal). - (c) For every x ∈ int K, the SDP relaxed dual is shown to be solvable and any optimal solution provides an element of the subgradient ∂fd (x) at the point x ∈ int K. - (d) For every basic semi-algebraic set D, we provide a monotone decreasing sequence of convex sets that converges to K. Checking whether x 6∈ K reduces to solving ﬁnitely many SDPs.

70

5.3

Chapter 5. Developing Algorithms

MinMax of Rational Functions and Applications

Let K ⊂ Rn be the basic semi-algebraic set K := {x ∈ Rn : gj (x) ≥ 0,

j = 1, . . . , p}

(5.6)

for polynomials (gj ) ⊂ R[x], and let fi = pi /qi be rational functions, i = 0, 1, . . . , m, with pi , qi ∈ R[x]. We assume that: • K satisﬁes Putinar’s property and, • qi > 0 on K for every i = 0, . . . , m. Consider the following minimization problem denoted by MRF (for maximum of rational functions): ρ := min{f0 (x) + max fi (x) : x ∈ K },

(5.7)

ρ = min{f0 (x) + z : z ≥ fi (x), i = 1, . . . , m, x ∈ K }.

(5.8)

MRF :

x

i=1,...,m

or, equivalently, MRF :

x,z

b ⊂ Rn+1 be the basic semi-algebraic set Let K

b := {(x, z) ∈ Rn × R : x ∈ K, z qi (x) − pi (x) ≥ 0, i = 1, . . . , m} K

and consider the new inﬁnite-dimensional optimization problem Z Z b P : ρˆ := min{ (p0 + z q0 ) dµ : q0 dµ = 1, µ ∈ M(K)} µ

K

(5.9)

(5.10)

K

Problem (5.10) is a particular instance of GMP, so it can be solved by using Lasserre techniques. The next sections show that several solution concepts of static and dynamic ﬁnite games reduce to solving the MRF problem (5.7).

5.3.1

Finite Games

A ﬁnite game is a tuple (N, {S i }i=1,...,N , {g i }i=1,...,N ) where N is the set (and the number) of players, S i is the ﬁnite set of pure strategies of player i and g i : S → R is the payoﬀ function of player i, where S := S 1 × ... × S N . The set ( ) X ∆i = pi (si ) si ∈S i : pi (si ) ≥ 0, pi (si ) = 1 si ∈S i

of probability distributions over S i is called the set of mixed strategies of player i. Notice that ∆i is a compact basic semi-algebraic set. If each player j chooses the mixed strategy pj (·), the vector p = p1 , ..., pN ∈ ∆ : = ∆1 × ... × ∆N is called a profile of mixed

MinMax of Rational Functions and Applications

71

strategies and the expected payoﬀ of player i is X

g i (p) =

p1 (s1 ) · ... · pi (si ) · ... · pN (sN )g i (s).

s=(s1 ,...,sN )∈S

This is just the multi-linear extension of g i . For a player i, and a proﬁle p, let p−i be the proﬁle of the other players except i: that is p−i = (p1 , ..., pi−1 , pi+1 , ..., pN ). Let S−i = S 1 × ... × S i−1 × S i+1 × ... × S N and deﬁne X g i (si , p−i ) = p1 (s1 ) · ... · pi−1 (si−1 ) · pi+1 (si+1 ) · ... · pN (sN )g i (s), s−i ∈S −i

where s−i := (s1 , ..., si−1 , si+1 , ..., sN ) ∈ S −i . A proﬁle p0 is a Nash equilibrium if and only for all i = 1, ..., N and all si ∈ S i , g i (p0 ) ≥ g i (si , p−i 0 ) or equivalently if: i i −i i p0 ∈ arg min max max g (s , p0 ) − g (p0 ) . (5.11) p∈∆

i=1,...,N si ∈S i

Since each ﬁnite game admits at least one Nash equilibrium, the optimal value of the min-max problem (5.11) is zero. Notice that (5.11) is a particular instance of the MRF problem (5.7) (with K = ∆ satisfying obviously Putinar’s property and with qi = 1 for every i = 0, . . . , m). Hence by solving a hierarchy of SDP relaxations, one can approximate the value of the min-max problem as closely as desired and extract always an approximate Nash equilibrium. In addition, if the rank condition is satisﬁed at some relaxation, then one may extract all Nash equilibria of the game. In all our random numerical simulations, the rank condition is satisﬁed. This suggests that the rank condition must hold generically. Another important concept in game theory is the min-max payoﬀ vi which plays an important role in repeated games: v i = min max g i (si , p−i ) p−i ∈∆−i si ∈S i

where ∆−i = ∆1 × ... × ∆i−1 × ∆i+1 × ... × ∆N . This is again a particular instance of the MRF problem.

5.3.2

Loomis Games

Loomis (1946) extended the min-max theorem to zero-sum games with a rational function. His model may be extended to N-player games as follows. Our extension is justiﬁed by the next section. Associated with each player i ∈ N are two functions g i : S → R and f i : S → R where f i > 0 and S := S 1 × ... × S N . With notation as in the last section, let their

72

Chapter 5. Developing Algorithms

multi-linear extension to ∆ still be denoted by g i and f i . That is, for p ∈ ∆, let: X g i (p) = p1 (s1 ) · ... · pi (si ) · ... · pN (sN )g i (s). s=(s1 ,...,sN )∈S

and similarly for f i . Definition 5.5 A Loomis game is defined as follows. The strategy set of player i is ∆i i and if the profile p ∈ ∆ is chosen, his payoff function is hi (p) = fg i(p) . (p) We show the following new lemma. Lemma 5.6 A Loomis game admits a Nash equilibrium. Proof. Note that each payoﬀ function is quasi-concave in pi (and also quasi-convex so that it is a quasi-linear function). Actually, if hi (pi1 , p−i ) ≥ α and hi (pi2 , p−i ) ≥ α then for any δ ∈ [0, 1], g i (δpi1 + (1 − δ)pi2 , p−i ) ≥ f i (δpi1 + (1 − δ)pi2 , p−i )α, so that hi (δpi1 + (1 − δ)pi2 , p−i ) ≥ α. One can now apply Glicksberg’s (1952) theorem because the strategy sets are compact, convex, and the payoﬀ functions are quasi-concave and continuous. Corollary 5.7 A profile p0 ∈ ∆ is a Nash equilibrium of a Loomis game if and only if i i −i i p0 ∈ arg min max max h (s , p ) − h (p) . (5.12) p∈∆

i=1,...,N si ∈S i

This is a particular instance of the MRF problem.

5.3.3

Absorbing Games

This subclass of stochastic games has been introduced by Kohlberg (1974). The following formulas are established in Laraki (2010). It shows that absorbing games could be reduced to Loomis games. An N-player ﬁnite absorbing game is deﬁned as follows. As above, there are N ﬁnite sets (S 1 , ..., S N ). There are two functions g i : S → R and f i : S → R associated to each player i ∈ {1, ..., N} and a probability transition function q : S → [0, 1]. The game is played in discrete time as follows. At each stage t = 1, 2, ..., if the game has not reached an absorbed state before that stage, each player i chooses (simultaneously) at random an action sit ∈ S i . If the proﬁle st = (s1t , ..., sN t ) is chosen, then: i (i) the payoﬀ of player i is g (st ) at stage t. (ii) with probability 1 − q(st ) the game is terminated (absorption has occurred) and each player i gets at every stage s > t the payoﬀ f i (st ), and

Generalized Polynomial Games

73

(iii) with probability q(st ) the game continues (the situation is repeated at stage t+1). Consider the λ-discounted game G (λ) (0 < λ < 1). If the payoﬀ of player i at stage t P t−1 i is r (t) then its λ-discounted payoﬀ in the game is ∞ r (t). Hence, a player t=1 λ(1 − λ) is optimizing his expected λ-discounted payoﬀ. i

Let e g i = g i × q and fei = f i × (1 − q) and extend gei , fei and q multilinearly to ∆ (as above in Nash and Loomis games). A proﬁle p ∈ ∆ is a stationary equilibrium of the absorbing game if (1) each player i plays the i.i.d. mixed strategy pi at each stage t until the game reaches an absorbing state and (2) this is optimal for him in the discounted absorbing game if the other players do not deviate.

It may be shown that a proﬁle p0 ∈ ∆ is a stationary equilibrium of the absorbing game if and only if it is a Nash equilibrium of the Loomis game with payoﬀ functions i (p)+(1−λ)fei (p) , i = 1, ..., N. It may also be shown that the min-max of a discounted p → λegλq(p)+(1−q(p)) absorbing game satisﬁes: λe g i (si , p−i ) + (1 − λ)fei (si , p−i ) . v i = −imin−i max p ∈∆ si ∈S i λq(si , p−i ) + (1 − q(si , p−i )) Hence solving a discounted absorbing game is equivalent to solving a Loomis game.

5.4

Generalized Polynomial Games

In this section, the generalized moment problem is extended to two-player polynomial games. This model was proposed by Dresher, Karlin and Shapley (1950) in the singlevariate case and we extended it here to the multi-variate case. Let K1 ⊂ Rn1 and K2 ⊂ Rn2 be two basic and closed semi-algebraic sets: K1 := {x ∈ Rn1 : gj (x) ≥ 0,

j = 1, . . . , m1 }

(5.13)

K2 := {x ∈ Rn2 : hk (x) ≥ 0,

k = 1, . . . , m2 }

(5.14)

for some polynomials (gj ) ⊂ R[x1 , . . . xn1 ] and (hk ) ⊂ R[x1 , . . . xn2 ]. Let ∆(Ki ) be the set of Borel probability measures on Ki , i = 1, 2, and consider the following min-max problem: Z Z ∗ P : J = min max p(x, z) dµ(x) dν(z) (5.15) µ∈∆(K1 ) ν∈∆(K2 )

for some polynomial p ∈ R[x, z].

K2

K1

74

Chapter 5. Developing Algorithms If K1 and K2 are compact, it is well-known from Sion’s (1958) minmax theorem that Z Z ∗ J = min max p(x, z) dµ(x) dν(z) (5.16) µ∈∆(K1 ) ν∈∆(K2 ) K2 K1 Z Z = max min p(x, z) dν(z) dµ(x), ν∈∆(K2 ) µ∈∆(K1 )

K1

K2

Let µ∗ and ν ∗ be some optimal strategies of players 1 and 2 respectively. For the single-variate case n = 1, Parrilo (2006) showed that J ∗ is the optimal value of a single semideﬁnite program. We provide below an extension to the multi-variate case. The price to pay for this extension is that we need to replace a single semideﬁnite program with a hierarchy of semideﬁnite programs of increasing size. But, contrary to the polynomial optimization case as described of the GMP, proving convergence of this hierarchy is more delicate because one has (simultaneously in the same SDP) moment matrices of increasing size and an s.o.s.-representation of some polynomial in Putinar’s form (5.2) with increasing degree bounds for the s.o.s. weights. In particular, the convergence is not monotone anymore. When we do n = 1 in this extension, one retrieves the original result of Parrilo (2006). With p ∈ R[x, z] as in (5.7), write p(x, z) =

X

pα (x) z α

with

(5.17)

α∈Nn2

pα (x) =

X

pαβ xβ ,

|α| ≤ dz

β∈Nn1

where dz is the total degree of p when regarded as a polynomial in R[z]. So, let pαβ := 0 for every β ∈ Nn1 whenever |α| > dz . Let rj := ⌈deg gj /2⌉, for every j = 1, . . . , m1 , and consider the following semideﬁnite program:  min λ   y,λ,Z k    m2  X X  0   pαβ yβ = hZ , Bα i + hZ k , Bαhk i, s.t. λ Iα=0 −     k=1 β∈Nn1                   

Md (y) 0 Md−rj (gj , y) 0, y0 = 1 Z k 0,

|α| ≤ 2d (5.18)

j = 1, . . . , m1

k = 0, 1, . . . m2 ,

where y is a ﬁnite sequence indexed in the canonical basis (xα ) of R[x]2d . Denote by λ∗d

Generalized Polynomial Games

75

its optimal value. In fact, with h0 ≡ 1 and py ∈ R[z] deﬁned by: z 7→ py (z) :=

X

α∈Nn2

X

pαβ yβ

β∈Nn1

!

zα,

the semideﬁnite program (5.18) has the equivalent formulation:  min λ    y,λ,σk   m2  X    s.t. λ − py (·) = σk hk     k=0                   

Md (y) 0 Md−rj (gj , y) 0, y0 = 1

(5.19)

(5.20)

j = 1, . . . , m1

σk ∈ Σ[z] : deg σk + deg hk ≤ 2d,

k = 0, 1, . . . , m2 ,

where the ﬁrst constraint should be understood as an equality of polynomials. Observe that for any admissible solution (y, λ) and py as in (5.19), (5.21)

λ ≥ max{py (z) : z ∈ K2 }. z

Similarly, with p as in (5.7), write p(x, z) =

X

pˆα (z) xα

with

(5.22)

α∈Nn1

pˆα (z) =

X

pˆαβ z β ,

|α| ≤ dx

β∈Nn2

where dx is the total degree of p when seen as polynomial in R[x]. So, let pˆαβ := 0 for every β ∈ Nn2 whenever |α| > dx .

Let lk := ⌈deg hk /2⌉, for every k = 1, . . . , m2 , and with x 7→ pˆy (x) :=

X

α∈Nn1

X

pˆαβ yβ

β∈Nn2

consider the following semideﬁnite program (with g0 ≡ 1):

!

xα ,

(5.23)

76

Chapter 5. Developing Algorithms  max γ   y,γ,σj   m1   X    s.t. p ˆ (·) − γ = σj gj y     j=0                   

Md (y) 0 Md−lk (hk , y) 0, y0 = 1

(5.24)

k = 1, . . . , m2

σj ∈ Σ[x] : deg σj + deg gj ≤ 2d,

j = 0, 1, . . . , m1 ,

where y is a ﬁnite sequence indexed in the canonical basis (z α ) of R[z]2d . Denote by γd∗ its optimal value. In fact, (5.24) is the dual of the semideﬁnite program (5.18). Observe that for any admissible solution (y, γ) and pˆy as in (5.23), γ ≤ min{ˆ py (x) : x ∈ K1 }. x

(5.25)

Theorem 5.8 Let P be the min-max problem defined in (5.7) and assume that K1 and K2 satisfy Putinar’s property. Let λ∗d and γd∗ be the optimal values of the semidefinite program (5.20) and (5.24) respectively. Then λ∗d → J ∗ and γd∗ → J ∗ as d → ∞. We also have a rank condition to detect whether ﬁnite convergence has occurred. Theorem 5.9 Let P be the min-max problem defined in (5.7). Let λ∗d be the optimal value of the semidefinite program (5.20), and suppose that with r := maxj=1,...,m1 rj , the condition rank Md−r (y) = rank Md (y) (=: s1 )

(5.26)

holds at an optimal solution (y, λ, σk ) of (5.20). Let γt∗ be the optimal value of the semidefinite program (5.24), and suppose that with r := maxk=1,...,m2 lk , the condition rank Mt−r (y′) = rank Mt (y′ ) (=: s2 )

(5.27)

holds at an optimal solution (y′ , γ, σj ) of (5.24). If λ∗d = γt∗ then λ∗g = γt∗ = J ∗ then an optimal strategy for player 1 (resp. player 2) is a probability measure µ∗ (resp. ν ∗ ) supported on s1 points of K1 (resp. s2 points of K2 ).

Bibliography [1] Borgs, C., J. Chayes, N. Immorlica, A. T. Kalai, V. Mirrokni and C. Papadimitriou (2009). The Mith of the Folk Theorem. To appear in Games and Economic Behavior

BIBLIOGRAPHY

77

[2] Brighi B. and M. Chipot (1994). Approximated Convex Envelope of a Function, SIAM J. Num. Anal. 31, 128–148. [3] G. Choquet (1969). Integration and Topological Vector Spaces, Lectures on Analysis, Volume 1, W. A. Benjamin, Inc.. [4] Daskalakis C., P.W. Goldberg and C.H. Papadimitriou (2009). The Complexity of Computing a Nash Equilibrium. To appear in SIAM J. Comput. [5] Chen X. and X. Deng (2009). Settling the Complexity of Two-Player Nash Equilibrium. To appear in J. ACM. [6] Dresher M., S. Karlin and L. S. Shapley (1950). Polynomial Games, in Contributions to the Theory of Games, Annals of Mathematics Studies 24, Princeton University Press, pp. 161-180. [7] Glicksberg I. (1952). A Further Generalization of the Kakutani Fixed Point Theorem with Applications to Nash Equilibrium Points. Proc. Amer. Math. Soc. 3, 170-174. [8] Herings P.J.-J. and R. Peeters (2009). Homotopy Methods to Compute Equilibria in Game Theory. To appear in Economic Theory. [9] Kohlberg E. (1974). Repeated Games with Absorbing States, Ann. Stat. 2, 724-738. [10] Laraki R. and J.-B. Lasserre (2008). Computing Uniform Convex Approximations for Convex Envelopes and Convex Hull. Journal of Convex Analysis, 3, 635-654. [11] Laraki R. and J.-B. Lasserre (2010). Semideﬁnite Programming for min-max problems and Games. Mathematical Programming A. Published online. [12] Laraki R. (2010). Explicit Formulas for Repeated Games with Absorbing States. International Journal of Game Theory, 39, 53-69. [13] Lasserre J.B. (2001). Global Optimization with Polynomials and the Problem of Moments. SIAM J. Optim. 11, 796-817. [14] Lasserre J.B. (2008). A Semideﬁnite Programming Approach to the Generalized Problem of Moments. Math. Program. B 112, 65-92. [15] Lasserre J.B. (2009). Moments and Sums of Squares for Polynomial Optimization and Related Problems. J. Global Optim. 45, 39-61. [16] Lasserre J.B. (2010). Moments, Positive Polynomials and Their Applications, Imperial College Press, London. [17] Lemke C., E. and J.T. Howson (1964). Equilibrium Points ofBimatrix Games. J. SIAM, 12, 413-423.

78

Chapter 5. Developing Algorithms

[18] Loomis L.H. (1946). On a Theorem of von Neumann. Proc. Nat. Acad. Sci. 32, 213-215. [19] Nash J.F. (1950). Equilibrium Points in N-Person Games, Proc. Nat. Acad. Sci., 36, 48-49. [20] Parrilo P. A. (2006). Polynomial Games and Sum of Squares Optimization. Proceedings of the 45 th IEEE Conference on Decision and Control, 2855-2860. [21] Parrilo P. and B. Sturmfels. (2003). Minimizing Polynomial Functions, Proceedings of the DIMACS Workshop on Algorithmic and Quantitative Aspects of Real Algebraic Geometry in Mathematics and Computer Science (March 2001), (eds. S. Basu and L. Gonzalez-Vega), American Mathematical Society, pp. 83–100. [22] Putinar M. (1993). Positive Polynomials on Compact Semi-Algebraic Sets, Ind. Univ. Math. J. 42, 969–984. [23] Rockafellar R.T. (1970). Convex Analysis, Princeton University Press, Princeton, New Jersey. [24] Savani R. and B. von Stengel (2006). Hard-to-Solve Bimatrix Games. Econometrica, 74, 397-429. [25] Sion, M. (1958) On General Minmax Theorems. Pacific Journal of Mathematics, 8, 171-176.

Chapter 6 Strategic Rationality: New Concepts Some games admit too many Nash equilibria and one would like some stability criteria to select among them. For instance, many game theorists have suggested that only the equilibria that survive a process of iterated elimination of weakly dominated strategies are to be considered. Others have suggested to consider stability robustness against deviations by coalitions rather than deviations by single players. On the other hand, games such as undiscounted stochastic games, Bertrand competition or timing games are discontinuous and fail to admit a Nash equilibrium. In that case, one looks for approximate equilibria by allowing ǫ-optimality rather than exact optimality. Thus three questions arise naturally: (1) when does a discontinuous game admit an approximate equilibrium? (2) what is the maximal set of strategies that may be iteratively eliminated in a game without contradicting the basic strategic rationality requirements? (3) how to extend the notions of Nash equilibrium and strong equilibrium and under which conditions the new equilibrium concept do exist? This chapter proposes to answer these questions. The ﬁrst section investigates some relaxed equilibrium notions, their existence in discontinuous games and the relation with the existing literature. In the second section, I deﬁne a tight procedure of iterated elimination of dominated strategies in any ﬁnite normal form game that preserves at least one backward-induction equilibrium of any underlying extensive form game. In the last section, I introduce the concept of coalitional equilibrium and provide conditions for its existence. This chapter discusses work in progress.1 The ﬁrst section is based on the paper by Bich and Laraki (2011), the second on Laraki (2011), and the third on Laraki (2009).

6.1

Relaxed Equilibria in Discontinuous Games

Nash equilibrium is perhaps the most important solution concept in game theory. It has been applied in a large variety of domains, including economics, political science, 1

Working paper versions are available on demand.

79

80

Chapter 6. Strategic Rationality: New Concepts

computer science, and biology. For its existence, Glicksberg’s theorem requires the payoﬀ functions to be continuous. However, though in many applications (auctions, pre-emption games, war-of-attrition, or symmetric Bertrand competition games) payoﬀs are discontinuous, a mixed or pure Nash equilibrium still does exist. This led Reny (1999) to investigate conditions for a discontinuous game to admit a Nash equilibrium. His seminal paper has inspired a large and growing literature: Barelli and Soza (2009), Bich (2010), Carmona (2009, 2010), McLennan et al.. (2009), Reny (2009, 2010). On the other hand, there are many classes of games in which a Nash equilibrium fails to exist but an approximate equilibrium (i.e. a limit point of ǫ-equilibria as ǫ > 0 goes to zero) does exist. This is the case for two-player stochastic games (Vieille (2000a, b)) and for asymmetric Bertrand competition games in pure strategies. However, only very few papers provide interesting conditions for the existence of an approximate equilibrium (see for example Prokopovych (2008)). More importantly still, even approximate equilibria fail to exist in many usual classes of games, such as three-player timing games (Laraki, Solan and Vieille (2005), see Chapter 3) and two-player Hotelling. In fact, this existence question is an open problem for three-player undiscounted stochastic games (a well known specialist conjectures its nonexistence). Consequently, it is desirable to deﬁne relaxed concepts of equilibrium which exists in any discontinuous game. The main result the section is the existence of a quasiequilibrium for every game with compact and convex strategy sets, without any assumption on the payoﬀs. Roughly speaking, if the payoﬀs are quasiconcave, then there is a strategy proﬁle x together with a payoﬀ vector limit u such that no player i can deviate to obtain a payoﬀ strictly above ui , even if the other players slightly perturb their strategy. Moreover, it is shown that almost all existing results in the literature may be derived easily from our general existence result. The main idea is illustrated by three examples below. Example 1. Consider a one-player game where X = [0, 1] and u : [0, 1] → R is deﬁned as follows: u(x) = 2x for every x ∈ [0, 1[, u(1) = 0. Our relaxed equilibrium concept will provide the output (x = 1, u = 2) which is interpreted as follows: a rational player will play some strategy close to x = 1 and at equilibrium will obtain a payoﬀ close to u = 2. Example 2. In a Bertrand duopoly with two ﬁrms i = 1, 2 choosing prices pi ∈ [0, a] 1 (a > 0), with a linear demand a − min(p1 , p2 ) and marginal costs c1 < c2 < a+c , 2 there is no pure Nash equilibria (assuming that the ﬁrm that charges the lower price supplies the entire market). Our relaxed equilibrium concept will provide the output (p1 = c2 , p2 = c2 , u1 = (a − c2 )(c2 − c1 ), u2 = 0). This corresponds to the situation where ﬁrm 1 obtains all the market, choosing p1 slightly below c2 , with a payoﬀ close to (a − c2 )(c2 − c1 ), and ﬁrm 2 choosing p2 = c2 with a zero payoﬀ. This may be related to two other methods which have been proposed to circumvent the non-existence of Nash equilibrium in this game: the ﬁrst one consists in assuming that there is a minimal mon-

Relaxed Equilibria in Discontinuous Games

81

etary unit δ > 0 (so that the strategy sets are ﬁnite); then the strategy proﬁle (c2 − δ, c2 ) is a Nash equilibrium of this discretized game for δ > 0 small enough, associated to a payoﬀ vector ((c2 − δ − c1 )(a − c2 + δ), 0). The second one consists in changing the sharing rule when p1 = p2 , by assuming that ﬁrm 2 does not produce and that ﬁrm 1 gets all the demand: in this case, (c2 , c2 ) is a Nash equilibrium of the modiﬁed game with the associated payoﬀ vector ((a − c2 )(c2 − c1 ), 0). Example 3. Consider the following location game—the California-Oregon psychologists game—from Simon and Zame (1990). Here, the length [0, 4] represents an interstate highway. The strategy space of player 1, the psychologist from California, is X = [0, 3] (which represents the Californian highway stretch), and the strategy space of player 2 (the psychologist from Oregon) is Y = [3, 4] (the Oregon part of the highway). The payoﬀ function of player 1 is u1 (x, y) = x+y if x < y and u1 (3, 3) = 2, and the payoﬀ function of 2 player 2 is u2 (x, y) = 4 − u1 (x, y). This game is discontinuous, and has no Nash equilibrium, but (3, 3) is an approximate equilibrium. Simon and Zame have shown how one can restore the existence of Nash equilibrium by a new (endogeneous) sharing rule: applied to this example, it says that if one modiﬁes the payoﬀs at (3, 3), giving 3 to player 1 and 1 to player 2 (instead of 2), (3, 3) is a Nash equilibrium. For this game, our relaxed equilibrium concept will provide exactly the equilibrium of Simon and Zame. Moreover, as far as we know, no general existence theorem for an approximate equilibrium can be applied to such a game (for example, Prokopovytch (2008) cannot be applied, because this game is not payoﬀ secure). As an application of our main existence result, we provide a suﬃcient condition for the existence of an approximate equilibrium that can be applied to the third example.

6.1.1

Model

We consider games in strategic form with a ﬁnite set of players N (which denotes also the cardinality of N). The pure strategy set Xi of each player i ∈ N is a non-empty, compact and convex subset of a (not necessarily Hausdorﬀ) topological vector space. Each player Q i ∈ N has a bounded payoﬀ function ui : X = i∈N Xi → R. A game G is a couple G = ((Xi )i∈N , (ui )i∈N ). A game G satisfying the above assumptions is called a compact game. For every x ∈ X and every i ∈ N, we let x−i = (xj )j6=i and X−i = Πj6=i Xj . For every x (resp. xi , x−i ) in X (resp. Xi , X−i ), let V(x) (resp. V(xi ), V(x−i )) and K(x) (resp. K(xi ), K(x−i )) denote the set of neighborhoods of x (resp. of xi , x−i ) and the set of compact neighborhoods of x (resp. xi , x−i ). The game G is quasiconcave if for every player i ∈ N and every strategy x−i ∈ X−i , the mapping ui (., x−i ), deﬁned on Xi , is quasiconcave. The game G is continuous if for every player i, ui is a continuous mapping, where X is endowed with the product

82

Chapter 6. Strategic Rationality: New Concepts

topology. For every game G, Γ denotes the closure of the graph of the players’ payoﬀs: Γ = {(x, u(x)), x ∈ X}. Note that Γ is a compact subset of X × IRN when G is compact. For every x ∈ X and every player i, deﬁne ui (xi , x−i ) = lim inf ui (xi , x′−i ) = ′ x−i →x−i

sup

inf ui (xi , x′−i ),

′ V ∈V(x−i ) x−i ∈V

which is the lower semicontinuous regularization of ui with respect to x−i . Finally, for every i = 1, ..., N and every x ∈ X, deﬁne the quasi-concave envelop of ui (., x−i ) with respect to xi by co(ui )(x) = sup{min{ui (yk , x−i )}nk=1}, where the supremum is taken over all n ∈ N∗ and all families {y1, ..., yn } of Xi such that x ∈ co{y1 , ..., yn }. Hence, the game G is quasiconcave if and only if co(ui ) = ui for every i ∈ N.

6.1.2

Reny Equilibria

The following deﬁnition can be easily derived from Reny (1999) by taking the contrapositive of the better-reply security property. Definition 6.1 A Reny equilibrium of a game G = ((Xi )i∈N , (ui )i∈N ) is a pair (x, u) ∈ Γ, defined by ∀i ∈ N, sup ui (di , x−i ) ≤ ui . di ∈Xi

This concept has not been introduced explicitly before. We believe it is important because it is non vacuous in all compact and quasiconcave games, it permits a deeper understanding of Reny’s results, and provides an interesting outcome even for games with no Nash equilibria. Recall, from Reny (1999), that player i can secure a payoﬀ strictly above ui ∈ R at x ∈ X if there exists di ∈ Xi such that ui (di , x−i ) > ui (or equivalently, there exists ǫ > 0 and an open neighborhood Vx of x such that ui (di , x′−i ) > ui + ǫ for every x′ ∈ Vx ). Thus, (x, u) ∈ Γ is a Reny equilibrium if no player can secure a payoﬀ strictly above ui . A fundamental observation is that such outcomes always exist. Theorem 6.2 For every quasiconcave and compact game, there exists a Reny equilibrium, and the set of Reny equilibria is closed. Proof. We ﬁrst give an elementary proof, which relies on Reny’s results stating that any better-reply secure game admits a Nash equilibrium. Recall that a game G is better-reply secure if for every strategy proﬁle x = (xi )i∈N ∈ X which is not a Nash

Relaxed Equilibria in Discontinuous Games

83

equilibrium, some player can secure a payoﬀ strictly above ui . Thus, G is better-reply secure if and only if for every Reny equilibrium (x, u), x is a Nash equilibrium. Now, to prove Theorem 6.2, remark that if G admits a Nash equilibrium x, then (x, u(x)) is a Reny equilibrium. Conversely, if there is no Nash equilibrium, then from Reny’s theorem (every compact, quasiconcave and better-reply secure game admits a Nash equilibrium), G cannot be better-reply secure: this yields formally the existence of a Reny equilibrium (x, u) ∈ Γ. In the paper, a short proof of Theorem 6.2 is provided without making use of Reny’s existence result. Our proof is almost standard, in the sense that it uses principally the Browder-Fan’s ﬁxed point theorem and a new selection lemma. As a matter of fact, this proof yields the existence of a quasi equilibrium which is a strict reﬁnement of a Reny equilibrium. Remarks: 1. If x is a Nash equilibrium, then (x, u(x)) is a Reny equilibrium. 2. If x is an approximate equilibrium (i.e. a limit point of ǫ-equilibria), then (x, u) is a Reny equilibrium for some u ∈ RN . Indeed, take (xn ), a sequence of n1 -equilibria converging to x. Because of the compactness of Γ, one can suppose, without loss of generality, that (xn , u(xn )) converges to some (x, u) ∈ Γ (by considering a subsequence of the sequence). Taking the limit of ui (xn ) ≥ ui (di , xn−i )− n1 ≥ ui (di , xn−i )− n1 , one obtains that (x, u) is a Reny equilibrium. 3. If G is better-reply secure, then for every Reny equilibrium (x, u), x is a Nash equilibrium. 4. For a one-player game (X1 , u1), (x, u) ∈ X × R is a Reny equilibrium if and only if there exists a sequence (xn ) in X converging to x such that u1 (xn ) converges to u = supx∈X1 u1 (x). 5. Example 2 (Bertrand duopoly): the only Reny equilibrium is (p1 = c2 , p2 = c2 , u1 = (a − c2 )(c2 − c1 ), u2 = 0). This game is not better-reply secure (it admits no Nash equilibrium), but (c2 , c2 ) is the only approximate equilibrium. 6. Example 3 (California-Oregon psychologists): the only Reny equilibrium is (x = 3, y = 3, u1 = 3, u2 = 0). As in the previous example, this game is not better-reply secure (no Nash equilibrium), and (3, 3) is the only approximate equilibrium. The above examples show that the notion of Reny equilibrium encompasses several situations depending on whether there exists a Nash or an approximate equilibrium (i.e. an ǫ−equilibrium for every ǫ > 0 small enough). Moreover, by analogy with the class of better-reply secure games, it is natural to deﬁne a class of games for which each Reny equilibrium yields an approximate equilibrium.

84

Chapter 6. Strategic Rationality: New Concepts

Definition 6.3 A game G = ((Xi )i∈N , (ui )i∈N ) is weakly better-reply secure if for every strategy profile x = (xi )i∈N ∈ X which is not an approximate Nash equilibrium, and for every (x, u) ∈ Γ, there exists a player i such that sup ui (di , x−i ) > ui . di ∈Xi

Theorem 6.4 Every compact, quasiconcave and weakly better-reply secure game G admits an approximate Nash equilibrium. Proof. Consider a Reny equilibrium (x, u) ∈ Γ of G. Since G is weakly better-reply secure, x is an approximate Nash equilibrium. This result encompasses Prokopovytch’s existence result which we will state now. Corollary 6.5 (Prokopovytch 2008) Let G be a compact and quasiconcave game admitting a strong equilibrium (¯ x, u¯) such that: (i) G is payoff secure at x¯, i.e. for every ε > 0, every player i, there exists di ∈ Xi and Vx¯−i , a neighborhood of x¯−i , such that ui (di , x ¯′−i ) ≥ ui (¯ x) − ε for every x′−i ∈ Vx¯−i . (ii) For every i ∈ N, supdi ∈Xi ui (di , x ¯−i ) is a continuous function at x¯−i . Then G admits an approximate equilibrium. We now provide an example of a two-player timing game in mixed-strategies which is weakly better-reply secure, not better-reply secure, and does not satisfy the assumptions of Prokopovytch (2008). Example 4 Consider the following discounted zero-sum timing game with two players: each player’s strategy set is [0, 1]. Let the payoﬀ function of player 1 be u1 (x, y) = e− min{x,y} for every y 6= x and u1 (x, x) = 0 and let G be the mixed extension of this game. To see that this game is not better-reply secure, take the sequence of mixed strategies (σ1n , σ2n ), where σ1n is uniform on [0, n1 ] and σ2n = 0. Then (σ1n , σ2n , u1 (σ1n , σ2n ), u2(σ1n , σ2n )) tends to (0, 0, 1, −1) which is a Reny equilibrium, but (0, 0) is not a Nash equilibrium. To prove that G is weakly better-reply secure: suppose (σ1 , σ2 , u1 , u2 ) ∈ Γ and (σ1 , σ2 ) is not an approximate equilibrium: • If u1 < 1, then supdi ∈∆i u1 (d1 , σ2 ) = 1 > u1 (for example, take d1 which is uniform on [0, 1].) • If u1 = 1, then u2 = −1 and σ1 = 0, but then (σ1 , σ1 ) is an approximate equilibrium (considering the sequence σ1n uniform on [0, n1 ] and σ2n = σ2 ), which is a contradiction. To conclude, observe that this game is not payoﬀ secure: for example, at x = (0, 0), player 2 cannot secure a payoﬀ above −ǫ for every ǫ > 0.

Relaxed Equilibria in Discontinuous Games

6.1.3

85

Quasi equilibrium

We now reﬁne the notion of a Reny equilibrium, in order to encompass several recent Nash existence results for discontinuous games and to get rid of the quasiconcavity assumption. First of all, let us extend a little bit the regularization procedure of ui used previously. In the following, for every x ∈ X, let D(x) be the set of closed-graph multi-valued mappings Φ with non-empty values, from some neighborhood VxΦ of x to X, let Φi be the projection of Φ on Xi (∀i ∈ N), and let Gr(Φi ) = {(x, di ) ∈ VxΦ × Xi | di ∈ Φi (x)} be the graph of Φi . For every x¯ ∈ X, Φ ∈ D(x), x ∈ Vx¯Φ and di ∈ Φi (x), deﬁne uΦ i (di , x−i ) as the lower 2 semicontinuous regularization, at (di , x−i ) of the restriction of ui to the closure of the set {(d′i , x′−i ) : (x′ , d′i ) ∈ Gr(Φi )}, and coΦi (x) (ui )(x) be the quasi-concave envelop of the restriction of ui (., x−i ) to Φi (x) ⊂ Xi . Note that if d ∈ X and Φ(x) = d for every x in some neighborhood of x¯, then uΦ i (di , x−i ) = ui (di , x−i ) is the lower semicontinuous regularization of ui with respect to x−i . The next theorem, which can be applied to any compact game, reﬁnes the existence of a Reny equilibrium. Theorem 6.6 For every compact game, there exists a quasi equilibrium (x, u) ∈ Γ, defined as follows: for every neighborhood V × U of (x, u) ∈ X × RN , for every multivalued mapping Φ from V to RN , there exists (z, u(z)) ∈ V × U such that for every player i, ′ Φi (z) there exists (x′ , di ) ∈ Gr(Φi ) with uΦ (ui )(z). i (di , x−i ) ≤ co If G is quasiconcave, one obtains (taking constant mappings Φ in the above theorem): Corollary 6.7 For every quasiconcave and compact game, there exists a srong Reny equilibrium (x, u) ∈ Γ defined as follows: for every d ∈ X, for every neighborhood V × U of (x, u) ∈ X × RN , there exists (z, u(z)) ∈ V × U such that for every i, there exists x′ ∈ V such that ui (di , x′−i ) ≤ ui (z). Remark 6.8 If G is a compact and quasiconcave game, then the ”regularized” game N G = ((Xi )N i=1 , (ui )i=1 ) is also compact and quasiconcave. So, from Corollary 6.7, there exists (x, u) ∈ Γ such that for every d ∈ X, for every neighborhood V × U of (x, u) ∈ X × RN , there exists (z, u(z)) ∈ V × U such that for every i, there exists x′ ∈ V such that ui (di , x′−i ) ≤ ui (z), which improves Corollary 6.7.

6.1.4

Byproducts

Reny (2009) Reny (2009) recently improved his main existence result of (1999), stated the existence of a Nash equilibrium for the larger class of compact and quasiconcave games with the lower single-deviation property, but has not yet provided a proof for it. 2

More explicitly, uΦ i (di , x−i ) = supV ′ ∈V(x,di )

inf (x′ ,d′ )∈Gr(Φi )∩V ′ ui (d′i , x′−i ). i

86

Chapter 6. Strategic Rationality: New Concepts

N Definition 6.9 A game G = ((Xi )N i=1 , (ui )i=1 ) has the lower single-deviation property if whenever x = (x1 , ..., xN ) ∈ X is not a Nash equilibrium, there exists d ∈ X and a neighborhood V of x such that for every z ∈ V , there exists a player i, such that

∀x′ ∈ V, ui (di , x′−i ) > ui (z) Clearly, if G has the lower single-deviation property, then every strong Reny equilibrium is a Nash equilibrium. McLennan et al. (2009) In a diﬀerent direction, McLennan et al. (2009) extended Reny’s existence result for better-reply secure games, proving the existence of a Nash equilibrium for “restrictionally secure compact games” that can be deﬁned as follows:3 Definition 6.10 A game G is restrictionally secure if for every i ∈ N there exists χi a multi-valued mapping from X to Xi , such that: for every x such that [x] (the closure of {x}) does not contain any Nash equilibrium, there exists: for every i ∈ N, a finite family (Fi (k))k=1,...,K of closed sets whose union is a neighborhood of x, U a neighborhood of x, D a finite subset of Πi∈N χi (x) and (mi )i∈N ∈ RN such that the two following conditions are true: (1) ∀i ∈ N, ∀d ∈ D, ∀x′ ∈ Fk , ui (di (k), x′−i ) ≥ mi , (2) ∀z ∈ U, ∃i0 : mi0 > coχi (X) (ui0 )(z), where coχi (x) (ui0 )(z) is the quasiconcave envelope of the restriction of ui0 (., z−i ) to χi (x). To prove that there exists a Nash equilibrium for every game which is restrictionally secure, suppose G is restrictionally secure without any Nash equilibria. Let x ∈ X be a quasi equilibrium. Let (χi )N i=1 , (Fk )k=1,...,K and D satisfying conditions (1) and (2) ′ above. Deﬁne the closed graph deviation map Φ on ∪K k=1 Fk as follows: for every x ∈ ′ ′ Φ ′ ′ ∪K k=1 Fk , Φ(x ) = {d(k) ∈ D : x ∈ Fk }. Then condition (1) above implies ui (di , x−i ) ≥ mi for every (d′i , x′ ) ∈ Gr(Φi ) and for every i = 1, ..., N. Consequently, the quasi equilibrium condition yields a contradiction with (2). Barelli and Soza (2009) In a recent paper, Barelli and Soza (2009) push this last idea further: in their securization assumption, the secure deviation di can change “nicely” with respect to the other players perturbations (not necessarily in a ﬁnite set). They prove the existence of Nash equilibria for the class of generalized better-reply secure compact and quasi-concave games. 3

This definition is equivalent to McLennan et al. (2009).

Relaxed Equilibria in Discontinuous Games

87

Definition 6.11 A game G is generalized better-reply secure if whenever (x, u) ∈ Γ and x is not a Nash equilibrium, then there a is player i, a neighborhood U of x−i , Φ a closed graph multi-valued mapping from U to Xi with nonempty and convex values, and αi > ui , such that for every x′ ∈ Gr(Φi ), ui (x′ ) > αi . The following corollary of Theorem 6.6 clearly implies the existence of a Nash equilibrium in quasiconcave and generalized better-reply secure games. Thus, it proves that Barelli and Soza (2009) is true on every (potentially non Hausdorﬀ) topological vector space. Recall that for every strategy proﬁle x, every player i ∈ N, and every neighborhood U of x−i , the set WU (x) is the set of a closed graph multi-valued mapping from U to Xi , with non-empty and convex values, and such that x ∈ Gr(Φi ). Corollary 6.12 For every compact and quasiconcave game G, a strong Reny equilibrium (x, u) ∈ Γ satisfies, for every i ∈ N: ui ≥

sup

sup

U−i ∈V(x−i ) Φi ∈WU−i

inf

′ (x) x ∈Gr(Φi )

ui (x′ ) := uui (x)

Carmona (2010) Lastly, Carmona (2010) proposes a synthesis of several work above. He proves that compact, quasiconcave and weak better-reply security games in locally convex and metric topological vector spaces admits a Nash equilibrium, where the deﬁnition of weak betterreply secure games is given hereafter.4 Deﬁne, for every strategy proﬁle x, player i ∈ N and U, and a neighborhood of x−i , the set WU (x) as the set of a closed graph multi-valued mapping from U to Xi , with non-empty, and convex values, and such that x ∈ Gr(Φi ). For every i and x, deﬁne uui (x) =

sup

sup

inf

U ∈V(x−i ) Φi ∈WU (x) z∈Gr(Φi )

ui (z).

Definition 6.13 G is weakly* better-reply secure if whenever (x, u) ∈ Γ and ui ≥ sup uui (di , x−i ) di ∈Xi

for every i ∈ I, then x is a Nash equilibrium. Let (x, u) ∈ Γ be a generalized outcome. From corollary 6.12, we have ui ≥ supdi ∈Xi uui (di , x−i ) for every i ∈ I. Thus x is a Nash equilibrium if G is compact, quasiconcave and weakly* better-reply secure. 4

This is not the initial definition of weak better-reply security, but an equivalent one given in Carmona’s paper.

88

Chapter 6. Strategic Rationality: New Concepts

6.2

Robust Rationalizability

Basic decision theory postulates that a rational player will never choose a weakly-dominated strategy. This behavior may be rationalized by Selten’s (1975) trembling-hand argument where a player believes that the opponents can do mistakes or by the Harsanyi’s (1973) uncertainly argument according to which a player is never completely informed about the utilities of the other players. Those justiﬁcations are not completely satisfactory. The ﬁrst assumes the probabilities of mistakes to be known and the second assumes a common prior distribution according to which utilities are drawn (and then each player is informed only about his own utility function). It seems more natural to assume those probabilities unknown. If moreover a player has some uncertainty-aversion (Gilboa and Schmeilder 1989), a strategy is optimal for him only if it is a best-response against all possible small deviations around some mixed proﬁle of the opponents. This naturally leads to the concept of robust-best-responses. These are strategies that are best-responses against an open set of strategy proﬁles of the opponents. Luce and Raiﬀa (1957) were perhaps the ﬁrst to argue that one might restrict attention to Nash equilibria obtained by a process of repeated elimination of weakly-dominated strategies. A number of suggestion have been made on how to iterate deletions. For example, Gale (1953) suggested to eliminate at every round all weakly-dominated strategies of all the players. But, a reﬁnement concept must satisfy some minimal restrictions summarized by Kohlberg and Mertens (1986) as follows: A good concept of “strategically stable equilibrium” should satisfy both the backwards induction rationality of the extensive form and the iterated dominance rationality of the normal form, and at the same time be independent of irrelevant details in the description of the game. Backward induction means that at least one proper equilibrium5 is preserved by the reﬁnement concept. Independence of irrelevant details means that the solution of an extensive form game depends only on its normal form. Gale (1953) proved that his procedure applied to the normal form of a generic game with perfect information leads to the backward-induction strategy proﬁle (and so preserves the unique proper equilibrium of the game). Does this remain true when the game is of imperfect information? And if not, does there exist a well justiﬁed procedure that preserves at least one proper equilibrium in every strategic game? 5

An ǫ-proper equilibrium of a normal form game (Myerson (1978) is a completely mixed strategy profile, such that whenever some pure strategy si of player i is a worse reply than some other pure strategy ti , the weight on si , is smaller than ǫ times the weight on ti . A proper equilibrium of a normal form game is a limit point of ǫn -proper equilibria as ǫn → 0.

Robust Rationalizability

6.2.1

89

Robust-Best-Responses

An operator of individual rationality R associates to each ﬁnite game in strategic form G = (N, (Si )i∈N , (ui )i∈N ) and each player i ∈ N, a non-empty subset Ri,G of Si , where N is the set of players, Si is the set of pure strategies of player i and ui : S = S1 ×...×Sn → R his utility function. Ri,G is interpreted as the set of individually rational strategies of player i in G. The axioms will imply that Ri,G depends only on the utility function of player i. Two related approaches exist in the literature to reduce the set of strategies of a player given his own utility function. The ﬁrst introduces a dominance relation and a strategy is considered “irrational” if it is dominated. Three dominance relations have been introduced. Only the ﬁrst two are really popular. Definition 6.14 (von Neumann and Morgenstern (1944)) A pure strategy si ∈ Si is strictly payoff-dominated by a mixed strategy mi ∈ Mi if it always yields a strictly lower payoff against any strategy profile of the opponents. Formally, si ∈ Si is strictly payoﬀ-dominated by mi if ui (mi , s−i ) > ui (si , s−i ) for all s−i ∈ S−i where S−i = ×j6=i Sj and Mi = ∆(Si ) is the set of mixed strategies of i. Definition 6.15 (Gale (1953)) A pure strategy si ∈ Si is weakly payoff-dominated by a mixed strategy mi ∈ Mi if it always yields a lower payoff against any strategy profile of the opponents and sometimes a strictly lower payoff. Formally, si is weakly payoﬀ-dominated by mi if ui (mi , s−i ) ≥ ui (si , s−i ) for any s−i ∈ S−i , and there is s′−i ∈ S−i such that ui (mi , s′−i ) > ui (si , s′−i ). Let BRi,G denote the best-response correspondence of player i which associates to each m−i ∈ M−i = ×j6=i Mj , BRi,G (m−i ) = {si ∈ Si : ui (si , m−i ) = max ui (ti , m−i )}, ti ∈Si

P where ui (ti , m−i ) = s−i ∈S−i m−i (s−i )ui (ti , s−i ) is the expected payoﬀ of player i when the other players play the mixed strategy proﬁle m−i . The third and very natural dominance relation is: Definition 6.16 (Harsanyi (1976)) A pure strategy si ∈ Si is BR-dominated by a pure strategy ti ∈ Si if whenever si is a best-response against some mixed strategy profile of the opponents, so is ti , but the converse does not hold. −1 −1 Formally, si is BR-dominated by ti if BRi,G (si ) is strictly included in BRi,G (ti ). Thus, a strictly payoﬀ-dominated strategy is always weakly payoﬀ-dominated, and a weakly payoﬀ-dominated strategy is always BR-dominated. The second approach stems from the hypothesis that a Savage-rational player i chooses a strategy that maximizes his expected payoﬀ against some subjective belief m−i ∈ M−i . Again, three deﬁnitions exist and only the ﬁrst two are well known and are regularly discussed in the literature.

90

Chapter 6. Strategic Rationality: New Concepts

Definition 6.17 (Bernheim (1984), Pearce (1884)) A pure strategy si ∈ Si of player i is Savage-rationale if it is a best-response against some mixed strategy profile of the opponents. Formally, si is Savage-rationale if si ∈ BRi,G (m−i ) for some m−i ∈ M−i . Definition 6.18 (Pearce (1984)) A strategy si ∈ Si of player i is admissible if it is a best-response against some completely mixed strategy profile of the opponents. Formally, si is admissible if si ∈ BRi,G (m−i ) for some m−i ∈ intM−i where intM−i is the relative interior of M−i . The two approaches are related: Proposition 6.19 (Gale and Sherman (1953), Pearce (1984), van Damme (1991)) A strictly payoff-dominated strategy is not Savage-rationale and a weakly payoff-dominated strategy is not admissible. The converses hold in two-player games but not in three-player games6. The ﬁnal concept is: Definition 6.20 (Balkenborg (1992), Kalai and Samet (1984)) A pure strategy si ∈ Si is BR-robust if it is a best-response against an open set of mixed strategy profiles of the opponents. Observe that a BR-robust strategy is always admissible. Our ﬁrst result relates BRdominance to BR-robustness much in the spirit of proposition 6.19. Theorem 6.21 A robust-best-response strategy is not best-response-dominated. The converse holds in two-player games but not in three-player games. The second result axiomatizes robust-best-responses using three axioms. Definition 6.22 Two pure strategies si and ti are payoff-equivalent if ∀s−i ∈ S−i , ui (si , s−i ) = ui (ti , s−i ). −1 −1 Definition 6.23 Two pure strategies si and ti are BR-equivalent if BRi,G (si ) = BRi,G (ti ).

We would like the operator of individuality rationality R to satisfy some desirable properties. The ﬁrst axiom, invariance, requires that two similar strategies should be treated similarly. It implies that the solution depends only on the best-response correspondence. This is important in particular for rationalizability. Two BR-equivalent strategies of i may have diﬀerent consequences for player j and so it is important for j to still think that player i will keep them both. Conversely, many examples of strategicinduction will show that two equivalent strategies for i may have diﬀerent impact on the 6

Or equivalently, the converses hold in any multi-player game in which a player’s assessments allow the opponents to correlate their random moves.

Robust Rationalizability

91

self-enforcing behavior of the game. Thus, without knowing the utilities of the other players, a Savage-rational player must keep all or eliminate all his equivalent strategies. Invariance: If si and ti are BR-equivalent then si ∈ Ri,G if and only if ti ∈ Ri,G . The next axiom, preparation, requires that a player must be prepared to best-reply to any strategy proﬁle of the opponents. This is a Nash self-enforcing condition. It has been used by Kalai and Samet (1984) to deﬁne persistent-retracts and by Voorneveld (2004) to deﬁne preparation-sets. Preparation: For all m−i ∈ M−i , Ri,G ∩ BRi,G (m−i ) 6= ∅. The trivial operator (Ri,G = Si ) satisﬁes the two axioms and so some minimality must be required. Minimality : If a BR-equivalent class of strategies Ti ⊂ Ri,G is eliminated, the preparation of i is aﬀected (meaning that ∃m−i ∈ M−i such that {Ri,G \Ti } ∩ BRi,G = ∅). Definition 6.24 An operator of individual rationality is optimal if it satisfies invariance, preparation and minimality. Admissibility is not suﬃcient for optimality. Consider the following two-player game (in which only the payoﬀs of player 1 are indicated). L R T 1 0 M 12 12 B 0 1 Here, M is admissible, is BR-dominated by T , and is BR-dominated by B. In fact, + 12 R is the unique mixed strategy proﬁle of player 2 for which M is a best-response: eliminating the admissible strategy M does not aﬀect the preparation of player 1. 1 L 2

Theorem 6.25 An operator of individual rationality R is optimal if and only if Ri,G is the set of robust-best-response strategies of i in G.

6.2.2

Iterating Eliminations

An iterative procedure of reduction associates to each ﬁnite game G0 = (N, (Si0 )i∈N , (ui )i∈N ) a decreasing sequence of games Gk = (N, (Sik )i∈N , (ui )i∈N ), meaning that Sik is decreasing for each i ∈ N. A strengthening of rationalizability compatible with strategic stability must preserve a proper equilibrium of G0 . Let R∗ denote the optimal operator of individual rationality. The image of G by R∗ ∗ is the ﬁnite game R∗ (G) = (N, (Ri,G )i∈N , (ui )i∈N ). If the game and optimal individual rationality are common knowledge, then one would like to iterate this operator a ﬁnite number of time until no reduction is possible. If robust rationality is simply the consequence of the desire of each player to minimize his set of choices as much as he can

92

Chapter 6. Strategic Rationality: New Concepts

and is not the consequence of assuming that the players can make mistakes or that they have some uncertainty concerning utilities or concerning the rationality of the others, this rule of reduction imposes itself. It is the analog of the one proposed by Gale (1953) with weakly payoﬀ-dominated strategies. This procedure and Gale’s procedure do not preserve a proper-equilibrium as shown in the following game G0 : L C R T 2,2 1,0 0,-1 M 2,2 0,0 1,-1 B 2,2 -1,-1 -1,3 B and C are the unique BR-dominated strategies of G0 (and also the unique weakly payoﬀ-dominated strategies). Eliminating them yields to a game G1 where T and R are the unique BR-dominated strategies (and also the unique weakly payoﬀ-dominated strategies). The procedure converges to (M, L) while the unique proper equilibrium of G0 is (T, L). In fact the example proves more. This game corresponds to an extensive form game where player 2 has the outside option to stop the game (by playing L). The sub-game is solvable by repeated eliminations of strictly-dominated strategies and so it has a unique equilibrium (T, C). Consequently, (T, L) is the unique sub-game perfect equilibrium. Consequently, to be compatible with backward induction in extensive form game with imperfect information, we are somehow forced to follow the interpretation according to which Savage-rational players can make mistakes or have some small uncertainties concerning the utilities of the opponents or have some uncertainties concerning the rationality of the opponents. Moreover, if one assumes that the players have some uncertaintyaversion and these facts are common knowledge, we naturally obtain the following iterative procedure. Definition 6.26 A pure strategy si is BR-robust with respect to A−i ⊂ M−i if there exists a−i ∈ A−i and a sequence an−i in M−i converging to a−i such that the BR-equivalent class of si is the unique best-response to an−i for all n. k Let Mik = ∆(Sik ) and M−i =

Q

j6=i

Mjk .

Definition 6.27 The iterative procedure of reduction is defined for k = 0, 1, ..., as folk lows. Sik+1 ⊂ Sik is the set of robust strategies of player i in Sik with respect to M−i . In word, the procedure starts by eliminating the non BR-robust strategies leading to 1 strategy proﬁles in S 1 . Then, if a strategy ti ∈ Si1 is not BR-robust with respect to M−i , it is eliminated and so on. Importantly, the neighborhood at each round k is with respect 0 to M−i . Definition 6.28 An iterative procedure of reductions satisfies:

Robust Rationalizability

93

• Invariant: if at each round, any two BR-equivalent strategies are treated similarly; • Preparation: if for each k, and each player i, Sik+1 contains a best response to k any element in the neighborhood of M−i (where the neighborhood is with respect to 0 M−i ); • Minimality: if for each k, eliminating a BR-equivalent class from Sik+1 affects the k preparation of player i with respect to M−i . An iterative procedure is optimal if it satisfies the three axioms. It will be shown that the procedure in deﬁnition 6.27 is the unique optimal rule, meaning that it satisﬁes invariance, preparation and minimality. In fact, the procedure preserves all persistent-retracts. Definition 6.29 A retract R = ×i∈N Ri is a convex subset of M. It is absorbing if it contains a best-response to each profile in its neighborhood. It is persistent if it is minimal. A Nash equilibrium is persistent if it belongs to a persistent-retract. Persistent-retracts are the natural extension of robust-best-responses to the Nash equilibrium problem when the players minimize their set of choices but still want to best reply to each others in the neighborhood of the solution. Kalai and Samet (1984) proved that a persistent-retract exists and that the extreme points of any persistentretract contains at most one representative from each BR-equivalent class of robust-bestresponses. Definition 6.30 An iterative procedure of reductions is • Nash-consistent: if for each k, any equilibrium of Gk is an equilibrium of G0 = G; • Proper-consistent: if there is a proper equilibrium of G that survives to the process of eliminations; • Persistent-consistent: if all persistent-retracts of G are contained in M k = Q k i ∆(Si ) for each k.

Observe that the above example shows that a proper equilibrium of G0 may fail to be a proper equilibrium of G1 . Thus, properness may be asked only with respect to the “whole game” G0 .

Theorem 6.31 The iterative procedure of reductions in definition 6.27 is the unique optimal rule. Moreover, it is Nash, proper and persistent -consistent.

94

Chapter 6. Strategic Rationality: New Concepts

6.3

Coalitional Equilibria

Nash equilibrium (28) and strong equilibrium (2) are the main solution concepts in noncooperative game theory. The ﬁrst one asks for stability of the strategy proﬁle against all unilateral deviations by single players while the second one asks for stability against all unilateral deviations by coalitions. The usual conditions that are assumed to show existence seems of diﬀerent kind. For the existence of Nash equilibria, the game is supposed to be quasi-concave while for the existence of strong equilibria, it is assumed to be balanced. We propose to unify both notions in a single concept, and provide a condition of existence that is reduced to quasi-concavity in case of Nash equilibrium and is related to the standard balancedness condition in case of strong equilibrium. The concept of coalitional equilibrium lies between Nash equilibrium and strong equilibrium. An exogenous coalitional structure deﬁnes which coalitions are admissible to deviate. A strategy proﬁle is a coalitional equilibrium if it is stable against all unilateral deviations by admissible coalitions. When only single-player coalitions are permitted, it is a Nash equilibrium and when all coalitions are permitted, it is a strong equilibrium. The motivation is straightforward. In many applications (voting, council of Europe, or market competition) some coalitions are not natural and so cannot be expected to jointly deviate (extreme-leftists and -rightists, western and eastern countries, or ﬁrms of diﬀerent areas).

6.3.1

A Fixed Point Theorem

S is a compact convex set of a topological vector space (TVS). The interior of X ⊂ S relatively to S is denoted by intX, its convex envelope coX and the closed convex hull coX. Let A be a correspondence on S (i.e. from S to S), best viewed as a (not necessarily transitive) preference where A(x) ⊂ S is interpreted as the set of points in S strictly better than x (that is A(x) = {y ∈ X : y ≻ x} where ≻ is a preference relation). The set of maximal elements of A is E = {x ∈ S such that A(x) = ∅}. A famous implication of Fan’s lemma7 is the following important result, ﬁrst established in Sonnenschein (1971). For its proof and the numerous applications, see (1) and (9). In general, the theorem is stated with the Hausdorﬀ assumption. It can be shown that this requirement is not necessary. Theorem 6.32 (Sonnenschein (1971)) Let A be a correspondence on S. If (i) for all x ∈ S, x ∈ / coA(x) and (ii) for any y ∈ A−1 (x) there exists x′ ∈ S (possibly x′ = x) such that y ∈ intA−1 (x′ ), then the set of maximal elements of A is compact and non-empty. Assume now that the topological vector space is locally convex. Under this additional assumption, the ﬁxed point theorem that follows reinforces slightly the quasi-concavity 7

Fan’s lemma states that: if the correspondence F from S to S has closed values and if for any finite family {x1 , ..., xk }, co{x1 , ..., xk } ⊂ ∪i=1,...,k F (xi ) then ∩x∈S F (x) 6= ∅.

Coalitional Equilibria

95

assumption (i) and relaxes suﬃciently the continuity assumption (ii). This allows the possibility of considering lower-hemi-continuous correspondences. Theorem 6.33 Let A be a correspondence on S. If there is a convex open neighborhood W of zero such that (i’) x ∈ / coA(x) − W for all x and (ii’) for any y ∈ A−1 (x), there exists x′ ∈ S (possibly x′ = x) such that y ∈ intA−1 (x′ + W ), then the set E of maximal elements of A is compact and non-empty. Thus, the theorem provides new conditions for a nontransitive preference to have a maximal element. Its proof needs the following lemma. Lemma 6.34 If there is a convex open neighborhood W of zero such that for each x ∈ S, x ∈ / coA(x) − W , then ∩x∈S F (x) 6= ∅ where F (x) is the closure of the complement of −1 A (x + W ): F (x) = S\A−1 (x + W ). Proof. From Fan’s lemma, it is suﬃcient to show that for any ﬁnite family {x1 , ..., xk } and any y ∈ co{x1 , ..., xk }, y ∈ ∪i=1,...,k F (xi ). Suppose not. This implies that for any i, y ∈ / F (xi ) implying that xi ∈ A(y) − W . Since W is convex, we conclude that y ∈ coA(y) − W, a contradiction. We now prove the ﬁxed point theorem. Proof. By assumption, there is an open and convex neighborhood W of zero such that, for all x ∈ S, x + W ∩ coA(x) = ∅. Note that E = ∩x∈S S\A−1 (x + W ). By (ii’), E = ∩x′ ∈S S\intA−1 (x′ + W ), so that it is compact (as the intersection of a family of compact sets). Thus S\A−1 (x + W ) ⊂ S\intA−1 (x + W ). Hence, ∩x∈S S\A−1 (x + W ) ⊂ ∩x∈S S\intA−1 (x + W ). By (i’) and the last lemma, ∩x∈S S\A−1 (x + W ) 6= ∅. Let now the topological vector space be locally convex and Hausdorff. Theorem 6.33 allows us to deduce the following result. Corollary 6.35 If A is continuous and has non-empty values, then there is x ∈ S such that x ∈ coA(x). Proof. If there is a convex neighborhood of zero such that for all x ∈ S, x + W ∩ {coA(x)} = ∅, since A(x) is lower-hemi-continuous, theorem 6.33 implies that A has a maximal element: a contradiction. Thus, for each W , there exists xW ∈ S such that xW + W ∩ {coA(xW )} = 6 ∅. Since coA is upper-hemi-continuous when A is (see (1)), by compactness of S and by the Hausdorﬀ and local convexity assumptions, tending W to zero implies that there x ∈ coA(x). This provides a new short proof of Tychonoﬀ ﬁxed point theorem (1). Actually, if f is continuous from S to S, and if A(x) = {f (x)} then there is x ∈ co{A(x)} = {f (x)}.

96

6.3.2

Chapter 6. Strategic Rationality: New Concepts

Existence

Let N be a set of players (not necessarily ﬁnite). Let G = (N, {Si }i∈N , {gi }i∈N ) be a strategic game. Assume that for each i in N, Si is a compact subset of a Hausdorﬀ and locally convex TVS and that the payoﬀ function of player i, gi is continuous. This deﬁnes a compact-convex-continuous strategic game. Q Let C ⊂ 2N be the set of permissible coalitions. As usual, S = i∈N Si is the set of Q strategy proﬁles. For a coalition of players C, let SC = j∈C Sj and let N\C denote the set of players outside C. Definition 6.36 s is a C-coalitional-equilibrium of G if no permissible coalition in C has a unilateral deviation that is profitable to all its members; That is, there is no C in C and no tC ∈ SC such that for any i ∈ C, gi (tC , sN \C ) > gi (s). Definition 6.37 G is C-quasi-concave if for all s ∈ S, ǫ > 0 and any family of permissible coalitions (Ck )k∈K with corresponding strategies tCk ∈ SCk , if for all k and i ∈ Ck gi (tCk , sN \Ck ) ≥ gi (s) + ε, then s ∈ / co{(tCk , sN \Ck ), k ∈ K}. In ﬁnite dimensional strategy spaces, Caratheodory’s theorem implies that co above could be replaced by co and that ﬁnitely many deviating coalitions are suﬃcient. When only single player coalitions are permissible, the condition is reduced to the quasi-concavity assumption in Glicksberg’s (1952) theorem. When all coalitions are permissible, the quasi-concavity condition is related to the balanced-condition under which Ichiishi (1982) proved the existence of a strong equilibrium. Thus C-quasi-concavity may be viewed as a mixture of quasi-concavity and the balancedness conditions. Theorem 6.38 If a compact-convex-continuous strategic game is C-quasi-concave, the set of C-coalitional-equilibria is compact and non-empty. Proof. Let AC (s) = {(tC , sN \C ) such that for all i ∈ C, gi (tC , sN \C ) > gi (s) + ε} and S let A = C AC . From the continuity of the game, A is lower-hemi-continuous. Suppose that for each convex and open neighborhood W of zero there is sW such that sW + W ∈ coA(s). Thus there exists a ﬁnite family of permissible coalitions (CkW )k∈K and strategies W (tW Ck ∈ SCk )k∈K such that s + W ∈ co{(tCk , sN \Ck ), k ∈ K} and gi (tCk , sN \Ck ) > gi (s) + ε for all k and i ∈ Ck . Using compactness of the strategy space, Hausdorﬀ assumption and continuity of the payoﬀ functions, by tending W to zero one deduces the existence of s and of a family of permissible coalitions (Ck )k∈K and strategies (tCk ∈ SCk )k∈K such that s ∈ co{(tCk , sN \Ck ), k ∈ K} and gi (tCk , sN \Ck ) ≥ gi (s) + ε for all k and i ∈ Ck : a contradiction. Thus, (i’) and (b) are satisﬁed. Since (b) implies (ii’), Theorem 6.33 implies that A has a maximal element sε whose accumulation points are coalitionalequilibria (thanks to compactness of the strategy spaces and continuity of the payoﬀ functions).

BIBLIOGRAPHY

97

Bibliography [1] Aliprantis C.D., K. Border (2005). Infinite Dimensional Analysis. Springer, 3rd ed. [2] Aumann R. (1960). Acceptable Points in Games of Perfect Information, Pacific J. of Math., 10, 381-417. [3] Balkenborg D. (1992). The Properties of persistent-retracts and Related Concepts, Ph.D. thesis, Department of Economics, University of Bonn. [4] Balkenborg D., M. Jansen and D. Vermeulen (2001). Invariance Properties of Persistent Equilibria and Related Solution Concepts. Mathematical Social Sciences, 41, 111-130. [5] Barelli P., I. Soza (2009). On the existence of Nash equilibria in discontinuous and qualitative games . University of Rochester. [6] Bernheim B.D. (1984). Rationalizable Strategic Behavior. Econometrica, 52, 10071029. [7] Bich P. and R. Laraki (2011). Relaxed Equilibria in Discontinuous Games. Preprint. Available on demand. [8] Bich P. (2010). Existence of pure Nash equilibria in discontinuous and non quasiconcave games. International Journal of Game Theory, 38(3), 395-410. [9] Border, K. C. (1999). Fixed Point Theorems with Applications to Economics and Game Theory, Cambridge University Press, Reprinted. [10] Carmona G. (2009). On existence result for discontinuous games. Journal of Economic Theory, 144(3), 1333-1340 [11] Carmona G. (2010). Understanding some recent existence results for discontinuous games. Economic Theory, forthcoming. [12] Fan K. (1969). Extension of two ﬁxed point theorems of F. E. Browder. Math. Z. 112, 234-240. [13] Gale D. (1953). A Theory of n-Person Games with Perfect Information. Proc. Nat. Acad. Sci. USA, 39, 396-501. [14] Gale D. and S. Sherman (1950). Solutions of Finite Two-person Games. In Contributions to the Theory of Games I, ed. by H. Kuhn and A. Tucker. Princeton University Press. [15] Gilboa I. and D. Schmeidler (1989). Maxmin Expected Utility with a Non-Unique Prio. Journal of Mathematical Economics, 18, 141-53.

98

Chapter 6. Strategic Rationality: New Concepts

[16] Glicksberg I. (1952). A Further Generalization of the Kakutani Fixed Point Theorem with Applications to Nash Equilibrium Points, Proc. Amer. Math. Soc. 3, 170-174. [17] Harsanyi J.C. (1976). A Solution Concept for n-Person Noncooperative Games. International Journal of Game Theory, 5, 211-225. [18] Harsanyi J.C. (1973). Games with Randomly Distributed Payoﬀs: A New Rationale for Mixed Strategy Equilibrium Points. International Journal of Game Theory, 2, 1-23. [19] Ichiishi T. (1982). Non-Cooperation and Cooperation. in Games, Economic Dynamics and Time Series Analysis,edited by M. Deistler, E. Fürst, and G. Schwödiauer, Physica-Verlag, Vienna. [20] Kalai E. and D. Samet. (1984). Persistent Equilibria in Strategic Games. International Journal of Game Theory, 13, 129-144. [21] Kohlberg E. and Mertens J.-F. (1986). On the Strategic Stability of Equilibria. Econometrica, 54, 1003-1037. [22] Laraki R. (2011). Robust Rationalizability, Strategic Induction and Equilibrium Reﬁnement. Preprint. Available on demand. [23] Laraki R. (2009). Coalitional Equilibria of Strategic Games. Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique. A new version, in collaboration with P. Bich is available. [24] Laraki R., E. Solan and N. Vieille (2005). Continuous-time games of timing. Journal of Economic Theory, 120, 206-238. [25] Luce R. and H. Raiﬀa (1957). Games and Decisions. John Wiley and Sons, New York. [26] McLennan A., P.K. Monteiro and R. Tourky (2009). Games with discontinuous payoﬀs: a strengthening of Reny’s existence theorem, mimeo. [27] Myerson R.B. (1978). Reﬁnement of the Nash Equilibrium Concept. International Journal of Game Theory, 7, 73-80. [28] Nash, J. F. (1950). Equilibrium Points in N-Person Games, PNAS 36, 48-49. [29] Pearce D. (1984). Rationalizable Strategic Behavior and the Problem of Perfection. Econometrica, 52, 1029-1050. [30] Prokopovych P. (2008). On equilibrium existence in payoﬀ secure games. Economic Theory, forthcoming.

BIBLIOGRAPHY

99

[31] Reny P.J. (1999). On the Existence of Pure and Mixed Strategy Nash Equilibria in Discontinuous Games. Econometrica, 67(5), 1029-1056. [32] Reny P.J. (2009). Further results on the existence of Nash equilibria in discontinuous games. mimeo, University of Chicago. [33] Reny P.J. (2010). Strategic approximations of discontinuous games. Economic Theory, online publication. [34] Selten R. (1975). Re-examination of the Perfectness Concept for Equilibrium Points in Extensive Games. International Journal of Game Theory, 4, 25-55. [35] Simon L.K. and Zame W.R. (1990). Discontinuous Games and Endogenous Sharing Rules. Econometrica, 58, 861-872. [36] Sonnenschein H. (1971). Demand Theory without Transitive Preferences, with Applications to the Theory of Competitive Equilibrium, Preferences Utility, and Demand, ed. by J. S. Chipman, L. Hurwicz, M. K. Richter, and H. F. Sonnenschein. NY: Harcourt Brace Jovanovich. [37] van Damme E. (1991). Stability and Perfection of Nash Equilibria. Springer. [38] Vieille N. (2000a). Two-player stochastic games I: a reduction, Israel Journal of Mathematics, 119, 55-91. [39] Vieille N. (2000b). Two-player stochastic games II: the case of recursive games, Israel Journal of Mathematics, 119, 93-126. [40] von Neumann J. and O. Morgenstern (1944). Games and Economic Behavior. Princeton University Press. [41] Voorneveld, M. (2004). Preparation. Games Economics Behavior, 48, 403-414.

Chapter 7 Social Choice: New Model and Method Throughout the world the choice of one from among a set of candidates is accomplished by elections. Elections are mechanisms for amalgamating the wishes of individuals into a decision of society. Many have been proposed and used. Most rely on the idea that voters compare candidates—one is better than another—so have lists of “preferences” in their minds. These mechanisms include ﬁrst-past-the-post, Condorcet’s method, Borda’s method, convolutions of Condorcet’s and/or Borda’s methods, the single transferable vote, and approval voting (according to one of its interpretations). Electoral mechanisms are also used in a host of other circumstances where winners and orders-of-ﬁnish must be determined by a jury of judges, including ﬁgure skaters, divers, gymnasts, pianists, and wines. Invariably, as the great mathematician Laplace (1820) was the ﬁrst to propose two centuries ago, they ask voters (or judges) not to compare but to evaluate the competitors by assigning points from some range, points expressing an absolute measure of the competitors’ merits. Laplace suggested the range [0, R] for some arbitrary positive real number R, whereas practical systems usually ﬁx R at some positive integer. These mechanisms rank the candidates according to the sums or the averages of their points (sometimes after dropping highest and lowest scores). They have been emulated in various schemes proposed for voting with ranges taken to be integers in [0, 100], [0, 5], [0, 2], or [0, 1] (the last of which is approval voting, according to its second interpretation). It is fair to ask whether any one of these mechanisms—based on comparisons or sums of measures of merit—actually makes the choice that corresponds to the true wishes of society, in theory or in practice. All have their supporters, yet all have serious drawbacks: every one of them fails to meet some important property that a good mechanism should satisfy. In consequence, the basic challenge remains: to ﬁnd a mechanism of election, prove it satisﬁes the properties, and show it is practical. The existing methods of voting have for the most part been viewed and analyzed in terms of the traditional model of social choice theory: individual voters submit “preference” lists of the candidates, and the decision to be made is to ﬁnd society’s winning candidate or to ﬁnd society’s “preference” list from best (implicitly the winner) to worst. 101

102

Chapter 7. Social Choice: New Model and Method

Majority judgment is a new mechanism based on a diﬀerent model of the problem of voting (inspired by practice in ranking wines, ﬁgure skaters, divers, and others). It asks voters to evaluate every candidate in a common language of grades—thus to judge each one on a meaningful scale—rather than to compare them. This scale is absolute in the sense that the merit of any one candidate in a voter’s view—whether the candidate be “excellent,” “good,” or merely “acceptable”— depends only on the candidate (so remains the same when candidates withdraw or enter). Assigning a value or grade permits comparisons of candidates, comparisons of candidates does not permit evaluations (or any expression of intensity). Given the grades assigned by voters to the candidates, it determines the ﬁnal grades of each candidate and orders them according to their ﬁnal grades. The ﬁnal grades are not sums or averages. Here it is explained why (1) the traditional model of the theory of social choice cannot lead to acceptable methods of ranking and electing, and (2) a more realistic model leads inevitably to one method of ranking and electing—majority judgment—that best meets the traditional criteria of what constitutes a good method. The chapter is based on the following publications: Balinski and Laraki (2010a,b,c, 2007) and Balinski, Jennings and Laraki (2009).

7.1

Traditional Model

The theory of voting—better known as the theory of social choice—has failed. Despite insightful concepts, fascinating analyses, and surprising theorems, its most famous results are for the most part negative: paradoxes leading to impossibility and incompatibility theorems. The theory has yielded no really decent methods for practical use. Beginning with the ﬁrst known written traces (1299) of how candidates are to be elected and ranked, voting has been viewed in terms of comparing the relative merits of candidates. Each voter is assumed to rank-order the candidates and the problem is to amalgamate these so-called preferences into the rank-order of society. This view leads to two unsurmountable paradoxes that plague theory and practice. (1) Condorcet’s paradox : In the presence of at least three candidates, A, B, and C, it is entirely possible that in head-to-head encounters, A defeats B, B defeats C, and C defeats A, so transitivity fails and a Condorcet-cycle is produced, A ≻S B ≻S C ≻S A where X ≻S Y means society prefers X to Y . (2) Arrow’s paradox : In the presence of at least (the same) three candidates, it is entirely possible for A to win, yet with the same voting opinions for B to defeat A when C withdraws. These paradoxes are real. They occur in practice. Condorcet’s paradox is not often seen because voting systems very rarely ask voters to give their rank-orders. It was, however, observed in a Danish election, see Kurrild-Klitgaard (1999). It also occurred in the famous 1976 “Judgment of Paris” where eleven voters—well known wine experts—evaluated six Cabernet-Sauvignons of California and four of Bordeaux, and the “unthinkable” is supposed to have occurred: in the phrase of Time magazine “California

Traditional Model

103

defeated Gaul.” In fact, by Condorcet’s majority principle, ﬁve wines—including three of the four French wines—all preferred to the other ﬁve wines by a majority, were in a Condorcet-cycle, A ≈S B ≻S C ≈S D ≻S E ≻S A, where X ≈S Y means society considers X and Y to be tied (see Balinski and Laraki (2010a) section 7.8). Arrow’s paradox is seen frequently. Had Ralph Nader not been a candidate for the presidency in the 2000 election in Florida, it seems clear that most of his 97,488 votes would have gone to Albert Gore who had 537 votes less than George W. Bush, thus making Gore the winner in Florida and so the national winner with 291 Electoral College votes to Bush’s 246. Had Jean-Pierre Chevènement not been candidate in 2002 presidential election in France, most of the votes would have gone to Lionel Jospin qualifying him to the second round of the election and perhaps making him the winner against Jacques Chirac (according to many pools). According to the rules that were used for years in amalgamating judges’ opinions of ﬁgure skating performances—where their inputs were rank-orders of skaters—it often happened that the relative position of two skaters could invert, or “ﬂip-ﬂop,” solely because of another skater’s performance. Behind these paradoxes lurk a host of impossibilities that plague the traditional model. A brief, informal account is given of the most striking among them. The model is this. Each voter’s input is a rank-order of the candidates. Their collective input is society’s preference-profile Φ. The output, society’s rank-order of the candidates, is determined by a rule of voting F that depends on Φ. It must satisfy certain basic demands. 1. Unlimited domain: Voters may input whatever rank-orders they wish. 2. Unanimous: When every voter inputs the same rank-order then society’s rank-order must be that rank-order. 3. Independence of irrelevant alternatives: Suppose that society’s rank-order over all ′ candidates C is F (ΦC ) and that over a subset of the candidates, C ′ ⊂ C, it is F (ΦC ). Then the rank-order obtained from F (ΦC ) by dropping all candidates not in C ′ must ′ be F (ΦC ). 4. Non-dictatorial: No one voter’s input can always determine society’s rank-order whatever the rank-orders of the others. Arrow’s Impossibility Theorem There is no rule of voting that satisfies the properties (1) to (4) (when there are at least three candidates). Arrow’s theorem ignores the possibility that voters have strategies. Under the assumption that their “true” opinions are rank-orders, it does not consider the possibility that their inputs may diﬀer from their true opinions, chosen in order to maximize the outcome they wish. A rule of voting is strategy-proof when every voter’s best strategy (i.e. dominant strategy) is his true preference-order; otherwise, the rule is manipulable. Strategy-proof rules are the most desirable for then the true preferences of the voters are amalgamated into a decision of society rather than some other set of strategically chosen preferences. Regrettably they do not exist.

104

Chapter 7. Social Choice: New Model and Method

However, the very formulation of the theorem that proves they do not exist underlines a defect in the traditional model. In general, the output of a rule of voting is society’s rank-order. Voters usually prefer one rank-order to another, viz., the rank-order of the candidates is important to a voter, the rank-order of ﬁgure skaters in Olympic competitions is important to the skaters and judges and to the public at large. But voters and judges have no way of expressing their preferences over rank-orders. In the spirit of the traditional approach they should be asked for their rank-orders of the rank-orders (for a more detailed discussion of this point see Balinski and Laraki (2010a)). Be that as it may, when strategic choices are introduced in the context of the traditional approach something must be assumed about the preferences of the voters to be able to analyze their behavior. It is standard to assume that voters only care about who wins, i.e., voters’ utility functions depend only on who is elected. This is, of course, not true for most voters. Each voter’s input is now a rank-order that is chosen strategically, so it may or may not be her true preference list. A rule of voting is assumed to produce a winner only, and unanimous means that when all the voters place a candidate ﬁrst on their lists then so does the rule. Gibbard and Satterthwaite’s Impossibility Theorem There is no rule of voting that is unanimous, non-dictatorial and strategy-proof for all possible preference-profiles (when there are at least three candidates). There is still another fundamental diﬃculty with the traditional model. Clearly, if a voter has a change of opinion and decides to move some candidate up in her ranking that candidate should not as a consequence end up lower in the ﬁnal ranking: that is, the method of voting should be “choice-monotone.” Monotonicity is essential to any practically acceptable method: how can one accept the idea that when a candidate rises in the inputs he falls in the output? But there are various ways of formulating the underlying idea. Another is “rank-monotone”: if one or several voters move the winner up in their inputs, not only should the winner remain but the ﬁnal ranking among the others should not change either. Theorem (Balinski, Jennings, Laraki (2009) There is no unanimous, impartial 1 rule of voting that is both choice- and rank-monotone (when there are at least three candidates). Moreover, when some non-winner falls in the inputs of one or more voters no method of the traditional model can guarantee that the winner remains the winner (none is “strongly monotone”). Why all of this happens is simple: moving some candidate up necessarily moves some candidate(s) down, though there may be no change of opinion regarding them. In short, these theorems show that there can be no good method of voting. The traditional paradigm leaves a desperate state of aﬀairs. 1

Impartial means candidates and voters are treated equally.

Practice in Skating

105

What is amazing about the theory of social choice is that the basic model has remained the same over seven centuries. The premisses of the model have not been questioned. Comparing candidates has steadfastly remained the paradigm of the traditional model. And yet, both common sense and practice show that voters and judges do not formulate their opinions as rank-orders. The goal of this chapter is to give a brief mathematical account of a new paradigm and model for a theory of social choice that comes much closer to capturing the way in which voters naturally express their opinions and that escapes the traditional impossibilities. For a complete account of the theory, a detailed justiﬁcation of its basic paradigm, and descriptions of its uses to date and of experiments that have been conducted to test it, see Balinski and Laraki (2010a).

7.2

Practice in Skating

Everything is ranked all of the time: architectural projects, beauty queens, cities, dogs, economists, ﬁgure skaters, graduates, hotels, investments, journalists, . . . , not only candidates for oﬃces. How? Invariably by evaluating them in a common language of grades. That it is natural to do so is evident since it is so often done. In most real competitions (other than elections) the order-of-ﬁnish of competitors is a function of number-grades attributed by judges. Most often the functions used to amalgamate judges’ grades are their sums, or equivalently, their averages. But this is not always so. The recent changes in the rules used in ﬁgure skating oﬀer a particularly interesting case study.

7.2.1

Condorcet’s and Arrow’s Paradoxes

Although there had already been occurrences of Arrow’s paradox in the past, including the 1995 women’s World Championship of ﬁgure skating, what happened in the 1997 men’s ﬁgure skating European Championships was the extra drop that caused a ﬂood. Before A. Vlascenko’s performance, the rule’s top ﬁnishers were A. Urmanov ﬁrst, V. Zagorodniuk second, and P. Candeloro third. Then Vlascenko performed. The ﬁnal order-of-ﬁnish placed him sixth, conﬁrmed Urmanov’s ﬁrst, but put Candeloro in second place and Zagorodniuk in third. The outcry over this ﬂip-ﬂop was so strident that the President of the International Skating Union (ISU) ﬁnally admitted something must be wrong with the rule in use and promised it would be ﬁxed. Accordingly, the rules were changed. The ISU adopted the OBO (“one-by-one”) system in 1998. It is explained in terms of a real problem. The Four Continents Figure Skating Championships are annual competitions with skaters from all the continents save Europe (whence the “Four”). In 2001 they were held in Salt Lake City, Utah. The example discussed comes from the Men’s “Short Program.” There were twenty-two competitors and nine judges. The analysis is conﬁned to the six leading ﬁnishers. It happens that doing so gives exactly the same order-of-ﬁnish among the six as is obtained with all twenty-two competitors (it need not be necessarily so!).

106

Chapter 7. Social Choice: New Model and Method

Every judge assigns to every competitor two grades, each ranging between 0 and 6, one “presentation mark” and one “technical mark.” Their sums determine each judge’s input. The data concerning the six skaters is given in table 1. Name T. Eldredge C. Li M. Savoie T. Honda M. Weiss Y. Tamura

J1 11.3 10.8 11.1 10.3 10.6 09.8

J2 11.6 11.2+ 10.8+ 11.2 11.1 10.8

J3 11.3 11.0 11.1 10.9 10.6 10.1

J4 11.4 10.9 10.8+ 11.0 10.8 10.4

J5 11.4 10.6 10.5 10.8 10.4 11.0

J6 11.7 11.0 10.8 10.9+ 10.9 11.6

J7 11.4 10.8 10.6 10.4 10.9 10.7

J8 11.2 10.9 10.5 10.3 10.4 10.6

J9 11.5 11.2 11.1 10.7 10.9 10.8

Avg. 11.42 10.93 10.81 10.72 10.73 10.64

Table 1. Scores of competitors given by nine judges (performance plus technical marks).

Contrary to public belief the sum or the average of the scores given a skater did not determine a skater’s standing. They were only used as a device to determine each judge’s rank-order of the competitors. Name T. Eldredge C. Li M. Savoie T. Honda M. Weiss Y. Tamura

J1 1 3 2 5 4 6

J2 1 2 5 3 4 6

J3 1 3 2 4 5 6

J4 1 3 4 2 5 6

J5 1 4 5 3 6 2

J6 1 3 6 4 5 2

J7 1 3 5 6 2 4

J8 1 2 4 6 5 3

J9 1 2 3 6 4 5

Table 2. Judges’ inputs (indicating rank-orders of the six competitors).

When two sums are the same but the presentation mark of one competitor is higher than the other’s then that competitor is taken to lead the other in the judge’s input. This ISU rule breaks all ties in the example; when a tie occurs a“+” is adjoined next to the number (in table 1) that indicates a higher presentation mark, so indicates higher in the ranking. The judges’ rank-orders of the competitors—their inputs to the OBO rule—are given in table 2. Thus, for example, judge J1 ranked Eldredge ﬁrst, Savoie second, . . . , and Tamura last. To here, the new rule is identical to the old one (for details see Balinski and Laraki (2010a)). The innovation was in how the judges’ inputs are amalgamated into a decision. The OBO system combines two of the oldest and best known voting rules, Llull’s—a generalization of Condorcet’s known by some as Coleman’s—and Cusanus’s—best know as Borda’s method. To use what we will call Llull’s and Borda’s rules, table 3 gives the numbers of judges that prefer one competitor to another for all pairs of competitors. Thus, for example, Savoie is ranked higher than Weiss by six judges, so ranked lower by three.

Practice in Skating

107

Condorcet was for declaring one competitor ahead of another if a majority of judges preferred him to the other. But, of course, his paradox may arise. It does in this example, Honda ≻S Weiss ≻S Tamura ≻S Honda.

Eldredge Li Savoie Honda Weiss Tamura

Eldredge – 0 0 0 0 0

Li 9 – 2 2 1 2

Savoie 9 7 – 4 3 4

Honda 9 7 5 – 4 5

Weiss 9 8 6 5 – 3

Tamura 9 7 5 4 6 –

Number of wins 5 4 3 1 1 1

Borda score 45 29 18 15 14 14

Table 3. Judges’ majority votes in all head-to-head comparisons.

A more general rule than Condorcet’s was proposed in 1299 by Ramon Llull (see Hägele and Pukelsheim 2001). Llull’s method : rank the competitors according to their numbers of wins plus ties. It is a more general rule because a Condorcet-winner is necessarily a Llull-winner. Eldredge is the Condorcet- and Llull-winner, and Llull’s rule yields the ranking Eldredge ≻S Li ≻S Savoie ≻S Honda ≈S Weiss ≈S Tamura. The ﬁrst three places are clear, but there is a tie for the next three places. Eldredge is the Condorcet-winner because he is ranked higher by a majority of judges in all pair-by-pair comparisons. There is no Condorcet-loser because no skater is ranked lower by a majority in all pair-by-pair comparisons. Cusanus (in 1433), see Hägele and Pukelsheim (2008), and later Borda (in 1770, published in 1784) had an entirely diﬀerent idea (it is so well-known as Borda’s method that we use this designation). A competitor C receives k Borda-points if k competitors are below C in a judge’s rank-order; C’s Borda-score is the sum of his Borda-points over all judges; and the Borda-ranking is determined by the competitors’ Borda-scores. Alternatively, a competitor’s Borda-score is the sum of the votes he receives in all pair by pair votes. Thus the Borda-scores in table 3 are simply the sums of votes in the rows, and the Borda-ranking of the six candidates is Eldredge ≻S Li ≻S Savoie ≻S Honda ≻S Weiss ≈S Tamura. Borda’s method, however, often denies ﬁrst place to a Condorcet-winner or last place to a Condorcet-loser, and that has caused many to be bewitched, bothered and bewildered (though Borda’s method suﬀers from much worse defects as will soon become apparent). There is an essential diﬀerence in the two approaches. Whereas Llull and Condorcet rely on the candidates’ numbers of wins in all face-to-face confrontations, Cusanus and Borda rely on the candidates’ total numbers of votes in all face-to-face encounters.

108

Chapter 7. Social Choice: New Model and Method

The OBO rule used in skating is this: • Rank the competitors by their number of wins (thereby giving precedence to the Llull and Condorcet idea); • break any ties by using Borda’s rule. In this case Borda’s rule happens to agree with Llull’s, so the OBO rule ranks the six skaters as does Borda, Eldredge ≻S Li ≻S Savoie ≻S Honda ≻S Weiss ≈S Tamura. This was the oﬃcial order-of-ﬁnish. The OBO rule is also known as Dasgupta-Maskin’s method (2004, 2008). They proposed it with elaborate theoretical arguments, calling it “the fairest vote of all.” In fact it had already been tried, and discarded. The OBO rule produces a linear order, so is not subject to Condorcet’s paradox, but it is (unavoidably) subject to Arrow’s paradox, in this example viciously. For suppose that the order of the performances had been ﬁrst Honda, then Weiss, Tamura, Savoie, Li and Eldredge. After each performance, the results are announced. Among the ﬁrst three the judges’ inputs are Name Honda Weiss Tamura

J1 2 1 3

J2 1 2 3

J3 1 2 3

J4 1 2 3

J5 2 3 1

J6 2 3 1

J7 3 1 2

J8 3 2 1

J9 3 1 2

This yields the majority votes, numbers of wins and Borda-scores:

Honda Weiss Tamura

Honda – 4 5

Weiss 5 – 3

Tamura 4 6 –

Number of wins 1 1 1

Bordascore 9 10 8

so the result Weiss ≻S Honda ≻S Tamura. For the ﬁrst four skaters the judges’ inputs are Name M. Savoie T. Honda M. Weiss Y. Tamura yielding

J1 1 3 2 4

J2 3 1 2 4

J3 1 2 3 4

J4 2 1 3 4

J5 3 2 4 1

J6 4 2 3 1

J7 3 4 1 2

J8 2 4 3 1

J9 1 4 2 3

Practice in Skating

Savoie Honda Weiss Tamura

109

Savoie – 4 3 4

Honda 5 – 4 5

Weiss 6 5 – 3

Tamura 5 4 6 –

Number of wins 3 1 1 1

Bordascore 16 13 12 13

so the result Savoie ≻S Weiss ≈S Honda ≻S Tamura. Before Savoie’s performance Weiss led Honda, after they were tied. Compare this with the ﬁnal standings among all six skaters after the performances of Eldredge and Li (already computed): Eldredge ≻S Li ≻S Savoie ≻S Honda ≻S Weiss ≈S Tamura. The last three did not perform, and yet Honda—who had once been tied with Weiss and once behind him—is now ahead of him, and Weiss—who had been ahead of Tamura—is now tied with him. This chaotic behavior of repeated ﬂip-ﬂops is completely unacceptable to spectators, competitors, and of course common sense. It is inherent to the OBO and Borda methods.

7.2.2

Strategic Manipulation

The OBO rule was abandoned by the ISU following the big scandal of the 2002 winter Olympics (also held in Salt Lake City). In the pairs ﬁgure skating competition the gold medal went to a Russian pair, the silver to a Canadian pair. The vast majority of the public, and many experts as well, were convinced that the gold should have gone to the Canadians, the silver to the Russians. A French judge confessed having favored the Russian over the Canadian pair, saying she had yielded to pressure from her hierarchy, only to deny it later. That judges manipulate their inputs—reporting grades not in keeping with their professional opinions—is known. A recent statistical analysis concluded: “[Judges] . . . appear to engage in bloc judging or vote trading. A skater whose country is not represented on the judging panel is at a serious disadvantage. The data suggests that countries are divided into two blocs, with the United States, Canada, Germany and Italy on one side and Russia, the Ukraine, France and Poland on the other” (Zitzewitz 2006). Once again the skating world entered into ﬁerce ﬁghts over how to express and how to amalgamate the opinions of judges. Finally—thankfully—it abandoned the idea that judges’ inputs should be rank-orders. In so doing, they joined the growing number of competitions where the rules have judges assign number grades to candidates, and the candidates’ average grades determine the orders-of-ﬁnish (including diving, wine tasting, gymnastics, pianists, restaurants, and many others). Such rules or aggregation functions are called by some point-summing methods by others range voting. The judges’ scores in the 2001 Four Continents Figure Skating

110

Chapter 7. Social Choice: New Model and Method

Championships provides an immediate example. Judges’ inputs are now the scores themselves. They range from a low of 0 to a high of 12. The candidates’ average scores are given in table 1 and yield an order-of-ﬁnish that diﬀers from that of the Borda and OBO rules: Eldredge ≻S Li ≻S Savoie ≻S Weiss ≻S Honda ≻S Tamura. It is at once evident that judges can easily manipulate the outcome by assigning their grades strategically. Every judge can both increase and decrease the ﬁnal score of every competitor by increasing or decreasing the score given that competitor. In this case it is particularly tempting for judges to assign scores strategically. Suppose they reported the grades they believed were merited. Take, for example, judge J2 . She can change her scores (as indicated in the top part of table 4, e.g., increasing that of Eldredge from 11.6 to 12.0 so that his average goes from 11.42 to 11.47) so that the ﬁnal orderof-ﬁnish is exactly the one she believes is merited. Moreover, the new scores she gives agree with the order of merit she believes is correct. But judge J2 is not unique in being able to do this: Every single judge can alone manipulate to achieve precisely the order-ofﬁnish he prefers by changing his scores. And each can do it while maintaining the order in which they placed them initially (given in table 2). Results are announced following every performance, so judges accumulate information as the competition progresses and may obtain insights as to how best manipulate.

J2 :

Averages:

Eldredge 1st 11.6 ↓ 12.0

Li 2nd 11.2+ ↓ 11.9

Savoie 5th 10.8+ ↓ 10.2+

Honda Weiss 3rd 4th 11.2 11.1 ↓ ↓ 11.8 11.4

11.42 ↓ 11.47

10.93 ↓ 11.01

10.81 ↓ 10.74

10.72 ↓ 10.79

10.73 ↓ 10.77

Tamura 6th 10.8 ↓ 10.2 10.64 ↓ 10.58

Table 4. Judge J2 ’s manipulations that change the order-of-finish to what she wishes (given in the first row). Note that her new grades define the same order.

This analysis shows how extremely sensitive point-summing methods are to strategic manipulation; in fact, they are more open to manipulation than any other method of voting.

7.2.3

Meaningfulness

Using a point-summing rule raises deep and important questions: Is it at all meaningful to sum or average the scores given a competitor? What scores? E.g., if ﬁnite in number and they go from a low of 0 to a high of 20, should they be evenly spaced or not? Why and under what conditions is it justiﬁed to sum them?

A More Realistic Model

111

How to construct a scale is a science in itself. “When measuring some attribute of a class of objects or events, we associate numbers . . . with the objects in such a way that the properties of the attribute are faithfully represented as numerical properties” (Krantz et al (1971), p. 1). Given a faithful representation, the type of scale dictates the meaningfulness of the operations by which measurements may be analyzed. Pain, for example, is measured on an eleven point ordinal scale going from 0 to 10: sums and averages are meaningless. Temperature (Celsius or Fahrenheit) is an interval scale because equal intervals have the same signiﬁcance: sums and averages are meaningful but multiplication is not for there is no absolute 0. Ounces and inches are ratio scales: they are interval scales where 0 has an absolute sense and multiplication is meaningful. Since a point-summing method sums candidates’ scores they must—to be meaningful— be drawn from an interval scale. Although in many applications such as ﬁgure skating the numbers of the scale have commonly understood meanings, an increase of one base unit invariably becomes more diﬃcult to obtain the higher is the score, implying the scores do not constitute an interval scale, so that sums and averages are meaningless.

7.3

A More Realistic Model

Postulate a ﬁnite number of competitors or candidates C = {A, . . . , I, . . . , Z}; a ﬁnite number of judges or voters J = {1, . . . , j, . . . , n}; and a common language of grades Λ = {α, β, γ, . . .}. The grades may be any strictly ordered words, phrases, levels or categories. Any two levels may be compared, α 6= β implies either α ≺ β or α ≻ β, and transitivity holds, α ≻ β and β ≻ γ imply α ≻ γ. A language may be ﬁnite or a subset of points of an interval of the real line. In practice (e.g., piano competitions, ﬁgure skating, gymnastics, diving, wine competitions), common languages of grades are invented to suit the purpose, and are carefully deﬁned and explained. Their words are clearly understood, much as the words of an ordinary language, or the measurements of physics. But they almost surely do not constitute interval scales (for a detailed analysis of this point see Balinski and Laraki (2010a) where what it takes for the scale to be interval is explained). The grades or words are “absolute” in the sense that every judge uses them to measure the merit of each competitor independently. They are “common” in the sense that judges assign them with respect to a set of benchmarks that constitute a shared scale of evaluation. By way of contrast, ranking competitors is only relative, it bars any scale of evaluation and ignores any sense of shared benchmarks. A problem is speciﬁed by its inputs, a profile Φ = Φ(C, J ): it is an m by n matrix of the grades Φ(I, j) ∈ Λ assigned by each of the n judges j ∈ J to each of the m

112

Chapter 7. Social Choice: New Model and Method

competitors I ∈ C,



    Φ=   

.. .. . . α1 α2 .. .. . . β1 β2 .. .. . .

.. .

··· · · · αn−1 .. ··· . ··· ···

βn−1 .. .

.. . αn .. .



    .  βn   .. .

With this formulation of the inputs—the assignment of grades to competitors—voters specify rank-orders determined by the grades (that may be strict if the scale of grades is ﬁne enough), so in this sense the inputs include those of the traditional model. Voters are able to input detailed expressions of their preferences that are at once simple and cognitively natural (as experience has proven). Suppose competitor A is assigned the grades (α1 , . . . , αn ) and competitor B the grades (β1 , . . . , βn ). A method of ranking is a non-symmetric binary relation S that compares any two competitors whose grades belong to some proﬁle. By deﬁnition A S B and B S A means A ≈S B; and A ≻S B if A S B and not A ≈S B. So S is a complete binary relation. What properties should any reasonable method of ranking S possess? (1) Neutrality: A S B for the proﬁle Φ implies A S B for the proﬁle σΦ for any permutation σ of the competitors (or rows). That is, the competitors’ ranks do not depend on where their grades are given in the inputs. (2) Anonymity: A S B for the proﬁle Φ implies A S B for the proﬁle Φσ for any permutation σ of the voters (or columns). That is, no judge has more weight than another judge in determining the ranks of competitors. When a rule satisﬁes these ﬁrst two properties it is called impartial. (3) Transitivity: A S B and B S C implies A S C. That is, Condorcet’s paradox cannot occur. (4) Independence of irrelevant alternatives in ranking (IIAR): When A S B for the proﬁle Φ, A S B for any proﬁle Φ′ obtained by eliminating or adjoining other competitors (or rows). That is, Arrow’s paradox cannot occur. These four are the rock-bottom necessities. Together they severely restrict the choice of a method of ranking. A method of ranking respects grades if the rank-order between them depends only on their sets of grades; in particular, when two competitors A and B have the same set of grades, they are tied. With such methods the rank-orders induced by the voters’ grades must be forgotten, only the sets of grades count, not which voter assigned which grade. Said diﬀerently, if two voters switch the grades they give a competitor this has no eﬀect on the electorate’s ranking of the competitors. The following theorem shows that the new paradigm—voters evaluate competitors— must replace the old paradigm—voters compare competitors.

A More Realistic Model

113

Theorem 7.1 A method of ranking is impartial, transitive and independent of irrelevant alternatives in ranking if and only if it is transitive and respects grades. This simple theorem is essential: it says that if Arrow’s and Condorcet’s paradoxes are to be avoided, then the traditional model and paradigm must be abandoned. Who gave what grade cannot be taken into account—only the sets of grades themselves may be taken into account. This suggests that what is needed is a function that transforms the grades given any competitor into a ﬁnal grade, the order among the ﬁnal grades determining the orderof-ﬁnish of the competitors. The usual practice, as was mentioned, is to use the average grade, though sometimes the top and bottom grades, or top two and bottom two grades, are omitted. Such rules present an immediate diﬃculty because two competitors with diﬀerent sets of grades may have the same average, and so are tied. In any case, such functions should enjoy at least two other properties. First, if the voters all assign the same grade to a competitor it should be his ﬁnal grade. Second, in comparing two ordered sets of grades, when each in the ﬁrst set is at least as high as the corresponding grade in the second set, the ﬁnal grade given the ﬁrst should be no lower than that given the second; moreover, when each in the ﬁrst set is strictly higher than the corresponding grade in the second set, the ﬁnal grade given the ﬁrst should be strictly higher than that given the second. Accordingly, a function f : Λn → Λ that transforms grades given a competitor into a ﬁnal grade is an aggregation function if it satisﬁes three properties: • Anonymity: f (. . . , α, . . . , β, . . .) = f (. . . , β, . . . , α, . . .); • Unanimity: f (α, α, . . . , α) = α; and • Monotonicity: αj βj for all j ⇒ f (α1 , . . . , αn ) f (β1 , . . . , βn ) and αj ≺ βj for all j ⇒ f (α1 , . . . , αn ) ≺ f (β1 , . . . , βn ). An aggregation function serves two separate though related purposes: (1) It assigns a ﬁnal grade to each competitor and (2) it determines the order-of-ﬁnish of all competitors. They are analyzed in both their uses, as, respectively, social-grading functions and socialranking functions. A language of grades Λ is usually parameterized as a bounded interval of the nonnegative rational or real numbers [0, R]. Obvious examples of aggregation functions are the arithmetic mean or average, any other means such as the geometric or harmonic mean, and the kth order function f k that is the kth highest grade (for k = 1, 2, . . . , n). Since

114

Chapter 7. Social Choice: New Model and Method

small changes in the parametrization or the input grades should naturally imply small changes in the outputs or ﬁnal grades it is natural to assume that an aggregation function is continuous. This assumption is sometimes necessary to establish the characterizations that follow, but not always. However, the characterizing properties hold for arbitrary ﬁnite or inﬁnite common languages of grades. The question that presents itself is: Which aggregation function(s) of the grades of competitors should be used to grade and which to rank?

7.4

Majority Judgment: Description

7.4.1

Small Jury

Suppose there are n judges or voters who assign competitors grades. The kth order function f k is the aggregation function social-grading function whose value is the kth highest grade. When the set of grades r of a competitor are ordered from highest to lowest, r = (r1 r2 · · · rn ) ⇒ f k (r) = rk , a competitor’s majority-grade f maj is his middlemost or median grade when n is odd, his lower-middlemost when n is even: ( n+1 f 2 if n is odd, f maj = n+2 f 2 if n is even. Interpret the judges’ scores as the grades of a ﬁnite common language (going from 0 to 12 in tenths). Ordering each competitor’s grades from highest to lowest gives table 5.

T. Eldredge C. Li M. Savoie T. Honda M. Weiss Y. Tamura

f1 11.7 11.2 11.1 11.2 11.1 11.6

f2 11.6 11.2 11.1 11.0 10.9 11.0

f3 11.5 11.0 11.1 10.9 10.9 10.8

f4 11.4 11.0 10.8 10.9 10.9 10.8

f maj 11.4 10.9 10.8 10.8 10.8 10.7

f6 11.4 10.9 10.8 10.7 10.6 10.6

f7 11.3 10.8 10.6 10.4 10.6 10.4

f8 11.3 10.8 10.5 10.3 10.4 10.1

f9 11.2 10.6 10.5 10.3 10.4 09.8

Table 5. Competitors’ scores ordered from highest to lowest (identities of judges forgotten). Majority-grade italicized.

The order-of-ﬁnish of the competitors is determined by their majority-grades. In this case there is a three-way tie for third place. So a ﬁner distinction is needed. If two competitors such as Savoie and Honda have the same majority-grade, then the order between them must depend on their sets of grades excluding that one common grade. So it is dropped, and the majority-grades of the remaining eight grades are determined.

Majority Judgment: Description

115

In this case Savoie’s is 10.8, Honda’s is 10.7: Savoie’s is higher, so he leads Honda by majority judgment. In general, suppose a competitor’s grades are r1 r2 · · · rn . Her majority-value is an ordered sequence of these grades. The ﬁrst in the sequence is her majority-grade; the second is the majority-grade of her grades when her (ﬁrst) majoritygrade has been dropped (it is her “second majority-grade”); the third is the majority-grade of her grades when her ﬁrst two majority-grades have been dropped; and so on. Thus, when there is an odd number of voters n = 2t − 1, a competitor’s majority-value is the sequence that begins at the middle, rt , and fans out alternately from the center starting from below, as indicated here r1 · · · rt−2 rt−1 rt rt+1 rt+2 · · · r2t−1 5th 3rd 1st 2nd 4th so that it is

− → r = (rt , rt+1 , rt−1 , rt+2 , rt−2 , . . . , r1 , r2t−1 ).

When there is an even number of voters n = 2t − 2, the majority-value begins at the lower middle and fans out alternatively from the center starting from above, − → r = (rt , rt−1 , rt+1 , rt−2 , rt+2 , . . . , r2t−2 , r1 ). → − If the majority-values of two competitors A and B are respectively − r A and → r B , the majority-ranking ≻maj is deﬁned by → − A ≻maj B when − r A ≻lexi → r B, → → where ≻lexi means lexicographically greater, i.e., the ﬁrst grade where − r A and − r B diﬀer A’s is higher. The majority-ranking in the skating competition is thus Eldredge ≻maj Li ≻maj Savoie ≻maj Honda ≻maj Weiss ≻maj Tamura. There are no ties. There can be no tie unless two competitors have precisely the same set of grades.

7.4.2

Large Electorates

Juries or committees usually have a small number of judges or of members: ﬁve, nine or perhaps twenty. The method and theory are the same when juries are any numbers of judges or voters. However, the majority judgment for “juries” composed of hundreds to millions of judges—nations electing presidents, cities electing mayors, congressional districts electing representatives, institutions and societies electing oﬃcers—has an easier

116

Chapter 7. Social Choice: New Model and Method

and more compelling description in that context. It was tested in a ﬁeld experiment on April 22, 2007, in parallel with the ﬁrst round of the French presidential election, in Orsay, a town close to Paris. The experiment took place in three of the twelve voting precincts of Orsay, that together had 2,695 registered voters. 2,383 voters cast oﬃcial ballots (88% of those registered) of which 2,360 were valid. After voting oﬃcially, the voters were asked to participate in the experiment using the majority judgment. They had been informed it would take place by mail, printed ﬂyers and posters. It was conducted in accordance with usual French voting practice: ballots were ﬁlled out and inserted into envelopes in voting booths with curtains, then deposited in transparent ballot boxes. Bulletin de vote : Élection du Président de la République 2007 Pour présider la France, ayant pris tous les éléments en compte, je juge en conscience que ce candidat serait : Très Bien

Bien

Assez Bien

Passable

Insuffisant

A Rejeter

Olivier Besancenot Marie-George Buffet Gérard Schivardi François Bayrou José Bové Dominique Voynet Philippe de Villiers Ségolène Royal Frédéric Nihous Jean-Marie Le Pen Arlette Laguiller Nicolas Sarkozy Cochez une seule mention dans la ligne de chaque candidat. Ne pas cocher une mention dans la ligne d’un candidat revient à le Rejeter.

Table 5. Ballot, Orsay experiment, 2007 French presidential election.

The ballot is reproduced in the table 5. Voters were posed a serious and solemn question: To be president of France, after having taken every consideration into account, I judge in conscience that this candidate would be: and asked to give an answer for every candidate in a common language of grades—absolute evaluations—common to all French voters: Très Bien, Bien, Assez Bien, Passable, Insuffisant, or A Rejeter.

Majority Judgment: Description

117

The ﬁrst ﬁve designations are known to all those who have been school children in France; the last is clear enough. Reasonable translations are: Excellent, Very Good, Good, Acceptable, Poor or To Reject. The meanings of the grades are, of course, directly related to the question posed. The results are given in table 6. F. Bayrou S. Royal N. Sarkozy D. Voynet O. Besancenot M.-G. Buffet J. Bové A. Laguiller F. Nihous P. de Villiers G. Schivardi J.-M. Le Pen

Excellent 13.6% 16.7% 19.1% 2.9% 4.1% 2.5% 1.5% 2.1% 0.3% 2.4% 0.5% 3.0%

Very Good 30.7% 22.7% 19.8% 9.3% 9.9% 7.6% 6.0% 5.3% 1.8% 6.4% 1.0% 4.6%

Good 25.1% 19.1% 14.3% 17.5% 16.3% 12.5% 11.4% 10.2% 5.3% 8.7% 3.9% 6.2%

Acceptable 14.8% 16.8% 11.5% 23.7% 16.0% 20.6% 16.0% 16.6% 11.0% 11.3% 9.5% 6.5%

Poor 8.4% 12.2% 7.1% 26.1% 22.6% 26.4% 25.7% 25.9% 26.7% 15.8% 24.9% 5.4%

To Reject 4.5% 10.8% 26.5% 16.2% 27.9% 26.1% 35.3% 34.8% 47.8% 51.2% 54.6% 71.7%

No grade 2.9% 1.8% 1.7% 4.3% 3.2% 4.3% 4.2% 5.3% 7.2% 4.3% 5.8% 2.7%

Table 6. Majority judgment results, three precincts of Orsay, April 22, 2007. When there was no grade it was counted as a To Reject, as per the instructions on the ballot (so Bayrou’s To Reject was counted as 7.4%, Royal’s as 12.6%,. . . , Le Pen’s as 74.4%). There were few.

When there is a very large number of voters or judges, as in this case, almost surely, the middle-interval—a single grade if the number of voters is odd, two grades if the number of voters is even—will be one and the same grade. Thus it is safe to simply say that a candidate’s majority-grade is the median of his or her grades: it is at once the highest grade approved by a majority and the lowest grade approved by a majority. Alternatively, only an minority would be for a higher grade or for a lower grade. For example, the majority-grade of D. Voynet is Acceptable because a majority of 53.4% = 2.9% + 9.3% + 17.5% + 23.7% of the voters judge that she merits at least an Acceptable and a majority of 70.3% = 23.7% + 26.1% + 16.2% + 4.3% of the voters judge that she merits at most an Acceptable. Or, only a minority of 29.8% would be for a higher grade and only a minority of 46.6% for a lower grade. The majority-ranking is calculated more directly in the case of a large number of voters (see Table 7). When the numbers or percentages of grades higher or lower than the candidates’ majority-grades are diﬀerent—which is almost surely true—the majorityranking is obtained from three pieces of information concerning each candidate: • p, the number or percentage of the grades better than a candidate’s majority-grade; • α, the candidate’s majority-grade; and • q, the number or percentage of the grades worse than a candidate’s majority-grade.

118

Chapter 7. Social Choice: New Model and Method

The triple (p, α, q) is the candidate’s majority-gauge. S. Royal’s majority-gauge is (39.4%,Good , 41.5%) since 39.4% = 16.7% + 22.7% of her grades are better than Good and 41.5% = 16.8% + 12.2% + 10.8% + 1.8% are worse than Good. If the number or percentage p of the grades better than a candidate’s majority-grade α is higher than the number or percentage q of those worse than the candidate’s majority-grade, then the majority-grade is completed by a plus (+); otherwise the majority-grade is completed by a minus (−). Thus S. Royal’s majority-grade is Good−. The ± attached to the majority-grade is implied by the majority-gauge, but for added clarity it will most often be included, so that, for example, Royal’s majority-gauge may be written (39.4%,Good −, 41.5%). Naturally a majority-grade+ is ahead of a majority-grade− in the majority-ranking. Of two majority-grade+’s the one having the higher number or percentage of grades better than the majority-grade is ahead of the other; of two majority-grade−’s the one having the higher number or percentage of grades worse than the majority-grade is behind the other. For example: S. Royal and N. Sarkozy both have the majority-grade Good−, Royal has 41.5% worse than Good and Sarkozy 46.9% worse than Good, so Royal ﬁnishes ahead of Sarkozy; and, O. Besancenot and M.-G. Buﬀet both have the majority-grade Poor+, Besancenot has 46.3% better than Poor and Buﬀet 43.2% better than Poor, so Besancenot ﬁnishes ahead of Buﬀet.

1st F. Bayrou 2nd S. Royal 3rd N. Sarkozy 4th D. Voynet 5th O. Besancenot 6th M.-G. Buffet 7th J. Bové 8th A. Laguiller 9th F. Nihous 10th P. de Villiers 11th G. Schivardi 12th J.-M. Le Pen

p Better than the majority-grade 44.3% 39.4% 38.9% 29.8% 46.3% 43.2% 34.9% 34.2% 45.0% 44.5% 39.7% 25.7%

α The majority-grade Good + Good − Good − Acceptable− Poor + Poor + Poor − Poor − To Reject To Reject To Reject To Reject

q Worse than the majority-grade 30.6% 41.5% 46.9% 46.6% 31.2% 30.5% 39.4% 40.0% -

Official vote 3 precincts 25.5% 29.9% 29.0% 1.7% 2.5% 1.4% 0.9% 0.8% 0.3% 1.9% 0.2% 5.9%

Official national vote 18.6% 25.9% 31.2% 1.6% 4.1% 1.9% 1.3% 1.3% 1.1% 2.2% 0.3% 10.4%

Table 7. The majority-gauges (p, α, q) and the majority-ranking, three precincts of Orsay, April 22, 2007.

7.5

Majority Judgment: Salient Properties

Here, a selection of properties explain why majority judgment is a good method in theory. The book provides much more arguments that show why majority judgment is a good method in theory and practice.

Majority Judgment: Salient Properties

7.5.1

119

Eliciting Honesty

The strategy a voter adopts depends on her personal likes and dislikes. Some voters and judges may care most about assigning the grades they believe are truly merited. Some may care most about the ﬁnal grades assigned each competitor—and are ready to adjust their assignments so as to attain that end. Others may not care at all about the ﬁnal grades but only about the order-of-ﬁnish of the competitors. Still others may think that only the identity of the winner is of importance. Some few may be bought or bribed. Some other few may simply be completely incompetent judges who assign unwarranted grades. The ﬁnal grade a voter wishes a competitor to be awarded, the ﬁnal grade he believes the competitor merits, and the grade he gives may all be diﬀerent. Some juries and electorates almost certainly include judges and voters who honestly wish grades to be assigned according to merit, and in certain cases it is perfectly reasonable to assume that all the players share this intent. Nevertheless, a very complex set of unknown wishes, opinions, expectations and anticipations—the voters’ or judges’ utility functions—determines the grades they give. One reasonable assumption follows. Suppose that a competitor’s ﬁnal grade is r ∗ . A social-grading function is strategy-proof-in-grading if, when a voter’s input grade is higher than the ﬁnal grade, r + > r ∗ , any change in his input can only lead to a lower ﬁnal grade; and if, when a voter’s input grade is lower than the ﬁnal grade, r − < r ∗ , any change in his input can only lead to a higher ﬁnal grade. Assume the more a ﬁnal grade deviates from the grade a voter wishes it to be the less she likes it (“single-peaked preferences over grades”). Then being strategy proof in grading implies that it is a dominant strategy for her to assign the grade she believes is merited: that is, it is at least as good as any other strategy and it is strictly better than others in some cases. The use of a strategy-proof-in-grading function permits an “honest-grade-seeking” judge—one whose objective is a ﬁnal-grade as close as possible to the grade he believes should be assigned—to discard all strategic considerations and to concentrate on the task of deciding what he believes is the true grade; moreover, he has no need to pay attention to his preference between two grades when one is lower than the true grade and the other higher. Theorem 7.2 The unique strategy-proof-in-grading social-grading functions are the order functions (for a finite or an infinite number of grades). A competitor who receives a higher majority-grade than another is naturally ranked higher in the order of the candidates or alternatives than the other: grades imply orders. But when an important component of the voters’ utilities are the orders of ﬁnish and not merely the ﬁnal grades of competitors, their strategic behavior may well alter. Given a proﬁle of grades (rjI ) where rjI ∈ [0, R], let the vector of ﬁnal grades be r I . Suppose the ﬁnal grades of some two competitors A, B ∈ C are r A < r B , but that some voter j is of the opposite conviction, rjA > rjB . She would like either to increase A’s ﬁnal grade, or decrease B’s ﬁnal grade, or better yet do both.

120

Chapter 7. Social Choice: New Model and Method When the ﬁnal grade of A is lower than that of B, r A < r B , and any voter j is of the opposite conviction, rjA > rjB , a social-ranking function is strategyproof-in-ranking if he can neither decrease B’s ﬁnal grade nor increase A’s ﬁnal grade.

Consider a voter j whose utility function uj depends only on the ultimate ranking of the competitors, that is, only on the order of the ﬁnal grades. Then if the aggregation function is strategy-proof-in-ranking, it is a dominant strategy for voter j to assign grades according to his convictions since it serves no earthly purpose to do otherwise. Theorem 7.3 There exists no social-ranking function that is strategy-proof-in-ranking. It is an immediate consequence of the next theorem. But the impossibility of perfection does not deny a search for the best possible. A social-ranking function is partially strategy-proof-in-ranking when r A < r B and any voter j is of the opposite persuasion, rjA > rjB , then if he can decrease B’s ﬁnal grade he cannot increase A’s ﬁnal grade and if he can increase A’s ﬁnal grade he cannot decrease B’s ﬁnal grade. Theorem 7.4 The unique social-ranking functions that are partially strategy-proof-inranking are the order functions. In elections with many voters (say in the hundreds and above) the majority-gauges (p, α±, q) of the candidates almost always determine the majority-ranking since ties among them almost never occur. Observe that it too is partially strategy-proof-inranking.

7.5.2

Meaningfulness

In the spirit of measurement theory an aggregation must be “meaningful” in both its uses, as social-grading and social-ranking functions: the particular language of grades that is used should make no diﬀerence in the ultimate outcomes. By way of an analogy, distance in the absolute and in comparisons should not change the ultimate outcomes when the scale is meters rather than yards. A social-grading function f is language-consistent if f φ(r1 ), . . . , φ(rn ) = φ(f (r1, . . . , rn )

for any increasing, continuous transformation φ of the grades of each voter. For example, when a Franco-American jury assigns grades to students, and each member is asked to give a grade in both of the languages, the French and the American grading

Majority Judgment: Salient Properties

121

systems, language-consistency asks that the aggregate French grades rank the students in the same order as the aggregate American grades. Order functions are clearly language-consistent: the kth highest grade remains the kth highest grade under increasing, continuous transformations. It is well known that the reverse is true as well: Theorem 7.5 The unique social-grading functions that are language-consistent are the order functions. To be meaningful as a social-ranking function the analogous property must hold for rankings as well. A social-ranking function S is order-consistent if the order between any two candidates for some proﬁle Φ implies the same order for any proﬁle Φ′ obtained from Φ by any increasing, continuous transformation φ of the grades of each voter. The order functions are clearly order-consistent. To characterize them requires an additional, eminently acceptable, property; namely, that an increase in a candidate’s grade necessarily helps. A social-ranking function S is choice-monotone if A S B and a judge increases the grade of A implies A ≻S B. Note in passing that the traditional model’s diﬃculties with monotonicity are completely eliminated. Majority judgment is at once choice-, rank- and strongly-monotone. The reason is simple: a change in heart concerning one candidate is expressed by the grade he is given, but that changes nothing in the inputs concerning the other candidates. Theorem 7.6 (Hammond 1976, D’Aspremont and Gevers 1977) The unique choicemonotone, order-consistent social-ranking functions f are the lexi-order functions. A lexi-order social-ranking function is a permutation σ of the order functions f σ = (f σ(1) , . . . , f σ(n) ), that ranks the candidates by A ≻S B

if

f σ(1) (A), . . . , f σ(n) (A) ≻lex f σ(1) (B), . . . , f σ(n) (B) .

Here ≻lex means the lexicographic order: the ﬁrst term where the corresponding grades diﬀer A’s is higher. There are n! lexi-order social-ranking functions. The idea is simple: some order function decides; if it doesn’t because there is a tie, a second order function is invoked; if there is a tie in the second order function, a third is called upon; and so on. The importance of Arrow’s impossibility becomes very clear in this context. A socialranking function is preference-consistent if the order between any two candidates for some proﬁle Φ implies the same order for any proﬁle Φ′ obtained from Φ by increasing,

122

Chapter 7. Social Choice: New Model and Method

continuous transformations φj of the grades of each voter j. For voters’ rank-orders to be meaningfully amalgamated there must exist a preference-consistent social-ranking function. But Arrow’s theorem tells us that there exists no monotonic preference-consistent social-ranking function. It says that there is no meaningful way of amalgamating the voters’ inputs when they have no common language. This is the deep enduring signiﬁcance of Arrow’s theorem (rather than the supposed impossibility of surmounting Arrow’s paradox). But this should not be surprising: how can agreement be found among persons who cannot communicate! Once again, only the order functions will do. But why the majority-grade and why the majority-value?

7.5.3

Resisting Manipulation

To manipulate successfully a voter (or judge) must be able to raise or to lower a candidate’s (or competitor’) ﬁnal grade by changing the grade he assigns. In some situations voters can only change a ﬁnal grade by increasing his grade, in others only by decreasing it. Voters who can both lower and raise the ﬁnal grade have a much greater possibility of manipulating: an outsider seeking to bribe or otherwise inﬂuence the outcome would surely wish to deal with such voters. Given a social-grading function f and a proﬁle of a candidate’s grades r = (r1 , . . . , rn ), let µ− f (r) be the number of voters who can decrease the ﬁnal grade, µ+ f (r) be the number of voters who can increase the ﬁnal grade, and µ f (r) = µ− f (r) + µ+ f (r) . Take the measure of manipulability µ of a social-grading function f to be the worst that can happen, µ(f ) = maxr µ f (r) ≤ 2n. It is easily veriﬁed that µ(f k ) = n + 1 for any order function f k . By way of contrast, for f a point-summing method µ(f ) = 2n. In fact, the only social-grading functions f for which µ(f ) = n + 1 are the order functions. For assume µ(f ) ≤ n + 1 and take any r. If more than one voter can both increase and decrease the ﬁnal grade, then, since all other voters can either increase or decrease the ﬁnal grade, µ(f ) ≥ n + 2, a contradiction. Therefore, at most one voter can both increase and decrease the ﬁnal grade, implying f must be an order function. Taking λ to be the probability the briber wishes to increase the grade and 1 − λ that he wishes to decrease the grade, a social-grading function is sought that minimizes the probability that a voter may be found who can eﬀectively raise or lower the grade in the worst case. The probability of cheating Ch(f ) with a social-grading function f is λµ+ f (r) + (1 − λ)µ− f (r) Ch(f ) = max max . r=(r1 ,...,rn ) 0≤λ≤1 n What social-grading functions minimize the probability of cheating? A social-grading function is middlemost if it is deﬁned by a middlemost ag-

Majority Judgment: Salient Properties

123

gregation function f , where for r1 ≥ . . . ≥ rn , f (r1 , . . . , rn ) = r(n+1)/2 when n is odd, and rn/2 ≥ f (r1 , . . . , rn ) ≥ r(n+2)/2 when n is even. When n is odd, there is exactly one such function, f (n+1)/2 . When n is even, there are inﬁnitely many; in particular, f n/2 is the upper-middlemost and f (n+2)/2 is the lower-middlemost. An aggregation function f depends only on the middlemost interval means that f (r1 , . . . , rn ) = f (s1 , . . . , sn ) when the middlemost interval of the grades r = (r1 , . . . , rn ) and the grades s = (s1 , . . . , sn ) is the same. Theorem 7.7 The unique social-grading functions that minimize the probability of cheating are the middlemost that depend only on the middlemost interval. When f is the max or the min order function, or the average function, the probability of cheating is maximized: Ch(f ) = 1. When f is a middlemost order function, Ch(f ) ≈ 21 . In this sense, the middlemost cut cheating by half. The unique meaningful social-ranking functions are the lexi-order functions, each a sequence of all n order functions that determines the ﬁnal ranking of the candidates. Which among the n! of them minimize cheating? To determine the ranking between any two candidates, the ﬁrst order function decides, unless there is a tie; in which case the second order function decides, unless the ﬁrst two are tied; in which case the third decides, unless the ﬁrst three are tied; and so on. The need to use each succeeding order function becomes increasingly rarer. Accordingly, it is of the ﬁrst importance to minimize the probability of cheating in the ﬁrst order function: by theorem 7 this is accomplished by choosing an order function that is in the (ﬁrst) middlemost interval: it is unique if n is odd and one of two is n is even, namely, f (n+1)/2 when n is odd and either the upper-middlemost f n/2 or the lower-middlemost f (n+2)/2 when n is even. Given that choice, there are now n − 1 order functions to choose from and the ﬁrst importance to minimize the probability of cheating is once again to take a middlemost of those that remain: it is either unique or one of two. Given the ﬁrst two choices, there are n − 2 to choose from, a middlemost must again be taken, and so on iteratively. To see this more clearly, consider a ﬁnite language of number grades going from a high of 10 to a low of 0 and a candidate who receives the seven grades {10, 9, 7, 6, 4, 3, 2}. The ﬁrst order function of a lexi-order function that minimizes the chance of cheating is the middlemost, in this case its value is 6. The second that minimizes the chance of cheating is either the upper- or the lower-middlemost, in this case its value is 7 or 4. If it is the upper-middlemost (its value 7) the next middlemost is unique (with value 4), if it is the lower-middlemost (its value 4) the next middlemost is unique (with value 7).

124

Chapter 7. Social Choice: New Model and Method

Consider, by way of a practical illustration, how the judges might try to manipulate the outcome to obtain what they believe is a better order-of-ﬁnish by falsifying their grades in the skating competition (see tables 1, 2 and 5). Assume the grades they gave are honest, and that their utility functions on the order-of-ﬁnish is lexicographic: what matters most to each judge is the winner, next the second place skater, and so on. The eﬀective possible manipulations of the judges are: • J1 would like Savoie in 2nd place, Li in 3rd. He gave Savoie (with majoritygrade 10.8) an 11.1: raising Savoie’s grade accomplishes nothing. He gave Li (with majority-grade 10.9) a 10.8: lowering Li’s grade accomplishes nothing. J1 would like Weiss in 4th place, Honda in 5th. He cannot lower Honda below anyone. He can place Weiss in 4th place by increasing his grade to 10.7; but if he increased it to 10.8, Weiss would leap ahead of Savoie, not at all his intention. • J2 would like to raise Honda and Weiss above Savoie. He can do nothing to raise either Honda or Weiss; but he can lower Savoie below them by decreasing his grade to 10.6. • J3 agrees with J1 , she would like Savoie in 2nd place, Li in 3rd. Raising Savoie’s grade and lowering Li’s does not reverse their order. Indeed, even in collusion J1 and J3 could not together inverse the order of Li and Savoie. • J4 would like to push Honda up to 2nd place, Li and Savoie down to 3rd and 4th. Increasing Honda’s grade accomplishes nothing; nor does decreasing Li’s. By decreasing Savoie’s to 10.7 she can place Honda in 3rd and Savoie 4th; but if she decreased it to 10.6, Savoie would vault down to 5th. • J5 would like Tamura in 2nd place not 6th, but he cannot raise Tamura above any other skater nor can he lower any other skater below Tamura. The best he can do is to raise Honda to 3rd place by assigning him 10.9. • J6 faces a situation similar to J5 ’s, though she can lower Honda and Weiss below Tamura, her 2nd place skater, thus putting Tamura in 4th place. In fact, acting together J5 and J6 can do no more than placing Tamura above Honda and Weiss. • J7 would like to push Weiss up to 2nd place and Savoie down to last place. He can do nothing to change the standings. • J8 would like to put Tamura in 3rd place ahead of Savoie and invert the positions of Honda and Weiss. He can accomplish the ﬁrst wish by increasing Tamura’s grade to 10.9, but can do nothing about the second. • J9 would like to put Honda in 6th place. The best she can do is to put him in 5th by decreasing his grade to 10.6.

Majority Judgment: Salient Properties

125

All judges are contented with the 1st place of Eldredge. None can change Li’s 2nd place; the only eﬀective manipulations concern skaters in 3rd place or below. Two judges can do nothing (J3 , J7 ); one can realize his preferred order-of-ﬁnish by moving his candidate for 5th place from 3rd place to 5th place (J2 ); four can invert the order of two consecutive skaters in the order-of-ﬁnish (J1 , J4 , J5 , J9 ); one can move his candidate for 2nd place from 6th place to 4th place (J6 ); and one can move his candidate for 3rd place from 6th place to 3rd place (J8 ). This comparison with point-summing assumes that judges only care about the order-of-ﬁnish, which is almost certainly false, for they are likely to give importance to the absolute ﬁnal scores of the skates if not other considerations as well. Proven in theory, practice conﬁrms that majority judgment is much better at resisting manipulation than point-summing and so also in eliciting honesty.

7.5.4

Majority and Consensus

Thus there are some 2n/2 lexi-order functions that minimize the chance of cheating. Which among them should be chosen? The basic idea—a candidate’s majority-grade—is ﬁrmly based on the majority’s will: it is the highest grade α that commands an absolute majority in answer to the question: “Does this candidate merit at least an α?” Moreover, the unique social-grading functions that assign a candidate the ﬁnal grade α if a majority of voters assign her α are the middlemost aggregation functions. But when there are many voters and a language of relatively few grades the two middlemost order functions will (almost always) have one value, the majority-grade. Another basic collective decision idea—a kind of “unanimity”—also singles out the majority-grade f maj among the social grading functions. A social grading function respects consensus when all of A’s grades belong to the middlemost interval of B’s grades implies that A’s ﬁnal grade is not below B’s ﬁnal grade. The rationale is evident: when a jury is more united on the grade of one alternative than on that of another, the stronger consensus must be respected by the award of a ﬁnal grade no lower than the other’s. Recall that the majority-grade f maj is the lower-middlemost order function. Theorem 7.8 The majority-grade f maj is the unique middlemost social grading function that respects consensus. A similar concept singles out the majority-ranking ≻maj among the social ranking functions. Consider an ordered set of input grades r1 ≥ · · · ≥ rn . The 1st -middlemost interval is the middlemost interval previously deﬁned. The 2nd -middlemost interval is the middlemost interval when the deﬁning grades of the 1st -middlemost interval are ignored. The k th -middlemost interval is the middlemost interval when the deﬁning grades of the previous middlemost intervals are ignored. For example, when the set of grades

126

Chapter 7. Social Choice: New Model and Method

is {10, 9, 7, 6, 4, 3, 2} the 1st -middlemost interval is [6, 6], the 2nd is [7, 4], the 3rd is [9, 3], and the fourth is [10, 2]. Suppose the grades of A and B are rA = (r1A , . . . , rnA ), rB = (r1B , . . . , rnB ). A social ranking function is a middlemost if A ≻S B depends only on the set of grades that belong to the ﬁrst of the k th -middlemost intervals where they diﬀer. For example, if A’s grades are those of the example just given and B’s are {10, 10, 7, 6, 4, 3, 1}, then the ﬁrst interval where they diﬀer is the 3rd : A’s is [9, 3] and B’s is [10, 3]. This is a natural extension of the idea of a middlemost social grading function that depends only on the middlemost interval. Suppose the ﬁrst of the j th -middlemost intervals where A’s and B’s grades diﬀer is the k th . A social-ranking function rewards consensus when all of A’s grades strictly belong to the k th -middlemost interval of B’s grades implies that A is ranked above B, A ≻S B. Thus, A is ranked above B for the example just given by a SRF that rewards consensus. This is a natural extension of the idea of respecting consensus for a social-grading function. Theorem 7.9 The majority-ranking ≻maj is the unique middlemost, choice-monotone social-ranking function that rewards consensus. The choice of the lower-middlemost order function for ranking and electing is the consequence of seeking consensus.

7.5.5

Equilibria and Condorcet Consistency

Throughout it is assumed that there are at least n ≥ 3 voters or players. A voter of the game chooses a strategy t in a set Θ (which is the same for all players). The strategies depend on the underlying model: it may be a rank-order of candidates, grades of a common language assigned to candidates, points assigned to candidates or indeed any abstract set. A strategy-profile is a (t1 , . . . , tn ) ∈ Θn , where ti is the strategy of player i. A mechanism is a function that associates to each strategy-proﬁle a winning candidate. Each voter or player i has a utility function ui that depends exclusively on the identity of the winner. A candidate X is a Nash-equilibrium-winner for a given method of election if there exists a strategy-proﬁle for which X is the winner and for any other candidate Y and any voter i with utility ui (Y ) > ui (X), i cannot make Y the winner by unilaterally changing his strategy. A method of election admits no veto-power (Maskin 1999) if for any candidate X there exists a strategy t ∈ Θ that when used by n − 1 voters assures the election of X.

Majority Judgment: Salient Properties

127

Theorem 7.10 (A folk theorem) For any method that admits no veto-power, every candidate is a Nash-equilibrium-winner. Proof. Take any candidate X, and suppose all n voters use the strategy t that permits any n − 1 of them to elect X. If any voter deviates, the remaining n − 1 still elect X. Thus, the concept of a Nash-equilibrium-winner—with no further reﬁnement—is, as well known, of little use. This result has given rise to a large literature that seeks reﬁned concepts of equilibria. For the most part the results are negative, many equilibria remain, and the diﬀerent concepts make it diﬃcult if not impossible to compare the relative merits of the diﬀerent results. A focus of recent interest is the question: when is there a “strongequilibrium-winner” (Aumann 1959), meaning that no coalition of voters can deviate from their strategies and thereby elect a preferred candidate (Sertel and Sanver 2004). This concept is particularly germane to elections because voters talk among themselves (orally and via the net), belong to political parties (who sometimes give careful instructions for exactly how to vote, as in Australia), have access to large amounts of information (much of it common), including repeated opinion polls, and so sets of voters may often adopt the same strategies. The central fact is that when any reasonable method of election of the traditional or the new model is formulated as a game—with the inputs the strategies and voters’ utilities devoid of animal spirits—only the Condorcet-winner can be a strong-equilibrium outcome. A method of election is majoritarian if for any candidate X and any strict majority of voters, there exists a common strategy t ∈ Θ for each of them such that whatever the strategies of the others, X is elected. A candidate X is a strong-equilibrium-winner for a given method of election if there exists a strategy-proﬁle for which X is the winner and for any other candidate Y and any coalition of voters i ∈ S with utilities satisfying ui (Y ) > ui (X), the voters of S cannot make Y the winner by together changing their strategies. A candidate C is a Condorcet-winner if there is no candidate X strictly preferred to C by a majority of the voters (i.e., their utilities satisfy ui (X) > ui (C)). Theorem 7.11 For any majoritarian method, a candidate is a strong-equilibrium-winner if and only if the candidate is a Condorcet-winner. A method of election is weakly-majoritarian if for any candidate X and any strict majority of voters K there exists a strategy-proﬁle tK ∈ ΘK for the players K such that whatever the strategies of the others, X is elected. Theorem 7.12 Only a weakly-majoritarian method can always implement the Condorcet-winner as a strong-equilibrium-winner. Condorcet’s, ﬁrst- and two-past-the-post, approval voting, single transferable vote, majority judgment and point-summing are all majoritarian methods, and so weaklymajoritarian. But Borda’s is not, as the following theorem shows.

128

Chapter 7. Social Choice: New Model and Method

Theorem 7.13 Borda’ method is not weakly-majoritarian. Thus Borda’s method cannot always implement a Condorcet-winner as a strongequilibrium-winner. More is true. A method of election is if for any candidate X and any strategy of a minority, the majority always has a best-response strategy that elects X. Any reasonable method—including Borda’s and sum-scoring methods—is a member of this class! Theorem 7.14 If a best-response-majoritarian method has a strong-equilibrium-winner, that candidate must be the Condorcet-winner. Thus any reasonable method can only elect the Condorcet-winner as a strong-equilibriumwinner. But for the Condorcet-winner to be elected, the method must be weaklymajoritarian. This excludes Borda’s method.

7.5.6

Honest Equilibria

Inherent in the last section is that—for most methods—when the utilities of voters depend only on the identity of the winner, there is a huge number of strong-equilibria strategyproﬁles that elect the Condorcet-winner. In practice, the most likely of the equilibria is when the voters express themselves as honestly as possible. Are there equilibria in which all or most of the voters express themselves honestly? This is important because the outcomes of elections should come as close as possible to reﬂecting the true opinions of voters. Theorem 7.15 For any middlemost aggregation mechanism there exists a strong-equilibrium that elects the Condorcet-winner C with his true middlemost grade α when voters have no indifferences. Moreover, every candidate is assigned a majority of honest grades. Proof. Let α be the true middlemost grade of a Condorcet-winner C. To begin, note that α is necessarily above the minimum grade αmin . For suppose not, then a majority of C’s grades are αmin . The no indiﬀerence assumption implies that any other candidate B must be preferred by a majority to C, contradicting the fact that C is the Condorcet-candidate. Construct a strategy-proﬁle as follows. • All voters who assign C at least α give their honest grades, all others give C the grade α; • Any voter who assigns a grade α or above to another candidate B and who prefers C to B, gives B a lower grade than α (barely below will do); otherwise, voters give candidates other than C their honest grades. With this strategy-proﬁle, C’s middlemost grade is α, the true one. Take B to be any other candidate and let S be the coalition of voters that prefers B to C. S is necessarily a strict minority of voters. The strategy-proﬁle implies that all voters outside S—a strict

BIBLIOGRAPHY

129

majority—give to B grades below α, so B’s middlemost grade must be below α. Thus C is the winner. The coalition S—the only voters who wish to make B the winner instead of C— cannot increase B’s middlemost grade because the majority outside S (that prefers C to B) gives grades below α to B; nor can it decrease C’s middlemost grade because the majority outside S (that prefers C to B) gives at least α to C. So C is a strong-equilibrium winner. The outcome of this equilibrium enjoys very desirable properties. First, the winner C has his true middlemost-grade α. Second, only voters that grade C below α—a strict minority—cheat by increasing the grades they give him. Third, voters that cheat by decreasing another candidate B’s grades are only those that grade C strictly above α—a strict minority. Thus, the majority of the grades given any candidate are the true grades of voters. The strategy-proﬁles in the proof may be justiﬁed as follows: ﬁrst, voters are motivated by the wish to elect a certain candidate; second, they are motivated by the wish to express themselves as honestly as possible or to give candidates ﬁnal grades as close as possible to their assessments. Other reﬁnements of Nash equilibria lead to the same conclusion.

Bibliography [1] Arrow K.J. (1951). Social Choice and Individual Values. New Haven CT: Yale University Press. [2] Aumann R.J. (1959). Acceptable Points in General Cooperative N-person Games. In Contributions to the Theory of Games I V, Annals of Mathematics Studies 40, ed. R. D. Luce and A. W. Tucker, 287-324. Princeton, NJ: Princeton University Press. [3] Balinski M., A. Jennings and R. Laraki. (2009). Monotonic Incompatibility Between Electing and Ranking. Economics Letters, 105, 145-147. [4] Balinski M. and R. Laraki (2010a). Majority Judgment: Measuring, Ranking, and Electing. Cambridge MA: MIT Press. [5] Balinski M. and R. Laraki (2010b). Election by Majority Judgment: Experimental Evidence. In ed. by B. Dolez, B. Grofman, and A. Laurent, In Situ and Laboratory Experiments on Electoral Law Reform: French Presidential Elections. Berlin: Springer. [6] Balinski M. and R. Laraki (2010c). Judge: Don’t Vote! Cahier du Laboratoire d’Econométrie de l’Ecole Polytechnique. [7] Balinski M. and R. Laraki (2007). A Theory of Measuring, Electing and Ranking. Proceedings of the National Academy of Sciences, U.S.A., 104, 8720-8725.

130

Chapter 7. Social Choice: New Model and Method

[8] Borda, Jean-Charles le Chevalier de (1784). Mémoire sur les Élections au Scrutin. Histoire de l’Académie royale des sciences 657-665. [9] Condorcet, Jean Antoine Caritat le Marquis de (1785). Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris: l’Imprimerie royale. [10] Copeland A.H. (1951). A ‘Reasonable’ Social Welfare Function. Seminar on Mathematics in the Social Sciences, University of Michigan. [11] Dasgupta P. and E. Maskin (2008). On the Robustness of Majority Rule. Journal of the European Economics Association, 6, 949-973. [12] Dasgupta P. and E. Maskin (2004). The Fairest Vote of All. Scientific American March. [13] D’Aspremont C. and L. Gevers (1977). Equity and the Informational Basis of Collective Choice. Review of Economic Studies, 44, 199-209. [14] Four Continents Championships 2001: Men - Short program. Found July 19, 2010 at: http://icecalc.org/events/fc2001/results/SEG089.HTM [15] Galton F. (1907). One Vote, One Value. Nature, 75, 414. [16] Gibbard A. (1973). “Manipulation of Voting Schemes: a General Result.” Econometrica, 41, 587-601. [17] Hägele G. and F. Pukelsheim (2001). Llull’s Writings on Electoral Systems. Studia Lulliana, 41, 3-38. [18] Hägele G. and F. Pukelsheim (2008). The Electoral Systems of Nicolas of Cusa in the Catholic Concordance and Beyond. In The Church, the Councils and Reform: Lessons from the Fifteenth Century, ed. Hg. G. Christianson, T.M. Izbicki, and C.M. Bellitto, 229-249. Washington, D.C.: Catholic University of America Press. [19] Hammond P. (1976). Equity, Arrow’s Conditions, and Rawls’ Diﬀerence Principle. Econometrica, 44, 793-804. [20] Krantz D.H., R.D. Luce, P. Suppes and A. Tversky (1971). Foundations of Measurement, Vol. I. New York: Academic Press. [21] Kurrild-Klitgaard P. (1999). An Empirical Example of the Condorcet Paradox of Voting in a Large Electorate. Public Choice, 107, 1231-1244. [22] Laplace, Pierre-Simon le Marquis de (1820). Théorie Analytique des Probabilités, 3rd edition. Paris: Mme VE Courcier, Imprimeur-Libraire pour les Mathématiques.

BIBLIOGRAPHY

131

[23] Miller G.A. (1956). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63 81-97. [24] Satterthwaite M.A. (1973). Strategy-Proofness and Arrow’s Conditions: Existence and Correspondence Theorems for Voting Procedures and Social Welfare Functions. Journal of Economic Theory, 10, 187-217. [25] Sertel, M. R., and M. R. Sanver (2004). “Strong Equilibrium Outcomes of Voting Games are the Generalized Condorcet Winners.” Social Choice and Welfare, 22, 331-347. [26] Zitzewitz E. (2006). Nationalism in Winter Sports Judging and its Lessons for Organizational Decision Making. Journal of Economics and Management Strategy, 15, 67-100.

Habilitation Ã Diriger des Recherches MathÃ©matiques ...

Wn for the associate function VÎ . Hence Wn(1, p, q) := 0 and for m = 0, ..., n â 1,. Wn(m n. , p, q) satisfies: Wn (mn, p, q) = max xââ(I)K min yââ(J)L [1ng(p, q, x, ...

Download PDF

1MB Sizes 1 Downloads 38 Views

Report

Habilitation Ã Diriger des Recherches MathÃ©matiques ...

Recommend Documents