A Brief Introduction to Large Deviations Theory Gilles Wainrib
Abstract In this paper we introduce the main concepts of large deviations theory. We state some of the main theorems with several examples, from Cram´er theorem for the sum of independent random variables, to Freidlin-Wentzell theory of random perturbation of dynamical systems.
1 Introduction Large deviations theory is concerned with an asymptotic description of the fluctuations of a system around its most probable behavior. The first example of such a description goes back to Boltzmann’s 1877 calculation [4] for a system of independent particles, establishing a fundamental link between the notion of entropy and the asymptotic exponential behavior of multinomial probabilities. The entropy of a system at equilibrium measures the number of microscopic configurations leading to a given macroscopic state, and the state of maximum entropy corresponds to the most probable state. Not only at the core of thermodynamics and statistical physics, entropy has played a major role in many areas of science. In life sciences, entropy is an important concept, from evolution theory to protein unfolding and self assembling and molecular motors, not to mention its links with information theory which is widely applied in genetics or in neuroscience. Sharing the perspective of [10], large deviations theory may be viewed as a mathematical investigation of the concept of entropy. Describing fluctuations beyond the Central Limit Theorem (CLT), this theory provides exponential estimates for rare events analysis, which is a field of growing interest in many applied sciences. Let us give an illustration with an elementary example. If one throws n coins, then for large n the proportion of heads will be close to 1/2 with high probability. Gilles Wainrib 1 CREA (Ecole polytechnique - CNRS), 2 IJM ( CNRS - Paris 7 - Paris 6), 3 LPMA (Paris 6 - Paris 7 - CNRS), e-mail: [email protected]
1
2
Gilles Wainrib
This is the √ law of large numbers. The CLT states that the typical fluctuations of order 1/ n around this value are asymptotically normally distributed. This result is valid to evaluate for instance the probability to have between 480 and 510 heads if n = 1000. However, if one wants to evaluate the probability of having over 700 heads, which is a number exponentially small in n, then it is necessary to use the information contained in the higher moments of the random variable ”coin toss”, whereas the CLT only uses the first two moments. Large deviations theory provides answers to this question through an appropriate transform of the moment generating function (exponential moments) that is related to the concept of relative entropy. Historically, the first mathematical result describing the large fluctuations around its mean for a sum of independent random variables is due to Cram´er [6] (section 2). The mathematical pendent to Boltzmann’s calculation is Sanov’s theorem [17] (section 2) for the empirical measure of independent random variables. The general theoretical framework (section 3) for this type of asymptotic results has been developed afterward, in particular by Stroock and Varadhan. A key result in this framework is the G¨artner-Ellis theorem (section 3), generalizing the results of Sanov and Cramer to the case of dependent random variables. Small random perturbations of dynamical systems (section 4) have been investigated by Freidlin and Wentzell [13] within the framework of large deviations for sample paths, and have many applications, for example for the problem of exit from a domain. This paper is not intended to be a detailed and precise account of large deviations results, but rather an introductory guide, and we encourage the curious reader to consult the standard mathematical textbooks [8, 7] on this topic.
2 Sum of independent random variables Consider a sequence of independent and identically distributed (i.i.d) real random variables ξ1 , ξ2 , .... If m := E(ξ1 ) < ∞, then by the strong Law of Large Numbers (LLN), the empirical average 1 n (1) An = ∑ ξi n i=1 converges almost surely to m when n → ∞. When n is large but finite, it is of interest to characterize the fluctuations of An around m. A first answer to this question is given by the of √ Central Limit Theorem (CLT), and concerns typical fluctuations √ order O(1/ n) around m. More precisely, if σ 2 := Var(ξ1 ) < ∞, then n(An − m) converges in law to a Gaussian random variable N √ (0, σ 2 ). However, the CLT does not describe properly fluctuations larger than O(1/ n). From the LLN, we know that with a > m, pn (a) := P [An > a] converges to 0 when n → ∞, and we would like to estimate the speed of this convergence according to the value of a. The event {An > a} is often called a rare event, since we will see that pn (a) becomes exponentially small when n is large. A first historical example comes from a problem related to the insurance industry : if Xi is the claim of policy
A Brief Introduction to Large Deviations Theory
3
holder i, what is the probability that the total claim exceeds na, with a > m ? That is, the focus is on the distribution tail of the total claim. Such a question is crucial, since the insurance company may not be able to refund policy holders above a critical value na∗ , and pn (a∗ ) is then the probability of ruin. Contrary to the CLT where only the first two moments of X1 characterize the asymptotic rescaled behavior, describing rare events requires exponential moments to integrate the distribution tail behavior. In the exponential scale, rare events have on average a significant contribution. Theorem 2.1 [6]) (Cramer Assume E eθ X1 < ∞ for all θ ∈ R and define: i h Λ (θ ) := ln E eθ X1 and I(x) := sup {θ x − Λ (θ )}
(2)
θ ∈R
the Legendre transform of Λ . Then, for all a ∈]m, 1] and a0 ∈ [0, m[: lim
n→∞
1 1 ln P [An > a] = − inf I(x) and lim ln P An < a0 = − inf0 I(x) x>a n→∞ n n x
(3)
Another useful tool to obtain LDP deals with the case of a sequence Yn defined as .... defined by, for f â C2 with compact support: Ah f(x) = â i bi(x)fi (x)+ h. 2. â.
Our analysis also focuses on the role of large deviations theory and its interplay with constant gain learning ... learning, can account for the data features and fat-tailed distributions of the priceâdividend ratio. ...... course escapes are more
discrete white noise type initial data, and describe functional large deviations for ... tion interval, momentum, energy, etc, our results extend those of. Ryan and ...
Whoops! There was a problem loading more pages. Retrying... a brief introduction to fluid mechanics young pdf. a brief introduction to fluid mechanics young pdf.
develop a general framework of standards and to reflect on the process and outcome of .... Knowledge Age Cultures in a Bigger Sphere of Standards. Figure 2 ...
This should of course be the âmidpoint" on the curve that connects and . .... In fact, it is possible in a computer graphics program to draw an entire Bezier curve, not ... But of course,. Bezier curves in general need not have any symmetry. As was
Mar 6, 2013 - of the center is .... in these notes we choose G+ and call it simply G(E). ..... [43] S. A. Claridge, J. J. Schwartz, and P. S. Weiss, ACS Nano 5, ...
Feb 19, 2004 - We can do this by an electronic thermometer which consists of a ... calibration of the thermometer might not be correct, the quality of the power-.
number theory and algebra pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. a computational introduction to number ...