Occupancy problem with infinitely many boxes Olivier Durieu We propose to study a multinomial occupancy scheme in which balls are thrown independently at a fixed infinite series of boxes, with probability pj of hitting the j-th box. This model was originally investigated by Bahadur (1960) and the case of regular varying frequencies (pj ) was investigated by Karlin (1967). This toy model has concrete applications. In biology or ecology, boxes can be replaced by species in a population of plants or animals and the model is used for estimating a number of species and obtaining extinction rate, see Bunge and Fitzpatrick (1993). It is also used in database optimization, learning methods .... We denote Yn,j the number all balls in the j-th box at time n and we consider the random variables Kn = #{j | X,nj > 0} the number of non-empty boxes and Kn,r = #{j | X,nj = r} the number of boxes with r balls. The aim of the stage is to derive asymptotic properties of the model by establishing laws of large numbers and central limit theorems for this quantities. This requires moment estimations for which Poissonization–DePoissonization technique is useful. A good survey on the subject is the paper by Gnedin, Hansen and Pitman (2007).
References [1] Raghu R. Bahadur, On the number of distinct values in a large sample from an infinite discrete distribution, Proc. Nat. Inst. Sci. India Part A 26 (1960), no. supplement II, 67–75. [2] J. Bunge and M. Fitzpatrick, Estimating the number of species: a review, J. Am. Stat. Ass. 88 (1993), no. 421, 364–373. [3] Alexander Gnedin, Ben Hansen, and Jim Pitman, Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws, Probab. Surv. 4 (2007), 146–171.
1
[4] Samuel Karlin, Central limit theorems for certain infinite urn schemes, J. Math. Mech. 17 (1967), 373–401.
2