Assignment 1 Can Yang Department of Mathematics Hong Kong Baptist University September 25, 2014 1. In our lecture note, “Fisher proposed ‘maximum likelihood estimator’ (MLE). Then he proved that the MLE was always consistent and that (under a few assumptions known as ‘regularity conditions’) it was the most efficient of all statistics. Furthermore, he proved that, if the MLE is biased, the bias can be calculated and subtracted from the MLE”. Give an example that the MLE can be biased but the bias can be corrected. 2. Consider the gamma function

∫ Γ(x) ≡



ux−1 e−u du,

(1)

0

which is widely involved in probability density function, such as the Beta distribution, the student t distribution, and the χ2 distribution. (a) Using integration by parts, prove the relation Γ(x + 1) = xΓ(x).

(2)

Also show that Γ(1) = 1 and hence that Γ(x + 1) = x! when x is an integer. (b) Calculate the expectation of the random variable 1/x, where x is a random variable with chi-square distribution χ2ν , where ν is the degree of freedom. Note that this property is used in derivation of the James-stein estimator. (c) Use R or Matlab to generate N = 100 samples from χ2ν , i.e. , {x1 , . . . , x100 } and obtain the mean of 1/xi . Repeat this process 1000 times to generate the boxplot of the mean and verify (b). Hint: The probability density function (pdf) of the χ2ν distribution is 1 2ν/2 Γ(ν/2)

xν/2−1 e−x/2 .

So you can write this problem as ∫ ∞ ∫ ∞ 1 1 1 ν/2−1 −x/2 x e dx = xν/2−2 e−x/2 dx. ν/2 Γ(ν/2) ν/2 Γ(ν/2) x 2 2 0 0

(3)

(4)

After that, you can rewrite the integrand so that it is the pdf of a χ2ν−2 random variable, which will then integrate to 1. The leftover constant factor will be the expected value. During the rewriting process, you may need (2). 1

3. The James-Stein estimator uses

N −2 ∑N 2 i=1 zi

as an estimate of

1 1+σ 2

as shown in Exercise 2.

Note that σ 2 , an unknown parameter in the prior distribution, is adaptively estimated from the observed data z = {z1 , . . . , zN }, which is an example of so-called “Empircal Bayes” (i.e. , methods assign a prior and estimate it adaptively from observed data). In this exercise, we are going to compare EB with Maximum A Posteriori (MAP) estimation, in which the prior is fixed rather than adaptively estimated from data. Specifically, use R or Matlab to generate µ = {µ1 , . . . , µN } in which µi is independently from N (0, σ02 ), where σ02 = 1. After that, you generate zi |µi ∼ N (µi , 1). Following the James-stein estimator, you can assign prior as µi ∼ N (0, σ 2 ) and pretend that the true parameter σ02 is unknown. At each fixed point (σ 2 =0.5, ˆ M AP and compare these four MAP estimates with 1, 1.5 and 2), try to obtain MAP estimates µ the James-Stein estimate using ˆ − µ∥2 ∥µ err = (5) N ˆ ∈ RN is the estimate and the norm ∥ · ∥ is the standard Euclidean norm, also known where µ as the ℓ2 norm. Show your results in the following two cases: (a) N = 10 and (b) N = 100. Hint: To compare performance of different methods, you should not only draw one random data set but do it many times (Recall Fisher’s view that the observed data come from some (l) (l) underlying distributions). Specifically, you should draw µ(l) = {µ1 , . . . , µN } from N (0, 1), apply MLE and JSE to µ(l) and then compute their errors using (5). Repeat this process L times, say, L = 1000, and report the final result using boxplot. 4. A lady said that tea tasted different depending upon whether the tea was poured into the milk or whether the milk was poured into the tea. (a) Under the null hypothesis that tea tasted the same, what is the p-value if she correctly identified whether the tea was poured into the milk or whether the milk was poured into the tea 8 times out of 10 tests? (b) Write an R or Matlab code to generate the null distribution and verify your p-value calculation. (For the real story of “Lady tasting tea”, it was a sunny summer afternoon in Cambridge, England, in the late 1920s. Ronald. A. Fisher, the founding father of modern statistical theory, did the experiment and the lady identified every single one of the cups correctly). 5. Consider six sequenced DNA fragments from three persons, ID 1: ATGCCTA ID 1: ATGCTTA ID 2: AAGCTTA ID 2: AAGCTTA ID 3: ATGCTTA ID 3: ATGCCTA

(a) Write down the complimentary sequences. (b) Find out the SNPs and write down their genotypes. (c) Find out the major allele for each SNP. 6 (This is an OPTIONAL exercise of deriving MLE using properties of matrix, which may help you to understand why we need matrices). Given a data set X = {x1 , . . . , xN }T in which the 2

observation {xn } are drawn independently from a multivariate Gaussian distribution, { } 1 1 1 T −1 exp − (x − u) Σ (x − µ) . N (x|µ, Σ) = (2π)D/2 |Σ|1/2 2

(6)

Show that (a) the MLE of the mean is given by µM L

N 1 ∑ = xn , N n=1

(7)

and (b) the MLE of the covariance is given by ΣM L

N 1 ∑ = (xn − µM L )(xn − µM L )T . N n=1

(8)

Hint: You need to first write down the log likelihood function and maximum it w.r.t µ and Σ. To show (8), you may need the some properties of matrices from “Appendix C. Properties of Matrices” of Bishop C. (2006) “Pattern recognition and Machine Learning”, such as (C.21), (C.26) and (C.28). This assignment is due to Oct. 8, 2014.

3

hw1.pdf

(For the. real story of “Lady tasting tea”, it was a sunny summer afternoon in ... hw1.pdf. hw1.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying hw1.pdf.

34KB Sizes 1 Downloads 111 Views

Recommend Documents

No documents