Chapter 2 DJM 30 January 2018 What is this chapter about? Problems with regression, and in particular, linear regression A quick overview: 1. 2. 3. 4. 5.

The truth is almost never linear. Collinearity can cause difficulties for numerics and interpretation. The estimator depends strongly on the marginal distribution of X. Leaving out important variables is bad. Noisy measurements of variables can be bad, but it may not matter.

Asymptotic notation • The Taylor series expansion of the mean function µ(x) at some point u ∂µ(x) |x=u + O(kx − uk2 ) ∂x • The notation f (x) = O(g(x)) means that for any x there exists a constant C such that f (x)/g(x) < C. µ(x) = µ(u) + (x − u)>

• More intuitively, this notation means that the remainder (all the higher order terms) are about the size of the distance between x and u or smaller. • So as long as we are looking at points u near by x, a linear approximation to µ(x) = E [Y | X = x] is reasonably accurate.

What is bias? • We need to be more specific about what we mean when we say bias. • Bias is neither good nor bad in and of itself. • A very simple example: let Z1 , . . . , Zn ∼ N (µ, 1). • We don’t know µ, so we try to use the data (the Zi ’s) to estimate it. • I propose 3 estimators: 1. µ b1 = 12, 2. µ b2 = Z6 , 3. µ b3 = Z. • The bias (by definition) of my estimator is E [b µ] − µ. • Calculate the bias and variance of each estimator.


Regression in general • If I want to predict Y from X, it is almost always the case that µ(x) = E [Y | X = x] 6= x> β • There are always those errors O(kx − uk)2 , so the bias is not zero. • We can include as many predictors as we like, but this doesn’t change the fact that the world is non-linear.

Covariance between the prediction error and the predictors • In theory, we have (if we know things about the state of nature)   −1 β ∗ = arg min E kY − Xβk2 = Cov [X, X] Cov [X, Y ] β


• Define v −1 = Cov [X, X]


• Using this optimal value β , what is Cov [Y − Xβ ∗ , X]? Cov [Y − Xβ ∗ , X] = Cov [Y, X] − Cov [Xβ ∗ , X]   = Cov [Y, X] − Cov X(v −1 Cov [X, Y ]), X = Cov [Y, X] − Cov [X, X] v −1 Cov [X, Y ]

(Cov is linear) (substitute the def. of β ∗ ) (Cov is linear in the first arg)

= Cov [Y, X] − Cov [X, Y ] = 0.

Bias and Collinearity • • • • •

Adding or dropping variables may impact the bias of a model Suppose µ(x) = β0 + β1 x1 . It is linear. What is our estimator of β0 ? If we instead estimate the model yi = β0 , our estimator of β0 will be biased. How biased? But now suppose that x1 = 12 always. Then we don’t need to include x1 in the model. Why not? Form the matrix [1 x1 ]. Are the columns collinear? What does this actually mean?

When two variables are collinear, a few things happen. 1. We cannot numerically calculate (X> X)−1 . It is rank deficient. 2. We cannot intellectually separate the contributions of the two variables. 3. We can (and should) drop one of them. This will not change the bias of our estimator, but it will alter our interpretations. 4. Collinearity appears most frequently with many categorical variables. 5. In these cases, software automatically drops one of the levels resulting in the baseline case being in the intercept. Alternately, we could drop the intercept! 6. High-dimensional problems (where we have more predictors than observations) also lead to rank deficiencies. 7. There are methods (regularizing) which attempt to handle this issue (both the numerics and the interpretability). We may have time to cover them slightly.


White noise White noise is a stronger assumption than Gaussian. Consider a random vector . 1.  ∼ N(0, Σ). 2. i ∼ N(0, σ 2 (xi )). 3.  ∼ N(0, σ 2 I). The third is white noise. The  are normal, their variance is constant for all i and independent of xi , and they are independent.

Asymptotic efficiency This and MLE are covered in 420. There are many properties one can ask of estimators θb of parameters θ h i 1. Unbiased: E θb − θ = 0 n→∞ 2. Consistent: hθb − i−−−→ θ b 3. Efficient: V θ is the smallest of all unbiased estimators 4. Asymptotically efficient: Maybe not efficient for every n, but in the limit, the variance is the smallest of all unbiased estimators. 5. Minimax: over all possible estimators in some class, this one has the smallest MSE for the worst problem. 6. . . .

Problems with R-squared SSE =1− 2 i=1 (Yi − Y )

R2 = 1 − Pn

1 n

M SE SSE =1− 2 SST (Y − Y ) i=1 i


• • • •

This gets spit out by software X and Y are both normal with (empirical) correlation r, then R2 = r2 In this nice case, it measures how tightly grouped the data are about the regression line Data that are tightly grouped about the regression line can be predicted accurately by the regression line. • Unfortunately, the implication does not go both ways. • High R2 can be achieved in many ways, same with low R2 • You should just ignore it completely (and the adjusted version), and encourage your friends to do the same

High R-squared with non-linear relationship genY <- function(X, sig) Y = sqrt(X)+sig*rnorm(length(X)) sig=0.05; n=100 X1 = runif(n,0,1) X2 = runif(n,1,2) X3 = runif(n,10,11) df = data.frame(x=c(X1,X2,X3), grp = rep(letters[1:3],each=n)) df$y = genY(df$x,sig) ggplot(df, aes(x,y,color=grp)) + geom_point() +


geom_smooth(method = 'lm', fullrange=TRUE,se = FALSE) + ylim(0,4) + stat_function(fun=sqrt,color='black')



grp y

a 2

b c


0 0



x df %>% group_by(grp) %>% summarise(rsq = summary(lm(y~x))$r.sq) ## ## ## ## ## ##

# A tibble: 3 x 2 grp rsq 1 a 0.924 2 b 0.845 3 c 0.424



Chapter 2 - GitHub

Jan 30, 2018 - More intuitively, this notation means that the remainder (all the higher order terms) are about the size of the distance between ... We don't know µ, so we try to use the data (the Zi's) to estimate it. • I propose 3 ... Asymptotically efficient: Maybe not efficient for every n, but in the limit, the variance is the smallest.

255KB Sizes 1 Downloads 526 Views

Recommend Documents

HW 2: Chapter 1. Data Exploration - GitHub
OI 1.8: Smoking habits of UK Residents: A survey was conducted to study the smoking habits ... create the scatterplot here. You can use ... Go to the Spurious Correlations website: and use the drop down menu to.

Chapter 4 - GitHub
The mathematics: A kernel is any function K such that for any u, K(u) ≥ 0, ∫ duK(u)=1 and ∫ uK(u)du = 0. • The idea: a kernel is a nice way to take weighted averages. The kernel function gives the .... The “big-Oh” notation means we have

Chapter 3 - GitHub
N(0, 1). The CLT tells us about the shape of the “piling”, when appropriately normalized. Evaluation. Once I choose some way to “learn” a statistical model, I need to decide if I'm doing a good job. How do I decide if I'm doing anything good?

Chapter 1 - GitHub
Jan 23, 2017 - 1. What are all these things? 2. What is the mean of yi? 3. What is the distribution of ϵi? 4. What is the notation X or Y ? Drawing a sample yi = xi β + ϵi. Write code which draws a sample form the population given by this model. p

AIFFD Chapter 12 - Bioenergetics - GitHub
The authors fit a power function to the maximum consumption versus weight variables for the 22.4 and ... The linear model for the 6.9 group is then fit with lm() using a formula of the form ..... PhD thesis, University of Maryland, College Park. 10.

AIFFD Chapter 10 - Condition - GitHub
May 14, 2015 - 32. This document contains R versions of the boxed examples from Chapter 10 of the “Analysis and Interpretation of Freshwater Fisheries Data” ...

chapter iv: the adventure - GitHub
referee has made changes in the rules and/or tables, simply make a note of the changes in pencil (you never kno, ,hen the rules ,ill ... the Game Host's rulebook until they take on the mantle of GH. The excitement and mystery of ...... onto the chara

Chapter 5 and 6 - GitHub
Mar 8, 2018 - These things are based on the sampling distribution of the estimators (ˆβ) if the model is true and we don't do any model selection. • What if we do model selection, use Kernels, think the model is wrong? • None of those formulas

AIFFD Chapter 4 - Recruitment - GitHub
Some sections build on descriptions from previous sections, so each ... setwd("c:/aaaWork/web/fishR/BookVignettes/AIFFD/") ... fact is best illustrated with a two-way frequency table constructed from the two group factor variables with ..... 10. Year

AIFFD Chapter 6 - Mortality - GitHub
6.5 Adjusting Catch-at-Age Data for Unequal Recruitment . . . . . . . . . . . . . . . . . . . . . . ...... System: Windows, i386-w64-mingw32/i386 (32-bit). Base Packages: base ...

Chapter 2
1976; Mitche „ Green, 1978; Pynte, 1974), and has since been used in a variety .... ºPretation of the ambiguous phrase in the instruction Put the apple on le .

Chapter 2 Review Key
Part 2. (Pages 66–67). 7. (a) very soluble, C12H22O11(aq). (b) slightly soluble, CH4(g). (c) slightly soluble, CaSO4(s). (d) slightly soluble, C(s). (e) very soluble ...

Chapter 2 Review Key
1, 8, 5, 6s. 2. 3, 1, 2,1. 3. C. 4. B. 5. C. 6. C. Part 2. (Pages 66–67). 7. (a) very soluble, C12H22O11(aq). (b) slightly soluble, CH4(g). (c) slightly soluble, CaSO4(s).

Chapter 2
chapter 5 we demonstrate that genetic alterations in biological cells translate ...... that the population of precancerous cells is significantly more heterogeneous ...

Chapter 2
degree-of-disorder in cell nanoarchitecture parallels genetic events in the ... I thank the staff at Evanston Northwestern Healthcare: Jameel Mohammed, Nahla.

AIFFD Chapter 9 - Size Structure - GitHub
May 14, 2015 - 9.1 Testing for Differences in Mean Length by Means of Analysis of .... response~factor and the data= argument set equal to the data frame ...

AIFFD Chapter 5 - Age and Growth - GitHub
May 13, 2015 - The following additional packages are required to complete all of the examples (with ... R must be set to where these files are located on your computer. ...... If older or younger age-classes are not well represented in the ... as the

Chapter 2; Section 2 China's Military Modernization.pdf
fight protracted wars to a smaller, well-trained, and technology-en- abled force. ...... explore new modes of military-civilian joint education,'' according to. Chinese ...... do so.171. Maritime ISR: China is fielding increasingly sophisticated spac

chapter 2 review.pdf
Page 1 of 2. Calculus BC Name: Review Ch 2. 1. A rock dropped from the top of a building falls y = 9t2. ft in t seconds. a. Find the average speed during the first 4 seconds of fall. b. Find the instantaneous velocity at 3 seconds. 2. Find the limit

HW 2. - GitHub
0. > JL. f t. HW 2."? - /*//. =:- 122. ^ 53.0. C.VK,. , r~/ = O*. ^. -._ I"T.

PDF 2 - GitHub
css/src/first.less') .pipe(less()), gulp.src('./css/src/second.css') .pipe(cssimport()) .pipe(autoprefixer('last 2 versions'))) .pipe(concat('app.css')) .pipe(minifyCss()).

chapter 2.pdf
What is a Computer? The computer can be defined as an electronic machine which can input. and save data and instructions, retrieve and process data which ...

Chapter 2.pdf
In StarOffice Writer, to make text bold, click. A) B) C) D). 05. In StarOffice Writer, to make text italic, click. A) B) C) D). 06. In StarOffice Writer, to make text underline ...