READING GUIDE - DAY 2 UNIVARIATE DATA SUMMARY ECO 1005, 2017 FALL INSTRUCTOR : JUNGMO YOON HANYANG UNIVERSITY

Read Chapters 2.3, 3.1, and 4.1-4.3. We will have a brief R session. 1. Graphical Techniques Graphical techniques to describe a dataset: Histogram, Scatter plot, and Box plot. (i) A histogram visualizes the distribution of a variable. (ii) A scatter plot visualizes the relationship between two variables. (iii) A box plot visualizes the distribution of a variable. It is useful to compares distributions of multiple variables. Illustration : The class-size and test scores. We will use the California Test Score Data frequently. (i) The California Standardized Testing and Reporting dataset contains data on test performance, school characteristics and student demographic backgrounds. (ii) The data are from all 420 K-6 and K-8 districts in California with data available for 1998 and 1999. (iii) Test scores are the reading and math scores on the Stanford 9 standardized test administered to 5th grade students. (iv) The student-teacher ratio is the number of students, divided by the number of full-time equivalent teachers in the district. (v) School characteristics (averaged across the district) include enrollment, number of teachers (measured as full time-equivalents), number of computers per classroom, and expenditures per student. 1

2

ECO 1005, 2017 FALL INSTRUCTOR : JUNGMO YOON HANYANG UNIVERSITY

(vi) We also have demographic variables for the students. Implementation : First, make a directory ‘stat’. Download ‘caschool.csv’ (from the class website) and save it in the directory. I. R In R, change your working directory to ‘stat’. You are now ready. Type > data <- read.csv("caschool.csv") The data set is in your memory. Type > hist(data$str) > hist(data$read scr) > plot(data$str,data$read scr) > mean(data$str) > summary(data$str) II. Excel Use Tools → Data Analysis → Histogram. Read page 74 for additional information. 2. Numerical Descriptive Statistics 2.1. Central Tendency. A measure of central location describes the center of the distribution of the data. The most common measure is the mean. The mean of a variable is the long-run average value of the variable over many repeated trials. Other measures of central location include • Median : The value of the observation falls in the middle when you sort all the observations in ascending order. It is the value that divides the ordered data into two halves. • Mode : The value of the observation that occurs with the greatest frequency. Formally, let us denote observations in a sample of size n x1 , x2 , . . . , xn

STATISTICS

3

where x1 is the value of the first observation, and x2 is the value of the second observation, and so on. The mean of a sample (or simply the sample mean) x¯ is written as an arithmetic mean of n observations n

1X x¯ = xi n i=1 The median (or the mode) does not have a closed form expression. So we will use an example to illustrate. Ex 1) Suppose that in your sample, the sample size n = 5, and x1 = 4 , x2 = 1, x3 = 2, x4 = 5, x5 = 4. (a) what is the sample mean? (b) the sample median? (c) the sample mode? 2.2. Measure of Variability or Data Dispersion. A measure of dispersion describes the variability or spread of the data. The most common choice is the standard deviation. P A natural measure of spread is the average value of xi − x¯, namely, n−1 ni=1 (xi − x¯). P This quantity turns out to be useless (why?). So instead we use ni=1 (xi − x¯)2 as a measure of variability. The sample variance s2 is the average value of squared deviations, Pn

− x¯)2 n−1 One problem of the variance is that it is measured in units that are different from 2

s =

i=1 (xi

the units of the original variable. To go back to the original unit, we take the square root and this gives us a standard deviation. √ s = s2 =

sP

n i=1 (xi

− x¯)2 n−1

How to use the standard deviation? How to interpret its value?

4

ECO 1005, 2017 FALL INSTRUCTOR : JUNGMO YOON HANYANG UNIVERSITY

The Empirical Rule says the following rules: Suppose that the central location is zero. Then • Pr (−s ≤ xi ≤ s) ≈ 68%, • Pr (−2s ≤ xi ≤ 2s) ≈ 95%, • Pr (−3s ≤ xi ≤ 3s) ≈ 99.7%. Suppose that the central location (denoted by µ) is not zero. Then • Pr (−s ≤ xi − µ ≤ s) ≈ 68%, • Pr (−2s ≤ xi − µ ≤ 2s) ≈ 95%, • Pr (−3s ≤ xi − µ ≤ 3s) ≈ 99.7%. Other measures of variability include • Range : Largest observation - Smallest observations. • Interquartile Range : Q3 − Q1 .∗ • Average Absolute Deviation : n−1

Pn

i=1

|xi − x¯|.

In R, type > var(data$str) > sd(data$read scr) > summary(data$read scr)

2.3. Quartiles, Deciles, and Percentiles. Just as the median is the point that divided an ordered sample into the equal two, other divisions of the sample are possible. The lower quartile is the point where one quarter of the observations lies below and three quarters of observations lies above. The upper quartile is the point where three quarters of the observations lies below and one quarter of observations lies above. You can extend it further. Percentile. The p-th percentile is the value for which p% of observations are less than that value. So The lower (first) quartile Q1 is the 25th percentile. The middle (second) quartile Q2 is the median. The upper (third) quartile Q3 is the 75th percentile. ∗

Find the definition of Q3 or Q1 in the next section.

STATISTICS

5

Deciles split the sample into tenth. The first decile is the 10th percentile, and the second decile is the 20th percentile. A Box Plot summarizes the data by displaying five statistics: the minimum, the maximum, and three quartiles. In R, type > quantile(data$read scr, prob=(1:9/10)) > boxplot(data$str) Exercise 1. A sample of 10 adults was asked to report the number of hours they spent on the Internet the previous month. Calculate the mean and median. 0 7 12 5 33 14 8 0 9 22 Exercise 2. The data collector recorded 133 instead of 33 by mistake. Calculate the mean and median of this mis-measured sample. 0 7 12 5 133 14 8 0 9 22 Exercise 3. A sample of six students reported the number of summer jobs they applied for. Find the sample variance. 17 15 23 7 9 13 Exercise 4. Show that the following two expressions are identical (meaning that the two expressions always agree) n

1 X s2 = (xi − x¯)2 = n − 1 i=1

n

1 X 2 x n − 1 i=1 i

Hint) Recall that from Day 1 reading guide,

Pn

!

i=1 (xi

 −

 n 2 x¯ . n−1

− x¯)2 =

Pn

i=1

x2i − n¯ x2 .

READING GUIDE - DAY 2 UNIVARIATE DATA ...

In R, change your working directory to 'stat'. You are now ready. Type. > data <- read.csv("caschool.csv"). The data set is in your memory. Type. > hist(data$str). > hist(data$read scr). > plot(data$str ... variable is the long-run average value of the variable over many repeated trials. Other measures of central location include.

161KB Sizes 0 Downloads 107 Views

Recommend Documents

Reading Guide
from John 14-‐16 during Lent. We have provided ... 1 Peter 2:11; Philippians 3:17-‐21. John 14. Week of March 26 -‐ The Triune God indwells those who love ...

Reading Guide
from John 14-‐16 during Lent. We have provided five Scriptures each week which supplement the previous Sunday's message and anticipate the upcoming Sunday's message. We encourage you to create the time to prayerfully reflect on these Scriptures, li

Data reading apparatus
Jan 11, 2011 - Manufacturers of digital check scanners for the ?nancial industry around the .... the check, con?rming the date, and verifying the signature,.

Reading in data - GitHub
... handles import from SPSS. Once installed, the package contents can be loaded into R (made available to the R system) with the function call. > library(Hmisc) ...

Topologically guaranteed univariate solutions of ...
algebraic constraints from zero dimensional solutions to uni- variate solutions, in ..... In Tenth. SIAM Conference on Geometric Design and Computing,. 2007.

3rd Grade Reading Law Parent Guide Version 2- Extended Version ...
3rd Grade Reading Law Parent Guide Version 2- Extended Version.pdf. 3rd Grade Reading Law Parent Guide Version 2- Extended Version.pdf. Open. Extract.

Cooperative Coevolution and Univariate ... - Semantic Scholar
elements aij represent the reward when genotypes i (from the first .... card information that is normally available to a more traditional evolutionary algorithm. Such.

H1N1 Math day 2
A biologist is studying a viral cell. He finds that in one hour, the virus has tripled the number of cells. Complete the table. When will there be at least 2000 cells?

POWER UP PRAYER GUIDE - DAY 2.pdf
Page 1 of 2. 1 | P O W E R U P : D a y 2. POWER UP. 2 Days of Corporate Prayer For. Christlovers to kick off the new semester. Sports Car Park. 6.30PM. Prayer Guide For Day 2, Friday October 6. Worship and Thanksgiving. Ephesians 5:19-20 – Speaking

GM-180 Day Reading Plan.pdf
Sign in. Page. 1. /. 1. Loading… Page 1 of 1. HOKIE STONE. Jl. Tubagus Angke Raya, Komp. Angke Megah. Blok B No. 26 , Jakarta Barat 11460. Telp : 021 - 29332827(H). Email : [email protected]. I. Ahli Waris / Wakil / Pemohon : Yang bertanda

GM-180 Day Reading Plan.pdf
... 6 ❏ Hebrews 12 ❏ Revelation 20. 29 ❏ Mark 10 ❏ Romans 3 - 4 ❏ Philippians 1 - 2 ❏ Hebrews 13 ❏ Revelation 21. 30 ❏ Mark 11 ❏ Romans 5 - 6 ❏ Philippians 3 - 4 ❏ James 1 ❏ Revelation 22. 31 ❏ Mark 12 ❏ Romans 7 - 8 ❏ J

Guide to Meter Reading
By telephone using our self service line at 1.800.600.2275. (Please have your 11-digit account number and meter reading available.) • By e-mail – Take a photo ...

Reading Guide - Rackcdn.com
After the death of her beloved mother, Martha Jefferson spent five years abroad with her father, Thomas Jefferson, on his first diplomatic mission to France. Now ...

22.1 Reading Guide
Physical Science Reading and Study Workbook □. Chapter 22 ... It describes the main layers ... Circle the letters of the major layers of Earth's interior. a. crust.

Reading Guide - Rackcdn.com
After the death of her beloved mother, Martha Jefferson spent five years abroad with her father, Thomas Jefferson, on his first diplomatic mission to France. Now, at seventeen, Jefferson's bright, handsome eldest daughter is returning to the lush hil

Cooperative Coevolution and Univariate ... - Research at Google
Analyses of cooperative (and other) coevolution will often make use of an infinite population ..... Given a joint reward system with a unique global optimum ai⋆.

5.3 day 2
X: 71% OLNua. - STYX X X=X+ Th. YAct X-5% ETA. Ty. 3 LT-Ztank Y = 0. 3 Sea -aec X-) –Y=) 3Łc-eX+2-V =) SeX-2 = 0 SecX = Secke 1/2. COX = TY= XY X Y.

Download PDF Grade 2 Reading (Kumon Reading ...
... Grade 2 Reading (Kumon Reading Workbooks) ,best ebook reader for windows ... Grade 2 Reading (Kumon Reading Workbooks) ,top 10 ebook reader Grade 2 ..... Workbooks) ,epub mobile Grade 2 Reading (Kumon Reading Workbooks) ...