Graphs and Numerical Summaries 1 Data Types

2

2 Graphs

4

3 Measures of Center

5

4 Measures of Spread

10

5 Five-Number Summaries, Boxplots, z-Scores

16

6 Misleading Graphs

22

www.apsu.edu/jonesmatt

1

1

Data Types

Categorical variables take values from a set of categories. Quantitative discrete variables take values from a finite or countable set of numbers, like {0, 1, 2, 3, . . .} Quantitative continuous variables take values from an interval of numbers, like [0, ∞) or [0, 10]. Examples:

www.apsu.edu/jonesmatt

2

Relative Frequencies: Proportions and Percentages frequency of category proportion of a category = total number of observations percentage of a category = proportion × 100% A frequency table lists the possible variable values with the frequencies for each value. Examples:

www.apsu.edu/jonesmatt

3

2

Graphs

Group qualitative data using pie charts or bar charts. Group quantitative data using histograms, stem and leaf plots, or dot plots. The distribution of a set of data is a graph, table, or mathematical formula that indicates the different kinds of possible observations and how often they occur. Distributions of quantitative data have shape, and the shape of a distribution can be determined by looking at histograms, stem and leaf plots, or dot plots. Example 1 Use graphs to describe some categorical variables for your classmates. Example 2 Use graphs to describe some quantitative variables for your classmates. www.apsu.edu/jonesmatt

4

3

Measures of Center Summation Notation

Given numbers x1 , x2 , x3 , . . . , xn , we succinctly express their sum x1 + x2 + x3 + x4 + · · · + xn as X xi i

Examples:

www.apsu.edu/jonesmatt

5

Mean, Median, Mode A measure of center is a one-number description of a distribution or data set, and we focus on three. Mean or average: the sum of the numbers divided by the number of numbers: P i xi mean ≡ n Median or 50th percentile: a number that separates the lower 50% and upper 50% of the numbers. Mode: the number that occurs most frequently in the set. There can be more than one mode. Which measure of center is best?

www.apsu.edu/jonesmatt

6

Example 3 In the 2002 Winter Olympics, figure skater Michelle Kwan competed in the short program ladies single event. She received the following scores for technical merit: 5.8

5.7

5.9

5.7

5.5

5.7

5.7

5.7

5.6

Find the mean, median, and mode.

Throw out the score of 5.5 and again find the mean, median, and mode.

www.apsu.edu/jonesmatt

7

Properties Resistant measures are not sensitive to extreme data. The median is resistant, the mean is not.

Example 4 Compare the mean and median of the salaries $13, 000

$32, 000

$45, 000

with the mean and median of the salaries $13, 000

www.apsu.edu/jonesmatt

$32, 000

$250, 000

8

Population Mean and Sample Mean The population mean µ (pronounced myoo) of a population of size N is the average of all values x1 , x2 , . . . , xN in the population: 1 X µ= xi N i The sample mean x (say x-bar ) of a sample of size n is the average of all values x1 , x2 , . . . , xn in the sample: 1X x= xi n i

www.apsu.edu/jonesmatt

9

4

Measures of Spread

Measures of spread summarize how far data are spread out. We focus on three. Standard deviation: used when the mean is the measure of center. It is the most important measure of spread (definition to follow). Range: the largest value minus the smallest value. Interquartile range: used when the median is the measure of center (definition to follow).

www.apsu.edu/jonesmatt

10

Other Important Sums (leading up to measuring the spread) Sum of squared distances from 0: X

x2i

i

Sum of distances from the mean: X (xi − x) i

Sum of squared distances from the mean: X (xi − x)2 i

www.apsu.edu/jonesmatt

11

Example 5 History Exam Scores x

x2

x−x

(x − x)2

91

95

92

76

www.apsu.edu/jonesmatt

12

Population and Sample Standard Deviation Roughly speaking, the standard deviation measures the variation in a population or data set by indicating how far, on average, each number is from the mean. • Population standard deviation σ: rP 2 i (xi − µ) σ= N • Sample standard deviation s: rP s=

www.apsu.edu/jonesmatt

− x)2 n−1

i (xi

13

Example 6 The ages in years of all seven MATH 4270 students are 26

22

24

21

23

32

24

Find the population standard deviation and the range.

Take a simple random sample of size three. Then find the sample standard deviation and range of three students’ ages.

www.apsu.edu/jonesmatt

14

Facts About Standard Deviation The more variation among data in a sample, the larger the standard deviation. Like the mean, the standard deviation is not resistant because its value is affected by extreme data points. Empirical Rule: For bell-shaped distributions, • about 68.27% of all possible observations lie within one σ of µ. • about 95.45% of all possible observations lie within two σs of µ. • about 99.73% of all possible observations lie within three σs of µ.

www.apsu.edu/jonesmatt

15

5

Five-Number Summaries, Boxplots, z-Scores

The first quartile Q1 is the 25th percentile, and same as the median of data at or below the median. The second quartile Q2 is the 50th percentile, and same as the median. The third quartile Q3 is the 75th percentile, and same as the median of data at or above the median. Example 7 Twenty people reportedly watched the following numbers of hours of TV weekly: 8

22

34

16

13

26

19

23

25

31

34

30

31

20

22

41

32

30

39

29

Find the quartiles. www.apsu.edu/jonesmatt

16

The interquartile range (IQR) is compute as Q3 − Q1 (this is our third measure of spread). The IQR is not sensitive to extreme values and is therefore a resistant measure of spread. The IQR is used as the measure of spread when the median is used as the measure of center. Example 8 Compute the range and IQR for the data in Example 7. The five number summary of data consists of the min, max, Q1 , Q2 , and Q3 . Example 9 Write the five number summary for the following 100 meter race times (in seconds): 10.69 11.11 11.18 12.44 10.76 10.88 10.64

www.apsu.edu/jonesmatt

17

Outliers Outlier(s): data value(s) that is (are) far from most of the data. Lower limit: Q1 - (1.5)(IQR) Upper limit: Q3 + (1.5)(IQR) Data greater than the upper limit or less than the lower limit are potential outliers. Examples: Human heights of 8’ 11”. Miles per gallon rates greater than 95, or number of people struck more than five times by lightening

www.apsu.edu/jonesmatt

18

Boxplots Determine the 5-number summary. Compute lower & upper limits. Mark and label the quartiles with vertical lines and box them in. Indicate all potential outliers with ∗ and label them. Mark and label the smallest & largest values occurring within upper and lower limits with vertical lines, and connect the lines to the to the box (these are called adjacent values). Example 10 Make a boxplot for the following eye pressures (in mmHg) of fifteen Caucasians and African Americans: 16.2 16.7 15.3 15.9 24.6 18.4 17.2 15.8 16.7 17.8 16.1 14.9 16.6 21.2

www.apsu.edu/jonesmatt

19

z-Scores (Standardized Data) Data can be standardized so that different data sets can be compared, or to compare values within the same data set. Example 11 The average height of men is 69 inches with a st. dev. of 2.8 inches. The average height of women is 63.6 inches with a st. dev. of 2.5 inches. Michael Jordan is 78 inches tall. Rebecca Lobo is 76 inches tall. Relatively speaking, who is taller? Jordan’s and Lobo’s heights should be standardized relative to those of their genders so their heights can be compared. If x is a variable, then z = (x − µ)/σ is the standardized version or z-score of x. Calculate the x-scores of Jordan’s and Lobo’s heights.

www.apsu.edu/jonesmatt

20

Facts About z-Scores The mean of the z-scores of a population is always 0. The standard deviation of the z-scores of a population is always 1. Most z-scores will fall between -3 and 3. z-scores never have units! Example 12 Body temperatures of healthy human children have mean µ = 98.60o F and standard deviation σ = 0.62o F . Your child has temperature of 101o F . What should you do?

www.apsu.edu/jonesmatt

21

6

Misleading Graphs

Truncated graphs magnify relative frequency differences between categories. Example 13 Value of a mutual fund over time. Improper scaling gives incorrect impressions about the relative differences between categories. Example 14 Golf Balls.

www.apsu.edu/jonesmatt

22

1530 2 Graphs and Numerical Summaries.pdf

1 Data Types 2. 2 Graphs 4. 3 Measures of Center 5. 4 Measures of Spread 10. 5 Five-Number Summaries, Boxplots, z-Scores 16. 6 Misleading Graphs 22.

178KB Sizes 0 Downloads 112 Views

Recommend Documents

Velocity–time graphs and acceleration 2 - ThisIsPhysics
3 Calculate the following accelerations. a A car accelerates from rest to to 50 m/s in 5 seconds. b At the start of a race, a sprinter accelerates from rest to 10 m/s in ...

1530 Fall 2011 E1V1.pdf
... same IQR, but Squad 2 has a higher median than Squad 3. (c) Zombie Attack Squad 2 and Zombie Attack Squad 3 have the same medians but different IQRs.

Visualizing stoichiometry - graphs and worksheet combined.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Visualizing ...

Feynman Graphs and Periods Day
Pj are homogeneous polynomials in x1, ..., xn;. • Pj can depend on some momentum-squares (∑ i pi. ) 2 and mass- squares m2 i . • Pj can only vanish at borders ...

Graphs of relations and Hilbert series - ScienceDirect
Let A(n,r) be the class of all graded quadratic algebras on n generators and r relations: A = k〈x1,..., xn〉/id{pi ...... (1−t)d , which is a series of algebra k[x1,..., xd] of.

Internship M2 2014-2015 2. Numerical solutions for ...
2. Numerical solutions for Hamilton-Jacobi equations constrained on networks. PLACE: Institut de Recherche Mathématique de Rennes CNRS UMR6625.

Displacement-Time Graphs
A car moving at… a constant speed of +1.0 m/s a constant speed of +2.0 m/s a constant speed of +0.0 m/s. A car accelerating from rest at +0.25 m/s. 2.

Equations? Graphs?
step takes a lot of critical thinking and trial and error. 4. What did you learn about Algebra in this project? Explain. There can be multiple solutions to a single ...

Algorithmic Game Theory and Graphs
We prove some new lower and upper bounds, which improve upon the best known .... These include the Internet, numerous economic and electronic ... only the set of bid independent auctions, i.e., auctions in which the price offered to a ...

Pancakes and crooked graphs
Distance-Regular Graphs, Springer-Verlag, 1989. [4] Budaghyan, L., Carlet, C., Leander, G., Two classes of quadratic APN binomials inequivalent to power functions, IEEE Trans. Inf. Th. 54 (2008), no. 9, 4218–4229. [5] D. de Caen, E.R. van Dam. Asso

CHAPTER 5: Graphs and Trees - DAINF
Page 166 Mathematical Structures for Computer Science Gersting. CHAPTER ... Not isomorphic; graph in (b) has a node of degree 5, graph in (a) does not. 14. f:.

Graphs of relations and Hilbert series - ScienceDirect.com
relations for n ≤ 7. Then we investigate combinatorial structure of colored graph associated with relations of RIT algebra. Precise descriptions of graphs (maps) ...

graphs-intro.pdf
this book, we represent graphs by using the abstract data types that we have seen ... The simplest representation of a graph is based on its definition as a set.

Displacement-Time Graphs (Make)
A car moving at… a constant speed of +1.0 m/s a constant speed of +2.0 m/s a constant speed of +0.0 m/s. A car accelerating from rest at +0.25 m/s. 2.

Skip Graphs - IC-Unicamp
Abstract. Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functional- ... repairing errors in the data structure introduced by node failures can be done using simple and straight- .... search, insert, an

Skip Graphs - IC/Unicamp
ble functionality. Unlike skip lists or other tree data structures, skip graphs are highly resilient, tolerating a large fraction of failed nodes without losing con-.

graphs-intro.pdf
It is only when you start considering his or her relation- ships to the world around, the person becomes interesting. Even at a biological level,. what is interesting ...

Block Diagrams and Signal Flow Graphs tutorial-2.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Block Diagrams ...

numerical strategies and software architecture ...
Jun 28, 2007 - NUMERICAL STRATEGIES AND SOFTWARE ARCHITECTURE .... into account the deformation of contact neighbourhood and if the time ...

Development of Intuitive and Numerical ... - Semantic Scholar
Dec 27, 1990 - were asked to predict the temperature of a container of water produced by combining two separate containers at different temperatures. The same problems were presented in two ways. In the numerical condition the use of quantitative or