HW 2: Chapter 1. Data Exploration STUDENT NAME Date

PART I: Chapter 1 Problem Sets. PS 1.1: Data Basics 1. OI 1.7: Fisher’s irises: Sir Ronald Aylmer Fisher was an English statistician, evolutionary biologist, and geneticist who worked on a data set that contained sepal length and width, and petal length and width from three species of iris flowers (setosa, versicolor and virginica). There were 50 flowers from each species in the data set. a. How many cases were included in the data? b. How many numerical variables are included in the data? Indicate what they are, and if they are continuous or discrete. c. How many categorical variables are included in the data, and what are they? List the corresponding levels (categories). 2. OI 1.8: Smoking habits of UK Residents: A survey was conducted to study the smoking habits of UK residents. The textbook displays a data matrix displaying a portion of the data collected in this survey. Note that £ stands for British Pounds Sterling, cig stands for cigarettes, and N/A refers to a missing component of the data. a. What does each row of the data matrix represent? b. How many participants were included in the survey? c. Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.

PS 1.3 Mean vs. Median A small accounting firm pays each of its six clerks $35,000, two junior accountants $70,000 each, and the firm’s owner $420,000. The salary data for the 6 clerks, 2 Jr. accountants and owner looks like # assign salary as an object here 1. What is the mean salary paid at this firm? # use the mean() function here 2. How many of the employees earn less than the mean? 3. What is the median salary? # use the median() function here 4. Which measure tells you more about the typical amount earned at that firm?


PS 1.8: Sample correlations The Organisation for Economic Co-operation and Development collects data on the central government debt for many countries. The data for this problem is contained in the debt data set. # Use read.delim() to import the data here by using the code found on the datasets page 1. Draw a scatterplot of 2005(x) against 2006(y) data. # create the scatterplot here. You can use plot(), qplot() or ggplot() 2. Describe the direction, strength and form of the relationship in context of the problem. 3. Calculate the correlation r. # use the cor() function here

PS 1.10: Spurious correlations Go to the Spurious Correlations website: http://tylervigen.com/discover and use the drop down menu to choose two interesting variables to examine the correlation between. Include the image into your homework document by replacing the URL in the example below with your URL. Write a sentence or two describing the trends observed in your example. Explain the difference in correlation and causation in context of your variables.

Figure 1: Divorce rate in Alabama vs US whole milk consumption


HW 2: Chapter 1. Data Exploration - GitHub

OI 1.8: Smoking habits of UK Residents: A survey was conducted to study the smoking habits ... create the scatterplot here. You can use ... Go to the Spurious Correlations website: http://tylervigen.com/discover and use the drop down menu to.

211KB Sizes 9 Downloads 496 Views

Recommend Documents

HW 2. - GitHub
0. > JL. f t. HW 2."? - /*//. =:- 122. ^ 53.0. C.VK,. , r~/ = O*. ^. -._ I"T.

Javascript Data Exploration - GitHub
Apr 20, 2016 - Designers. I'm a sort of. « social data scientist ». Paris. Sciences Po médialab. I just received a CSV. Let me grab my laptop ... Page 9 ...

Chapter 2 - GitHub
Jan 30, 2018 - More intuitively, this notation means that the remainder (all the higher order terms) are about the size of the distance between ... We don't know µ, so we try to use the data (the Zi's) to estimate it. • I propose 3 ... Asymptotica

Chapter 1 - GitHub
Jan 23, 2017 - 1. What are all these things? 2. What is the mean of yi? 3. What is the distribution of ϵi? 4. What is the notation X or Y ? Drawing a sample yi = xi β + ϵi. Write code which draws a sample form the population given by this model. p

Chapter 2: Data
Suppose a basketball player has an 80% free throw success rate. How can we use random numbers to simulate whether or not she makes a foul shot?

Data 8R Hypothesis Testing Summer 2017 1 Terminology 2 ... - GitHub
Jul 27, 2017 - simulated on a computer. ... From the histogram, it looks like the higher mean from gambling was not at all that unusual - it certainly could have.

Recordset 1 och 2 - GitHub
TTEKOKORTISAR. EKK. TT-GÖTEBORG-PM. GPM. TT-NORRLANDS-PM. NPM .... This means of course that this field not is repeated. The signatures are SGML ...

Chapter 1 – Getting Started Chapter 2 - PSM ... - GCAP CoolCast
What is Garden City Ammonia Program? What is Refrigeration? Why Refrigeration? Why Take an Operator I Course? Is there a Career in the Industrial ...

Chapter 4 - GitHub
The mathematics: A kernel is any function K such that for any u, K(u) ≥ 0, ∫ duK(u)=1 and ∫ uK(u)du = 0. • The idea: a kernel is a nice way to take weighted averages. The kernel function gives the .... The “big-Oh” notation means we have

Chapter 1 – Getting Started Chapter 2 - PSM ... - GCAP CoolCast
How much Must I Know about Process Safety Management to be an Operator? Are there Any Organizations that Can Help Me in ... “To the Extent they can affect the process” Mean? How do I Properly Document this Training? ... are some Chemical Characte

Chapter 3 - GitHub
N(0, 1). The CLT tells us about the shape of the “piling”, when appropriately normalized. Evaluation. Once I choose some way to “learn” a statistical model, I need to decide if I'm doing a good job. How do I decide if I'm doing anything good?

Part 1-Chapter 2.pdf
At first, Miko had recorded 'sentences' as atomic (indivisible) units: (1) a. megfedkim b. kimiz∂sliip. SENTENCE SENTENCE. Miko now revises her initial ...

Chapter 1, Section 2: Quiz - MOBILPASAR.COM
A. KEY TERMS. Match the descriptions in Column I with the terms in Column II. Write the letter of the correct answer in the blank provided. Column I. _____ 1.

Chapter 2, Section 1 Notes.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Chapter 2, Section 1 Notes.pdf. Chapter 2, Section 1 Notes.pdf.

1 Introduction 2 Vector magnetic potential - GitHub
Sep 10, 2009 - ... describes the derivation of the approximate analytical beam models ...... of the source whose solution was used to correct the residual data.

AIFFD Chapter 12 - Bioenergetics - GitHub
The authors fit a power function to the maximum consumption versus weight variables for the 22.4 and ... The linear model for the 6.9 group is then fit with lm() using a formula of the form ..... PhD thesis, University of Maryland, College Park. 10.

1 Visibility Data & AIPS++ Measurement Sets - GitHub
you screw up, restore it with: $ cd ~/Workshop2007 ... cp -a (/net/birch)/data/oms/Workshop2007/demo.MS . ... thus, “skeleton”: we ignore the data in the MS.

Energy Review HW #2
A ball is thrown into the air. When it reaches the top, what kind of energy does it have? 2. When it falls halfway back towards the ground, what kind of energy ...

AIFFD Chapter 10 - Condition - GitHub
May 14, 2015 - 32. This document contains R versions of the boxed examples from Chapter 10 of the “Analysis and Interpretation of Freshwater Fisheries Data” ...

chapter iv: the adventure - GitHub
referee has made changes in the rules and/or tables, simply make a note of the changes in pencil (you never kno, ,hen the rules ,ill ... the Game Host's rulebook until they take on the mantle of GH. The excitement and mystery of ...... onto the chara

Chapter 5 and 6 - GitHub
Mar 8, 2018 - These things are based on the sampling distribution of the estimators (ˆβ) if the model is true and we don't do any model selection. • What if we do model selection, use Kernels, think the model is wrong? • None of those formulas

AIFFD Chapter 4 - Recruitment - GitHub
Some sections build on descriptions from previous sections, so each ... setwd("c:/aaaWork/web/fishR/BookVignettes/AIFFD/") ... fact is best illustrated with a two-way frequency table constructed from the two group factor variables with ..... 10. Year

AIFFD Chapter 6 - Mortality - GitHub
6.5 Adjusting Catch-at-Age Data for Unequal Recruitment . . . . . . . . . . . . . . . . . . . . . . ...... System: Windows, i386-w64-mingw32/i386 (32-bit). Base Packages: base ...

Polygon HW#1.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.