MATH 1530: Elements of Statistics Exam I Review Exam rules: use calculator, #2 pencils, on 8.5” x 11” page of notes. No sharing materials of any kind. You may begin as early as 7:30 am and stay as late as 10:00 am, but you should not need that much time. The exam will be about 60 multiple choice questions. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Counting problems… n!, nCr, nPr. First step: identify which of these you need. Second step: use your calculator to carry out the calculation. Association: Be able to tell if two categorical variables are associated. Be able to tell if two quantitative variables are associated. Graphs: dotplots, histograms, stem-and-leaf plots, pie charts, bar charts, boxplots, scatterplots. Be able to make them, and also be able to interpret them: identify outliers, estimate means, ranges, medians, modes, etc. Be able to distinguish between experimental designs and observational studies. Be able to identify the following kinds of variables: lurking variables, response (also called dependent) variables, explanatory (also called predictor, or independent) variables, quantitative discrete variables, quantitative continuous variables, categorical variables. Be able to check for outliers using the 1.5xIQR rule for boxplots (or just make a boxplot with your calculator and check that way). Be able to use your calculator to calculate each of the following statistics. Also know their meanings and be able to interpret them: Mean, median, mode; Range, sample standard deviation; interquartile range (IQR); Q1, Q2, Q3 (these are the quartiles, also called 25th, 50th, and 75th percentiles); r, r2, equation of the regression line; Distribution shapes: left- or right-skewed, uniform, symmetrical, asymmetrical, bimodal, unimodal, multi-modal, triangle-shaped, mound- or bell-shaped. Be able to describe graphs using these terms. Know that the regression line minimizes the sum of squared errors. This is why we sometimes call it the least-squares regression line. Be able to calculate z-scores (or standardize data). Once done, be able to interpret (that is, know that the z-score of a datum tells how far and in what direction the datum is from the mean). Empirical rule. Know the rule, and be able to apply it. Also be able to use it to identify outliers. If I give you a mean and standard deviation, be able to find the intervals where the middle 68.27%, 95.45%, and 99.73% of the observations are. Also, given percentages (68.27%, 95.45%, and 99.73%), be able to
calculate the marks that are one, two, or three standard deviations below or above average. Also, if I give you the mean and standard deviation, and give you marks, be able to tell how many standard deviations those marks are (and in what direction) from the mean. Be able to match r with scatterplots. The mean is to the right of the median when data are right-skewed. The mean is to the left of the median with the data are left-skewed. The mean and median are in the same place when the data are symmetrically distributed. What effect does increasing a data value have on the mean? What effect does increasing the maximum data value have on the standard deviation and range? If the mean of a data set of size 10 was found to be 100, but then you change the datum of 30 to 80, what is the new mean? Be able to work one similar to this. Within the context of regression, be able to identify outliers, influential observations, high leverage points. Also, be able to identify the response (dependent) variable and the explanatory (independent or predictor). Be able to calculate the equation of the regression line (with calculator), and interpret the meaning of the slope and the meaning of the vertical intercept (e.g., gas price falls as a rate of $0.05 per day; the line predicts a gas price of $0.89 when the temperature is 0oF.) Use your calculator to obtain the values of r (linear correlation coefficient) and r2 (coefficient of determination, which is the percentage of variation in the response variable that is explained by the model) and interpret them (that is, know what they mean and what they are telling you in context of the problem). Be able to make a scatterplot of the data, and check for outliers and influential observations by looking at the plot. Be able to use the regression line to make predictions of the response variable based on observations of the explanatory. Know when extrapolation is occurring. When you remove an outlier or influential observation from the data, be able compare the new model (regression line) with the old one.