You are on page 1of 2

AP Statistics - Chapter 3 - Notes

Section 3.1: (Hwk: 3.2 - 3.4, 3.22 - 3.26)

Response Variable - a variable that measures the outcome of a study

Explanatory Variable - a variable that attempts to explain observed outcomes

EX: If you think that alchohol causes body temperature to increase, you might do a study
giving certain amounts of alchohol to mice, and measuring the temperature drops. In this
case the explanatory variable is the amount of alchohol and the response variable is the
measured temperature drop.

Scatterplot - a type of graph in which pairs of variables are graphed as points on a graph
relative to each other
• look for overall patterns and striking deviations from the overall pattern
• you can describe the overall pattern of a scatterplot by the direction, form and
strength of the relationship demonstrated
• outliers can have a very significant effect on the overall relationship

Positive Association - Two variables are positively associated when above average
values for one variable tend to be paired with above average values of the other variable.

Negative Association - Two variables are negatively associated when above average
values for one variable tend to be paired with below average values of the other variable
and vice versa

Outlier - (already defined) - an individual observation which lies outside the overall pattern
of a graph or distribution, note that an outlier can be an outlier in the x direction, y direction or
both

How to add categorical variables to a scatterplot? Since a scatterplot is inherently a graph


of two quantitative variables, you add categorical variables by having different colors or
symbols for the dots of different categories.

Doing scatterplots on the calculator - need to do a couple of examples of this with


students in order to make them comfortable with the technique

Correlation - a measurement of the strength and direction of the linear relationship


between two quantitative variables - correlation is usually denoted with the variable r

1 xi − x yi − y
formula: r= Σ( )( sy )
n −1 sx

Suppose x is height and y is mass. Then x-bar and sx are the mean and standard
deviation of the height measurements and y-bar and sy are the mean and standard
€ of the mass measurements. xi and yi are the standardized measurements for
deviation
height and mass of the ‘ith’ observation.
Section 3.2: (Hwk: 3.28 - 3.31, 3.34 - 3.36, 3.44 - 3.47)

Regression Line - a line that describes how a response variable y changes as the value
of an explanatory variable x changes

LSRL - least squares regression line - a mathematical model for a set of data created
by using the least squares method to find the regression line - the least squares method
finds the line which minimizes the sum of the squares of the distance of the data points from
the line
We can find the line using:
sy
yˆ = a + bx where the slope b = r and intercept a = y − bx
sx
this line passes through the point: (x ,y )
note: the slope formula means a change of one standard deviation in x corresponds to a change of r standard
deviations in r
€ €

extrapolation - the use of a regression
€ line to make predictions outside the range of
values of the explanatory variable x used to obtain the line, predictions outside the range of
the explanatory variable are often not accurate, so you have to be very careful using them

residual - the difference between the observed value of y (the response variable) and the
value predicted by the regression line

residual = observed y - predicted y = y − yˆ

residual plot - if you scatterplot the residuals for each data point and the model (equation)
that you found is a good model for the data there should be no discernible pattern in the
€ residual plot €
coefficient of determination - the fraction of the variation in the values of y that is
explained by the least-squares regression of y on x, the coefficient of determination is
found by squaring the correlation, in other words coefficient of determination = r2

influential observation - an observation is influential if removing it would significantly


change the position of the regression line, points which are x direction outliers are often
influential - correlation and least squares regression are strongly influenced by outliers,
knowing when to include and when to exclude an outlier depends on developing a good
intuition for statistics and a good understanding of the problem situation from which the
observations were made
Something to watch:
• you have to be careful to make sure which variable you have as the explanatory
variable and which is the response variable, the correlation will be the same no matter which
you put as x and y, but the regression line will be different if you swap the roles of x and y

You might also like