Professional Documents
Culture Documents
EX: If you think that alchohol causes body temperature to increase, you might do a study
giving certain amounts of alchohol to mice, and measuring the temperature drops. In this
case the explanatory variable is the amount of alchohol and the response variable is the
measured temperature drop.
Scatterplot - a type of graph in which pairs of variables are graphed as points on a graph
relative to each other
• look for overall patterns and striking deviations from the overall pattern
• you can describe the overall pattern of a scatterplot by the direction, form and
strength of the relationship demonstrated
• outliers can have a very significant effect on the overall relationship
Positive Association - Two variables are positively associated when above average
values for one variable tend to be paired with above average values of the other variable.
Negative Association - Two variables are negatively associated when above average
values for one variable tend to be paired with below average values of the other variable
and vice versa
Outlier - (already defined) - an individual observation which lies outside the overall pattern
of a graph or distribution, note that an outlier can be an outlier in the x direction, y direction or
both
1 xi − x yi − y
formula: r= Σ( )( sy )
n −1 sx
Suppose x is height and y is mass. Then x-bar and sx are the mean and standard
deviation of the height measurements and y-bar and sy are the mean and standard
€ of the mass measurements. xi and yi are the standardized measurements for
deviation
height and mass of the ‘ith’ observation.
Section 3.2: (Hwk: 3.28 - 3.31, 3.34 - 3.36, 3.44 - 3.47)
Regression Line - a line that describes how a response variable y changes as the value
of an explanatory variable x changes
LSRL - least squares regression line - a mathematical model for a set of data created
by using the least squares method to find the regression line - the least squares method
finds the line which minimizes the sum of the squares of the distance of the data points from
the line
We can find the line using:
sy
yˆ = a + bx where the slope b = r and intercept a = y − bx
sx
this line passes through the point: (x ,y )
note: the slope formula means a change of one standard deviation in x corresponds to a change of r standard
deviations in r
€ €
€
extrapolation - the use of a regression
€ line to make predictions outside the range of
values of the explanatory variable x used to obtain the line, predictions outside the range of
the explanatory variable are often not accurate, so you have to be very careful using them
residual - the difference between the observed value of y (the response variable) and the
value predicted by the regression line
residual plot - if you scatterplot the residuals for each data point and the model (equation)
that you found is a good model for the data there should be no discernible pattern in the
€ residual plot €
coefficient of determination - the fraction of the variation in the values of y that is
explained by the least-squares regression of y on x, the coefficient of determination is
found by squaring the correlation, in other words coefficient of determination = r2