Multiple Regression

1.
Multiple Regression p-value, add in remaining

Null: all explanatory variable slopes with variables until latest p-value no
outcome are = 0 longer significant
Alternative: at least one is different than Higher adj R sq is better
0 Use R2 when to improve prediction
Overall Model accuracy //to understand which variables
F-Test in the ANOVA Table are statistically significant predictors of
Use regular p-value the response or if there is interest in
interpretation producing a simpler model at the
Individual Variables potential cost of a little prediction
Same T-value and accuracy, then p value approach is
probability process as preferred
with simple linear
regression Confidence interval bi+- t*df multiply
Behind the scenes is more SEbi
complicated math than we T df is the appropriate t value df = n-k-1
have done Logistic regression: when there is a
ANOVA Table categorical variable with two
[SS Error/SS Total * (n-1)/ levels(logistic regression is a type of
(n-p-1)] where p is the generalized linear model for response
number of variables in the variables where regular multiple
model regression does not work very well) 1.
Remember that Error = Model response variable using a
Residuals probability distribution(binomial or
Result is the Adj R2 Poisson) 2. Model the parameter of the
Linear Relationships do for each distribution using a collection of
variable w outcome predictors and a special form of multiple
Nearly Normal Residuals see regression
histogram of residuals Notation
Constant Variability see the The outcome vairbale is Yi (observation i)
residuals vs. fitted value plot Yi=1 spam yi=0 not spam
Independent Residuals no trend predictor variables are x1,I is the value
in the Residuals v. Order plot of variable 1 for observation I
BACKWARD Elimination: Two Yi takes the value 1 wit prob pi //takes
Ways the value 0 with prob 1-pi
Adj R2 approach Logit (pi) = Log e( pi/1-pi) =
Drop the least significant B0+B1x1,i.Bkxki
variable at a time, note Result of the logic pi is for example -2.12
Adj R2 then => e-2.12/(1+e-2.12) = 0.11= p^i
Pick Model with highest
increase in Adj R2
Repeat until none of the 2. Chi square
models yield an increase Point estimate null value / SE of
in Adj R2 point estimate
P-Value approach and drop 1. Identify the diff bet a point
those with highest p-value estimate(observed) and a
until all variables left in expected value if the null hypo
model are significant or was true
required 2. Standardize the diff using the
FORWARD Selection: Two Ways standard error of the point
Adj R2 approach estimate
Do every explanatory w SE= square root of the count
outcome under the nullmeaning null value
Pick Model with highest Reasons squaring
Adj R2 H0:There is no inconsistency between
Add variables until the observed and the expected
addition does not yield counts. The observed counts follow
higher Adj R2 the same distribution as the expected
P-Value approach choose lowest counts.
HA: There is an inconsistency
between the observed and the
expected counts. The observed
3. linear regression
counts do not follow the same
4. Correlation describes the
distribution as the expected counts.
strength of the linear association
There is a bias in which side comes
between two variables
up on the roll of a die.
5. Goal of regression
We need to quantify how different
6. To model the response
the observed counts are from the
variable = y
expected counts.
7. Use predictor variable = x
Large deviations from what would
8. To model systematically
be expected based on sampling
and efficiently
variation (chance) alone provide
9. Understand what linear
strong evidence for the
model explains and what it
alternative hypothesis.
does not explain
This is called a goodness of fit
10. Residual is the difference
test since we're evaluating how
between the observed (yi) and
well the observed data fit the
predicted i.
expected distribution.
11. ei = yi - i
Conditions
12. We want a line that has
Independence: Each case that
small residuals
contributes a count to the table must
13. Minimize the sum of
be independent of all the other cases
squared residuals -- least
in the table.
squares
Sample size: Each particular scenario
14. e12 + e22 + + en2
(i.e. cell) must have at least 5
15. Why least squares?
expected cases.
16. Most commonly used
df > 1: Degrees of freedom must be
17. Easier to compute by hand
greater than 1.
and using software
p-value = tail area under the chi-
18. In many applications, a
square distribution (as usual)
residual twice as large as
For this we can use technology, or
another is usually more
a chi-square probability table.
than twice as bad
This table works a lot like the t
19. Intercept Notation
table, but only provides upper tail
20. Parameter: 0 Point estimate: b0
values.
21. Slope Notation
degrees of freedom are calculated
22. Parameter: 1 Point estimate: b1
as the number of cells (k) minus 1.
23. Interpretation
df = k 1 24. For each additional % point
As df increases in HS graduate rate, we
1. Distribution becomes more would expect the % living
symmetric 2. Center moves to the in poverty to be lower on
right 3. Variability inflates average by 0.62% points.
conclusion 25. Intercept interpretation
(a) Reject H0, the data provide 26. States with no HS
convincing evidence that the dice graduates are expected on
are biased. average to have 64.68% of
(b) p-value is low, H0 is rejected. The residents living below the
observed counts from the poll do poverty line.
not follow the same distribution 27. R2 Strength of the fit of a linear
as the reported votes. model
28. R2 = r * r (just the correlation,
Two way table expected count row I squared)
col j = (row I total) multiply (column j 29. Describes percent of variability in
total) / table total the response variable (y) is
Df= number of rows minus 1 multiply explained by the model (x in
number of columns minus 1 linear form).
30. The remainder of the variability is
explained by variables not be roughly constant as
included in the model or by well.
inherent randomness in the data. c. Also called
31. Example, homoscedasticity.
32. = 56% of variability of poverty d. Check using a histogram
explained by high school grads or normal probability plot
33. conditions of residuals.
34. Linearity 37. Indepedndent observation avoid
a. The relationship between the time series data
the explanatory and the Categorical
response variable should be Reference level: eastern states
linear. Intercept: The estimated average poverty
b. Check using a scatterplot percentage in eastern states is 11.17%
of the data, or a residuals This is the value we get if we plug
plot in 0 for the explanatory variable
35. Nearly normal residuals Slope: The estimated average poverty
a. The residuals should be percentage in western states is 0.38%
nearly normal. higher than eastern states.
b. This condition may not be Then, the estimated average
satisfied when there are poverty percentage in western
unusual observations that states is 11.17 + 0.38 = 11.55%.
don't follow the trend of This is the value we get if we plug
the rest of the data. in 1 for the explanatory variable
c. Check using a histogram Coreelation which takes bet -1 and 1
or normal probability plot describes the strength of the linear
of residuals. relationship R
36. Constant variability y-y_= b1(x-x_( _mmeans mean // b1=sy/sx
a. The variability of points R
around the least squares R squared is the amount of variation in
line should be roughly the reponse that is explained by the least
constant. squares line
b. This implies that the Leverage(away from center)
variability of residuals
around the 0 line should

Multiple Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression

Uploaded by

Copyright:

Available Formats

1.

Multiple Regression p-value, add in remaining

You might also like