Null: all explanatory variable slopes with variables until latest p-value no outcome are = 0 longer significant Alternative: at least one is different than Higher adj R sq is better 0 Use R2 when to improve prediction Overall Model accuracy //to understand which variables F-Test in the ANOVA Table are statistically significant predictors of Use regular p-value the response or if there is interest in interpretation producing a simpler model at the Individual Variables potential cost of a little prediction Same T-value and accuracy, then p value approach is probability process as preferred with simple linear regression Confidence interval bi+- t*df multiply Behind the scenes is more SEbi complicated math than we T df is the appropriate t value df = n-k-1 have done Logistic regression: when there is a ANOVA Table categorical variable with two [SS Error/SS Total * (n-1)/ levels(logistic regression is a type of (n-p-1)] where p is the generalized linear model for response number of variables in the variables where regular multiple model regression does not work very well) 1. Remember that Error = Model response variable using a Residuals probability distribution(binomial or Result is the Adj R2 Poisson) 2. Model the parameter of the Linear Relationships do for each distribution using a collection of variable w outcome predictors and a special form of multiple Nearly Normal Residuals see regression histogram of residuals Notation Constant Variability see the The outcome vairbale is Yi (observation i) residuals vs. fitted value plot Yi=1 spam yi=0 not spam Independent Residuals no trend predictor variables are x1,I is the value in the Residuals v. Order plot of variable 1 for observation I BACKWARD Elimination: Two Yi takes the value 1 wit prob pi //takes Ways the value 0 with prob 1-pi Adj R2 approach Logit (pi) = Log e( pi/1-pi) = Drop the least significant B0+B1x1,i.Bkxki variable at a time, note Result of the logic pi is for example -2.12 Adj R2 then => e-2.12/(1+e-2.12) = 0.11= p^i Pick Model with highest increase in Adj R2 Repeat until none of the 2. Chi square models yield an increase Point estimate null value / SE of in Adj R2 point estimate P-Value approach and drop 1. Identify the diff bet a point those with highest p-value estimate(observed) and a until all variables left in expected value if the null hypo model are significant or was true required 2. Standardize the diff using the FORWARD Selection: Two Ways standard error of the point Adj R2 approach estimate Do every explanatory w SE= square root of the count outcome under the nullmeaning null value Pick Model with highest Reasons squaring Adj R2 H0:There is no inconsistency between Add variables until the observed and the expected addition does not yield counts. The observed counts follow higher Adj R2 the same distribution as the expected P-Value approach choose lowest counts. HA: There is an inconsistency between the observed and the expected counts. The observed 3. linear regression counts do not follow the same 4. Correlation describes the distribution as the expected counts. strength of the linear association There is a bias in which side comes between two variables up on the roll of a die. 5. Goal of regression We need to quantify how different 6. To model the response the observed counts are from the variable = y expected counts. 7. Use predictor variable = x Large deviations from what would 8. To model systematically be expected based on sampling and efficiently variation (chance) alone provide 9. Understand what linear strong evidence for the model explains and what it alternative hypothesis. does not explain This is called a goodness of fit 10. Residual is the difference test since we're evaluating how between the observed (yi) and well the observed data fit the predicted i. expected distribution. 11. ei = yi - i Conditions 12. We want a line that has Independence: Each case that small residuals contributes a count to the table must 13. Minimize the sum of be independent of all the other cases squared residuals -- least in the table. squares Sample size: Each particular scenario 14. e12 + e22 + + en2 (i.e. cell) must have at least 5 15. Why least squares? expected cases. 16. Most commonly used df > 1: Degrees of freedom must be 17. Easier to compute by hand greater than 1. and using software p-value = tail area under the chi- 18. In many applications, a square distribution (as usual) residual twice as large as For this we can use technology, or another is usually more a chi-square probability table. than twice as bad This table works a lot like the t 19. Intercept Notation table, but only provides upper tail 20. Parameter: 0 Point estimate: b0 values. 21. Slope Notation degrees of freedom are calculated 22. Parameter: 1 Point estimate: b1 as the number of cells (k) minus 1. 23. Interpretation df = k 1 24. For each additional % point As df increases in HS graduate rate, we 1. Distribution becomes more would expect the % living symmetric 2. Center moves to the in poverty to be lower on right 3. Variability inflates average by 0.62% points. conclusion 25. Intercept interpretation (a) Reject H0, the data provide 26. States with no HS convincing evidence that the dice graduates are expected on are biased. average to have 64.68% of (b) p-value is low, H0 is rejected. The residents living below the observed counts from the poll do poverty line. not follow the same distribution 27. R2 Strength of the fit of a linear as the reported votes. model 28. R2 = r * r (just the correlation, Two way table expected count row I squared) col j = (row I total) multiply (column j 29. Describes percent of variability in total) / table total the response variable (y) is Df= number of rows minus 1 multiply explained by the model (x in number of columns minus 1 linear form). 30. The remainder of the variability is explained by variables not be roughly constant as included in the model or by well. inherent randomness in the data. c. Also called 31. Example, homoscedasticity. 32. = 56% of variability of poverty d. Check using a histogram explained by high school grads or normal probability plot 33. conditions of residuals. 34. Linearity 37. Indepedndent observation avoid a. The relationship between the time series data the explanatory and the Categorical response variable should be Reference level: eastern states linear. Intercept: The estimated average poverty b. Check using a scatterplot percentage in eastern states is 11.17% of the data, or a residuals This is the value we get if we plug plot in 0 for the explanatory variable 35. Nearly normal residuals Slope: The estimated average poverty a. The residuals should be percentage in western states is 0.38% nearly normal. higher than eastern states. b. This condition may not be Then, the estimated average satisfied when there are poverty percentage in western unusual observations that states is 11.17 + 0.38 = 11.55%. don't follow the trend of This is the value we get if we plug the rest of the data. in 1 for the explanatory variable c. Check using a histogram Coreelation which takes bet -1 and 1 or normal probability plot describes the strength of the linear of residuals. relationship R 36. Constant variability y-y_= b1(x-x_( _mmeans mean // b1=sy/sx a. The variability of points R around the least squares R squared is the amount of variation in line should be roughly the reponse that is explained by the least constant. squares line b. This implies that the Leverage(away from center) variability of residuals around the 0 line should