Professional Documents
Culture Documents
6
g hl i
u
m
a
Tea h McL ala
t
Hea h Miriy
s
l
Aka Petril yanto
t
n
Rya elle Se
h
Mic
u
Ya S
K353 Final
Project
Predictive
Analytics
of
US Obesity
Data Overview
Predicting Obesity Rate in the US
through:
Data
Overview
Continuous variables:
Population
Per Capita:
Per State:
Food Tax
Prices of Milk over Soda
Continuous Socioeconomic
Variables:
Poverty
Metro areas
Capita
Removed Anomalies
Eliminated Outliers
Modeling
Supervised Techniques
Multiple Linear Regression
Analysis
Decision Tree
Unsupervised Techniques:
Agglomerative Cluster Analysis
K-Means Clustering
Supervised Technique
Multiple Regression
Analysis
Regression Equation:
y = 98.526 .009(ExpPCRest) .000(MedInc) + .129(%Black) + .
255(%18&under) 1.568(RestPT) .808(%Asian) + .156(SodaTaxRetail) +
3.435(PriceMilkOverSoda) 4.834(GymsPT) .069(%Hispanic) + .
270(%Native) .003(ExpPCFastFood) + .023(%White) + e i
Evaluation
Goodness-of-fit
3.
Residuals
2.1.Binned
Scatterplot
Supervised Technique
Decision Tree
Rules:
1. If Adult Diabetes Rate <= 13.85%
-> 0
2. If Diabetes Rate > 13.85% AND
1,294
121
48
Testing
1,398
14
168
62
Training
Accuracy : 91%
Sensitivity : 28%
Specificity : 99%
Error : 10%
Testing
Accuracy : 89%
Sensitivity : 27%
Specificity : 99%
Error : 11%
Unsupervised
Technique
Agglomerative
Different variable Cluster
combinations
Clusters
Analysis
Children Poverty Rate, Poverty Rate &
Adult Obesity Rate
Gyms Per Thousand & Adult Obesity
Rate
Two-Step Clustering
Evaluation
Predictive accuracy
Unsupervised
Technique
K-Means Clustering
Data Create 4 clusters
Analyze any region
similarities
Northeast, South, Midwest,
West
Evaluation
Predictive accuracy
Cluster 1
Cluster 2
Obesity Rates
Cluster 3
Cluster 4
Cluster 1
Cluster 2
Gyms Per
Thousand
Cluster 3
Cluster 4
Deployment
Business / real world applications
Questions?