You are on page 1of 3

Grow Up and Get a Job!

My partner and I chose to study the correlation between the age of Fowler High
School students and the amount of jobs they have had in their lifetime. Throughout the
project, we were able to conclude results about the data, as well as predict other values.
The two variables that we studied were age and amount of jobs. The explanatory
variable is the one in which we have less control over. In this situation, that would be the
age. This is also the data on the x axis. The response variable, or the result, is the
amount of jobs had. This is the variable on the y axis.
After making our scatter plot, we can conclude that there is moderate positive
correlation. In other words, as one variable goes up, so does the other. As age
increases, so do the amount of jobs, to some degree. It is not a very strong correlation,
however, it is minimally correlated. We also added a trend line to help us better see
the line of best fit. However, it is not a very steep line. It appears that there are more
points above the line than below. Because of this, we could conclude that a lot of the
points below the line were clustered. In other words, there are points stacked on top of
each other. For example, by looking at the graph, you may assume that there is only
one seventeen year old who had one job. However, there turned out to be two. This
graph may be a bit misleading since there are points stacked onto each other.
The correlation coefficient for our data is .539. This number represents the
strength of our correlation between our two variables, age and the number of jobs
attainted. When assessing the strength of a correlation, numbers that fall between 0 and
1 have positive correlation. Our number falls almost in the middle of the scale. It is
about halfway between 0 and 1. From this, we can conclude that there is moderate
positive correlation.
The x for our data is 15.98. The The x , or the mean of the x values, is
representative of the average of all of the x values, or the ages. In other words, the
average age of the students in the study is 15.98, so nearly 16 years old. This makes
sense because 16 is a common age between both sophomore and juniors, which is why
it makes sense that this would be the average age. The y is 1.38. The y is the mean of
all of the y values. This number represents the average of all of the y values. The data
on
the y value was number of jobs, so the y is revealing that the average number of jobs
withheld was 1.38. We can conclude that the average number of jobs held was around 1.
The point ( x , y ) is the point which contains both the mean of the x values and the
mean of the y values. For our data, this point would be (15.98, 1.38). This point is one in
which the line of best fit must pass through.
The least squares regression line for our data was found by running a linear
regression on our calculators. The equation for our line is y= .442x + -5.685. This
equation gives the equation for the line of best fit which was added to our graph. By
adding the least squares regression line, we are able to see the overall trend of data,
and predict other amounts. An important piece of information to note about our
regression line is that it is not very steep, and it has a lot more dots above than below.
This may be misleading, however a lot of our points are stacked on top of each other.
This is why the line of best fit looks as if it is lower, and there are more points on top.
To find the marginal change, we looked to our equation. The marginal change for
our data is .442. The slope of our line is .442. This is a numerical value to account for
why our line isn’t very steep.
In our data set, there were no influential points. Influential points are points that
would drastically affect the data if they were taken away. We did not have any influential
points. This may be because there was not very many opportunities for that to happen.
For example, the only age range was from 14-18. It was impossible for anyone to have
an extreme value, such as 26, if they are a student at Fowler High School. Also, it was
highly unlikely for a student to have an extreme number of jobs. For example, no
student claimed to have a really high number, such as 14 jobs, that would throw off the
data set. The significance of influential points is that it can really affect the data. For
example, the r value would be greatly increased or decreased based on just one
influential points.
The coefficient of determination r2 is .290. We found this by running a linear
regression in our calculators. r2 is a measure of how closely the data falls in relation to
the least squares regression line. In other words, our line is 29% closely related. r2 is
also equal to the amount of explained variation. The amount of explained variation is
29%. Therefore, the amount of unexplained variation is 71%. These numbers show us
that there is far more unexplained variation in our data set than there is explained.
In terms of lurking variables, we could not think of any major ones. The one that
we did come up with would be the possession of a car. For example, a 17 year old may
have a harder time obtaining a job if they do not have a source of transportation. This
may have been one of the reasons that some upperclassmen only have had a few jobs.
This may have impacted our data by decreasing the number of jobs held by
upperclassmen.
Interpolation and extrapolation are used to predict values that are not directly
stated. One example of interpolation would be predicting the number of jobs that some
one who is 16.5 years old. By following the trend upward, interpolation can be used to
predict that a person of that age would have had around 1.5 jobs. That is not possible,
but interpolation can be used to predict that. By following the line outward and past our
survey, we can use extrapolation to predict that a 20 year old person may have around 3
jobs. However, extrapolation is not always accurate.

You might also like