You are on page 1of 3

Katelyn W and Morgan G

Mrs. Lewis

Statistics and Probability

19 March 2019

Junk Mail?

When most FHS students receive an email, what do they do? For our chapter 7 project,
we decided to take a look at the amount of emails students have. This was a random selection by
surveying every third student on the list of 9th-12th graders. We found that some keep their
inbox neat and up to date, while others let their emails pile up in their inbox.

Our results proved to be anything, but normal. By looking at the histogram, it is quite
obvious that there is no normality to the results. There is no bell shaped curve.The mean for our
data set was 1753.79. This means that the average number of emails left in a student’s email is
approximately 1753. If this was a normal distribution, that value would be found in the middle of
the histogram. But due to the several outliers, that is not the case. We were able to find 𝜇by
running a 1 variable statistic in our calculators. The standard deviation for our data set is
2660.48. This number represents the spread of our data, or the numerical distance from the mean,
or 1753 emails. Had this been a normal distribution, the farther away from the mean, the higher
the standard deviation would be. We found 𝜎 by running a 1 variable statistic in our calculators.

By running a 1 variable statistic, we were also able to compute the five number summary.
The five number summary is made up of the minimum amount, the lower quartile, the median,
the upper quartile, and the maximum number. The minimum number was 0. The Q1 amount was
14. The median was 205. The Q3 amount was 3678. The maximum number was 9675.

The criteria for being a normal distribution are: the histogram should be approximately a
bell shaped curve, there should be no more than one outlier, Pearson’s Index, 3 (x̅-median)/s,
must be between -1 and 1, and the normal quantile plot must be a straight line. We can conclude
that our data set is not normal simply by assessing the first two categories. As pictured, our
histogram is not displaying a bell shaped curve. Also, there are several outliers. While our mean
is 1753.79, we have data at both 0 and 9675. These two numbers are quite far from our mean.
We also can find Pearson’s index by subtracting the median from the mean and dividing by
standard deviation. The Pearson's Index: 3(1753.79 - 205)/2660.48 gave us a final answer of
1.746. This does not fall between 1 and -1, therefore we could conclude that the distribution was
not normal.
Due to our data being abnormal the empirical calculations are pretty interesting. For our
data to fall within 68% it needs to be between 4414.27 and -906.69 emails. To calculate this you
take 1752.79661 and add 2660.491122. You will then take 1752.79661 and subtract
2660.491122. To get 95% you double this and for 99.7% you triple it. For our data to fall within
95% it needs to be between 7074.75 and -3567.17 emails. For our data to fall within 99.7% it
needs to be between 9735.23 and -6227.65.

If we were to answer our survey question our responses would have been 50 and 270. For
the data value of 50, we can compute what percent of people would answer the same, or less than
our amount of emails using the z score table. We must first convert the amount to a z score using
the formula z= x - 𝜇/𝜎. To find the z score for our data value, the formula reads z= 50-
1753.79/2660.48. This gives us a z score of -.64. Using this z score, we can compute the percent
chance that a person will have 50 or less emails in their inbox. The z score gives us a value, or
decimal of .2611. In other words, there is a 26% chance that a person would answer with 50
emails or less, based off our data. To find the probability of a person having more than 50
emails, we can simply subtract the decimal from the z table from one. 1-.2611=.7398. This
number tells us that there is approximately a 74% chance that a person at random would have
more than 250 emails in their inbox.

For our second answer, 270 emails, we are able to use the same process, but instead
subbing out 270 for the 50 in the past problem. The equation we used is z=270-1753.79/2660.48.
This gives us a z score of -.55. Using this value, we can turn to our z score table to estimate the
probability that a person at random would answer 270 emails or below. The z score gives us a
value of .2912. In other words, there is a 29% chance that a person at random will have 270
emails or less in their inbox when surveyed. To find the probability of a person answering
greater than 270 emails, we can subtract our decimal from the z table from 1. 1-.2912= .7088.
This number tells us that there is approximately a 71% chance that a person at random would
have more than 250 emails in their inbox.

The central theorem averages the emails that are within a group of 30 individuals and
gives us a percentage of who is below us and who is above us. To calculate this theorem, you
have to do Z=x̄-𝜇/𝜎 √ n. So, the first equation would be 50-1753.79/2660/48 √ 30, which equals -
.12. There is a 45% chance of this happening. For the second equation, it would be 270-
1753.79/2660.48√30, which equals -.10. There is a 46% chance of this happening.

To find the probability that a person at random would answer between our answers, we
can subtract the decimal values from the z table. We must take the z value from 50, and subtract
it from the z value of 250. The equation for this is : .2912-.2611.The solution to this equals
.0301. In other words, there is a 3% chance that a person at random would answer that they have
between 50 and 270 emails in their inbox.
When trying to find the raw data value in which 3% of the data should fall below, we
must use the formula that works backwards. This formula reads: x=z𝜎+𝜇. To find the z value, we
must turn to our z score table and find a number on the table that reads approximately .0300. The
closest decimal on the table to this amount is .0301. This decimal is found under the z value of -
1.88. We can use this z value to find the raw data value that lies below 3%. The equation now
reads: x= -1.88 x 2660.48 + 1753.79. This gives us a final answer of -3248.21. This number
means that 3% of the data would fall below -3248.21 emails. Obviously, this is not a possible
answer, because there cannot be negative emails. Since our data is not normal, this does not
make sense. However, 3% of our data is calculated to fall below this number.

To calculate the raw data value above which 3% of our data value should fall, we can use
the same process. Using the equation x= -1.88 x 2660.48 + 1753.79, we are left with the same
answer of -3248.21 emails. However, we are trying to find the value that would fall above that.
Since we found this amount from a left tail z score table, we must subtract this value from 1. 1- -
3248.21 = 3249.21 emails. This number equals the raw data value that 3% of our data should fall
above. This number also is not accurate because we do not have a normal distribution
whatsoever.

You might also like