Professional Documents
Culture Documents
(Part 6)
The mean number of skittles, the standard deviation, the 95% confidence interval and
all the other statistical values found in a sample of skittles meant nothing to me before this
class. All I wanted was to eat the skittles and enjoy the taste of the rainbow. This skittle project
taught me how to apply statistical principles to the real world.
Recently I was taking a tour of the Coca-Cola plant. As I watched the amount of cans and
bottles being sorted, filled, labeled, packaged and shipped off in a short amount of time, the
principles of statistics ran across my mind. First, I thought about how many cans of soda are
produced each day. The tour guide told me the plant shipped 1.8 million cans of soda each day.
As I considered how he knows that number, I immediately thought about the mean. Each day
they likely dont produce exactly 1.8 million cans of soda, but rather 1.8 million is the mean
number of cans of many days. This was a simple way in which I noticed statistics I learned in
class applied to the real world.
This term project taught me a lot about my own problem solving skills. It was a great
way to practice problem solving because it was a real, tangible study that we did as a class. It
wasnt something I just read out of a book, but rather something I played a role in creating.
Also, this project really forced me to internalize the statistics and understand what I was doing
and what the results meant. It wasnt just plugging numbers in and getting numbers out. It was
real data that I had to internalize and understand. This is a skill that I will continue to practice in
my life and I know it is important to master.
Although I was hesitant about this term project at the beginning of the semester, I am
grateful it was assigned and it is something I will remember. Statistics is all around us and can
tell us a lot about the world in which we live.
Data:
Personal Data
Class Data
Red
13
219
Orange
19
205
Yellow
7
220
Green
11
213
Purple
12
213
Total
62
1070
The class data shows us that although different, the colors of skittles in a bag is
relatively consistent. At first glance, we can see the data is similar with two numbers being the
same and two others being one value apart. The mean is 214 with a standard deviation of 6. All
of our data is within 2 standard deviations of the mean, thus showing that our data has no
outliers. My personal data is irregular compared to the class data. I had almost three times as
many orange skittles compared to yellow. The mean for my personal data was 12.4 with
standard deviation of 4.34. Although all the data was within 2 standard deviations and has no
outliers, 4.34 is a big standard deviation for a small set of data.
The graphs are not what I expected to see. After I did my counts and seeing a big
difference between certain colors, I was expecting to see the same results in the class data.
However, the class data shows that the color of skittles in a bag should have similar numbers.
The reason for a difference in my personal data was the result of using a small sample size.
Using a small data set will result in a greater chance for irregular data and counts, such as my
personal data. I now know that after I eat many bags of skittles, I will consume a similar amount
of each color!
Our class had a sample size of eighteen bags of 2.17 oz. skittles. The shape of the distribution
was sporadic and followed no systematic order. The max amount of candies, 62, also had the
greatest frequency of five bags. However, the range of the distribution is relatively small (six).
Both the distribution and the range are what we would expect to see. We could do another
sample of eighteen bags and get another scattered distribution with a small range. This would
be consistent throughout different trials.
The mean from the class data was 59.4 and my number of candies was 62. These
numbers are close enough that my data is consistent with the data from the class. The
differences found in different bags of candy could be attributed to broken candies, which we
did not count in our data. I had no broken candies in my bag, thus I had a higher number
compared to the mean.
Categorical data and quantitative data is important to understand if we want to
interpret and create understand from our results. Categorical data are things that cannot be
placed in an order or the order of them does not matter. For example, colors and numbers on
the back of basketball players jerseys are examples of categorical data. We could put these
things in order; however, the order would not matter. Quantitative data is data that can be
ordered representing counts or measurements. Examples of quantitative data is number of
skittles in a bag or the heights of basketball players. These can be placed in order and their
differences mean something.
The types of graphs most commonly used for categorical data are bar graphs, pareto
charts, and pie charts. For example, in a bar graph the x-axis represents the category and the yaxis represents the frequency or relative frequency. We could create a histogram with the xaxis having the color of skittles and the y-axis the frequency of those colors in a certain bag.
This is an easy graph to read if we want to compare how many of the different colors of skittles
are in each bag. The types of graphs most commonly used for quantitative data are bar graphs,
stemplots, scatterplots. These things will tell us counts of certain numbers, will help use
estimate the middle, and show us if our data has a certain trend or distribution.
The types of calculations that make sense for categorical data are frequency or relative
frequency. Often when sorting through categorical data, we are interested in the amount of
that certain category. Calculations such as mean, median, and mode could be calculated for
numerical categorical data, but finding these numbers would not make sense or mean anything.
For quantitative data that has order and measure, calculations such as range, median, mean,
mode, and 5-number summaries make sense to calculate. These calculations will help us
understand the distribution of our data and the role each count plays in our sample.