You are on page 1of 295

Using and Handling Data

Data Index

Probability and
Statistics Index

Graphs Index

What is Data?

What is Data?
Discrete and Continuous Data
Advanced: Analog and Digital Data

How to Show Data

Bar Graphs
Pie Charts
Dot Plots
Line Graphs
Scatter (x,y) Plots
Pictographs
Histograms
Frequency Distribution
Stem and Leaf Plots
Cumulative Tables and Graphs
Graph Paper Maker

Surveys

How to Do a Survey
Survey Questions
Showing the Results of a Survey
Accuracy and Precision

Activity: Asking Questions

Activity: Improving Questions

Probability and Statistics

Measures of Central Value


Finding a Central Value
Calculate the Mean Value and The Mean Machine
Find the Median Value
Find the Mode or Modal Value

Activity: Averages Brain-Teaser

Calculate the Mean from a Frequency Table


Advanced: Mean, Median and Mode from Grouped Frequencies
Weighted Mean

Measures of Spread

The Range
Quartiles and the Interquartile Range
Percentiles
Mean Deviation
Standard Deviation
Standard Deviation Calculator
Standard Deviation Formulas
Comparing Data

Univariate and Bivariate Data


Scatter (x,y) Plots
Outliers
Correlation

Probability

Probability
The Probability Line
The Spinner
The Basic Counting Principle
Relative Frequency

Activities:

An Experiment with a Die


An Experiment with Dice
Dropping a Coin onto a Grid
Buffon's Needle
Random Words
Lotteries

Events

Complement
Probability: Types of Events
Independent Events
Dependent Events: Conditional Probability
Tree Diagrams
Mutually Exclusive Events

Combinations and Permutations

Combinations and Permutations


Combinations and Permutations Calculator

Advanced

False Positives and False Negatives


Bayes Theorem
Shared Birthdays
Confidence Intervals Confidence Interval Calculator
Chi-Square Test Chi-Square Calculator
Least Squares Regression Least Squares Calculator

Random Variables
Random Variables
Random Variables - Continuous
Random Variables - Mean, Variance and Standard Deviation

The Binomial Distribution


Quincunx and Quincunx Explained
The Binomial Distribution

The Normal Distribution

Normal Distribution
Standard Normal Distribution Table
Skewed Data

What is Data?
Data is a collection of facts, such as numbers, words, measurements, observations or even just
descriptions of things.

Qualitative vs Quantitative
Data can be qualitative or quantitative.

Qualitative data is descriptive information (it describes something)


Quantitative data, is numerical information (numbers).

And Quantitative data can also be Discrete or Continuous:

Discrete data can only take certain values (like whole numbers)
Continuous data can take any value (within a range)

Put simply: Discrete data is counted, Continuous data is measured


Example: What do we know about Arrow the Dog?

Qualitative:

He is brown and black


He has long hair
He has lots of energy

Quantitative:

Discrete:
He has 4 legs
He has 2 brothers
Continuous:
He weighs 25.5 kg
He is 565 mm tall

To help you remember think "Quantitative is about Quantity"

More Examples
Qualitative:

Your friends' favorite holiday destination


The most common given names in your town
How people describe the smell of a new perfume

Quantitative:

Height (Continuous)
Weight (Continuous)
Petals on a flower (Discrete)
Customers in a shop (Discrete)

Collecting
Data can be collected in many ways. The simplest way is direct observation.

Example: you want to find how many cars pass by a certain point on a road in a 10-minute
interval.

So: stand at that point on the road, and count the cars that pass by in that interval.

We collect data by doing a Survey.

Census or Sample
A Census is when we collect data for every member of the group (the whole "population").

A Sample is when we collect data just for selected members of the group.

Example: there are 120 people in your local football club.

You can ask everyone (all 120) what their age is. That is a census.

Or you could just choose the people that are there this afternoon. That is a sample.

A census is accurate, but hard to do. A sample is not as accurate, but may be good enough, and
is a lot easier.

Language

Data or Datum?

The singular form is "datum", so we say "that datum is very high".

"Data" is the plural so we say "the data are available", but it is also a collection of facts, so "the
data is available" is fine too.
Discrete and Continuous Data
Data can be Descriptive (like "high" or "fast") or Numerical (numbers).

And Numerical Data can be Discrete or Continuous:

Discrete data is counted,


Continuous data is measured

Discrete Data
Discrete Data can only take certain values.

Example: the number of students in a class (you can't have half a student).

Example: the results of rolling 2 dice:

can only have the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12

Continuous Data

Continuous Data can take any value (within a range)

Examples:

A person's height: could be any value (within the range of human heights), not just certain
fixed heights,
Time in a race: you could even measure it to fractions of a second,
A dog's weight,
The length of a leaf,
Lots more!
Analog and Digital

Analog: something physical with continuous change.

Digital: made of numbers.

Arrow Barks!
Let's record him barking:

Arrow's bark is analog. It is actual pressure waves in the air, so it is physical with continuous
change.

Continuous change: changes smoothly ... no sudden breaks.

And the microphone converts that pressure into an electrical signal. It is still analog (the
electricity is physical, and has continuous change).

But when it gets to your computer or phone it gets


converted to digits!
Thousands of times a second the analog
signal is measured by special electronics ...
and is then saved asnumbers.

So the "sound" is now "12, 25, 39, 52, 68, 71, 78, 82, 82, 79, 70, 59, ..." (in fact it would be
inbinary, so would be something like "000011000001100100100111...")

It is now digital!

Notice the digital data has sudden jumps up and down ... it does not change continuously.

It is Discrete Data: that means it can only be certain values (such as 1, 2, 3, etc).

Digital data is very easy for computers and phones to use. It can be saved, shared electronically,
sent all over the world quickly and more.

How can we hear Digits?


Easy! The numbers are used to control the size of an electrical signal, which is analog.

Digital becomes Analog

The electricity can be sent to a


speaker ...

... to make sound waves again!

It should sound very much like the original bark (but not perfectly so!)

Digital Pictures
A similar thing happens when you take a picture.

Light (which is analog) gets projected onto a grid of millions of little sensors inside the camera:

The camera measures the light at each point and produces numbers.

The picture is now digital!

So the "picture" is now "A1DDF9, ADE3FF, B5E7FE, AFE4F8, ...", which are hexadecimal color
numbers, (that are used internally in binary, so would be something like
"101000011101110111111001...")

Look really closely at a digital picture ... it is made up of millions of little squares called "pixels":

Each "pixel" is made using a hexadecimal color number.

Digital IS Numbers
So digital pictures, music, videos etc are actually stored on your device as numbers.

Numbers rule!

Bar Graphs
A Bar Graph (also called Bar Chart) is a graphical display of data using bars of different heights.

Imagine you just did a survey of your friends to find which kind of movie they liked best:

Table: Favorite Type of Movie

Comedy Action Romance Drama SciFi


4 5 6 1 4

We can show that on a bar graph like this:


It is a really good way to show relative sizes: we can see which types of movie are most liked,
and which are least liked, at a glance.

We can use bar graphs to show the relative sizes of many things, such as what type of car
people have, how many customers a shop has on different days and so on.

Example: Nicest Fruit

A survey of 145 people asked them "Which is the nicest fruit?":

Fruit: Apple Orange Banana Kiwifruit Blueberry Grapes

People: 35 30 10 25 40 5

And here is the bar graph:

That group of people think Blueberries are the nicest.

Bar Graphs can also be Horizontal, like this:

Example: Student Grades

In a recent test, this many students got these grades:

Grade: A B C D
Students: 4 12 10 2

And here is the bar graph:

You can create graphs like that using our Data Graphs (Bar, Line and Pie) page.

Histograms vs Bar Graphs


Bar Graphs are good when your data is incategories (such as "Comedy", "Drama", etc).

But when you have continuous data (such as a person's height) then use a Histogram.

It is best to leave gaps between the bars of a Bar Graph, so it doesn't look like a Histogram.

Pie Chart
Pie Chart: a special chart that uses "pie slices" to show relative sizes of data.

Imagine you survey your friends to find the kind of movie they like best:

Table: Favorite Type of Movie


Comedy Action Romance Drama SciFi
4 5 6 1 4

You can show the data by this Pie Chart:

It is a really good way to show relative sizes: it is easy to see which movie types are most liked,
and which are least liked, at a glance.

You can create graphs like that using our Data Graphs (Bar, Line and Pie) page.

Or you can make them yourself ...

How to Make Them Yourself


First, put your data into a table (like above), then add up all the values to get a total:

Table: Favorite Type of Movie


Comedy Action Romance Drama SciFi TOTAL
4 5 6 1 4 20

Next, divide each value by the total and multiply by 100 to get a percent:

Comedy Action Romance Drama SciFi TOTAL


4 5 6 1 4 20
4/20 5/20 6/20 1/20 4/20
100%
= 20% = 25% = 30% = 5% = 20%

Now to figure out how many degrees for each "pie slice" (correctly called a sector).

A Full Circle has 360 degrees, so we do this calculation:

Comedy Action Romance Drama SciFi TOTAL


4 5 6 1 4 20
20% 25% 30% 5% 20% 100%
4/20 360 5/20 360 6/20 360 1/20 360 4/20 360
360
= 72 = 90 = 108 = 18 = 72

Now you are ready to start drawing!

Draw a circle.

Then use your protractor to measure the degrees of each sector.


Here I show the first sector ...

Finish up by coloring each sector and giving it a label like "Comedy: 4 (20%)", etc.

(And dont forget a title!)

Another Example
You can use pie charts to show the relative sizes of many things, such as:

what type of car people have,


how many customers a shop has on different days and so on.
how popular are different breeds of dogs
Example: Student Grades

Here is how many students got each grade in the recent test:

A B C D
4 12 10 2

And here is the pie chart:

Dot Plots
A Dot Plot is a graphical display of data using dots.

Example: Minutes To Eat Breakfast

A survey of "How long does it take you to eat breakfast?" has these results:

Minutes: 0 1 2 3 4 5 6 7 8 9 10 11 12
People: 6 2 3 5 2 5 0 0 2 3 7 4 1

Which means that 6 people take 0 minutes to eat breakfast (they probably had no breakfast!), 2
people say they only spend 1 minute having breakfast, etc.
And here is the dot plot:

Another version of the dot plot has just one dot for each data point like this:

Example: (continued)

This has the same data as above:

But notice that we need to have lines and numbers on the side so we can see what the dots
mean.

Grouping
Example: Access to Electricity across the World

Some people don't have access to electricity (they live in remote or poorly served areas). A
survey of many countries had these results:

Access to Electricity
Country
(% of population)
Algeria 99.4
Angola 37.8
Argentina 97.2
Bahrain 99.4
Bangladesh 59.6
... ... etc

But hang on! How do we make a dot plot of that? There might be only one "59.6" and one
"37.8", etc. Nearly all values will have just one dot.

The answer is to group the data (put it into "bins").

In this case let's try rounding every value to the nearest 10%:

Access to Electricity
Country (% of population,
nearest 10%)
Algeria 100
Angola 40
Argentina 100
Bahrain 100
Bangladesh 60
... ... etc

Now we count how many of each 10% grouping and these are the results:

Access to Electricity
Number of
(% of population,
Countries
nearest 10%)
10 5
20 6
30 12
40 5
50 4
60 5
70 6
80 10
90 15
100 34

So there were 5 countries where only 10% of the people had access to electricity, 6 countries
where 20% of the people had access to electricity, etc

Here is the dot plot:


Percent of Population with Access to Electricity

And that is a good plot, it shows the data nicely.

Line Graphs
Line Graph: a graph that shows information that is connected in some way (such as change
over time)

You are learning facts about dogs, and each day you do a short test to see how good you are.
These are the results:

Table: Facts I got Correct


Day 1 Day 2 Day 3 Day 4
3 4 12 15

And here is the same data as a Line Graph:

You seem to be improving!

Making Line Graphs


You can create graphs like that using the Data Graphs (Bar, Line and Pie) page.
Or you can draw it yourself!

Example: Ice Cream Sales

Table: Ice Cream Sales


Mon Tue Wed Thu Fri Sat Sun
$410 $440 $550 $420 $610 $790 $770

Let's make the vertical scale go from $0 to $800, with tick marks every $200

Draw a vertical scale with tick marks

Label the tick marks, and give the scale a label

Draw a horizontal scale with tick marks and labels


Put a dot for each data value

Connect the dots and give the graph a title

Important! Make sure to have:

A Title
Vertical scale with tick marks and labels
Horizontal scale with tick marks and labels
Data points connected by lines

Scatter Plots

A Scatter (XY) Plot has points that show the relationship between two sets of
data.

In this example, each dot shows one person's weight versus their height.

(The data is plotted on the graph as " Cartesian (x,y) Coordinates ")

Example:
The local ice cream shop keeps track of how much ice cream they sell versus the noon
temperature on that day. Here are their figures for the last 12 days:

Ice Cream Sales vs Temperature

Temperature C Ice Cream Sales

14.2 $215

16.4 $325

11.9 $185

15.2 $332

18.5 $406

22.1 $522

19.4 $412

25.1 $614

23.4 $544

18.1 $421

22.6 $445

17.2 $408

And here is the same data as a Scatter Plot:

It is now easy to see that warmer weather leads to more sales, but the relationship is not
perfect.
Line of Best Fit
We can also draw a "Line of Best Fit" (also called a "Trend Line") on our scatter plot:

Try to have the line as close as possible to all points, and as many points above the line as
below.

Example: Sea Level Rise

A Scatter Plot of Sea Level Rise:

And here I have drawn on a "Line


of Best Fit".

Interpolation and Extrapolation


Interpolation is where we find a value inside our set of data points.

Here we use linear interpolation to estimate the sales at 21 C.

Extrapolation is where we find a value outside our set of data points.

Here we use linear extrapolation to estimate the sales at 29 C (which is higher than any
value we have).

Careful: Extrapolation can give misleading results because we are in "uncharted territory".

As well as using a graph (like above) we can create a formula to help us.

Example: Straight Line Equation

We can estimate a straight line equation from two points from the graph above

Let's estimate two points on the line near actual values: (12, $180) and (25, $610)

First, find the slope:

slope "m" = change in ychange in x

= $610 $18025 12

= $43013

= 33 (rounded)

Now put the slope and the point (12, $180) into the "point-slope" formula :

y y1 = m(x x1)
y 180 = 33(x 12)

y = 33(x 12) + 180

y = 33x 396 + 180

y = 33x 216

INTERPOLATING

Now we can use that equation to interpolate a sales value at 21:

y = 3321 216 = $477

EXTRAPOLATING

And to extrapolate a sales value at 29:

y = 3329 216 = $741

The values are close to what we got on the graph. But that doesn't mean they are more (or less)
accurate. They are all just estimates.

Don't use extrapolation too far! What sales would you expect at 0 ?

y = 330 216 = $216

Hmmm... Minus $216? We extrapolated too far!

Note: we used linear (based on a line) interpolation and extrapolation, but there are many
other types, for example we could use polynomials to make curvy lines, etc.

Correlation
When the two sets of data are strongly linked together we say they have a High Correlation.

The word Correlation is made of Co- (meaning "together"), and Relation

Correlation is Positive when the values increase together, and


Correlation is Negative when one value decreases as the other increases

Like this:
(Learn More About Correlation )
Negative Correlation
Correlations can be negative, which means there is a correlation but one value goes down as the
other value increases.

Yearly
Birth
Country Production
Rate
per Person

Example : Birth Rate vs Income Madagascar $800 5.70

The birth rate tends to be lower in richer countries.


India $3,100 2.85

Mexico $9,600 2.49

Below is a scatter plot for about 100 different countries.


Taiwan $25,300 1.57

Norway $40,000 1.78

It has a negative correlation (the line slopes down)

Note: I tried to fit a straight line to the data, but maybe a curve would work better, what do you
think?

Pictographs
A Pictograph is a way of showing data using images.
Each image stands for a certain number of things.

Example: Apples Sold

Here is a pictograph of how many apples were sold at the local shop over 4 months:

Note that each picture of an apple means 10 apples (and the half-apple picture means 5
apples).

So the pictograph is showing:

In January 10 apples were sold


In February 40 apples were sold
In March 25 apples were sold
In April 20 apples were sold

It is a fun and interesting way to show data.

But it is not very accurate: in the example above we can't show just 1 apple sold, or 2 apples
sold etc.

Pictographs can also be vertical, like this:

Example: Games Played

Four friends play a lot of tennis. Here is how many games they played this year:

Each tennis ball means 20 games played. A tennis ball can be cut to show part of 20.

So the pictograph is showing:

John played 40 games


Sam played 45 games
Mary played 90 games
Alex played 55 games

Can you see that Alex played 55 games?


Why don't you try to make your own pictographs? Here are a few ideas:

How much money you have (week by week)


How much exercise you get (each day)
How many hours you play games every week
How much water you drink
How far your friends travel every day
How many goals your team makes

Have fun making pictures for each!

Histograms
Histogram: a graphical display of data using bars of different heights.

It is similar to a Bar Chart, but a


histogram groups numbers into ranges

And you decide what ranges to use!


Example: Height of Orange Trees

You measure the height of every tree in the orchard in centimeters (cm)

The heights vary from 100 cm to 340 cm

You decide to put the results into groups of 50 cm:

The 100 to just below 150 cm range,


The 150 to just below 200 cm range,
etc...

So a tree that is 260 cm tall is added to the "250-300" range.

And here is the result:

You can see (for example) that there are 30 trees from 150 cm to just below 200 cm tall

The horizontal axis is continuous like a number line:


Example: How much is that puppy growing?

Each month you measure how much weight your pup has gained and get these results:

0.5, 0.5, 0.3, 0.2, 1.6, 0, 0.1, 0.1, 0.6, 0.4

They vary from 0.2 (the pup lost weight that month) to 1.6

Put in order from lowest to highest weight gain:

0.2, 0, 0.1, 0.1, 0.3, 0.4, 0.5, 0.5, 0.6, 1.6

You decide to put the results into groups of 0.5:

The 0.5 to just below 0 range,


The 0 to just below 0.5 range,
etc...

And here is the result:

(There are no values from 1 to just below 1.5, but we still show the space.)

The range of each bar is also called the Class Interval

In the example above each class interval is 0.5

Histograms are a great way to show results of continuous data, such as:

weight
height
how much time
etc.

But when the data is in categories (such as Country or Favorite Movie), we should use
aBar Chart.

Frequency Histogram
A Frequency Histogram is a special histogram that uses vertical columns to show frequencies
(how many times each score occurs):

Here I have added up how often 1 occurs (2 times),


how often 2 occurs (5 times), etc,
and shown them as a histogram.

Frequency Distribution

Frequency
Frequency is how often something occurs.

Example: Sam played football on

Saturday Morning,
Saturday Afternoon
Thursday Afternoon

The frequency was 2 on Saturday, 1 on Thursday and 3 for the whole week.

Frequency Distribution
By counting frequencies we can make a Frequency Distribution table.
Example: Goals

Sam's team has scored the following numbers of goals in recent games:

2, 3, 1, 2, 1, 3, 2, 3, 4, 5, 4, 2, 2, 3

Sam put the numbers in order,


then added up:

how often 1 occurs (2


times),
how often 2 occurs (5
times),
etc,

and wrote them down as a


Frequency Distribution table.

From the table we can see interesting things such as

getting 2 goals happens most often


only once did they get 5 goals

This is the definition:

Frequency Distribution: values and their frequency (how often each value occurs).

Here is another example:

Example: Newspapers

These are the numbers of newspapers sold at a local shop over the last 10 days:

22, 20, 18, 23, 20, 25, 22, 20, 18, 20

Let us count how many of each number there is:


Papers Sold Frequency
18 2
19 0
20 4
21 0
22 2
23 1
24 0
25 1

It is also possible to group the values. Here they are grouped in 5s:

Papers Sold Frequency


15-19 2
20-24 7
25-29 1

(Learn more about Grouped Frequency Distributions)

Graphs
After creating a Frequency Distribution table you might like to make a Bar Graph or a Pie
Chartusing the Data Graphs (Bar, Line and Pie) page.

Stem and Leaf Plots


A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first
digit or digits) and a "leaf" (usually the last digit). Like in this example:

Example:

"32" is split into "3" (stem) and "2" (leaf).

More Examples:
Stem "1" Leaf "5" means 15
Stem "1" Leaf "6" means 16
Stem "2" Leaf "1" means 21
etc

The "stem" values are listed down, and the "leaf" values go right (or left) from the stem values.

The "stem" is used to group the scores and each "leaf" shows the individual scores within each
group.

Example: Long Jump

Sam got his friends to do a long jump and got these results:

2.3, 2.5, 2.5, 2.7, 2.8 3.2, 3.6, 3.6, 4.5, 5.0

And here is the stem-and-leaf plot:

Stem Leaf
2 35578
3 266
4 5
5 0

Stem "2" Leaf "3" means 2.3

Note:

Say what the stem and leaf mean (Stem "2" Leaf "3" means 2.3)
In this case each leaf is a decimal
It is OK to repeat a leaf value
5.0 has a leaf of "0"
Cumulative Tables and Graphs

Cumulative
Cumulative means "how much so far".

Think of the word "accumulate" which means to gather together.

To have cumulative totals, just add up the values as you go.

Example: Jamie has earned this much in the last 6 months:

Month Earned
March $120
April $50
May $110
June $100
July $50
August $20

To work out the cumulative totals, just add up as you go.

The first line is easy, the total earned so far is the same as Jamie earned that month:

Month Earned Cumulative


March $120 $120

But for April, the total earned so far is $120 + $50 = $170 :

Month Earned Cumulative


March $120 $120

April $50 $170

And for May we continue to add up: $170 + $110 = $280


Month Earned Cumulative
March $120 $120

April $50 $170

May $110 $280

Do you see how we add the previous month's cumulative total to this month's earnings?

Here is the calculation for the rest:

June is $280 + $100 = $380


July is $380 + $50 = $430
August is $430 + $20 = $450

And this is the result

Month Earned Cumulative


March $120 $120
April $50 $170
May $110 $280
June $100 $380
July $50 $430
August $20 $450

The last cumulative total should match the total of all earnings:

$450 is the last cumulative total ...


... it is also the total of all earnings:

$120+$50+$110+$100+$50+$20 = $450

So we got it right.

So that's how to do it, add up as you go down the list and you will have cumulative totals.

We could also call it a "Running Total"

Graphs
We can make cumulative graphs, too. Just plot each cumulative total:

Cumulative Bar Graph

Cumulative Line Graph

How to Do a Survey

Survey Says ...


Turn on the television, radio or open a newspaper and
you will often see the results from a survey.
Gathering information is an important way to help people make decisions about topics
of interest.

Surveys can help decide what needs changing, where money should be spent, what
products to buy, what problems there might be, or lots of other questions you may
have at any time.

The best part about surveys is that they can be used to answer any question about any
topic.

You can survey people (through questionnaires, opinion polls, etc) or things (like pollution
levels in a river, or traffic flow).

Four Steps
Here are four steps to a successful survey:

Step one: create the questions


Step two: ask the questions
Step three: tally the results
Step four: present the results

Let us look at those steps in more detail ...

Step One: Create the Questions


The first thing is to decide is

What questions do you want answered?

Sometimes these may be simple questions like:

"What is your favorite color?"

Other times the questions may be quite complex such as:

Which roads have the worst traffic conditions


Simple Surveys

When doing a simple survey, you can use tally marks to show each persons answer:

Sometimes, it is helpful to be creative in how the people can respond. It makes it more fun for
both you and your respondents (the people answering the question).

Example:What is your favorite color?

Have them write down their favorite color on a piece of paper and drop it in a fish bowl.

Then, put all of the pieces of paper into piles and count them.

To help you make a good Questionnaire read our page Survey Questions.

Step Two: Asking The Questions


Now you have your questions, go out and ask them! But who to ask?

If you survey a small group you can ask everybody (called a Census)

If you want to survey a large group, you may not be able to ask everybody so you should ask a
sample of the population (called a Sample)

When you are Sampling you should be careful who you ask.

To be a good sample, each person should be chosen randomly


If you only ask people who look friendly, you will only know what friendly
people think!
If you go to the swimming pool and ask people "Can you swim?" you will get a
biased answer ... maybe even 100% will say "Yes"

And the surveys where people are asked to ring a number to vote are not
very accurate, because only certain types of people actually ring up!

So be careful not to bias your survey. Try to choose randomly.

Example: You want to know the favorite colors for people at your school, but don't have the time
to ask everyone.

Solution: Choose 50 people at random:

stand at the gate and choose "the next person to arrive" each time
or choose people randomly from a list and then go and find them!
or you could choose every 5th person

Your results will hopefully be nearly as good as if you asked everyone.

If you choose a person and they do not want to answer, record "no answer" on the survey
form and mention how many people did not answer in your report.

After completing a sampling survey you can use the information to make a prediction as to how
the rest of the population might respond.

And your results are better when you ask more people.

Example: nationwide opinion polls survey up to 2,000 people, and the results are nearly as good
(within about 1%) as asking everyone.

Step Three: Tally the Results


Now you have finished asking questions it is time to tally the results.

By "tally" I mean add up. This usually involves lots of paperwork and computer work
(spreadsheets are useful!)
Example: For "favorite colors of my class" you can simply write tally marks like this (every fifth
mark crosses the previous 4 marks, so you can easily see groups of 5):

Step Four: Presenting the Results


Now you have your results, you will want to show them to other people in the best possible way.

We have written a special page called Showing the Results of a Survey, but here is a quick
summary:

Tables

Sometimes, you can simply report the information in a table.

A table is a very simple way to show others the results. A table should have a title, so those
looking at it understand what results the table shows:

Table: The Favorite Colors of My Class

Yellow Red Blue Green Pink

4 5 6 1 4

Statistics

You can also summarize the results using statistics, such as mean or standard deviation

Example: you have lots of information about how long it takes people to get to school but it may
be simpler just to present a summary such as:

Shortest Journey: 3 minutes


Average Journey: 22 minutes
Longest Journey: 58 minutes

Graphs

But nothing makes a report look better than a nice graph or chart.

Use Data Graphs (Bar, Line and Pie) to make them.


Example Survey Question: What is your favorite color?

Have fun asking questions!!!!!

Survey Questions
How to make a good Questionnaire!

The first question is one you should ask yourself:

"What do I hope to learn from asking the questions?"

This defines your objectives: what you want to achieve by the end of
the survey.

Example: you want to clean up the local river. You feel that with some help and some money you
could make it really beautiful again.

You want to survey your local community to find out:

Are other people also worried about the river.


Are they willing to donate their time or money to help.

Questions
Now you know why you are doing a survey, start writing down the questions you will ask!

Just write down any questions you think may be useful. Don't worry about quality at this stage,
we will improve your list of questions later.

Example: Questions you could ask for the river survey:

Does pollution worry you?


Do you ever go down to the river?
Can you spare some money to help the river?
Have you noticed the pollution in the river?
Are you happy to volunteer for river cleanup?
When would you be available to help?
How should we clean up the river?
etc...

You can also ask the person about themselves (not too personal!), such as age group, male or
female, etc, so that you know the kind of people that you have been surveying.

Your Turn: Go ahead and write down the questions for your own survey!

Types of Questions
A survey question can be:

Open-ended (the person can answer in any way they want), or


Closed-ended (the person chooses from one of several options)

Closed ended questions are much easier to total up later on, but may stop people giving an
answer they really want.

Example: "What is your favorite color?"

Open-ended: Someone may answer "dark fuchsia", in which case you will need to have a
category "dark fuchsia" in your results.

Closed-ended: With a choice of only 12 colors your work is easier, but they may not be able to
pick their exact favorite color.

Look at each of your questions and decide if they should be open-ended or


closed ended (take the opportunity to rewrite any questions, too)

Example: "What do you think is the best way to clean up the river?"

Make it Open-ended: the answers won't be easy to put in a table or graph, but you may get
some good ideas, and there may be some good quotes for your report.
Example: "How often do you visit the river?"

Make it Closed-ended with the following options:

Nearly every day


At least 5 times a year
1 to 4 times a year
Almost never

You can present this data in a neat bar graph.

Question Sequence
It is important that the questions don't "lead" people to the answer

Example: people may say "yes" to donate money if you ask the questions this way

Do you love nature?


Will you donate money to help the river?

But probably will say "no" if you ask the questions this way:

Is lack of money a problem for you?


Will you donate money to help the river?

To avoid this kind of thing, try to have your questions go:

from the least sensitive to the most sensitive


from the more general to the more specific
from questions about facts to questions about opinions

Go through your questions and put them in the best sequence possible

Example: I will ask people how often they visit the river (a fact) before I ask them what they feel
about pollution (an opinion)

I will ask people their general feelings about the environment before I ask them their feelings
about the river.
Neutral Questions
Your questions should also be neutral ... allowing the person to
think their own thoughts about the question.

The question "Do you love nature?" (in the example above) is a bad question as it almost
forces the person to say "Yes, of course."

Try changing the words to be more neutral, for example:

Example: "How important is the natural environment to you?"

Not Important
Some Importance
Very Important

But you can also make statements and see if people agree:

Make sure each question is neutral.

Possible Answers
For each "closed-ended" question try to think:

What are the possible answers to this question?

Make sure you have most of the common answer available.

If you are not sure what people might answer, you could always try a small open ended survey
(maybe ask your friends or people in the street) to find common answers.

Trick: try to avoid neutral answers (such as "don't care") because people may choose them so
they don't have to think about the answer!

It is also helpful to have an other category in case none of the choices are satisfactory for the
person answering the question.

Example: What is your favorite color?

Red, blue, green, yellow, purple, black, brown, orange, other

SCALED ANSWERS

Sometimes you could have a scale on which they can rate their feelings about the question.

Have "opposite" words at either end and a scale in between like this:

Examples:

The river is ...

Polluted :_____:_____:_____:_____:_____: Clean

Cleaning up the river is ...

Easy :_____:_____:_____:_____:_____: Difficult


RATED ITEMS

For this type of answer the person gets to rate or rank each option.

Don't have too many items though, as that makes it too hard to answer.

Example: Please rank the following activities from 1 to 5, putting 1 next to your favorite through
to 5 for your least favorite.

___ Fishing
___ Football
___ Golf
___ Shopping
___ Sleeping

NUMBER ANSWERS

You can also just ask for a number

Example: "How many times did you visit the river during the past year?"

____ times

Look at each "closed-end" question and choose the best answer options.

How Will I Gather the Answers?


Try to make life easier by thinking how you will gather the
answersbefore you ask the questions

It is important to make the process simple, for both yourself and


those responding.
The Questionnaire

You will want a neat form that makes it easy to answer


the questions AND easy to total up the answers later
on.

Type your questions and answer options into a word processor or spreadsheet,
and format it neatly.

Remember to leave plenty of space for open-ended questions.

How Will I Show the Results?


Go over each of the questions and think how you want the answers to go into your report:

in a table,
a bar graph,
a pie chart,
or just explained in words.

Make sure each question is set up so you can present the answers in your
chosen style.

Example: you decide to have six options for "How many times do you visit the river" so the bar
graph looks best.

Test It Out
You should test your questionnaire on a few people.

was each question clear and easy to understand?


were they happy with the options?

It is also a good idea to time how long it takes so you can tell people "this survey only
takes __ minutes" (put in your time). Use the Stopwatch.

Try the questionnaire on some friends.

Take notes of any difficulties your friends have with the questionnaire, and see
what you can do to improve it.

Your Original Objective


Lastly, look back at your original objectives for this survey ...

will the questions really help you find out what you want to know?
are there some questions you can remove? (smaller surveys are easier!)

This is your last chance to make sure your questionnaire is a good one!

You Are Done!


Now you have your questions as perfect as you can get them ..

... go out and ask them!

Showing the Results of a Survey


So you have just Conducted a Survey and want to show
your results in the best possible way?
Here are some suggestions:
Tables

Sometimes, you can simply report the information in a table.

A table is a very simple way to show others the results. A table should have a title, so those
looking at it understand what it shows:

Table: The Favorite Colors of My Class

Yellow Red Blue Green Pink

4 5 6 1 4

Statistics

You can also summarize the results using statistics, such as Mean, Median, Mode, Standard
Deviation and Quartiles

Example: you have lots of information about how long it takes people to get to school but it may
be simpler just to present a summary such as:

Shortest Journey: 3 minutes


Average Journey: 22 minutes
Longest Journey: 58 minutes

Graphs

But nothing makes a report look better than a nice graph or chart

There are many different types of graphs. Three of the most common are:

Line Graph - shows information that is somehow connected (such as change over
time)

Bar Graph shows relative sizes of different results:

Pie Chart - shows sizes as part of a whole (good for showing percentages).
You can create graphs like those using our Data Graphs (Bar, Line and Pie) page

People's Comments

If people have given their opinions or comments in the survey, you can present the more
interesting ones:

Example: In response to the question "How can we best clean up the river?" we received these
interesting replies:

"The government has a special fund for this"


"The local gardening group has seedlings you could plant"

Report
Put it all together into a report, with a nice introduction, and conclusions at the end, and you are
done!

Accuracy and Precision


They mean slightly different things!

Accuracy
Accuracy is how close a measured value is to the actual (true) value.

Precision
Precision is how close the measured values are to each other.
Examples of Accuracy and Precision:

High Accuracy Low Accuracy High Accuracy


Low Precision High Precision High Precision

So, if you are playing soccer and you always hit the left goal post instead of scoring, then you
are notaccurate, but you are precise!

How to Remember?
aCcurate is Correct (a bullseye).
pRecise is Repeating (hitting the same spot, but maybe not the correct spot)

Bias (don't let precision fool you!)


When we measure something several times and all values are close, they may all be wrong if
there is a "Bias"

Bias is a systematic (built-in) error which makes all measurements wrong by a certain amount.

Examples of Bias

The scales read "1 kg" when there is nothing on them


You always measure your height wearing shoes with thick soles.
A stopwatch that takes half a second to stop when clicked

In each case all measurements are wrong by the same amount. That is bias.

Degree of Accuracy
Accuracy depends on the instrument we are measuring with. But as a general rule:

The degree of accuracy is half a unit each side of the unit of measure

Examples:
When an instrument measures in "1"s
any value between 6 and 7 is measured as "7"
When an instrument measures in "2"s
any value between 7 and 9 is measured as "8"

(Notice that the arrow points to the same spot, but the measured values are different!
Read more at Errors in Measurement. )

Activity: Asking Questions


As you walk, or in the car or at home, look around and ask yourself questions about the world
around you.

Write down 5 of those questions that can be answered using numbers.

Examples:

How many trees in the park?


How long would it take to cut the grass along the street?
How much paint would it take to do the whole house?
Which Ice Cream sells the most?

You can use this form:

Question

4
5

Why Do This Activity?

It will improve your awareness and understanding of the world


It will increase your curiosity and
It will improve your "number-sense"

(Optional) Find or Estimate an Answer


See if you can find an answer to each question.

An estimate is fine (example: cutting grass along a street, estimate the area of grass and use
the internet to find how fast grass is usually cut).

Try To Do This Your Whole Life


It is a good habit to always ask questions about the world.

Activity: Improving Questions


First do the Asking Questions Activity where you are asked to write down 5 real-world questions
that can be answered using numbers.

Now we want to take those questions and make them better.

For each question:

Is it possible to answer?
Can we answer it exactly (or close enough)?
Do we know what each part of it means?
Ask "does that depend on ... "

Example: How many trees in the park?

When you start counting the trees you may find lots of tiny ones ... should they be counted?
Maybe the question could be changed to

How many trees taller than 2 meters are in the park?

Example: How long would it take to cut the grass along the street?

What are you using to cut the grass: A lawn mower? One you sit on?

Maybe the question could be changed to

How long would it take to cut the grass along the street using our lawn mower?

Also: what does "along the street" mean? Just the grass alongside the road? Maybe you need a
map!

Original Question:
1

Improved Question:

Original Question:
2

Improved Question:

Original Question:
3

Improved Question:

Original Question:
4

Improved Question:

Original Question:
5

Improved Question:
Finding a Central Value
When you have two or more numbers it is nice to find a value for the "center".

2 Numbers
With just 2 numbers the answer is easy: go half-way between.

Example: what is the central value for 3 and 7?

Answer: Half-way between, which is 5.

You can calculate it by adding 3 and 7 and then dividing the result by 2:

(3+7) / 2 = 10/2 = 5

3 or More Numbers
You can use the same idea when you have 3 or more numbers:

Example: what is the central value of 3, 7 and 8?

Answer: You calculate it by adding 3, 7 and 8 and then dividing the results by 3 (because there
are 3 numbers):

(3+7+8) / 3 = 18/3 = 6
Notice that we divide by 3 because we have 3 numbers ... very important!

The Mean
So far we have been calculating the Mean (or the Average):

Mean: Add up the numbers and divide by how many numbers.

But sometimes the Mean can let you down:

Example: Birthday Activities

Uncle Bob wants to know the average age at the party, to choose an activity.

There will be 6 kids aged 13, and also 5 babies aged 1.

Add up all the ages, and divide by 11 (because there are 11 numbers):

(13+13+13+13+13+13+1+1+1+1+1) / 11 = 7.5...

The mean age is about 7, so he gets a Jumping Castle!

The 13 year olds are embarrassed,


and the 1 year olds can't jump!

The Mean was accurate, but in this case it was not useful.

The Median
But you could also use the Median: simply list all numbers in order and choose the middle one:

Example: Birthday Activities (continued)

List the ages in order:

1, 1, 1, 1, 1, 13, 13, 13, 13, 13, 13

Choose the middle number:


1, 1, 1, 1, 1, 13, 13, 13, 13, 13, 13

The Median age is 13 ... so let's have a Disco!

Sometimes there are two middle numbers. Just average them:

Example: What is the Median of 3, 4, 7, 9, 12, 15

There are two numbers in the middle:

3, 4, 7, 9, 12, 15

So we average them:

(7+9) / 2 = 16/2 = 8

The Median is 8

The Mode
The Mode is the value that occurs most often:

Example: Birthday Activities (continued)

Group the numbers so we can count them:

1, 1, 1, 1, 1, 13, 13, 13, 13, 13, 13

"13" occurs 6 times, "1" occurs only 5 times, so the mode is 13.

How to remember? Think "mode is most"

But Mode can be tricky, there can sometimes be more than one Mode.

Example: What is the Mode of 3, 4, 4, 5, 6, 6, 7

Well ... 4 occurs twice but 6 also occurs twice.

So both 4 and 6 are modes.

When there are two modes it is called "bimodal", when there are three or more modes we call it
"multimodal".
Outliers
Outliersare values that "lie outside" the other values.

They can change the mean a lot, so we can either not use them (and say so) or use the median
or mode instead.

Example: 3, 4, 4, 5 and 104

Mean: Add them up, and divide by 5 (as there are 5 numbers):

(3+4+4+5+104) / 5 = 24

24 does not represent those numbers well at all!

Without the 104 the mean is:

(3+4+4+5) / 4 = 4

But please tell people you are not including the outlier.

Median: They are in order, so just choose the middle number, which is 4:

3, 4, 4, 5, 104

Mode: 4 occurs most often, so the Mode is 4

3, 4, 4, 5, 104

Conclusion
There are other ways of measuring central values, but Mean, Median and Mode are the most
common.

Use the one that best suits your data. Or better still, use all three!
How to Find the Mean
The mean is the average of the numbers.

It is easy to calculate: add up all the numbers, then divide by how many numbers there are.

In other words it is the sum divided by the count.

Example 1: What is the Mean of these numbers?

6, 11, 7

Add the numbers: 6 + 11 + 7 = 24


Divide by how many numbers (there are 3 numbers): 24 / 3 = 8

The Mean is 8

Why Does This Work?


It is because 6, 11 and 7 added together is the same as 3 lots of 8:

It is like you are "flattening out" the numbers

Example 2: Look at these numbers:

3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29

The sum of these numbers is 330

There are fifteen numbers.

The mean is equal to 330 / 15 = 22

The mean of the above numbers is 22


Negative Numbers
How do you handle negative numbers? Adding a negative number is the same as subtracting the
number (without the negative). For example 3 + (2) = 32 = 1.

Knowing this, let us try an example:

Example 3: Find the mean of these numbers:

3, 7, 5, 13, 2

The sum of these numbers is 3 7 + 5 + 13 2 = 12


There are 5 numbers.
The mean is equal to 12 5 = 2.4

The mean of the above numbers is 2.4

Here is how to do it one line:

3 7 + 5 + 13 2 12
Mean = = = 2.4
5 5

Try it yourself!

Now have a look at The Mean Machine.

Advanced Topic: the mean we have just looked at is also called the "Arithmetic Mean", because
there are other means such as the Geometric Mean.

How to Find the Median Value


It's the middle of a sorted list of numbers.

Median Value
The Median is the "middle" of a sorted list of numbers.

How to Find the Median Value


To find the Median, place the numbers in value order and find the middle.

Example: find the Median of 12, 3 and 5

Put them in order:

3, 5, 12

The middle is 5, so the median is 5.

Example:

3, 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29

When we put those numbers in order we have:

3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56

There are fifteen numbers. Our middle is the eighth number:

3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56

The median value of this set of numbers is 23.

(It doesn't matter that some numbers are the same in the list.)
Two Numbers in the Middle
BUT, with an even amount of numbers things are slightly different.

In that case we find the middle pair of numbers, and then find the value that is half
way between them. This is easily done by adding them together and dividing by two.

Example:

3, 13, 7, 5, 21, 23, 23, 40, 23, 14, 12, 56, 23, 29

When we put those numbers in order we have:

3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56

There are now fourteen numbers and so we don't have just one middle number, we have a pair
of middle numbers:

3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56

In this example the middle numbers are 21 and 23.

To find the value halfway between them, add them together and divide by 2:

21 + 23 = 44
then 44 2 = 22

So the Median in this example is 22.

(Note that 22 was not in the list of numbers ... but that is OK because half the numbers in the
list are less, and half the numbers are greater.)
Your Turn
Sort the list (drag them left or right), find the median, type in your answer.

2015 MathsIsFun.com v 0.8

Where is the Middle?


A quick way to find the middle: count how many numbers, add 1 then divide by 2

Example: There are 45 numbers

45 plus 1 is 46, then divide by 2 and we get 23

So the median is the 23rd number in the sorted list.

Example: There are 66 numbers

66 plus 1 is 67, then divide by 2 and we get 33.5

33 and a half? That means that the 33rd and 34th numbers in the sorted list are the two
middle numbers.

So to find the median: add the 33rd and 34th numbers together and divide by 2.

How to Find the Mode or Modal Value


The mode is simply the number which appears most often.

Finding the Mode


To find the mode, or modal value, first put the numbers in order, then count how many of each
number. A number that appears most often is the mode.

Example:

3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
In order these numbers are:

3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23, 29, 39, 40, 56

This makes it easy to see which numbers appear most often.

In this case the mode is 23.

Another Example: {19, 8, 29, 35, 19, 28, 15}

Arrange them in order: {8, 15, 19, 19, 28, 29, 35}

19 appears twice, all the rest appear only once, so 19 is the mode.

How to remember? Think "mode is most"

More Than One Mode


We can have more than one mode.

Example: {1, 3, 3, 3, 4, 4, 6, 6, 6, 9}

3 appears three times, as does 6.

So there are two modes: at 3 and 6

Having two modes is called "bimodal".

Having more than two modes is called "multimodal".

Grouping
When all values appear the same number of times the idea of a mode is not useful. But we could
group them to see if one group has more than the others.

Example: {4, 7, 11, 16, 20, 22, 25, 26, 33}

Each value occurs once, so let us try to group them.

We can try groups of 10:

0-9: 2 values (4 and 7)


10-19: 2 values (11 and 16)
20-29: 4 values (20, 22, 25 and 26)
30-39: 1 value (33)

In groups of 10, the "20s" appear most often, so we could choose 25 (the middle of the 20s
group) as the mode.

You could use different groupings and get a different answer!

Grouping also helps to find what the typical values are when the real world messes things up!

Example: How long to fill a pallet?

Philip recorded how long it takes to fill a pallet in minutes:

{35, 36, 32, 42, 58, 56, 35, 39, 46, 47, 34, 37}

It takes longer if there is break time or lunch so an average is not very useful.

But grouping by 5s gives:

30-34: 2
35-39: 5
40-44: 1
45-49: 2
50-54: 0
54-59: 2

"35-39" appear most often, so we can say it normally takes about 37 minutes to fill a pallet.

Activity: Averages Brain-Teaser


Here is a little puzzle about averages . Is it right?

Who is Better at Kicking Goals?

At practice last week:

You scored 2 of 10 shots at goal


Sam scored 3 of 10 shots

Sam is better!

This week:

You scored 53 of 100 shots


Sam scored 6 of 10 shots

Sam is still better.

But let's add up the scores for BOTH weeks:

You scored 55 of 110 shots: that is 50%


Sam scored 9 of 20 shots: that is only 45%

Hang on! YOU are better!

Sam was better last week and this week ... but you are better over both weeks?

Please explain.
...

Maybe make a table with all the data and do the calculations yourself

Sam You

Last Week

This Week

Both Weeks

....

... read on after you have thought about it ...

...

It is All True
Because you had SO MANY shots at goal this week, and did well at them, you lifted your two-
week average above Sam's.

At practice last week:

You scored 2 of 10 (20%)


Sam scored 3 of 10 (30%)

This week:

You scored 53 of 100 (53%)

Sam scored 6 of 10 (60%)

For BOTH weeks:

You scored 55 of 110 (50%)


Sam scored 9 of 20 (45%)

To be fair, we should really compare the averages when your and Sam's attempts at goal is
roughly the same.

If Sam had attempted 100 shots this week, he may have scored 60 out of 100, and his two-week
average would have been about 57%, better than you.

So be careful when comparing two sets of data with widely different counts.

The Mean from a Frequency Table


It is easy to calculate the Mean :

Add up all the numbers,


then divide by how many numbers there are.
Example 1: What is the Mean of these numbers?

6, 11, 7

Add the numbers: 6 + 11 + 7 = 24


Divide by how many numbers (there are 3 numbers): 24 3 = 8

The Mean is 8

But sometimes we don't have a simple list of numbers, it might be a frequency table like this
(the "frequency" says how often they occur):

Score Frequency

1 2

2 5

3 4

4 2

5 1

(it says that score 1 occurred 2 times, score 2 occurred 5 times, etc)

We could list all the numbers like this:

Mean = 1+1 + 2+2+2+2+2 + 3+3+3+3 + 4+4 + 5(how many numbers)

But rather than do lots of adds (like 3+3+3+3) it is easier to use multiplication:

Mean = 21 + 52 + 43 + 24 + 15(how many numbers)

And rather than count how many numbers there are, we can add up the frequencies:

Mean = 21 + 52 + 43 + 24 + 152 + 5 + 4 + 2 + 1

And now we calculate:

Mean = 2 + 10 + 12 + 8 + 514
= 3714 = 2.64...

And that is how to calculate the mean from a frequency table!


Here is another example:

Example: Parking Spaces per House in Hampton Street

Isabella went up and down the street to find out how many parking spaces each house has. Here
are her results:

Parking
Frequency
Spaces

1 15

2 27

3 8

4 5

What is the mean number of Parking Spaces?

Answer:

15 1 + 27 2 + 8 3 + 5 4
Mean =
15+27+8+5

15 + 54 + 24 + 20
=
55

= 2.05...

The Mean is 2.05 (to 2 decimal places)

(much easier than adding all numbers separately!)

Notation
Now you know how to do it, let's do that last example again, but using formulas.
This symbol (called Sigma) means "sum up"
(read more at Sigma Notation)

So we can say "add up all frequencies" this way:

(where f is frequency)

And we can use it like this:

Likewise we can add up "frequency times score" this way:

(where f is frequency and x is the matching score)

And the formula for calculating the mean from a frequency table is:

The x with the bar on top says "the mean of x"

So now we are ready to do our example above, but with correct notation.

Example: Calculate the Mean of this Frequency Table


x f

1 15

2 27

3 8

4 5

And here it is:


There you go! You can use sigma notation.

Calculate in the Table


It is often better to do the calculations in the table.

Example: (continued)

From the previous example, calculate f x in the right-hand column and then do totals:

x f fx

1 15 15

2 27 54

3 8 24

4 5 20

TOTALS: 55 113

And the Mean is then easy:

Mean = 113 / 55 = 2.05...

Mean, Median and Mode


from Grouped Frequencies

Explained with Three Examples


The Race and the Naughty Puppy
This starts with some raw data (not a grouped frequency yet) ...

Alex timed 21 people in the sprint race, to the nearest second:

59, 65, 61, 62, 53, 55, 60, 70, 64, 56, 58, 58, 62, 62, 68, 65, 56, 59, 68, 61, 67

To find the Mean Alex adds up all the numbers, then divides by how many numbers:

59+65+61+62+53+55+60+70+64+56+58+58+62+62+68+65+56+59+68+61+67
Mean =
21

= 61.38095...

To find the Median Alex places the numbers in value order and finds the middle number.

In this case the median is the 11th number:

53, 55, 56, 56, 58, 58, 59, 59, 60, 61, 61 , 62, 62, 62, 64, 65, 65, 67, 68, 68, 70

Median = 61

To find the Mode , or modal value, Alex places the numbers in value order then counts how
many of each number. The Mode is the number which appears most often (there can be more
than one mode):
53, 55, 56, 56, 58, 58, 59, 59, 60, 61, 61, 62, 62, 62 , 64, 65, 65, 67, 68, 68, 70

62 appears three times, more often than the other values, so Mode = 62

Grouped Frequency Table


Alex then makes a Grouped Frequency Table :

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

So 2 runners took between 51 and 55 seconds, 7 took between 56 and 60 seconds, etc

Oh No!
Suddenly all the original data gets lost (naughty pup!)

Only the Grouped Frequency Table survived ...

... can we help Alex calculate the Mean, Median and Mode from just that table?

The answer is ... no we can't. Not accurately anyway. But, we can make estimates.
Estimating the Mean from Grouped Data
So all we have left is:

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

The groups (51-55, 56-60, etc), also called class intervals, are of width 5
The midpoints are in the middle of each class: 53, 58, 63 and 68

We can estimate the Mean by using the midpoints.

So, how does this work?

Think about the 7 runners in the group 56 - 60: all we know is that they ran somewhere
between 56 and 60 seconds:

Maybe all seven of them did 56 seconds,


Maybe all seven of them did 60 seconds,
But it is more likely that there is a spread of numbers: some at 56, some at 57, etc

So we take an average and assume that all seven of them took 58 seconds.

Let's now make the table using midpoints:

Midpoint Frequency
53 2

58 7

63 8

68 4

Our thinking is: "2 people took 53 sec, 7 people took 58 sec, 8 people took 63 sec and 3 took 68
sec". In other words we imagine the data looks like this:

53, 53, 58, 58, 58, 58, 58, 58, 58, 63, 63, 63, 63, 63, 63, 63, 63, 68, 68, 68, 68

Then we add them all up and divide by 21. The quick way to do it is to multiply each midpoint by
each frequency:

Midpoint Frequency Midpoint


x f Frequency
fx

53 2 106

58 7 406

63 8 504

68 4 272

Totals: 21 1288

And then our estimate of the mean time to complete the race is:

Estimated Mean = 1288 = 61.333...


21

Very close to the exact answer we got earlier.

Estimating the Median from Grouped Data


Let's look at our data again:

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

The median is the middle value, which in our case is the 11th one, which is in the 61 - 65 group:

We can say "the median group is 61 - 65"

But if we want an estimated Median value we need to look more closely at the 61 - 65 group.

We call it "61 - 65", but it really includes values from 60.5 up to (but not including) 65.5.

Why? Well, the values are in whole seconds, so a real time of 60.5 is measured as 61. Likewise
65.4 is measured as 65.

At 60.5 we already have 9 runners, and by the next boundary at 65.5 we have 17 runners. By
drawing a straight line in between we can pick out where the median frequency of n/2 runners
is:
And this handy formula does the calculation:

(n/2) B
Estimated Median = L + w
G

where:

L is the lower class boundary of the group containing the median


n is the total number of values
B is the cumulative frequency of the groups before the median group
G is the frequency of the median group
w is the group width

For our example:

L = 60.5
n = 21
B=2+7=9
G=8
w=5

Estimated Median = 60.5 + (21/2) 98 5

= 60.5 + 0.9375

= 61.4375
Estimating the Mode from Grouped Data
Again, looking at our data:

Seconds Frequency

51 - 55 2

56 - 60 7

61 - 65 8

66 - 70 4

We can easily find the modal group (the group with the highest frequency), which is 61 - 65

We can say "the modal group is 61 - 65"

But the actual Mode may not even be in that group! Or there may be more than one mode.
Without the raw data we don't really know.

But, we can estimate the Mode using the following formula:

fm fm-1
Estimated Mode = L + w
(fm fm-1) + (fm fm+1)

where:

L is the lower class boundary of the modal group


fm-1 is the frequency of the group before the modal group
fm is the frequency of the modal group
fm+1 is the frequency of the group after the modal group
w is the group width

In this example:

L = 60.5
fm-1 = 7
fm = 8
fm+1 = 4
w=5

87
Estimated Mode = 60.5 + 5
(8 7) + (8 4)

= 60.5 + (1/5) 5

= 61.5

Our final result is:

Estimated Mean: 61.333...


Estimated Median: 61.4375
Estimated Mode: 61.5

(Compare that with the true Mean, Median and Mode of 61.38..., 61 and 62 that we got at the
very start.)

And that is how it is done.

Now let us look at two more examples, and get some more practice along the way!

Baby Carrots Example

Example: You grew fifty baby carrots using special soil. You dig them up and measure
their lengths (to the nearest mm) and group the results:

Length (mm) Frequency


150 - 154 5

155 - 159 2

160 - 164 6

165 - 169 8

170 - 174 9

175 - 179 11

180 - 184 6

185 - 189 3

Mean

Midpoint Frequency
Length (mm)
x f fx

150 - 154 152 5 760

155 - 159 157 2 314

160 - 164 162 6 972

165 - 169 167 8 1336

170 - 174 172 9 1548

175 - 179 177 11 1947

180 - 184 182 6 1092

185 - 189 187 3 561


Totals: 50 8530

8530
Estimated Mean = = 170.6 mm
50

Median

The Median is the mean of the 25th and the 26th length, so is in the 170 - 174 group:

L = 169.5 (the lower class boundary of the 170 - 174 group)


n = 50
B = 5 + 2 + 6 + 8 = 21
G=9
w=5

(50/2) 21
Estimated Median = 169.5 + 5
9

= 169.5 + 2.22...

= 171.7 mm (to 1 decimal)

Mode

The Modal group is the one with the highest frequency, which is 175 - 179:

L = 174.5 (the lower class boundary of the 175 - 179 group)


fm-1 = 9
fm = 11
fm+1 = 6
w=5

Estimated Mode = 174.5 + 11 9 5


(11 9) + (11 6)

= 174.5 + 1.42...

= 175.9 mm (to 1 decimal)

Age Example
Age is a special case.

When we say "Sarah is 17" she stays "17" up until her eighteenth birthday.
She might be 17 years and 364 days old and still be called "17".

This changes the midpoints and class boundaries.

Example: The ages of the 112 people who live on a tropical island are grouped as
follows:

Age Number

0-9 20

10 - 19 21

20 - 29 23

30 - 39 16

40 - 49 11

50 - 59 10
60 - 69 7

70 - 79 3

80 - 89 1

A child in the first group 0 - 9 could be almost 10 years old. So the midpoint for this group
is 5 not 4.5

The midpoints are 5, 15, 25, 35, 45, 55, 65, 75 and 85

Similarly, in the calculations of Median and Mode, we will use the class boundaries 0, 10, 20 etc

Mean

Age Midpoint Number


x f fx

0-9 5 20 100

10 - 19 15 21 315

20 - 29 25 23 575

30 - 39 35 16 560

40 - 49 45 11 495

50 - 59 55 10 550

60 - 69 65 7 455

70 - 79 75 3 225

80 - 89 85 1 85

Totals: 112 3360

Estimated Mean = 3360 = 30


112

Median

The Median is the mean of the ages of the 56th and the 57th people, so is in the 20 - 29 group:

L = 20 (the lower class boundary of the class interval containing the median)
n = 112
B = 20 + 21 = 41
G = 23
w = 10

(112/2) 41
Estimated Median = 20 + 10
23

= 20 + 6.52...

= 26.5 (to 1 decimal)

Mode

The Modal group is the one with the highest frequency, which is 20 - 29:

L = 20 (the lower class boundary of the modal class)


fm-1 = 21
fm = 23
fm+1 = 16
w = 10

23 21
Estimated Mode = 20 + 10
(23 21) + (23 16)

= 20 + 2.22...

= 22.2 (to 1 decimal)


Summary
For grouped data, we cannot find the exact Mean, Median and Mode, we can only
give estimates.

To estimate the Mean use the midpoints of the class intervals.

(n/2) B
Estimated Median = L + w
G

where:

L is the lower class boundary of the group containing the median


n is the total number of data
B is the cumulative frequency of the groups before the median group
G is the frequency of the median group
w is the group width

fm fm-1
Estimated Mode = L + w
(fm fm-1) + (fm fm+1)

where:

L is the lower class boundary of the modal group


fm-1 is the frequency of the group before the modal group
fm is the frequency of the modal group
fm+1 is the frequency of the group after the modal group
w is the group width

Weighted Mean
Also called Weighted Average

A mean where some values contribute more than others.

Mean
When we do a simple mean (or average), we give equal weight to each number.

Here is the mean of 1, 2, 3 and 4:

Add up the numbers, divide by how many numbers:

1+2+3+4 10
Mean = = = 2.5
4 4

Weights
We could think that each of those numbers has a "weight" of (because there are 4 numbers):

Mean = 1 + 2 + 3 + 4
= 0.25 + 0.5 + 0.75 + 1 = 2.5

Same answer.

Now let's change the weight of 3 to 0.7 , and the weights of the other numbers to 0.1 so the
total of the weights is still 1:

Mean = 0.1 1 + 0.1 2 + 0.7 3 + 0.1 4


= 0.1 + 0.2 + 2.1 + 0.4 = 2.8

This weighted mean is now a little higher ("pulled" there by the weight of 3).

When some values get more weight than others


the central point (the mean) can change:
Decisions
Weighted means can help with decisions where some things are more important than others:

Example: Sam wants to buy a new camera, and decides on the following rating
system:

Image Quality 50%


Battery Life 30%
Zoom Range 20%

The Sany camera gets 8 (out of 10) for Image Quality, 6 for Battery Life and 7 for Zoom Range

The Conan camera gets 9 for Image Quality, 4 for Battery Life and 6 for Zoom Range

Which camera is best?

Sany: 0.5 8 + 0.3 6 + 0.2 7 = 4 + 1.8 + 1.4 = 7.2

Conan: 0.5 9 + 0.3 4 + 0.2 6 = 4.5 + 1.2 + 1.2 = 6.9

Sam decides to buy the Sany.

What if the Weights Don't Add to 1?


When the weights don't add to 1, divide by the sum of weights.

Example: Alex usually works 7 days a week, but sometimes just 1, 2, or 5 days.
Alex worked:

2 weeks: 1 day each week


14 weeks: 2 days each week
8 weeks: 5 days each week
32 weeks: 7 days each week

What is the mean number of days Alex works per week?

Use "Weeks" as the weighting:

Weeks Days = 2 1 + 14 2 + 8 5 + 32 7
= 2 + 28 + 40 + 224 = 294

Also add up the weeks:

Weeks = 2 + 14 + 8 + 32 = 56

Divide:

294
Mean = = 5.25
56

It looks like this:

But it is often better to use a table to make sure you have all the numbers correct:

Example (continued):

Let's use:

w for the number of weeks (the weight)


x for days (the value we want the mean of)

Multiply w by x, sum up w and sum up wx:

Weight Days
w x wx

2 1 2

14 2 28

8 5 40

32 7 224

w = 56 wx = 294

Note: ( Sigma ) means "Sum Up"

Divide wx by x:

294
Mean = = 5.25
56

And that leads us to our formula:

wx
Weighted Mean =
w

In other words: multiply each weight w by its matching value x, sum that all up, and divide by
the sum of weights.

Summary
Weighted Mean: A mean where some values contribute more than others.

When the weights add to 1: just multiply each weight by the matching value and sum it
all up

Otherwise, multiply each weight w by its matching value x, sum that all up, and divide
by the sum of weights:
wx
Weighted Mean =
w

The Range (Statistics)


The Range is the difference between the lowest and highest values.

Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9.

So the range is 9 3 = 6.

It is that simple!

But perhaps too simple ...

The Range Can Be Misleading


The range can sometimes be misleading when there are extremely high or low values.

Example: In {8, 11, 5, 9, 7, 6, 3616}:

the lowest value is 5,


and the highest is 3616,

So the range is 3616-5 = 3611.

The single value of 3616 makes the range large, but most values are around 10.

So we may be better off using Interquartile Range or Standard Deviation .


Range of a Function
Range can also mean all the output values of a function, see Domain, Range and
Codomain .

Quartiles
Quartiles are the values that divide a list of numbers into quarters:

Put the list of numbers in order


Then cut the list into four equal parts
The Quartiles are at the "cuts"

Like this:

Example: 5, 7, 4, 4, 6, 2, 8

Put them in order: 2, 4, 4, 5, 6, 7, 8

Cut the list into quarters:

And the result is:

Quartile 1 (Q1) = 4
Quartile 2 (Q2), which is also the Median, = 5
Quartile 3 (Q3) = 7

Sometimes a "cut" is between two numbers ... the Quartile is the average of the two numbers.

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are already in order


Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5

And the result is:

Quartile 1 (Q1) = 3
Quartile 2 (Q2) = 5.5
Quartile 3 (Q3) = 7

Interquartile Range
The "Interquartile Range" is from Q1 to Q3:

To calculate it just subtract Quartile 1 from Quartile 3, like this:

Example:

The Interquartile Range is:

Q3 Q1 = 7 4 = 3

Box and Whisker Plot


We can show all the important values in a "Box and Whisker Plot", like this:

A final example covering everything:


Example: Box and Whisker Plot and Interquartile Range for

4, 17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11

Put them in order:

3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18

Cut it into quarters:

3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18

In this case all the quartiles are between numbers:

Quartile 1 (Q1) = (4+4)/2 = 4


Quartile 2 (Q2) = (10+11)/2 = 10.5
Quartile 3 (Q3) = (14+16)/2 = 15

Also:

The Lowest Value is 3,


The Highest Value is 18

So now we have enough data for the Box and Whisker Plot:

And the Interquartile Range is:

Q3 Q1 = 15 4 = 11

Percentiles
Percentile: the value below which a percentage of data falls.

Example: You are the fourth tallest person in a group of 20

80% of people are shorter than you:


That means you are at the 80th percentile.

If your height is 1.85m then "1.85m" is the 80th percentile height in that group.

In Order
Have the data in order, so you know which values are above and below.

To calculate percentiles of height: have the data in height order (sorted by height).
To calculate percentiles of age: have the data in age order.
And so on.

Grouped Data
When the data is grouped:

Add up all percentages below the score,


plus half the percentage at the score.

Example: You Score a B!

In the test 12% got D, 50% got C, 30% got B and 8% got A

You got a B, so add up

all the 12% that got D,


all the 50% that got C,
half of the 30% that got B,

for a total percentile of 12% + 50% + 15% = 77%

In other words you did "as well or better than 77% of the class"

(Why take half of B? Because you shouldn't imagine you got the "Best B", or the "Worst B", just
an average B.)
Deciles
Deciles are similar to Percentiles (sounds like decimal and percentile together), as they split the
data into 10% groups:

The 1st decile is the 10th percentile (the value that divides the data so that 10% is below it)
The 2nd decile is the 20th percentile (the value that divides the data so that 20% is below it)
etc!
Example: (continued)

You are at the 8th decile (the 80th percentile).

Quartiles
Another related idea is Quartiles , which splits the data into quarters:

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

The numbers are in order. Cut the list into quarters:

In this case Quartile 2 is half way between 5 and 6:

Q2 = (5+6)/2 = 5.5

And the result is:

Quartile 1 (Q1) = 3
Quartile 2 (Q2) = 5.5
Quartile 3 (Q3) = 7

The Quartiles also divide the data into divisions of 25%, so:

Quartile 1 (Q1) can be called the 25th percentile


Quartile 2 (Q2) can be called the 50th percentile
Quartile 3 (Q3) can be called the 75th percentile
Example: (continued)

For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:

The 25th percentile = 3


The 50th percentile = 5.5
The 75th percentile = 7

Estimating Percentiles
We can estimate percentiles from a line graph .

Example: Shopping

A total of 10,000 people visited the shopping mall over 12 hours:

Time (hours) People

0 0

2 350

4 1100

6 2400

8 6500

10 8850

12 10,000

a) Estimate the 30th percentile (when 30% of the visitors had arrived).
b) Estimate what percentile of visitors had arrived after 11 hours.

First draw a line graph of the data: plot the points and join them with a smooth curve:

a) The 30th percentile occurs when the visits reach 3,000.

Draw a line horizontally across from 3,000 until you hit the curve, then draw a line vertically
downwards to read off the time on the horizontal axis:

So the 30th percentile occurs after about 6.5 hours.

b) To estimate the percentile of visits after 11 hours: draw a line vertically up from 11 until you
hit the curve, then draw a line horizontally across to read off the population on the horizontal
axis:

So the visits at 11 hours were about 9,500, which is the 95th percentile.

Mean Deviation
How far, on average, all values are from the middle.

Calculating It
Find the mean of all values ... use it to work out distances ... then find the mean of those
distances!

In three steps:

1. Find the mean of all values


2. Find the distance of each value from that mean (subtract the mean from each value, ignore
minus signs)
3. Then find the mean of those distances

Like this:

Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16

Step 1: Find the mean:

3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 72
Mean = = =9
8 8

Step 2: Find the distance of each value from that mean:

Value Distance from 9

3 6

6 3

6 3

7 2

8 1

11 2

15 6

16 7

Which looks like this:

(No minus signs!)


Step 3. Find the mean of those distances:

6+3+3+2+1+2+6+7 30
Mean Deviation = = = 3.75
8 8

So, the mean = 9, and the mean deviation = 3.75

It tells us how far, on average, all values are from the middle.

In that example the values are, on average, 3.75 away from the middle.

For deviation just think distance

Formula
The formula is:

Mean Deviation = |x |N
is Sigma, which means to sum up
|| (the vertical bars) mean Absolute Value, basically to ignore minus signs
x is each value (such as 3 or 16)
is the mean (in our example = 9)
N is the number of values (in our example N = 8)

Let's look at those in more detail:

Absolute Deviation
Each distance we calculate is called an Absolute Deviation, because it is the Absolute Value of
the deviation (how far from the mean).

To show "Absolute Value" we put "|" marks either side like this:

|-3| = 3
For any value x:

Absolute Deviation = |x - |

From our example, the value 16 has Absolute Deviation = |x - | = |16 - 9| = |7| = 7

And now let's add them all up ...

Sigma
The symbol for "Sum Up" is (called Sigma Notation ), so we have:

Sum of Absolute Deviations = |x - |

Divide by how many values N and we have:

Mean Deviation = |x |N

Let's do our example again, using the proper symbols:

Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16

Step 1: Find the mean:

3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 72
= = =9
8 8

Step 2: Find the Absolute Deviations:

x |x - |

3 6

6 3
6 3

7 2

8 1

11 2

15 6

16 7

|x - | = 30

Step 3. Find the Mean Deviation:

|x - | 30
Mean Deviation = = = 3.75
N 8

Note: the mean deviation is sometimes called the Mean Absolute Deviation (MAD) because it is
the mean of the absolute deviations.

What Does It "Mean" ?


Mean Deviation tells us how far, on average, all values are from the middle.

Here is an example (using the same data as on the Standard Deviation page):

Example: You and your friends have just measured the heights of your dogs (in
millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

Step 1: Find the mean:

600 + 470 + 170 + 430 + 300 1970


= = = 394
5 5

Step 2: Find the Absolute Deviations:

x |x - |

600 206

470 76

170 224

430 36

300 94

|x - | = 636

Step 3. Find the Mean Deviation:


|x - | 636
Mean Deviation = = = 127.2
N 5

So, on average, the dogs' heights are 127.2 mm from the mean.

(Compare that with the Standard Deviation of 147 mm)

A Useful Check
The deviations on one side of the mean should equal the deviations on the other side.

From our first example:

Example: 3, 6, 6, 7, 8, 11, 15, 16

The deviations are:

6+3+3+2+1 = 2+6+7

15 = 15

Likewise:

Example: Dogs

Deviations left of mean: 224 + 94 = 318

Deviations right of mean: 206 + 76 + 36 = 318

If they are not equal ... you may have made a msitake!

Standard Deviation and Variance


Deviation just means how far from the normal

Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.

Its symbol is (the greek letter sigma)

The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"

Variance
The Variance is defined as:

The average of the squared differences from the Mean.

To calculate the variance follow these steps:

Work out the Mean (the simple average of the numbers)

Then for each number: subtract the Mean and square the result (the squared
difference).

Then work out the average of those squared differences. (Why Square?)

Example
You and your friends have just measured the heights of your dogs (in millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

Find out the Mean, the Variance, and the Standard Deviation.

Your first step is to find the Mean:

Answer:
Mean = 600 + 470 + 170 + 430 + 3005 = 19705 = 394

so the mean (average) height is 394 mm. Let's plot this on the chart:

Now we calculate each dog's difference from the Mean:

To calculate the Variance, take each difference, square it, and then average the result:
So the Variance is 21,704

And the Standard Deviation is just the square root of Variance, so:

Standard Deviation

= 21,704

= 147.32...

= 147 (to the nearest mm)

And the good thing about the Standard Deviation is that it is useful. Now we can show which
heights are within one Standard Deviation (147mm) of the Mean:

So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what
is extra large or extra small.

Rottweilers are tall dogs. And Dachshunds are a bit short ... but don't tell them!
Now try the Standard Deviation Calculator .

But ... there is a small change with Sample Data


Our example has been for a Population (the 5 dogs are the only dogs we are interested in).

But if the data is a Sample (a selection taken from a bigger Population), then the calculation
changes!

When you have "N" data values that are:

The Population: divide by N when calculating Variance (like we did)

A Sample: divide by N-1 when calculating Variance

All other calculations stay the same, including how we calculated the mean.

Example: if our 5 dogs are just a sample of a bigger population of dogs, we divide by 4 instead
of 5 like this:

Sample Variance = 108,520 / 4 = 27,130

Sample Standard Deviation = 27,130 = 164 (to the nearest mm)

Think of it as a "correction" when your data is only a sample.

Formulas
Here are the two formulas, explained at Standard Deviation Formulas if you want to know
more:

The "Population Standard Deviation":

The "Sample Standard Deviation":


Looks complicated, but the important change is to
divide by N-1 (instead of N) when calculating a Sample Variance.

*Footnote: Why square the differences?


If we just add up the differences from the mean ... the negatives cancel the positives:

4 + 4 4 44 = 0

So that won't work. How about we use absolute values ?

|4| + |4| + |4| + |4|4 = 4 + 4 + 4 + 44 = 4

That looks good (and is the Mean Deviation ), but what about this case:

|7| + |1| + |6| + |2|4 = 7 + 1 + 6 + 24 = 4

Oh No! It also gives a value of 4, Even though the differences are more spread out.

So let us try squaring each difference (and taking the square root at the end):

(4 2 ) (644) = 4
+ 42 + 42 + 424 =
(7 2 ) (904) = 4.74...
+ 12 + 62 + 224 =

That is nice! The Standard Deviation is bigger when the differences are more spread out ... just
what we want.

In fact this method is a similar idea to distance between points , just applied in a different way.

And it is easier to use algebra on squares and square roots than absolute values, which makes
the standard deviation easy to use in other areas of mathematics.

Return to Top

Standard Deviation Formulas


Deviation just means how far from the normal

Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.

You might like to read this simpler page on Standard Deviation first.

But here we explain the formulas.

The symbol for Standard Deviation is (the Greek letter sigma).

This is the formula for Standard Deviation:

Say what? Please explain!

OK. Let us explain it step by step.

Say we have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11.


To calculate the standard deviation of those numbers:

1. Work out the Mean (the simple average of the numbers)

2. Then for each number: subtract the Mean and square the result

3. Then work out the mean of those squared differences.

4. Take the square root of that and we are done!

The formula actually says all of that, and I will show you how.

The Formula Explained


First, let us have some example values to work on:

Example: Sam has 20 Rose Bushes.

The number of flowers on each bush is

9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

Work out the Standard Deviation.

Step 1. Work out the mean

In the formula above (the greek letter "mu") is the mean of all our values ...

Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

The mean is:

9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+420
= 14020 = 7
So:

=7

Step 2. Then for each number: subtract the Mean and square the result

This is the part of the formula that says:

So what is xi ? They are the individual x values 9, 2, 5, 4, 12, 7, etc...

In other words x1 = 9, x2 = 2, x3 = 5, etc.

So it says "for each value, subtract the mean and square the result", like this

Example (continued):

(9 - 7)2 = (2)2 = 4

(2 - 7)2 = (-5)2 = 25

(5 - 7)2 = (-2)2 = 4

(4 - 7)2 = (-3)2 = 9

(12 - 7)2 = (5)2 = 25

(7 - 7)2 = (0)2 = 0

(8 - 7)2 = (1)2 = 1

... etc ...

And we get these results:

4, 25, 4, 9, 25, 0, 1, 16, 4, 16, 0, 9, 25, 4, 9, 9, 4, 1, 4, 9

Step 3. Then work out the mean of those squared differences.


To work out the mean, add up all the values then divide by how many.

First add up all the values from the previous step.

But how do we say "add them all up" in mathematics? We use "Sigma":

The handy Sigma Notation says to sum up as many terms as we want:

Sigma Notation

We want to add up all the values from 1 to N, where N=20 in our case because there are 20
values:

Example (continued):

Which means: Sum all values from (x1-7)2 to (xN-7)2

We already calculated (x1-7)2=4 etc. in the previous step, so just sum them up:

= 4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178

But that isn't the mean yet, we need to divide by how many, which is simply done by
multiplying by "1/N":

Example (continued):

Mean of squared differences = (1/20) 178 = 8.9

(Note: this value is called the "Variance")


Step 4. Take the square root of that:

Example (concluded):

= (8.9) = 2.983...

DONE!

Sample Standard Deviation


But wait, there is more ...

... sometimes our data is only a sample of the whole population.

Example: Sam has 20 rose bushes, but only counted the flowers on 6 of them!

The "population" is all 20 rose bushes,

and the "sample" is the 6 bushes that Sam counted the flowers of.

Let us say Sam's flower counts are:

9, 2, 5, 4, 12, 7

We can still estimate the Standard Deviation.

But when we use the sample as an estimate of the whole population, the Standard Deviation
formula changes to this:
The formula for Sample Standard Deviation:

The important change is "N-1" instead of "N" (which is called "Bessel's correction").

The symbols also change to reflect that we are working on a sample instead of the whole
population:

The mean is now x (for sample mean) instead of (the population mean),
And the answer is s (for Sample Standard Deviation) instead of .

But that does not affect the calculations. Only N-1 instead of N changes the calculations.

OK, let us now calculate the Sample Standard Deviation:

Step 1. Work out the mean

Example 2: Using sampled values 9, 2, 5, 4, 12, 7

The mean is (9+2+5+4+12+7) / 6 = 39/6 = 6.5

So:

x = 6.5

Step 2. Then for each number: subtract the Mean and square the result

Example 2 (continued):

(9 - 6.5)2 = (2.5)2 = 6.25

(2 - 6.5)2 = (-4.5)2 = 20.25

(5 - 6.5)2 = (-1.5)2 = 2.25

(4 - 6.5)2 = (-2.5)2 = 6.25


(12 - 6.5)2 = (5.5)2 = 30.25

(7 - 6.5)2 = (0.5)2 = 0.25

Step 3. Then work out the mean of those squared differences.

To work out the mean, add up all the values then divide by how many.

But hang on ... we are calculating the Sample Standard Deviation, so instead of dividing by how
many (N), we will divide by N-1

Example 2 (continued):

Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5

Divide by N-1: (1/5) 65.5 = 13.1

(This value is called the "Sample Variance")

Step 4. Take the square root of that:

Example 2 (concluded):

s = (13.1) = 3.619...

DONE!

Comparing
When we used the whole population we got: Mean = 7, Standard Deviation = 2.983...

When we used the sample we got: Sample Mean = 6.5, Sample Standard Deviation = 3.619...

Our Sample Mean was wrong by 7%, and our Sample Standard Deviation was wrong by 21%.
Why Take a Sample?
Mostly because it is easier and cheaper.

Imagine you want to know what the whole country thinks ... you can't ask millions of people, so
instead you ask maybe 1,000 people.

There is a nice quote (possibly by Samuel Johnson):

"You don't have to eat the whole ox to know that the meat is tough."

This is the essential idea of sampling. To find out information about the population (such as
mean and standard deviation), we do not need to look at all members of the population; we only
need a sample.

But when we take a sample, we lose some accuracy.

Summary

The Population Standard Deviation:

The Sample Standard Deviation:

Univariate and Bivariate Data


Univariate: one variable,
Bivariate: two variables

Univariate means "one variable" (one type of data)

Example: Travel Time (minutes): 15, 29, 8, 42, 35, 21, 18, 42, 26
The variable is Travel Time

Example: Puppy Weights

You weigh the pups and get these results:

2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.4

The variable is Puppy Weight

We can do lots of things with univariate data:

Find a central value using mean, median and mode


Find how spread out it is using range, quartiles and standard deviation
Make plots like Bar Graphs, Pie Charts and Histograms

Bivariate means "two variables", in other words there are two types of data

With bivariate data you have two sets of related data that you want to compare:

Example:

An ice cream shop keeps track of how much ice cream they sell versus the temperature on that
day.

The two variables are Ice Cream Sales and Temperature.

Here are their figures for the last 12 days:

Ice Cream Sales vs Temperature

Temperature C Ice Cream Sales


14.2 $215

16.4 $325

11.9 $185

15.2 $332

18.5 $406

22.1 $522

19.4 $412

25.1 $614

23.4 $544

18.1 $421

22.6 $445

17.2 $408

And here is the same data as a Scatter Plot :

Now we can easily see that warmer weather and more ice cream sales are linked, but the
relationship is not perfect.

So with bivariate data we are interested in comparing the two sets of data and finding any
relationships.

Scatter Plots

A Scatter (XY) Plot has points that show the relationship between two sets of
data.

In this example, each dot shows one person's weight versus their height.

(The data is plotted on the graph as " Cartesian (x,y) Coordinates ")

Example:
The local ice cream shop keeps track of how much ice cream they sell versus the noon
temperature on that day. Here are their figures for the last 12 days:

Ice Cream Sales vs Temperature

Temperature C Ice Cream Sales

14.2 $215

16.4 $325

11.9 $185

15.2 $332

18.5 $406

22.1 $522

19.4 $412

25.1 $614

23.4 $544

18.1 $421

22.6 $445
17.2 $408

And here is the same data as a Scatter Plot:

It is now easy to see that warmer weather leads to more sales, but the relationship is not
perfect.

Line of Best Fit


We can also draw a "Line of Best Fit" (also called a "Trend Line") on our scatter plot:

Try to have the line as close as possible to all points, and as many points above the line as
below.

Example: Sea Level Rise

A Scatter Plot of Sea Level Rise:


And here I have drawn on a "Line
of Best Fit".

Interpolation and Extrapolation


Interpolation is where we find a value inside our set of data points.

Here we use linear interpolation to estimate the sales at 21 C.

Extrapolation is where we find a value outside our set of data points.

Here we use linear extrapolation to estimate the sales at 29 C (which is higher than any
value we have).

Careful: Extrapolation can give misleading results because we are in "uncharted territory".

As well as using a graph (like above) we can create a formula to help us.

Example: Straight Line Equation

We can estimate a straight line equation from two points from the graph above
Let's estimate two points on the line near actual values: (12, $180) and (25, $610)

First, find the slope:

slope "m" = change in ychange in x

= $610 $18025 12

= $43013

= 33 (rounded)

Now put the slope and the point (12, $180) into the "point-slope" formula :

y y1 = m(x x1)

y 180 = 33(x 12)

y = 33(x 12) + 180

y = 33x 396 + 180

y = 33x 216

INTERPOLATING

Now we can use that equation to interpolate a sales value at 21:

y = 3321 216 = $477

EXTRAPOLATING

And to extrapolate a sales value at 29:

y = 3329 216 = $741

The values are close to what we got on the graph. But that doesn't mean they are more (or less)
accurate. They are all just estimates.
Don't use extrapolation too far! What sales would you expect at 0 ?

y = 330 216 = $216

Hmmm... Minus $216? We extrapolated too far!

Note: we used linear (based on a line) interpolation and extrapolation, but there are many
other types, for example we could use polynomials to make curvy lines, etc.

Correlation
When the two sets of data are strongly linked together we say they have a High Correlation.

The word Correlation is made of Co- (meaning "together"), and Relation

Correlation is Positive when the values increase together, and


Correlation is Negative when one value decreases as the other increases

Like this:
(Learn More About Correlation )
Negative Correlation
Correlations can be negative, which means there is a correlation but one value goes down as the
other value increases.

Yearly
Birth
Country Production
Rate
per Person

Example : Birth Rate vs Income Madagascar $800 5.70

The birth rate tends to be lower in richer countries.


India $3,100 2.85

Mexico $9,600 2.49

Below is a scatter plot for about 100 different countries.


Taiwan $25,300 1.57

Norway $40,000 1.78

It has a negative correlation (the line slopes down)

Note: I tried to fit a straight line to the data, but maybe a curve would work better, what do you
think?

Outliers
"Outliers" are values that "lie outside" the other values.

When we collect data, sometimes there are values that are "far away" from the main group
of data ... what do we do with them?

Example: Long Jump

A new coach has been working with the Long Jump team this month, and the athletes'
performance has changed.

Augustus can now jump 0.15m further, June and Carol can jump 0.06m further.

Here are all the results:

Augustus: +0.15m
Tom: +0.11m
June: +0.06m
Carol: +0.06m
Bob: + 0.12m
Sam: -0.56m

Oh no! Sam got worse.

Here are the results on the number line:

The mean is:

(0.15+0.11+0.06+0.06+0.12-0.56) / 6 = -0.06 / 6 = -0.01m

So, on average the performance went DOWN.

The coach is obviously useless ... right?

Sam's result is an "Outlier" ... what if we remove Sam's result?


Example: Long Jump (continued)

Let us try the results WITHOUT Sam:

Mean = (0.15+0.11+0.06+0.06+0.12)/5 = 0.1 m

Hey, the coach looks much better now!

But is that fair? Can we just get rid of values we don't like?

What To Do?
You need to think "why is that value over there?"

It may be quite normal to have high or low values

People can be short or tall


Some days there is no rain, other days there can be a downpour
Athletes can perform better or worse on different days

Or there may be an unusual reason for extreme data

Example: Long Jump (continued)

We find out that Sam was feeling sick that day. Not the coach's fault at all.

So it is a good idea in this case to remove Sam's result.

When we remove outliers we are changing the data, it is no longer "pure", so we shouldn't just
get rid of the outliers without a good reason!

And when we do get rid of them, we should explain what we are doing and why.

Mean, Median and Mode


We saw how outliers affect the mean , but what about the median or mode ?

Example: Long Jump (continued)

The median ("middle" value):

including Sam is: 0.085


without Sam is: 0.11 (went up a little)

The mode (the most common value):

including Sam is: 0.06


without Sam is: 0.06 (stayed the same)

The mode and median didn't change very much.

They also stayed around where most of the data is.

So it seems that outliers have the biggest effect on the mean, and not so much on the median or
mode.

Hint: calculate the median and mode when you have outliers.

Correlation
When two sets of data are strongly linked together we say they have a High Correlation.

The word Correlation is made of Co- (meaning "together"), and Relation

Correlation is Positive when the values increase together, and


Correlation is Negative when one value decreases as the other increases

Here we look at linear correlations (correlations that follow a line).


Correlation can have a value:
1 is a perfect positive correlation
0 is no correlation (the values don't seem linked at all)
-1 is a perfect negative correlation

The value shows how good the correlation is (not how steep the line is), and if it is positive or
negative.

Example: Ice Cream Sales


The local ice cream shop keeps track of how much ice cream they sell versus the temperature on
that day, here are their figures for the last 12 days:

Ice Cream Sales vs Temperature

Temperature C Ice Cream Sales

14.2 $215

16.4 $325

11.9 $185

15.2 $332

18.5 $406

22.1 $522

19.4 $412

25.1 $614

23.4 $544

18.1 $421

22.6 $445
17.2 $408

And here is the same data as a Scatter Plot :

We can easily see that warmer weather and higher sales go together. The relationship is good
but not perfect.

In fact the correlation is 0.9575 ... see at the end how I calculated it.

Correlation Is Not Good at Curves


The correlation calculation only works well for relationships that follow a straight line.

Our Ice Cream Example: there has been a heat wave!

It gets so hot that people aren't going near the shop, and sales start dropping.

Here is the latest graph:

The correlation value is now 0: "No Correlation" ... !

The calculated correlation value is 0 (I worked it out), which means "no correlation".

But we can see the data does have a correlation: it follows a nice curve that reaches a peak
around 25 C.

But the linear correlation calculation is not "smart" enough to see this.

Moral of the story: make a Scatter Plot , and look at it!


You may see a correlation that the calculation does not.

Correlation Is Not Causation


"Correlation Is Not Causation" ... which says that a correlation does not mean that one thing
causes the other (there could be other reasons the data has a good correlation).

Example: Sunglasses vs Ice Cream

Our Ice Cream shop finds how many sunglasses were sold by a big store for each day and
compares them to their ice cream sales:

The correlation between Sunglasses and Ice Cream sales is high

Does this mean that sunglasses make people want ice cream?

Example: A Real Case!

A few years ago a survey of employees found a strongpositive correlation between "Studying
an external course" and Sick Days.

Does this mean:

Studying makes them sick?


Sick people study a lot?
Or did they lie about being sick to study more?

Without further research we can't be sure why.

How To Calculate
How did I calculate the value 0.9575 at the top?

I used "Pearson's Correlation". There is software that can calculate it, such as the CORREL()
function in Excel or LibreOffice Calc ...
... but here is how to calculate it yourself:

Let us call the two sets of data "x" and "y" (in our case Temperature is x and Ice Cream Sales
is y):

Step 1: Find the mean of x, and the mean of y


Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call
them "b")
Step 3: Calculate: a b, a2 and b2 for every value
Step 4: Sum up a b, sum up a2 and sum up b2
Step 5: Divide the sum of a b by the square root of [(sum of a2) (sum of b2)]

Here is how I calculated the first Ice Cream example (values rounded to 1 or 0 decimal places):

As a formula it is:

Where:

is Sigma, the symbol for "sum up"


is each x-value minus the mean of x (called "a" above)
is each y-value minus the mean of y (called "b" above)

You probably won't have to calculate it like that, but at least you know it is not "magic", but
simply a routine set of calculations.

Note for Programmers


You can calculate it in one pass through the data. Just sum up x, y, x2, y2 and xy (no need
for a or b calculations above) then use the formula:

Other Methods
There are other ways to calculate a correlation coefficient, such as "Spearman's rank correlation
coefficient", but I prefer using a spreadsheet like above.

Probability
How likely something is to happen.

Many events can't be predicted with total certainty. The best we can say is how likely they are
to happen, using the idea of probability.

Tossing a Coin

When a coin is tossed, there are two possible outcomes:

heads (H) or
tails (T)

We say that the probability of the coin landing H is

And the probability of the coin landing T is

Throwing Dice

When a single die is thrown, there are six possible outcomes: 1, 2, 3, 4, 5, 6.

The probability of any one of them is 16

Probability
In general:

Probability of an event happening = Number of ways it can happenTotal number of


outcomes

Example: the chances of rolling a "4" with a die

Number of ways it can happen: 1 (there is only 1 face with a "4" on it)

Total number of outcomes: 6 (there are 6 faces altogether)

So the probability = 16
Example: there are 5 marbles in a bag: 4 are blue, and 1 is red. What is the
probability that a blue marble gets picked?

Number of ways it can happen: 4 (there are 4 blues)

Total number of outcomes: 5 (there are 5 marbles in total)

So the probability = 45 = 0.8

Probability Line
We can show probability on a Probability Line :

Probability is always between 0 and 1

Probability is Just a Guide


Probability does not tell us exactly what will happen, it is just a guide

Example: toss a coin 100 times, how many Heads will come up?

Probability says that heads have a chance, so we can expect 50 Heads.


But when we actually try it we might get 48 heads, or 55 heads ... or anything really, but in
most cases it will be a number near 50.

Learn more at Probability Index .

Words
Some words have special meaning in Probability:

Experiment or Trial: an action where the result is uncertain.

Tossing a coin, throwing dice, seeing what pizza people choose are all examples of experiments.

Sample Space: all the possible outcomes of an experiment

Example: choosing a card from a deck

There are 52 cards in a deck (not including Jokers)

So the Sample Space is all 52 possible cards: {Ace of Hearts, 2 of Hearts, etc... }

The Sample Space is made up of Sample Points:

Sample Point: just one of the possible outcomes

Example: Deck of Cards

the 5 of Clubs is a sample point


the King of Hearts is a sample point

"King" is not a sample point. As there are 4 Kings that is 4 different sample points.

Event: a single result of an experiment

Example Events:

Getting a Tail when tossing a coin is an event


Rolling a "5" is an event.

An event can include one or more possible outcomes:

Choosing a "King" from a deck of cards (any of the 4 Kings) is an event


Rolling an "even number" (2, 4 or 6) is also an event

The Sample Space is all possible outcomes.

A Sample Point is just one possible outcome.

And an Event can be one or more of the possible outcomes.

Hey, let's use those words, so you get used to them:

Example: Alex wants to see how many times a "double" comes up when
throwing 2 dice.

Each time Alex throws the 2 dice is an Experiment.

It is an Experiment because the result is uncertain.

The Event Alex is looking for is a "double", where both dice have the same number. It is made
up of these 6 Sample Points:

{1,1} {2,2} {3,3} {4,4} {5,5} and {6,6}

The Sample Space is all possible outcomes (36 Sample Points):

{1,1} {1,2} {1,3} {1,4} ... {6,3} {6,4} {6,5} {6,6}

These are Alex's Results:


Experiment Is it a Double?

{3,4} No

{5,1} No

{2,2} Yes

{6,3} No

... ...

After 100 Experiments, Alex has 19 "double" Events ... is that close to what you would expect?

Probability Line
Probability is the chance that something will happen. It can be shown on a line:

The probability of an event occurring is somewhere between impossible and certain.

As well as words, we can use numbers (such as fractions or decimals) to show the probability of
something happening:

Impossible is zero
Certain is one.

Here are some fractions on the probability line:

We can also show the chance that something will happen:

a) The sun will rise tomorrow


b) I will not have to learn mathematics at school
c) If I flip a coin it will land heads up
d) Choosing a red ball from a sack with 1 red ball and 3 green balls

Between 0 and 1
The probability of an event will not be less than 0.
This is because 0 is impossible (sure that something will not happen).
The probability of an event will not be more than 1.
This is because 1 is certain that something will happen.

The Basic Counting Principle


When there are m ways to do one thing,
and n ways to do another,
then there are mn ways of doing both.

Example: you have 3 shirts and 4 pants.

That means 34=12 different outfits.

Example: There are 6 flavors of ice-cream, and 3 different cones.

That means 63=18 different single-scoop ice-creams you could order.

It also works when you have more than 2 choices:

Example: You are buying a new car.

There are 2 body styles:

sedan or hatchback
There are 5 colors available:

There are 3 models: GL (standard model),


SS (sports model with bigger engine)
SL (luxury model with leather seats)

How many total choices?

You can see in this "tree" diagram:

You can count the choices, or just do the simple calculation:

Total Choices = 2 5 3 = 30

Independent or Dependent?
But it only works when all choices are independent of each other.

If one choice affects another choice (i.e. depends on another choice), then a simple
multiplication is not right.

Example: You are buying a new car ... but ...

the salesman says "You can't choose black for the hatchback" ... well then things change!

You now have only 27 choices.

Because your choices are not independent of each other.

But you can still make your life easier with this calculation:

Choices = 53 + 43 = 15 + 12 = 27
Relative Frequency
How often something happens divided by all outcomes.

Example: Your team has won 9 games from a total of 12 games played:

the Frequency of winning is 9


the Relative Frequency of winning is 9/12 = 75%

All the Relative Frequencies add up to 1 (except for any rounding error).

Example: Travel Survey

92 people were asked how they got to work:

35 used a car
42 took public transport
8 rode a bicycle
7 walked

The Relative Frequencies (to 2 decimal places) are:

Car: 35/92 = 0.38


Public Transport: 42/92 = 0.46
Bicycle: 8/92 = 0.09
Walking: 7/92 = 0.08

0.38+0.46+0.09+0.08 = 1.01

(It would be exactly 1 if we had used perfect accuracy),

Activity: An Experiment with a Die


You will need:

A
single die

Interesting point
Many people think that one of these cubes is called "a dice". But no!

The plural is dice, but the singular is die. (i.e. 1 die, 2 dice.)

The common die has six faces:

We usually call the faces 1, 2, 3, 4, 5 and 6.

High, Low, and Most Likely


Before we start, let's think about what might happen.

Question: If you roll a die:

1. What is the least possible score?


2. What is the greatest possible score?
3. What do you think is the most likely score?

The first two questions are quite easy to answer:

1. The least possible score must be 1


2. The greatest possible score must be 6
3. The most likely score is ... ???

Are they all just as likely? Or will some happen more often?

Let us see which is most likely ...

The Experiment
Throw a die 60 times,
record the scores in a tally table.

You can record the results in this table using tally marks :

Score Tally Frequency

Total Frequency = 60

OK, Go!

... ...

Finished ...?

Now draw a bar graph to illustrate your results.

You can fill in this one:

Or you can use Data Graphs (Bar, Line and


Pie)
then print it out.
You may get something like this:

Are the bars all the same height?


If not ... why not?

60 Throws
OK, why did I ask you to make 60 throws? Well, 6 throws is not enough for good results. 600
will give good results but is a lot of work. So 60 seems OK, and is also 10 lots of 6.

So we should expect 10 of each number, like this:

Those are the theoretical values,


as opposed to the experimental ones you got from your experiment!

How do those theoretical results compare with your experimental results?

This graph and your graph should be similar, but they are not likely to be exactly the same, as
your experiment relied on chance, and the number of times you did it was fairly small.

If you did the experiment a very large number of times, you would get results much closer to the
theoretical ones.
Questions
Which face came up most often? ____

Which face came up least often? ____

Do you think you would get the same results if you did this again? Yes / No

An experiment gives results.


When done again it may give different results!
So it is important to know when results are good quality, or just random.

Probability
On the page Probability you will find a formula:

Probability of an event happening = Number of ways it can happenTotal number of outcomes

Example: Probability of a 2

We know there are 6 possible outcomes.

And there is only 1 way to get a 2.

So the probability of getting 2 is:

Probability of a 2 = 16

Doing that for each score gets us:

Score Probability

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6
Total = 1

The sum of all the probabilities is 1

For any experiment:

The sum of the probabilities of all possible outcomes is always equal to 1

Activity: An Experiment with Dice


Let's throw two dice and add the scores ...

You will need:

Two dice

Interesting point
Many people think that one of these cubes is called "a dice". But no!

The plural is dice, but the singular is die: i.e. 1 die, 2 dice.

The common die has six faces:

We usually call the faces 1, 2, 3, 4, 5 and 6.

Throwing Two Dice and Adding the Scores ...


Example: when one die shows 2 and the other shows 6 the total score is 2 + 6 = 8

Question: Can you get a total of 8 any other way?


What about 6 + 2 = 8 (the other way around), is that a different way?

Yes! Because the two dice are different.

Example: imagine one die is colored red and the other is colored blue.

There are two possibilities:

So 2 + 6 and 6 + 2 are different.

And you can get 8 with other numbers, such as 3 + 5 = 8 and 4 + 4 = 8

High, Low, and Most Likely


Before we start, let's think about what might happen.

Question: If you throw 2 dice together and add the two scores:

1. What is the least possible total score?


2. What is the greatest possible total score?
3. What do you think is the most likely total score?

The first two questions are quite easy to answer:

1. The least possible total score must be 1 + 1 = 2


2. The greatest possible total score must be 6 + 6 = 12
3. The most likely total score is ... ???

Are they all just as likely? Or will some happen more often?

To help answer the third question let us try an experiment.

The Experiment
Throw two dice together 108 times,
add the scores together each time,
record the scores in a tally table.

Why 108? That seems a strange number to choose. I will explain later.

You can record the results in this table using tally marks :

Added
Tally Frequency
Scores

10

11

12

Total Frequency = 108

OK, Go!

...
...

Finished ...?

Now draw a bar graph to show


your results.

Or you can use Data Graphs


(Bar, Line and Pie) then print it
out.

You may get something like


this:

Are the bars all about the same height?


If not ... why not?

So Why Did We Get That Shape?


The explanation is simple:

There is only one way to get a total of 2 (1 + 1),

but there are six ways of getting a total of 7 (1 + 6, 2 + 5, 3 + 4, 4 + 3, 5 + 2 and


6 + 1)

Here is a table of all possibile outcomes, and the totals. I have also shown what adds to 7
in bold.

Score on One Die


1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8

Score
3 4 5 6 7 8 9
on the
Other
4 5 6 7 8 9 10
Die

5 6 7 8 9 10 11

6 7 8 9 10 11 12

You can see there is only 1 way to get 2, there are 2 ways to get 3, and so on.

Let us count the ways of getting each total and put them in a table:

Number of
Total
Ways to Get
Score
Score

2 1

3 2

4 3

5 4

6 5

7 6

8 5

9 4

10 3

11 2

12 1

Total = 36
Can you see the Symmetry in this table?

2 and 12 have the same number of ways = 1 each


3 and 11 have the same number of ways = 2 each
4 and 10 have the same number of ways = 3 each
5 and 9 have the same number of ways = 4 each
6 and 8 have the same number of ways = 5 each

108 Throws
OK, why 108 throws? Well, 36 throws isn't enough for good results, 360 throws is be great but
takes a long time. So 108 (which is 3 lots of 36) seems just right.

So let's multiply all these numbers by 3 to match our total of 108:

Number of
Total
Ways to Get
Score
Score

2 3

3 6

4 9

5 12

6 15

7 18

8 15

9 12

10 9

11 6

12 3

Total = 108
Those are the theoretical values, as opposed to the experimental ones you got from your
experiment.

The theoretical values look like this in a bar graph:

How do these theoretical results compare with your experimental results?

This graph and your graph should be quite similar, but they are not likely to be exactly the same,
as your experiment relied on chance, and the number of times you did it was fairly small.

If you did the experiment a very large number of times, you should get results much closer to
the theoretical ones.

And, by the way, we've now answered the question from near the beginning of the experiment:

What is the most likely total score?

7 has the highest bar, so 7 is the most likely total score.

Hey, is that why people talk about Lucky 7 ... ?

Probability
On the page Probability you will find a formula:

Probability of an event happening = Number of ways it can happenTotal number of outcomes

Example: Probability of a total of 2

We know there are 36 possible outcomes.


And there is only 1 way to get a total score of 2.

So the probability of getting 2 is:

Probability of a 2 = 136

Doing that for each score gets us:

Total
Probability
Score

2 1/36

3 2/36

4 3/36

5 4/36

6 5/36

7 6/36

8 5/36

9 4/36

10 3/36

11 2/36

12 1/36

Total = 1

(Note: I didn't simplify the fractions)

The sum of all the probabilities is 1

For any experiment:

The sum of the probabilities of all possible outcomes is always equal to 1

Activity: Dropping a Coin onto a Grid


A few hundred years ago people enjoyed betting on coins tossed on to the floor ... would they
cross a line or not?

A man (Georges-Louis Leclerc, the Count of Buffon, see " Buffon's Needle ") started thinking
about this and worked out how to calculate the probability .

Now it is your turn to have a go!

You will need:

A small round coin,

such as a US penny, a 1c Euro or 5 Rupee.

A sheet of paper with a grid of 30 mm squares.

Steps
Measure the diameter of your coin: ____ mm

a US Penny is 19mm, a 1c Euro is 16.25mm, a Rs 5 is 23mm

Also measure the spacing of your grid (it may not print at exactly 30mm): ____ mm

Put your sheet of paper on a flat surface such as a table top or the floor.

From a height of about 5cm, drop the coin onto the paper and record whether it lands:
A: Completely inside a square (not touching any grid lines)

B: Crosses one or more lines

The exact height from which you drop the coin is not important, but don't drop it so close to the
paper that you are cheating!

If the coin rolls completely off the paper, then do not count that turn.

100 Times
Now we will drop the coin 100 times, but first ...

... what percentage do you think will land A, or B?

Make a guess (estimate) before you begin the experiment:

Your Guess for "A" (%):

Your Guess for "B" (%):

OK let's begin.

Drop the coin 100 times and record A (does not touch a line) or B (touches a line) using Tally
Marks :

Coin lands Tally Frequency Percentage

Totals: 100 100%


Now draw a Bar Graph to illustrate your results. You can create one at Data Graphs (Bar, Line
and Pie) .

Are the bars the same height?


Did you expect them to be?
How does the result compare with your guess?

We Can Calculate What It Should Be ...


Here are some positions for the coin to land so it does not quite touch one of the lines:

Place your coin on your grid (like above), and then put a mark on the paper where the center of
the coin is (just a rough estimate will do).

See how the coin's center is one radius r away from a line.

(Read about a Circle's Radius and Diameter .)

Make lots of "center marks" then draw a box connecting them all like below:

d = Coin's diameter (2 r)

When a coin's center is within the yellow box it won't touch any line.

The yellow box is smaller than the grid by two radiuses (= one diameter) of the coin.
So what are the areas?

The area of the grid square is 30 30 = 900 mm2


The area of the yellow box is (30-d) (30-d) = (30-d)2 mm2

The above calculation was for a 30 mm grid, but we can use S for grid size:

The area of the grid square is S S = S2 mm2


The area of the yellow box is (S-d)2 mm2

Example: A 1c Euro (d=16.25 mm) on a 29mm grid (S=29 mm):

Grid Square = 292 = 841 mm2

Yellow Box = (29-16.25)2 = 12.752 = 162 mm2 (to the nearest mm2)

So you should expect the coin to land not crossing a line of the grid approximately:

"A" = 162 / 841 = 19.3% of the time

And "B" = 100% - 19.3% = 80.7%

Now do the calculations for your own grid size and coin size.

Grid Spacing S (mm):

Diameter of Coin d (mm):

Area of Grid Square = S2 (mm2):

Area of Yellow Box = (S-d)2 (mm2):

"A" (%):

"B" (%):

How do these theoretical results compare with your experimental results?

It won't be exact (because it is a random thing) but it may be close.

Different Sizes of Coin


Try repeating the experiment using a different sized coin.

First calculate the theoretical value ... how does this affect the values for A and B?
Then do the experiment to see how close it gets.

What You Have Done


You have (hopefully) had fun running an experiment.

You have done some geometry, and had some experience calculating areas and probabilities.

And you have seen the relationship between theory and reality.

Activity: Buffon's Needle


How to estimate Pi by dropping a match.

A few hundred years ago people enjoyed betting on coins tossed on to the floor : would the coin
cross a line or not?

A man (Georges-Louis Leclerc, the Count of Buffon) started thinking about this and worked out
the probability .

It is called "Buffon's Needle" in his honor.

Now it is your turn to have a go!

You will need:

A match, with the head cut off.


It must be less than 50 mm.

(You can use a needle, but be careful!)


A sheet of paper with lines 50 mm apart.

Steps
Measure the spacing of your lines (it may not print at exactly 50mm): ____ mm
Measure the length of your match (must be less than the line spacing): ____ mm
Make sure your sheet of paper is on a flat surface such as a table top or the floor.
From a height of about 5cm, drop the match onto the paper and record whether it lands:

A: Not touching a line

B: Touching or crossing a line

The exact height from which you drop the match is not important, but don't drop it so close to
the paper that you are cheating!

If the match rolls completely off the paper, then do not count that turn.

100 Times
Now we will drop the match 100 times, but first ...

... what percentage do you think will land A, or B?

Make a guess (estimate) before you begin the experiment:

Your Guess for "A" (%):

Your Guess for "B" (%):

OK let's begin.
Drop the match 100 times and record A (does not touch a grid line) or B (touches or crosses a
grid line) using Tally Marks :

match lands Tally Frequency Percentage

(no touch)

(crosses)

Totals: 100 100%

Now draw a Bar Graph to illustrate your results. You can create one at Data Graphs (Bar, Line
and Pie) .

Are the bars the same height?


Did you expect them to be?
How does the result compare with your guess?

Now Let's Estimate Pi


Buffon used the results from his experiment with a needle to estimate the value of ( Pi ). He
worked out this formula:

2Lxp
Where

L is the length of the needle (or match in our case)


x is the line spacing (50 mm for us)
p is the proportion of needles crossing a line (case B)

We can do it too!
Example: Sam had a match of length 31 mm, and a 40 mm line spacing and 49
of 100 drops crossed the line

So Sam had:

L = 31
x = 40
p = 49/100 = 0.49

Substituting these values into the formula, Sam got:

2 3140 0.49 3.16

Now it's your turn. Fill in the following table using your own results:

Length of match "L" (mm):

Line Spacing "x" (mm):

p (the proportion of needles crossing a line):

And do the calculation:

2Lxp 2 __________ _____ _____

Did you do any better?

It won't be exact (because it is a random thing) but it may be close.

Changing The Subject


The next part of this activity is to " change the subject " of the formula to work out the perfect
value of "p" (the proportion of times the match crosses the line):

Start with: 2L/xp



multiply both sides by p: p 2L/x

divide both sides by :p 2L/ x

And we get:

p 2Lx
Example: Alex had a match of length 36 mm, and a 50 mm line spacing.

So Alex had:

L = 36
x = 50

Substituting these values into the formula, Alex got:

p 2 36 50 0.46...

So Alex should expect the match to cross the line (case B) 46 times out of 100

Fill in the following table using your own results:

Length of match "L" (mm):

Line Spacing "x" (mm):

Estimate for p ( 2L/x):

How close were you?

Different Size of Match


Try repeating the experiment using a different sized match (but not larger then the line spacing!)

Did you get better or worse results?

What You Have Done


You have (hopefully) had fun running an experiment.

You have had some experience with calculations.

And you have seen the relationship between theory and reality.

Random Words
Probability and English ... what a mix!

Random Letters
You would think it was easy to create random words ... just pick letters randomly and put them
together, and voila! a random word.

Well, here are 20 words made that way:

tldkl oewkx dmwol vuptg hvwjk naqid avypr zwtip zgnzs bvdhd
muyfd ighgd xhlng oyecn vjnsl ssjrx gxald tukxj rvfoq yxzxq

It turns out that the words are not only nonsense, but quite hard to pronounce!

(Try saying "tldkl" or "oewkx")

You see, the probability is very unlikely ... you would have to try lots of random combinations
before getting lucky.

Why? Well, English has around 200,000 words (228,000 in the Oxford English Dictionary
including many words no longer used) ... but how many different words can be made with just 5
letters?

26 26 26 26 26 = 11,881,376 possible 5 letter words!

And that is just the 5 letter words ...

Let us guess that there are 40,000 words in English that have 5 letters. So the probability of
making a real word just randomly would be:

40,000 / 11,881,376 = 0.003, or about 0.3% chance


So real words are rare. And we can see that putting random letters together is very unlikely to
produce a real word.

Vowels
We can improve our success by insisting that a word have at least one vowel, since nearly every
word in English has one (except fly, by and a few others). Like this:

ectot gjaqv kuifg vzicu zspsu pdidb wqdis uerrs ucgej okimw
fnevz ewxko ljgew aglgo jpfoq dcytu uwkcj dzioy wekdx xuybk

This is a great improvement. More words can be pronounced.

But there are still lots of strange words like "zspsu" and "xuybk"

Letter Frequency
So, our next improvement is to use less of the letters like j,x,z and q and more of the letters like
e,t and s.

In fact the frequency of letters in the English Language is well known. Here is how many times
you would expect to see a letter in every 1,000 letters:

a b c d e f g h i j k l m n o p q r s t u v w x y z

82 15 28 42 127 22 20 61 70 2 8 40 24 67 75 19 1 60 63 90 27 10 24 2 20 1

Can you see that "e" is common, but "z" is rare?

"e" is lkely to occur 127 times in every 1,000, or as a ratio 127/1000 = .127 (=12.7%)
"z" is lkely to occur only 1 time in every 1,000, or as a ratio 1/1000 = .001 (=0.1%)

So, by selecting letters based on that frequency (a bit like rolling a 1,000 sided die (dice) ,
where each die has 82 a's, 15 b's ... and only one z), we can get output like this:

elnao etgov segty laast aessn siuon oenha eaoas ncoot ctwka
dmswo dpuoh eewis ebdni laarm syucs idvos lhina igahh soyie
Still no real words, but some are close. And most of them can be pronounced. (Great names if
you are writing a science fiction novel!)

but we can do better ...

2-Letter Frequencies
We can take the idea of Letter Frequency one step further by asking

"what is the frequency of letters that follow another letter"

For example, if we already have a "t", the next letter is very likely to be an "h" (making "th").

To illustrate this, I built up a Table of Two-Letter Frequencies (from Alice's Adventures in


Wonderland). Here is the line for "t":

Freq a b c d e f g h i j k l m n o p q r s t u v w x y z

t 238 41 727 11 3197 459 275 18 12 990 149 153 333 125 65 54

So, "h" occured 3197 times after a "t" ("th") ... but "b" never followed a "t"

OK, let us start with a "t", and let us say we choose an "h" to make "th", then next we would use
the "h"-row to choose another letter (maybe an "e" to make "the"), and so on ... well, here is a
sample:

the cur the bund hof arytowno d sheromasees asemedosouro f


soacthake d imon binofowat oaten d heng wa

The results are remarkable ... nonsense, but almost like some strange language.

In fact we are not just making random words now, we are making random sentences!

Higher Letter Frequencies


Why stop there? We can make tables of three letter frequencies or more ...

3 Letter Frequencies
How do 3 Letter Frequencies work?

Well, say I already have two letters (like "ei") ... we then:

look through the sample text for every time "ei" appears,
randomly choose one of those
look for the letter following "ei" (possibly "t").
then add the "t" to make "eit"
and start again using "it" (... always the last two letters)

Here is a sample:

Either great into get very deep welled of it it, and


to wondere started into the book about hear!

Now, that looks good! By sampling from a real source we can get good results.

4 Letter Frequencies

Using the same method I used groups of 3 Letters to decide on the 4th letter and got:

Either the sides or conversations in time to


happen next. First, she look down mind

5 Letter Frequencies

And with 5 Letter frequencies:

There was just in time it all seemed quite natural);


but to take out of time as she had not like to do

Try For Yourself!


Lotteries
A lottery is a type of gambling where people buy tickets, and then win if their numbers get
chosen.
A "lot" is something that happens by chance. You may have heard people say "let us decide by
drawing lots" or "so that is my lot".

Rules
Different Lotteries have different rules.

Here we will use a typical lottery where the player chooses 6 different numbers out of 49.

Example:

You enter the lottery by buying a ticket and selecting your six numbers.

You choose: 1, 2, 12, 14, 20 and 21

On Saturday they draw the lottery, and the winning numbers are:

3, 12, 18, 20, 32 and 43

You matched two of the numbers (12 and 20):

Is that enough to win you anything? No.


Usually you must match at least three numbers to get a small prize.
Matching four numbers gets a bigger prize,
Matching five is even bigger.
But if you match ALL SIX of the numbers you might win millions.

Choosing Numbers

The numbers don't know what they are!


A Lottery is just as likely to come up "1,2,3,4,5,6" as "9,11,16,23,27,36"

Seriously!

Instead of numbers they could be symbols, or colors, the lottery would still work.

So it doesn't matter what numbers you choose, the chances are all the same.

More Likely Numbers?


So you have read that some numbers come up more often than others? Well of course they do,
that is random chance.

The people who run lotteries have strict rules to stop the "rigging" of results. But random chance
can sometimes produce strange results.

For example, using The Spinner I did 1000 spins for 10 numbers and came up with this:

Wow! 7 came up 115 times,


and 8 only 81 times.

Does this mean 7 will now come up more often, or less often? In fact it doesn't mean anything, 7
is just as likely as any number to get chosen.

Try it yourself and see what results you get.

Popular Numbers
But there is a trick! People have favorite numbers, so when popular numbers come up you are
sharing the winnings with lots of people.

Birthdays are popular choices, so people choose 1-12 and 1-31 more often. Also lucky numbers.

So maybe you should choose unpopular numbers so when you DO win you get lots of money.

(This assumes your lottery is one where prizes are shared among winners.)

Regret
Don't choose the same numbers every week. It's a trap! If you forget a week you then
worry that your numbers will come up, and this forces you to buy a ticket every week (even
when you have other more important things to do).

My advice:

Make a list of many unpopular numbers.


Choose randomly from this list every time.

Syndicates
A "Syndicate" is a group of people who all put in a little money so the group can buy lots of
tickets. The chance of winning goes up, but your payout each time is less (because you are
sharing).

Syndicates can be fun because they are sociable ... a way of making and keeping friendships.
Plus some syndicates like to spend small winnings on everyone going out for a meal together.

Another good reason for joining a syndicate is that your chances of winning go up (but what you
win goes down).

Think about it ... winning Ten Million would totally change your life, but winning One Million
would also improve your life a lot. You might prefer ten times the chance of winning the million.

Chance Of Winning the Big Prize


OK. What are the chances of you winning the big prize?
The chances of winning all 6 numbers is 1 in 13,983,816

You can use the Combinations and Permutations Calculator to work it out (use n=49, r=6, 'No'
for Is Order Important? and 'No' for Is Repetition allowed?)

The actual calculation is this:

49
C6 = 49!/(43! x 6!) = 13983816

So how many times do you need to play to win?

1 Week

Suppose you play every week

The probability you win after 1 week is: 113983816 = 0.0000000715...


The probability you fail to win after 1 week is: 1 113983816 = 0.9999999285...

50 Years

Let's say you play for 50 years, that's 2,600 weeks.

The probability you fail to win over 2,600 weeks is:

(1 113983816)2600 = 0.999814...
Which means the probability of winning (after 50 Years) is: 1 0.999814... = 0.000186...

Still only about 0.02%

And you would have spent thousands for that small chance.

You could have bought a new TV, a computer and phone for that money.

BUT it IS fun thinking "I just may win this week!"

Just keep it as a fun thing to do, OK?

Your Turn
Now your turn:

Find out the rules for winning Lotto in your area.


How many numbers do you have to choose and how many numbers do you choose from?
Calculate the probability of winning in any one week.
Calculate the probability of winning if you play every week for 50 years.
How much money could you have saved by not playing? What could you have bought?

Probability: Complement
Complement of an Event: All outcomes that are NOT the event.

When the event is Heads, the complement is Tails

When the event is {Monday, Wednesday} the complement is {Tuesday,


Thursday, Friday, Saturday, Sunday}

When the event is {Hearts} the complement is {Spades, Clubs, Diamonds,


Jokers}

So the Complement of an event is all the other outcomes (not the ones we want).

And together the Event and its Complement make all possible outcomes.

Probability
Probability of an event happening = Number of ways it can happenTotal number of
outcomes

Example: the chances of rolling a "4" with a die

Number of ways it can happen: 1 (there is only 1 face with a "4" on it)
Total number of outcomes: 6 (there are 6 faces altogether)

So the probability = 16

The probability of an event is shown using "P":

P(A) means "Probability of Event A"

The complement is shown by a little mark after the letter such as A' (or sometimes Ac or A):

P(A') means "Probability of the complement of Event A"

The two probabilities always add to 1

P(A) + P(A') = 1

Example: Rolling a "5" or "6"

Event A is {5, 6}

Number of ways it can happen: 2

Total number of outcomes: 6

P(A) = 26 = 13

The Complement of Event A is {1, 2, 3, 4}

Number of ways it can happen: 4

Total number of outcomes: 6

P(A') = 46 = 23

Let us add them:

P(A) + P(A') = 13 + 23 = 33 = 1

Yep, that makes 1


It makes sense, right? Event A plus all outcomes that are not Event A make up all possible
outcomes.

Why is the Complement Useful?


It is sometimes easier to work out the complement first.

Example. Throw two dice. What is the probability the two scores
are different?

Different scores are like getting a 2 and 3, or a 6 and 1. It is quite a long list:

A = { (1,2), (1,3), (1,4), (1,5), (1,6),


(2,1), (2,3), (2,4), ... etc ! }

But the complement (which is when the two scores are the same) is only 6 outcomes:

A' = { (1,1), (2,2), (3,3), (4,4), (5,5), (6,6) }

And its probability is:

P(A') = 6/36 = 1/6

Knowing that P(A) and P(A') together make 1, we can calculate:

P(A) = 1 P(A')
= 1 1/6
= 5/6

So in this case (and many others) it's easier to work out P(A') first, then find P(A)

Probability: Types of Events


Life is full of random events!
You need to get a "feel" for them to be a smart and successful person.

The toss of a coin, throw of a dice and lottery draws are all examples of random events.

Events
When we say "Event" we mean one (or more) outcomes.

Example Events:

Getting a Tail when tossing a coin is an event


Rolling a "5" is an event.

An event can include several outcomes:

Choosing a "King" from a deck of cards (any of the 4 Kings) is also an event
Rolling an "even number" (2, 4 or 6) is an event

Events can be:

Independent (each event is not affected by other events),


Dependent (also called "Conditional", where an event is affected by other events)
Mutually Exclusive (events can't happen at the same time)

Let's look at each of those types.

Independent Events
Events can be "Independent", meaning each event is not affected by any other events.

This is an important idea! A coin does not "know" that it came up heads before ... each toss of a
coin is a perfect isolated thing.

Example: You toss a coin three times and it comes up "Heads" each time ... what is the chance
that the next toss will also be a "Head"?

The chance is simply 1/2, or 50%, just like ANY OTHER toss of the coin.

What it did in the past will not affect the current toss!
Some people think "it is overdue for a Tail", but really truly the next toss of the coin is totally
independent of any previous tosses.

Saying "a Tail is due", or "just one more go, my luck is due" is called The Gambler's
Fallacy

Learn more at Independent Events .

Dependent Events
But some events can be "dependent" ... which means they can be affected by previous
events.

Example: Drawing 2 Cards from a Deck

After taking one card from the deck there are less cards available, so the probabilities change!

Let's look at the chances of getting a King.

For the 1st card the chance of drawing a King is 4 out of 52

But for the 2nd card:

If the 1st card was a King, then the 2nd card is less likely to be a King, as only 3 of the 51
cards left are Kings.
If the 1st card was not a King, then the 2nd card is slightly more likely to be a King, as 4 of
the 51 cards left are King.

This is because we are removing cards from the deck.

Replacement: When we put each card back after drawing it the chances don't change, as the
events are independent.

Without Replacement: The chances will change, and the events are dependent.

You can learn more at Dependent Events: Conditional Probability

Tree Diagrams
When we have Dependent Events it helps to make a " Tree Diagram "

Example: Soccer Game

You are off to soccer, and love being the Goalkeeper, but that depends who is the Coach today:

with Coach Sam your probability of being Goalkeeper is 0.5


with Coach Alex your probability of being Goalkeeper is 0.3

Sam is Coach more often ... about 6 of every 10 games (a probability of 0.6).

Let's build the Tree Diagram!

Start with the Coaches. We know 0.6 for Sam, so it must be 0.4 for Alex (the probabilities must
add to 1):

Then fill out the branches for Sam (0.5 Yes and 0.5 No), and then for Alex (0.3 Yes and 0.7 No):

Now it is neatly laid out we can calculate probabilities (read more at Tree Diagrams ).

Mutually Exclusive
Mutually Exclusive means we can't get both events at the same time.

It is either one or the other, but not both

Examples:

Turning left or right are Mutually Exclusive (you can't do both at the same time)
Heads and Tails are Mutually Exclusive
Kings and Aces are Mutually Exclusive

What isn't Mutually Exclusive


Kings and Hearts are not Mutually Exclusive, because we can have a King of Hearts!

Like here:

Aces and Kings are Hearts and Kings are


Mutually Exclusive not Mutually Exclusive
(can't be both) (can be both)

Read more at Mutually Exclusive Events

Probability: Independent Events

Life is full of random events!

You need to get a "feel" for them to be a smart and successful person.

The toss of a coin, throwing dice and lottery draws are all examples of random events.

Sometimes an event can affect the next event.

Example: taking colored marbles from a bag: as you take each marble there are less marbles left
in the bag, so the probabilities change.

We call those Dependent Events, because what happens depends on what happened before
(learn more about this at Conditional probability ).

But otherwise they are Independent Events ...


Independent Events

Independent Events are not affected by previous events.

This is an important idea!

A coin does not "know" it came up heads before.

And each toss of a coin is a perfect isolated thing.

Example: You toss a coin and it comes up "Heads" three times ... what is the
chance that the next toss will also be a "Head"?

The chance is simply (or 0.5) just like ANY toss of the coin.

What it did in the past will not affect the current toss!

Some people think "it is overdue for a Tail", but really truly the next toss of the coin is totally
independent of any previous tosses.

Saying "a Tail is due", or "just one more go, my luck is due" is called The Gambler's
Fallacy

Of course your luck may change, because each toss of the coin has an equal chance.

Probability of Independent Events


"Probability" (or "Chance") is how likely something is to happen.

So how do we calculate probability?

Probability of an event happening = Number of ways it can happenTotal number of


outcomes

Example: what is the probability of getting a "Head" when tossing a coin?


Number of ways it can happen: 1 (Head)

Total number of outcomes: 2 (Head and Tail)

So the probability = 12 = 0.5

Example: what is the probability of getting a "4" or "6" when rolling a die?

Number of ways it can happen: 2 ("4" and "6")

Total number of outcomes: 6 ("1", "2", "3", "4", "5" and "6")

So the probability = 26 = 13 = 0.333...

Ways of Showing Probability


Probability goes from 0 (imposssible) to 1 (certain):

It is often shown as a decimal or fraction.

Example: the probability of getting a "Head" when tossing a coin:

As a decimal: 0.5
As a fraction: 1/2
As a percentage: 50%
Or sometimes like this: 1-in-2

Two or More Events


We can calculate the chances of two or more independent events by multiplying the chances.
Example: Probability of 3 Heads in a Row

For each toss of a coin a "Head" has a probability of 0.5:

And so the chance of getting 3 Heads in a row is 0.125

So each toss of a coin has a chance of being Heads, but lots of Heads in a row is unlikely.

Example: Why is it unlikely to get, say, 7 heads in a row, when each toss of a
coin has a chance of being Heads?

Because we are asking two different questions:

Question 1: What is the probability of 7 heads in a row?

Answer: = 0.0078125 (less than 1%).

Question 2: Given that we have just got 6 heads in a row, what is the probability thatthe
next toss is also a head?

Answer: , as the previous tosses don't affect the next toss.

You can have a play with the Quincunx to see how lots of independent effects can still have a
pattern.

Notation
We use "P" to mean "Probability Of",

So, for Independent Events:

P(A and B) = P(A) P(B)


Probability of A and B equals the probability of A times the probability of B

Example: your boss (to be fair) randomly assigns everyone an extra 2 hours
work on weekend evenings between 4 and midnight.

What are the chances you get Saturday between 6 and 8?

Day: there are two days on the weekend, so P(Saturday) = 0.5

Time: you want the 2 hours of 6-to-8, out of the 8 hours of 4-to-midnight):

P(Your Time) = 2/8 = 0.25

And:

P(Saturday and Your Time) = P(Saturday) P(Your Time)

= 0.5 0.25

= 0.125

Or a 12.5% Chance

(Note: we could ALSO have worked out that you wanted 2 hours out of a total possible 16 hours,
which is 2/16 = 0.125. Both methods work here.)

Another Example
Imagine there are two groups:

A member of each group gets randomly chosen for the winners circle,
then one of those gets randomly chosen to get the big money prize:

What is your chance of winnning the big prize?

there is a 1/5 chance of going to the winners circle


and a 1/2 chance of winning the big prize

So you have a 1/5 chance followed by a 1/2 chance ... which makes a 1/10 chance overall:

15 12 = 15 2 = 110

Or we can calculate using decimals (1/5 is 0.2, and 1/2 is 0.5):

0.2 x 0.5 = 0.1

So your chance of winning the big money is 0.1 (which is the same as 1/10).

Coincidence!
Many "Coincidences" are, in fact, likely.

Example: you are in a room with 30 people, and find that Zach and Anna
celebrate their birthday on the same day.

Do you say:

"Wow, how strange !", or


"That seems reasonable, with so many people here"

In fact there is a 70% chance that would happen ... so it is likely.

Why is the chance so high?

Because you are comparing everyone to everyone else (not just one to many).

And with 30 people that is 435 comparisons

(Read Shared Birthdays to find out more.)

Example: Snap!

Did you ever say something at exactly the same time as someone else?
Wow, how amazing!

But you were probably sharing an experience (movie, journey, whatever) and so your thoughts
were similar.

And there are only so many ways of saying something ...

... so it is like the card game "Snap!" ...

... if you speak enough words together, they will eventually match up.

So, maybe not so amazing, just simple chance at work.

Can you think of other cases where a "coincidence" was simply a likely thing?

Conclusion
Probability is: (Number of ways it can happen) / (Total number of outcomes)

Dependent Events (such as removing marbles from a bag) are affected by previous
events

Independent events (such as a coin toss) are not affected by previous events

We can calculate the probability of 2 or more Independent events bymultiplying

Not all coincidences are really unlikely (when you think about them).

Conditional Probability
How to handle Dependent Events

Life is full of random events! You need to get a "feel" for them to be a smart and successful
person.

Independent Events
Events can be " Independent ", meaning each event is not affected by any other events.
Example: Tossing a coin.

Each toss of a coin is a perfect isolated thing.

What it did in the past will not affect the current toss.

The chance is simply 1-in-2, or 50%, just like ANY toss of the coin.

So each toss is an Independent Event.

Dependent Events
But events can also be "dependent" ... which means they can be affected by previous
events ...

Example: Marbles in a Bag

2 blue and 3 red marbles are in a bag.

What are the chances of getting a blue marble?

The chance is 2 in 5

But after taking one out the chances change!

So the next time:


if we got a red marble before, then the chance of a blue marble next is 2 in 4
if we got a blue marble before, then the chance of a blue marble next is 1 in 4

See how the chances change each time? Each event depends on what happened in the previous
event, and is called dependent.

That is the kind of thing we look at here.

"Replacement"
Note: if we replace the marbles in the bag each time, then the chances do not change and the
events are independent :

With Replacement: the events are Independent (the chances don't change)
Without Replacement: the events are Dependent (the chances change)

Tree Diagram
A Tree Diagram : is a wonderful way to picture what is going on, so let's build one for our
marbles example.

There is a 2/5 chance of pulling out a Blue marble, and a 3/5 chance for Red:

We can go one step further and see what happens when we pick a second marble:
If a blue marble was selected first there is now a 1/4 chance of getting a blue marble and a 3/4
chance of getting a red marble.

If a red marble was selected first there is now a 2/4 chance of getting a blue marble and a 2/4
chance of getting a red marble.

Now we can answer questions like "What are the chances of drawing 2 blue marbles?"

Answer: it is a 2/5 chance followed by a 1/4 chance:

Did you see how we multiplied the chances? And got 1/10 as a result.

The chances of drawing 2 blue marbles is 1/10

Notation
We love notation in mathematics! It means we can then use the power of algebra to play
around with the ideas. So here is the notation for probability:

P(A) means "Probability Of Event A"

In our marbles example Event A is "get a Blue Marble first" with a probability of 2/5:

P(A) = 2/5
And Event B is "get a Blue Marble second" ... but for that we have 2 choices:

If we got a Blue Marble first the chance is now 1/4


If we got a Red Marble first the chance is now 2/4

So we have to say which one we want, and use the symbol "|" to mean "given":

P(B|A) means "Event B given Event A"

In other words, event A has already happened, now what is the chance of event B?

P(B|A) is also called the "Conditional Probability" of B given A.

And in our case:

P(B|A) = 1/4

So the probability of getting 2 blue marbles is:

And we write it as

"Probability of event A and event B equals


the probability of event A times the probability of event B given event A"

Let's do the next example using only notation:

Example: Drawing 2 Kings from a Deck

Event A is drawing a King first, and Event B is drawing a King second.

For the first card the chance of drawing a King is 4 out of 52 (there are 4 Kings in a deck of 52
cards):
P(A) = 4/52

But after removing a King from the deck the probability of the 2nd card drawn is less likely to be
a King (only 3 of the 51 cards left are Kings):

P(B|A) = 3/51

And so:

P(A and B) = P(A) x P(B|A) = (4/52) x (3/51) = 12/2652 = 1/221

So the chance of getting 2 Kings is 1 in 221, or about 0.5%

Finding Hidden Data


Using Algebra we can also "change the subject" of the formula, like this:

Start with: P(A and B) = P(A) x P(B|A)

Swap sides: P(A) x P(B|A) = P(A and B)

Divide by P(A): P(B|A) = P(A and B) / P(A)

And we have another useful formula:

"The probability of event B given event A equals


the probability of event A and event B divided by the probability of event A

Example: Ice Cream

70% of your friends like Chocolate, and 35% like Chocolate AND like Strawberry.

What percent of those who like Chocolate also like Strawberry?

P(Strawberry|Chocolate) = P(Chocolate and Strawberry) / P(Chocolate)

0.35 / 0.7 = 50%


50% of your friends who like Chocolate also like Strawberry

Big Example: Soccer Game


You are off to soccer, and want to be the Goalkeeper, but that depends who is the Coach today:

with Coach Sam the probability of being Goalkeeper is 0.5


with Coach Alex the probability of being Goalkeeper is 0.3

Sam is Coach more often ... about 6 out of every 10 games (a probability of 0.6).

So, what is the probability you will be a Goalkeeper today?

Let's build a tree diagram . First we show the two possible coaches: Sam or Alex:

The probability of getting Sam is 0.6, so the probability of Alex must be 0.4 (together the
probability is 1)

Now, if you get Sam, there is 0.5 probability of being Goalie (and 0.5 of not being Goalie):

If you get Alex, there is 0.3 probability of being Goalie (and 0.7 not):

The tree diagram is complete, now let's calculate the overall probabilities. Remember that:
P(A and B) = P(A) x P(B|A)

Here is how to do it for the "Sam, Yes" branch:

(When we take the 0.6 chance of Sam being coach times the 0.5 chance that Sam will let you be
Goalkeeper we end up with an 0.3 chance.)

But we are not done yet! We haven't included Alex as Coach:

An 0.4 chance of Alex as Coach, followed by an 0.3 chance gives 0.12

And the two "Yes" branches of the tree together make:

0.3 + 0.12 = 0.42 probability of being a Goalkeeper today

(That is a 42% chance)

Check

One final step: complete the calculations and make sure they add to 1:

0.3 + 0.3 + 0.12 + 0.28 = 1

Yes, they add to 1, so that looks right.

Friends and Random Numbers


Here is another quite different example of Conditional Probability.

4 friends (Alex, Blake, Chris and Dusty) each choose a random number between 1 and
5. What is the chance that any of them chose the same number?

Let's add our friends one at a time ...


First, what is the chance that Alex and Blake have the same number?

Blake compares his number to Alex's number. There is a 1 in 5 chance of a match.

As a tree diagram :

Note: "Yes" and "No" together makes 1


(1/5 + 4/5 = 5/5 = 1)

Now, let's include Chris ...

But there are now two cases to consider:

If Alex and Blake did match, then Chris has only one number to compare to.
But if Alex and Blake did not match then Chris has two numbers to compare to.

And we get this:

For the top line (Alex and Blake did match) we already have a match (a chance of 1/5).

But for the "Alex and Blake did not match" there is now a 2/5 chance of Chris matching
(because Chris gets to match his number against both Alex and Blake).

And we can work out the combined chance by multiplying the chances it took to get there:

Following the "No, Yes" path ... there is a 4/5 chance of No, followed by a 2/5 chance of
Yes:

(4/5) (2/5) = 8/25

Following the "No, No" path ... there is a 4/5 chance of No, followed by a 3/5 chance of
No:

(4/5) (3/5) = 12/25


Also notice that when we add all chances together we still get 1 (a good check that we haven't
made a mistake):

(5/25) + (8/25) + (12/25) = 25/25 = 1

Now what happens when we include Dusty?

It is the same idea, just more of it:

OK, that is all 4 friends, and the "Yes" chances together make 101/125:

Answer: 101/125

But here is something interesting ... if we follow the "No" path we can skip all the other
calculations and make our life easier:

The chances of not matching are:

(4/5) (3/5) (2/5) = 24/125

So the chances of matching are:

1 - (24/125) = 101/125

(And we didn't really need a tree diagram for that!)

And that is a popular trick in probability:

It is often easier to work out the "No" case


(and subtract from 1 for the "Yes" case)

(This idea is shown in more detail at Shared Birthdays .)


Probability Tree Diagrams
Calculating probabilities can be hard, sometimes we add them, sometimes we multiply them, and
often it is hard to figure out what to do ... tree diagrams to the rescue!

Here is a tree diagram for the toss of a coin:

There are two "branches" (Heads and Tails)

The probability of each branch is written on the


branch
The outcome is written at the end of the branch

We can extend the tree diagram to two tosses of a coin:

How do we calculate the overall probabilities?

We multiply probabilities along the branches


We add probabilities down columns

Now we can see such things as:

The probability of "Head, Head" is 0.50.5 = 0.25


All probabilities add to 1.0 (which is always a good check)
The probability of getting at least one Head from two tosses is 0.25+0.25+0.25 = 0.75
... and more

That was a simple example using independent events (each toss of a coin is independent of the
previous toss), but tree diagrams are really wonderful for figuring out dependent events (where
an event depends on what happens in the previous event) like this example:
Example: Soccer Game
You are off to soccer, and love being the Goalkeeper, but that depends who is the Coach today:

with Coach Sam the probability of being Goalkeeper is 0.5


with Coach Alex the probability of being Goalkeeper is 0.3

Sam is Coach more often ... about 6 out of every 10 games (a probability of 0.6).

So, what is the probability you will be a Goalkeeper today?

Let's build the tree diagram. First we show the two possible coaches: Sam or Alex:

The probability of getting Sam is 0.6, so the probability of Alex must be 0.4 (together the
probability is 1)

Now, if you get Sam, there is 0.5 probability of being Goalie (and 0.5 of not being Goalie):

If you get Alex, there is 0.3 probability of being Goalie (and 0.7 not):

The tree diagram is complete, now let's calculate the overall probabilities. This is done by
multiplying each probability along the "branches" of the tree.

Here is how to do it for the "Sam, Yes" branch:


(When we take the 0.6 chance of Sam being coach and include the 0.5 chance that Sam will let
you be Goalkeeper we end up with an 0.3 chance.)

But we are not done yet! We haven't included Alex as Coach:

An 0.4 chance of Alex as Coach, followed by an 0.3 chance gives 0.12.

Now we add the column:

0.3 + 0.12 = 0.42 probability of being a Goalkeeper today

(That is a 42% chance)

Check

One final step: complete the calculations and make sure they add to 1:

0.3 + 0.3 + 0.12 + 0.28 = 1

Yes, it all adds up.

Conclusion
So there you go, when in doubt draw a tree diagram, multiply along the branches and add the
columns. Make sure all probabilities add to 1 and you are good to go.

Mutually Exclusive Events


Mutually Exclusive: can't happen at the same time.

Examples:

Turning left and turning right are Mutually Exclusive (you can't do both at the same time)
Tossing a coin: Heads and Tails are Mutually Exclusive
Cards: Kings and Aces are Mutually Exclusive

What is not Mutually Exclusive:

Turning left and scratching your head can happen at the same time
Kings and Hearts, because we can have a King of Hearts!

Like here:

Aces and Kings are Hearts and Kings are


Mutually Exclusive not Mutually Exclusive
(can't be both) (can be both)

Probability
Let's look at the probabilities of Mutually Exclusive events. But first, a definition:

Probability of an event happening = Number of ways it can happenTotal number of


outcomes

Example: there are 4 Kings in a deck of 52 cards. What is the probability of


picking a King?
Number of ways it can happen: 4 (there are 4 Kings)

Total number of outcomes: 52 (there are 52 cards in total)

So the probability = 452 = 113

Mutually Exclusive
When two events (call them "A" and "B") are Mutually Exclusive it is impossible for them to
happen together:

P(A and B) = 0

"The probability of A and B together equals 0 (impossible)"

Example: King AND Queen

A card cannot be a King AND a Queen at the same time!

The probability of a King and a Queen is 0 (Impossible)

But the probability of A or B is the sum of the individual probabilities:

P(A or B) = P(A) + P(B)

"The probability of A or B equals the probability of A plus the probability of B"

Example: King OR Queen

In a Deck of 52 Cards:

the probability of a King is 1/13, so P(King)=1/13


the probability of a Queen is also 1/13, so P(Queen)=1/13

When we combine those two Events:

The probability of a King or a Queen is (1/13) + (1/13) = 2/13

Which is written like this:


P(King or Queen) = (1/13) + (1/13) = 2/13

So, we have:

P(King and Queen) = 0


P(King or Queen) = (1/13) + (1/13) = 2/13

Special Notation
Instead of "and" you will often see the symbol (which is the "Intersection" symbol used
in Venn Diagrams )

Instead of "or" you will often see the symbol (the "Union" symbol)

So we can also write:

P(King Queen) = 0
P(King Queen) = (1/13) + (1/13) = 2/13

Example: Scoring Goals

If the probability of:

scoring no goals (Event "A") is 20%


scoring exactly 1 goal (Event "B") is 15%

Then:

The probability of scoring no goals and 1 goal is 0 (Impossible)


The probability of scoring no goals or 1 goal is 20% + 15% = 35%

Which is written:

P(A B) = 0
P(A B) = 20% + 15% = 35%

Remembering
To help you remember, think:

"Or has more ... than And"

Also is like a cup which holds more than

Not Mutually Exclusive


Now let's see what happens when events are not Mutually Exclusive.

Example: Hearts and Kings

Hearts and Kings together is only the King of Hearts:

But Hearts or Kings is:

all the Hearts (13 of them)


all the Kings (4 of them)

But that counts the King of Hearts twice!

So we correct our answer, by subtracting the extra "and" part:

16 Cards = 13 Hearts + 4 Kings the 1 extra King of Hearts

Count them to make sure this works!


As a formula this is:

P(A or B) = P(A) + P(B) P(A and B)

"The probability of A or B equals the probability of A plus the probability of B


minus the probability of A and B"

Here is the same formula, but using and :

P(A B) = P(A) + P(B) P(A B)

A Final Example
16 people study French, 21 study Spanish and there are 30 altogether. Work out the
probabilities!

This is definitely a case of not Mutually Exclusive (you can study French AND Spanish).

Let's say b is how many study both languages:

people studying French Only must be 16-b


people studying Spanish Only must be 21-b

And we get:

And we know there are 30 people, so:

(16b) + b + (21b) = 30

37 b = 30

b=7

And we can put in the correct numbers:

So we know all this now:

P(French) = 16/30
P(Spanish) = 21/30
P(French Only) = 9/30
P(Spanish Only) = 14/30
P(French or Spanish) = 30/30 = 1
P(French and Spanish) = 7/30

Lastly, let's check with our formula:

P(A or B) = P(A) + P(B) P(A and B)

Put the values in:

30/30 = 16/30 + 21/30 7/30

Yes, it works!

Summary:

Mutually Exclusive

A and B together is impossible: P(A and B) = 0


A or B is the sum of A and B: P(A or B) = P(A) + P(B)

Not Mutually Exclusive

A or B is the sum of A and B minus A and B: P(A or B) = P(A) + P(B) P(A and B)

Symbols

And: (the "Intersection" symbol)


Or: (the "Union" symbol)

Combinations and Permutations

What's the Difference?


In English we use the word "combination" loosely, without thinking if the order of things is
important. In other words:

"My fruit salad is a combination of apples, grapes and bananas" We don't care
what order the fruits are in, they could also be "bananas, grapes and apples" or "grapes,
apples and bananas", its the same fruit salad.

"The combination to the safe is 472". Now we do care about the order. "724" won't
work, nor will "247". It has to be exactly 4-7-2.

So, in Mathematics we use more precise language:

When the order doesn't matter, it is a Combination.

When the order does matter it is a Permutation.

So, we should really call this a "Permutation Lock"!

In other words:

A Permutation is an ordered Combination.

To help you to remember, think "Permutation ... Position"

Permutations
There are basically two types of permutation:
Repetition is Allowed: such as the lock above. It could be "333".
No Repetition: for example the first three people in a running race. You can't be
first andsecond.

1. Permutations with Repetition

These are the easiest to calculate.

When a thing has n different types ... we have n choices each time!

For example: choosing 3 of those things, the permutations are:

nnn
(n multiplied 3 times)

More generally: choosing r of something that has n different types, the permutations are:

n n ... (r times)

(In other words, there are n possibilities for the first choice, THEN there are n possibilites for the
second choice, and so on, multplying each time.)

Which is easier to write down using an exponent of r:

n n ... (r times) = nr

Example: in the lock above, there are 10 numbers to choose from (0,1,2,3,4,5,6,7,8,9) and we
choose 3 of them:

10 10 ... (3 times) = 103 = 1,000 permutations

So, the formula is simply:

nr

where n is the number of things to choose from,


and we choose r of them
(Repetition allowed, order matters)
2. Permutations without Repetition

In this case, we have to reduce the number of available choices each time.

For example, what order could 16 pool balls be in?

After choosing, say, number "14" we can't choose it again.

So, our first choice has 16 possibilites, and our next choice has 15 possibilities, then 14, 13, etc.
And the total permutations are:

16 15 14 13 ... = 20,922,789,888,000

But maybe we don't want to choose them all, just 3 of them, so that is only:

16 15 14 = 3,360

In other words, there are 3,360 different ways that 3 pool balls could be arranged out of 16
balls.

Without repetition our choices get reduced each time.

But how do we write that mathematically? Answer: we use the " factorial function "

The factorial function (symbol: !) just means to multiply a series of descending


natural numbers. Examples:

4! = 4 3 2 1 = 24
7! = 7 6 5 4 3 2 1 = 5,040
1! = 1

Note: it is generally agreed that 0! = 1. It may seem funny that multiplying no numbers together
gets us 1, but it helps simplify a lot of equations.

So, when we want to select all of the billiard balls the permutations are:
16! = 20,922,789,888,000

But when we want to select just 3 we don't want to multiply after 14. How do we do that? There
is a neat trick: we divide by 13!

16 15 14 13 12 ...
= 16 15 14 = 3,360
13 12 ...

Do you see? 16! / 13! = 16 15 14

The formula is written:

n!(n r)!
where n is the number of things to choose from,
and we choose r of them
(No repetition, order matters)

Example Our "order of 3 out of 16 pool balls example" is:


16! 16! 20,922,789,888,000
= = = 3,360
(16-3)! 13! 6,227,020,800

(which is just the same as: 16 15 14 = 3,360)

Example: How many ways can first and second place be awarded to 10 people?
10! 10! 3,628,800
= = = 90
(10-2)! 8! 40,320

(which is just the same as: 10 9 = 90)

Notation

Instead of writing the whole formula, people use different notations such as these:

Example: P(10,2) = 90
Combinations
There are also two types of combinations (remember the order does not matter now):

Repetition is Allowed: such as coins in your pocket (5,5,5,10,10)


No Repetition: such as lottery numbers (2,14,15,27,30,33)

1. Combinations with Repetition

Actually, these are the hardest to explain, so we will come back to this later.

2. Combinations without Repetition

This is how lotteries work. The numbers are drawn one at a time, and if we have the lucky
numbers (no matter what order) we win!

The easiest way to explain it is to:

assume that the order does matter (ie permutations),


then alter it so the order does not matter.

Going back to our pool ball example, let's say we just want to know which 3 pool balls are
chosen, not the order.

We already know that 3 out of 16 gave us 3,360 permutations.

But many of those are the same to us now, because we don't care what order!

For example, let us say balls 1, 2 and 3 are chosen. These are the possibilites:

Order does matter Order doesn't matter

1 2 3
1 3 2
2 1 3
123
2 3 1
3 1 2
3 2 1

So, the permutations will have 6 times as many possibilites.


In fact there is an easy way to work out how many ways "1 2 3" could be placed in order, and
we have already talked about it. The answer is:

3! = 3 2 1 = 6

(Another example: 4 things can be placed in 4! = 4 3 2 1 = 24 different ways, try it for


yourself!)

So we adjust our permutations formula to reduce it by how many ways the objects could be in
order (because we aren't interested in their order any more):

That formula is so important it is often just written in big parentheses like this:

where n is the number of things to choose from, and we


choose r of them
(No repetition, order doesn't matter)

It is often called "n choose r" (such as "16 choose 3")

And is also known as the Binomial Coefficient .

Notation

As well as the "big parentheses", people also use these notations:

Just remember the formula:

n!r!(n r)!

Example
So, our pool ball example (now without order) is:

16! 16! 20,922,789,888,000


= = = 560
3!(16-3)! 3!13! 66,227,020,800

Or we could do it this way:

161514 3360
= = 560
321 6

It is interesting to also note how this formula is nice and symmetrical:

In other words choosing 3 balls out of 16, or choosing 13 balls out of 16 have the same number
of combinations.

16! 16! 16!


= = = 560
3!(16-3)! 13!(16-13)! 3!13!

Pascal's Triangle

We can also use Pascal's Triangle to find the values. Go down to row "n" (the top row is 0), and
then along "r" places and the value there is our answer. Here is an extract showing row 16:

1 14 91 364 ...

1 15 105 455 1365 ...

1 16 120 560 1820 4368 ...


1. Combinations with Repetition

OK, now we can tackle this one ...

Let us say there are five flavors of icecream: banana, chocolate, lemon, strawberry and
vanilla.

We can have three scoops. How many variations will there be?

Let's use letters for the flavors: {b, c, l, s, v}. Example selections include

{c, c, c} (3 scoops of chocolate)


{b, l, v} (one each of banana, lemon and vanilla)
{b, v, v} (one of banana, two of vanilla)

(And just to be clear: There are n=5 things to choose from, and we choose r=3 of them.
Order does not matter, and we can repeat!)

Now, I can't describe directly to you how to calculate this, but I can show you a special
techniquethat lets you work it out.

Think about the ice cream being in boxes, we could say "move past the first box, then
take 3 scoops, then move along 3 more boxes to the end" and we will have 3 scoops
of chocolate!

So it is like we are ordering a robot to get our ice cream, but it doesn't change anything, we still
get what we want.

We can write this down as (arrow means move, circle means scoop).

In fact the three examples above can be written like this:

{c, c, c} (3 scoops of chocolate):


{b, l, v} (one each of banana, lemon and vanilla):

{b, v, v} (one of banana, two of vanilla):

OK, so instead of worrying about different flavors, we have a simpler question: "how many
different ways can we arrange arrows and circles?"

Notice that there are always 3 circles (3 scoops of ice cream) and 4 arrows (we need to move 4
times to go from the 1st to 5th container).

So (being general here) there are r + (n1) positions, and we want to choose r of them to have
circles.

This is like saying "we have r + (n1) pool balls and want to choose r of them". In other words
it is now like the pool balls question, but with slightly changed numbers. And we can write it like
this:

where n is the number of things to choose from, and we


choose r of them
(Repetition allowed, order doesn't matter)

Interestingly, we can look at the arrows instead of the circles, and say "we have r +
(n1)positions and want to choose (n1) of them to have arrows", and the answer is the
same:

So, what about our example, what is the answer?

(3+51)! 7! 5040
= = = 35
3!(51)! 3!4! 624

There are 35 ways of having 3 scoops from five flavors of icecream.


In Conclusion
Phew, that was a lot to absorb, so maybe you could read it again to be sure!

But knowing how these formulas work is only half the battle. Figuring out how to interpret a real
world situation can be quite hard.

But at least now you know how to calculate all 4 variations of "Order does/does not matter" and
"Repeats are/are not allowed".

False Positives and False Negatives

Test Says "Yes" ... or does it?


When you have a test that can say "Yes" or "No" (such as a medical test), you have to think:

It could be wrong when it says "Yes".


It could be wrong when it says "No".

Wrong?

It is like being told you did something when you didn't!

Or you didn't do it when you really did.

They each have a special name: "False Positive" and "False Negative":

They say you did They say you didn't

You really did They are right! "False Negative"

You really didn't "False Positive" They are right!

Here are some examples of "false positives" and "false negatives":


Airport Security: a "false positive" is when ordinary items such as keys or coins
get mistaken for weapons (machine goes "beep")

Quality Control: a "false positive" is when a good quality item gets rejected, and a
"false negative" is when a poor quality item gets accepted. (A "positive" result
means there IS a defect.)

Antivirus software: a "false positive" is when a normal file is thought to be a virus

Medical screening: low-cost tests given to a large group can give many false
positives (saying you have a disease when you don't), and then ask you to get more
accurate tests.

But many people don't understand the true numbers behind "Yes" or "No", like in this example:

Example: Allergy or Not?


Hunter says she is itchy. There is a test for Allergy to Cats, but this test is not
always right:

For people that really do have the allergy, the test says "Yes" 80% of the
time
For people that do not have the allergy, the test says "Yes" 10% of the time
("false positive")

Here it is in a table:

Test says "Yes" Test says "No"

Have allergy 80% 20% "False Negative"

Don't have it 10% "False Positive" 90%

Question: If 1% of the population have the allergy, and Hunter's test says "Yes",
what are the chances that Hunter really has the allergy?

Do you think 75%? Or maybe 50%?


A similar test was given to Doctors and most guessed around 75% ...
... but they were very wrong!

(Source: "Probabilistic reasoning in clinical medicine: Problems and opportunities" by David M.


Eddy 1982, which this example is based on)

There are three good ways to solve this: "Imagine a 1000", "Tree Diagrams" or "Bayes'
Theorem", use which you prefer::

Try Imagining A Thousand People

When trying to understand questions like this, just imagine a large group (say 1000) and play
with the numbers:

Of 1000 people, only 10 really have the allergy (1% of 1000 is 10)

The test is 80% right for people who have the allergy, so it will get 8 of those 10
right.

But 990 do not have the allergy, and the test will say "Yes" to 10% of them,
which is 99 people it says "Yes" to wrongly (false positive)

So out of 1000 people the test says "Yes" to (8+99) = 107 people

As a table:

1% have it Test says "Yes" Test says "No"

Have allergy 10 8 2

Don't have it 990 99 891

1000 107 893

So 107 people get a "Yes" but only 8 of those really have the allergy:

8 / 107 = about 7%

So, even though Hunter's test said "Yes", it is still only 7% likely that Hunter has a Cat Allergy.

Why so small? Well, the allergy is so rare that those who actually have it are
greatly outnumberedby those with a false positive.
As A Tree

Drawing a tree diagram can really help:

First of all, let's check that all the percentages add up:

0.8% + 0.2% + 9.9% + 89.1% = 100% (good!)

And the two "Yes" answers add up to 0.8% + 9.9% = 10.7%, but only 0.8% are correct.

0.8/10.7 = 7% (same answer as above)

Bayes' Theorem

And there is a special formula, too!

P(A)P(B|A)
P(A|B) =
P(A)P(B|A) + P(not A)P(B|not A)

It needs some explaining, so learn more about it at Bayes Theorem .

Lastly, let's look at one more example:

Extreme Example: Computer Virus

A computer virus spreads around the world, all reporting to a master computer.
The good guys capture the master computer and find that a million computers have been
infected (but don't know which ones).

Governments decide to take action!

No one can use the internet until their computer passes the "virus-free" test. The test is 99%
accurate (pretty good, right?) But 1% of the time it says you have the virus when you don't (a
"false positive").

Now let's say there are 1000 million internet users.

Of 1 million with the virus 99% of them get correctly banned = about 1 million
But false positives are 999 million x 1% = about 10 million

So a total of 11 million get banned, but only 1 out of those 11 actually have the virus.

So if you get banned there is only a 9% chance you actually have the virus!

Conclusion
When dealing with false positives and false negatives (or other tricky probability questions) we
can use these methods:

Imagine you have 1000 (of whatever),


Make a tree diagram, or
Use Bayes' Theorem

Bayes' Theorem
Bayes can do magic!

Ever wondered how computers learn about people?


Example:

An internet search for "movie automatic shoe laces" brings up "Back to the future"

Has the search engine watched the movie? No, but it knows from lots of other searches what
people areprobably looking for.

And it calculates that probability using Bayes' Theorem.

Bayes Theorem is a way of finding a probability when we know certain other probabilities.

The formula is:

P(A) P(B|A)
P(A|B) =
P(B)

It tells us how often A happens given that B happens, written P(A|B), when we know how often
B happens given that A happens, written P(B|A) , and how likely A and B are on their own.

P(A|B) is "Probability of A given B", the probability of A given that B happens


P(A) is Probability of A
P(B|A) is "Probability of B given A", the probability of B given that A happens
P(B) is Probability of B

When P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke,
then:

P(Fire|Smoke) means how often there is fire when we see smoke.


P(Smoke|Fire) means how often we see smoke when there is fire.

So the formula kind of tells us "forwards" when we know "backwards" (or vice versa)
Example: If dangerous fires are rare (1%) but smoke is fairly common (10%)
due to factories, and 90% of dangerous fires make smoke then:
P(Fire) P(Smoke|Fire) 1% x 90%
P(Fire|Smoke) = = = 9%
P(Smoke) 10%

In this case 9% of the time expect smoke to mean a dangerous fire.

Example: Picnic Day

You are planning a picnic today, but the morning is cloudy

Oh no! 50% of all rainy days start off cloudy!


But cloudy mornings are common (about 40% of days start cloudy)
And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)

What is the chance of rain during the day?

We will use Rain to mean rain during the day, and Cloud to mean cloudy morning.

The chance of Rain given Cloud is written P(Rain|Cloud)

So let's put that in the formula:

P(Rain) P(Cloud|Rain)
P(Rain|Cloud) =
P(Cloud)

P(Rain) is Probability of Rain = 10%


P(Cloud|Rain) is Probability of Cloud, given that Rain happens = 50%
P(Cloud) is Probability of Cloud = 40%

P(Rain|Cloud) = 0.1 x 0.5 = .125


0.4

Or a 12.5% chance of rain. Not too bad, let's have a picnic!

Remembering

First think "AB AB AB" then remember to group it like: "AB = A BA / B"

P(A) P(B|A)
P(A|B) =
P(B)

"A" With Two Cases


One of the famous uses for Bayes Theorem is False Positives and False Negatives .

For those we have two possible cases for "A", such as Pass/Fail (or Yes/No etc)

Example: Allergy or Not?

Hunter says she is itchy. There is a test for Allergy to Cats, but this test is not always right:

For people that really do have the allergy, the test says "Yes"80% of the time
For people that do not have the allergy, the test says "Yes" 10%of the time ("false positive")

If 1% of the population have the allergy, and Hunter's test says "Yes", what are the
chances that Hunter really has the allergy?

We want to know the chance of having the allergy when test says "Yes", written P(Allergy|Yes)

Let's get our formula:


P(Allergy) P(Yes|Allergy)
P(Allergy|Yes) =
P(Yes)

P(Allergy) is Probability of Allergy = 1%


P(Yes|Allergy) is Probability of test saying "Yes" for people with allergy = 80%
P(Yes) is Probability of test saying "Yes" (to anyone) = ??%

Oh no! We don't know what the general chance of the test saying "Yes" is ...

... but we can calculate it by adding up those with, and those without the allergy:

1% have the allergy, and the test says "Yes" to 80% of them
99% do not have the allergy and the test says "Yes" to 10% of them

Let's add that up:

P(Yes) = 1% 80% + 99% 10% = 10.7%

Which means that about 10.7% of the population will get a "Yes" result.

So now we can complete our formula:

1% 80%
P(Allergy|Yes) = = 7.48%
10.7%

P(Allergy|Yes) = about 7%

This is the same result we got on False Positives and False Negatives .

In fact we can write a special version of the Bayes' formula just for things like this:

P(A)P(B|A)
P(A|B) =
P(A)P(B|A) + P(not A)P(B|not A)

"A" With Three (or more) Cases


We just saw "A" with two case (A and not A), which we took care of in the bottom line.

When "A" has 3 or more cases we include them all in the bottom line:

P(A1)P(B|A1)
P(A1|B) =
P(A1)P(B|A1) + P(A2)P(B|A2) + P(A3)P(B|A3) + ...etc

Example: The Art Competition has entries from three painters: Pam, Pia and
Pablo

Pam put in 15 paintings, 4% of her works have won First Prize.


Pia put in 5 paintings, 6% of her works have won First Prize.
Pablo put in 10 paintings, 3% of his works have won First Prize.

What is the chance that Pam will win First Prize?

P(Pam)P(First|Pam)
P(Pam|First) =
P(Pam)P(First|Pam) + P(Pia)P(First|Pia) + P(Pablo)P(First|Pablo)

Put in the values:

(15/30) 4%
P(Pam|First) =
(15/30) 4% + (5/30) 6% + (10/30) 3%

Multiply all by 30 (makes calculation easier):

15 4% 0.6
P(Pam|First) = = = 50%
15 4% + 5 6% + 10 3% 0.6 + 0.3 + 0.3

A good chance!
Pam isn't the most successful artist, but she did put in lots of entries.

So now you know how search engines can guess what you want: they simply keep track of what
lots of people type in and what websites they eventually click on.

Then using Bayes they figure which ones are probably the best to show first.

It makes them look like they can read your mind!

Shared Birthdays
This is a great puzzle, and you get to learn a lot about probability along the way ...

There are 30 people in a room ... what is the chance that any two of them celebrate
their birthday on the same day? Assume 365 days in a year.

Some people think "there are 30 people, and 365 days, so 30/365 sounds about right, and
30/365 = 0.08..."

But no!

The probability is much higher. It is actually likely there are people who share a birthday in that
room.

Because you should compare everyone to everyone else.

And with 30 people that is 435 comparisons.

But you also have to be careful not to over-count the chances.

I will show you how to do it ... starting with a smaller example:

Friends and Random Numbers


4 friends (Alex, Billy, Chris and Dusty) each choose a random number between 1 and
5. What is the chance that any of them chose the same number?

We will add our friends one at a time ...


First, what is the chance that Alex and Billy have the same number?

Billy compares his number to Alex's number. There is a 1 in 5 chance of a match.

As a tree diagram :

Note: "Yes" and "No" together make 1


(1/5 + 4/5 = 5/5 = 1)

Now, let's include Chris ...

But there are now two cases to consider (called " Conditional Probability "):

If Alex and Billy did match, then Chris has only one number to compare to.
But if Alex and Billy did not match then Chris has two numbers to compare to.

And we get this:

For the top line (Alex and Billy did match) we already have a match (a chance of 1/5).

But for the "Alex and Billy did not match" there is a 2/5 chance of Chris matching (against both
Alex and Billy).

And we can work out the combined chance by multiplying the chances it took to get there:

Following the "No, Yes" path ... there is a 4/5 chance of No, followed by a 2/5 chance of
Yes:

(4/5) (2/5) = 8/25

Following the "No, No" path ... there is a 4/5 chance of No, followed by a 3/5 chance of
No:

(4/5) (3/5) = 12/25


Also notice that adding all chances together is 1 (a good check that we haven't made a mistake):

(5/25) + (8/25) + (12/25) = 25/25 = 1

Now what happens when we include Dusty?

It is the same idea, just more of it:

OK, that is all 4 friends, and the "Yes" chances together make 101/125:

Answer: 101/125

But here is something interesting ... if we follow the "No" path we can skip all the other
calculations and make our life easier:

The chances of not matching are:

(4/5) (3/5) (2/5) = 24/125

So the chances of matching are:

1 - (24/125) = 101/125

(And we didn't really need a tree diagram for that!)

And that is a popular trick in probability:

It is often easier to work out the "No" case


(and subtract from 1 for the "Yes" case)

Example: what are the chances that with 6 people any of them celebrate their
Birthday in the same month? (Assume equal months)
The "no match" case for:

2 people is 11/12
3 people is (11/12) (10/12)
4 people is (11/12) (10/12) (9/12)
5 people is (11/12) (10/12) (9/12) (8/12)
6 people is (11/12) (10/12) (9/12) (8/12) (7/12)

So the chance of not matching is:

(11/12) (10/12) (9/12) (8/12) (7/12) = 0.22...

Flip that around and we get the chance of matching:

1 - 0.22... = 0.78...

So, there is a 78% chance of any of them celebrating their Birthday in the same month

And now we can try calculating the "Shared Birthday" question we started with:

There are 30 people in a room ... what is the chance that any two of them celebrate
their birthday on the same day? Assume 365 days in a year.

It is just like the previous example! But bigger and more numbers:

The chance of not matching:

364/365 363/365 362/365 ... 336/365 = 0.294...

(I did that calculation in a spreadsheet, but there are also mathematical shortcuts)

And the probability of matching is 1- 0.294... :

The probability of sharing a birthday = 1 - 0.294... = 0.706...

Or a 70.6% chance, which is likely!

In fact the probability for 23 people is about 50%.

And for 57 people it is 99% (almost certain!)


So, next time you are in a room with a group of people why not find out if there are any shared
birthdays?

Footnote: In real life birthdays are not evenly spread out ... more babies are born in Spring. Also
Hospitals prefer to work on weekdays, not weekends, so there are more births early in the week.
And then there are leap years. But you get the idea.

Confidence Intervals

An interval of 4 plus or minus 2

A Confidence Interval is a range of values we are fairly sure our true value lies in.

Example: Average Height

We measure the heights of 40 randomly chosen men, and get a:

mean height of 175cm,


with a standard deviation of 20cm.

The 95% Confidence Interval (we show how to calculate it later) is:

175cm 6.2cm
This says the true mean of ALL men (if we could measure their heights) is likely to be between
168.8cm and 181.2cm.

But it might not be!

The "95%" says that 95% of experiments like we just did will include the true mean, but5%
won't.

So there is a 1-in-20 chance (5%) that our Confidence Interval does NOT include the true mean.

Calculating the Confidence Interval


Step 1: note down the number of samples n, and calculate the mean X and standard
deviation s of those samples:

Number of samples: n = 40
Mean: X = 175
Standard Deviation: s = 20

Step 2: decide what Confidence Interval we want. 90%, 95% and 99% are common choices.
Then find the "Z" value for that Confidence Interval here:

80% 1.282

85% 1.440

90% 1.645

95% 1.960
99% 2.576

99.5% 2.807

99.9% 3.291

For 95% the Z value is 1.960

Step 3: use that Z in this formula for the Confidence Interval

s
X Z
(n)

Where:

X is the mean
Z is the chosen Z-value from the table above
s is the standard deviation
n is the number of samples

And we have:

20
175 1.960
40

Which is:

175cm 6.20cm

In other words: from 168.8cm to 181.2cm

The value after the is called the margin of error

The margin of error in the previous example is 6.20cm


Calculator
We have a Confidence Interval Calculator to make life easier for you.

Another Example

Example: Apple Orchard

Are the apples big enough?

There are hundreds of apples on the trees, so you randomly choose just 30 and get these
results:

Mean: 86
Standard Deviation: 5

Let's calculate:

s
X Z
(n)

We know:
X is the mean = 86
Z is the Z-value = 1.960 (from the table above for 95%)
s is the standard deviation = 5
n is the number of samples = 30

5
86 1.960 = 86 1.79
30

So the true mean (of all the hundreds of apples) is likely to be between 84.21 and 87.79

True Mean

Now imagine we get to pick ALL the apples straight away, and get them ALL measured by the
packing machine (this is a luxury not normally found in statistics!)

And the true mean turns out to be 84.9

Let's lay all the apples on the ground from smallest to largest:

Each apple is a green dot,


except our samples which are blue

Our result was not exact ... it is random after all ... but the true mean is inside our confidence
interval of 86 1.79 (in other words 84.21 to 87.79)

But the true mean might not be inside the confidence interval but 95% of the time it will!

95% of all "95% Confidence Intervals" will include the true mean.

Maybe we had this sample, with a mean of 83.5 and a Standard Deviation of 3.5:
Each apple is a green dot,
our samples are marked purple

That does not include the true mean. Expect that to happen 5% of the time for a 95%
confidence interval.

So how do we know if the sample we took is one of the "lucky" 95% or the unlucky 5%? Unless
we get to measure the whole population like above we simply don't know.

This is the risk in sampling, we might have a bad sample.

Example in Research
Here is Confidence Interval used in research on extra exercise for older people:

Example: the "Male" line says there were:

1,226 Men (47.6% of all people)


had a "HR" (which means Hazard Reduction*) with a mean of 0.92,
and a 95% Confidence Interval (95% CI) of 0.88 to 0.97 (which is also 0.920.05)

In other words the true benefit (for the wider population of men) has a 95% chance of being
between 0.88 and 0.97

* Note for the curious: "HR" is used in research and means "Hazard Ratio" where lower is better,
so an HR of 0.92 means the subjects were better off, and 1.03 means slightly worse off.
Standard Normal Distribution
It is all based on the idea of the Standard Normal Distribution , where the Z value is the "Z-
score"

For example the Z for 95% is 1.960, and here we see the range from -1.96 to +1.96 includes
95% of all values:

From -1.96 to +1.96 standard deviations is 95%

Applying that to our sample looks like this:

Also from -1.96 to +1.96 standard deviations, so includes 95%

Conclusion
The Confidence Interval formula is

s
X Z
(n)

Where:
X is the mean
Z is the Z-value from the table below
s is the standard deviation
n is the number of samples

80% 1.282

85% 1.440

90% 1.645

95% 1.960

99% 2.576

99.5% 2.807

99.9% 3.291

Chi-Square Test
Groups and Numbers

You research two groups and put them in categories single, married or divorced:

The numbers are definitely different, but ...

Is that just random chance?


Or have you found something interesting?

The Chi-Square Test gives a "p" value to help you decide!

Example: "Which holiday do you prefer?"


Beach Cruise

Men 209 280

Women 225 248

Does Gender affect Preferred Holiday?


If Gender (Man or Woman) does affect Preferred Holiday we say they are dependent.

By doing some special calculations (explained later), we come up with a "p" value:

p value is 0.132

Now, p < 0.05 is the usual test for dependence. In this case p is greater than 0.05, so we
believe the variables are independent (ie not linked together).

In other words Men and Women probably do not have a different preference for Beach Holidays
or Cruises.

Understanding "p" Value


"p" is the probability the variables are independent.

Imagine that the previous example was in fact two random samples of Men each time:

Men: Men:
Beach 209, Cruise 280 Beach 225, Cruise 248

Is it likely you would get such different results surveying Men each time?

Well the "p" value of 0.132 says that it really could happen every so often.

Surveys are random after all. We expect slightly different results each time, right?

So most people want to see a p value less than 0.05 before they are happy to say the results
show the groups have a different response.

Let's see another example:

Example: "Which pet do you prefer?"


Cat Dog

Men 207 282

Women 231 242

By doing the calculations (shown later), we come up with:

P value is 0.043

In this case p < 0.05, so this result is thought of as being "significant" meaning we think the
variables are not independent.

In other words, because 0.043 < 0.05 we think that Gender is linked to Pet Preference (Men
and Women have different preferences for Cats and Dogs).

Just out of interest, notice that the numbers in our two examples are similar, but the resulting p-
values are very different: 0.132 and 0.043. This shows how sensitive the test is!

Why p<0.05 ?
It is just a choice! Using p<0.05 is common, but we could have chosen p<0.01 to be even
more sure that the groups behave differently, or any value really.

Calculating P-Value
So how do we calculate this p-value? We use the Chi-Square Test!

Chi-Square Test
Note: Chi Sounds like "Hi" but with a K, so say Chi-Square like "Ki square"
And Chi is the greek letter , so we can also write it 2

Important points before we get started:

This test only works for categorical data (data in categories), such as Gender {Men, Women} or
color {Red, Yellow, Green, Blue} etc, but not numerical data such as height or weight.
The numbers must be large enough. Each entry must be 5 or more. In our example we have
values such as 209, 282, etc, so we are good to go.

Our first step is to state our hypotheses:


Hypothesis: A statement that might be true, which can then be tested.

The two hypotheses are.

Gender and preference for cats or dogs are independent.


Gender and preference for cats or dogs are not independent.

Lay the data out in a table:

Cat Dog

Men 207 282

Women 231 242

Add up rows and columns:

Cat Dog

Men 207 282 489

Women 231 242 473

438 524 962

Calculate "Expected Value" for each entry:

Multiply each row total by each column total and divide by the overall total:

Cat Dog

Men 489438/962 489524/962 489

Women 473438/962 473524/962 473

438 524 962


Which gives us:

Cat Dog

Men 222.64 266.36 489

Women 215.36 257.64 473

438 524 962

Subtract expected from actual, square it, then divide by expected:


Cat Dog

Men (207-222.64)2222.64 (282-266.36)2266.36 489

Women (231-215.36)2215.36 (242-257.64)2257.64 473

438 524 962

Which is:

Cat Dog

Men 1.099 0.918 489

Women 1.136 0.949 473

438 524 962

Now add up those values:


1.099 + 0.918 + 1.136 + 0.949 = 4.102

Chi-Square is 4.102

From Chi-Square to p
To get from Chi-Square to p-value is a difficult calculation, so either look it up in a table, or use
the Chi-Square Calculator .
But first you will need a "Degree of Freedom" (DF)

Calculate Degrees of Freedom

Multiply (rows 1) by (columns 1)

Example: DF = (2 1)(2 1) = 11 = 1

Result

The result is:

p = 0.04283

Done!

Chi-Square Formula
This is the formula for Chi-Square:

O = the Observed (actual) value


E = the Expected value

Least Squares Regression

Line of Best Fit


Imagine you have some points, and want to have a line that best fits them like this:
The black line is the "Line of Best Fit" for the points

You can try to place the line "by eye": aim to have a similar number of points above and below
the line and try to get the distance from each point to the line as small as possible.

But for better accuracy let's see how to calculate the line using Least Squares Regression.

The Line
Our aim is to calculate the values m (slope) and b (y-intercept) in the equation of a line :

y = mx + b
Where:

y = how far up
x = how far along
m = Slope or Gradient (how steep the line is)
b = the Y Intercept (where the line crosses the Y axis)

Steps
To find the line of best fit for a group of (x,y) points:

Step 1: For each (x,y) calculate x2 and xy

Step 2: Sum all x, y, x2 and xy (gives us x, y, x2 and xy)


Step 3: Calculate Slope m:

m = (Nxy x y)N(x2) (x)2

(N is the number of points.)

Step 4: Calculate Intercept b:

b = y m(x)N

Step 5: Assemble the equation of a line

y = mx + b

Example
Let's have an example to see how to do it!

Example: Sam found how many hours of sunshine vs how many ice
creams were sold at the shop from Monday to Friday:

"x" "y"
Hours of Ice Creams
Sunshine Sold

2 4

3 5

5 7

7 10
9 15

Let us find the best m (slope) and b (y-intercept) that suits that data

y = mx + b

Step 1: For each (x,y) calculate x2 and xy:

x y x2 xy

2 4 4 8

3 5 9 15

5 7 25 35

7 10 49 70

9 15 81 135

Step 2: Sum x, y, x2 and xy (gives us x, y, x2 and xy):

x y x2 xy

2 4 4 8

3 5 9 15

5 7 25 35

7 10 49 70

9 15 81 135

x: 26 y: 41 x2: 168 xy: 263


Also N (number of data values) = 5

Step 3: Calculate Slope m:

m = (Nxy x y)N(x2) (x)2


= (5 x 263 26 x 41)5 x 168 262
= (1315 1066)840 676
= 249164 = 1.5183...

Step 4: Calculate Intercept b:

b = y m(x)N
= 41 1.5183 x 265
= 0.3049...

Step 5: Assemble the equation of a line:

y = mx + b

y = 1.518x + 0.305

Let's see how it works out:

x y y = 1.518x + 0.305 error

2 4 3.34 0.66

3 5 4.86 0.14

5 7 7.89 0.89

7 10 10.93 0.93

9 15 13.97 1.03

Here are the (x,y) points and the line y = 1.518x + 0.305 on a graph:
Nice fit!

Sam hears the weather forecast which says "we expect 8 hours of sun tomorrow", so he uses the
above equation to estimate that he will sell

y = 1.518 x 8 + 0.305 = 12.45 Ice Creams

Sam makes fresh waffle cone mixture for 14 ice creams just in case. Yum.

How does it work?


It works by making the total of the square of the errors as small as possible (that is why it is
called "least squares"):

The straight line minimizes the sum of squared errors

So, when we square each of those errors and add them all up, the total is as small as possible.

You can imagine each data point connected to a straight bar by springs:

Boing!
Outliers
Be careful! Least squares is sensitive to outliers . A strange value will pull the line towards it.

Use the App


Have a play with the Least Squares Calculator

Not Just For Lines


This idea can be used in many other areas, not just lines.

A "circle of best fit"

But the formulas (and the steps taken) will be very different!

Random Variables
A Random Variable is a set of possible values from a random experiment.

Example: Tossing a coin: we could get Heads or Tails.

Let's give them the values Heads=0 and Tails=1 and we have a Random Variable "X":

In short:

X = {0, 1}
Note: We could choose Heads=100 and Tails=150 or other values if we want! It is our choice.

So:

We have an experiment (such as tossing a coin)


We give values to each event
The set of values is a Random Variable

Not Like an Algebra Variable


In Algebra a variable, like x, is an unknown value:

Example: x + 2 = 6

In this case we can find that x=4

But a Random Variable is different ...

A Random Variable has a whole set of values ...

... and it could take on any of those values, randomly.

Example: X = {0, 1, 2, 3}

X could be 0, 1, 2, or 3 randomly.

And they might each have a different probability.

Capital Letters
We use a capital letter, like X or Y, to avoid confusion with the Algebra type of variable.

Sample Space
A Random Variable's set of values is the Sample Space.
Example: Throw a die once

Random Variable X = "The score shown on the top face".

X could be 1, 2, 3, 4, 5 or 6

So the Sample Space is {1, 2, 3, 4, 5, 6}

Probability
We can show the probability of any one value using this style:

P(X = value) = probability of that value

Example (continued): Throw a die once

X = {1, 2, 3, 4, 5, 6}

In this case they are all equally likely, so the probability of any one is 1/6

P(X = 1) = 1/6
P(X = 2) = 1/6
P(X = 3) = 1/6
P(X = 4) = 1/6
P(X = 5) = 1/6
P(X = 6) = 1/6

Note that the sum of the probabilities = 1, as it should be.

Example: How many heads when we toss 3 coins?

X = "The number of Heads" is the Random Variable.


In this case, there could be 0 Heads (if all the coins land Tails up), 1 Head, 2 Heads or 3 Heads.

So the Sample Space = {0, 1, 2, 3}

But this time the outcomes are NOT all equally likely.

The three coins can land in eight possible ways:

X = "number
of Heads"

HHH 3

HHT 2

HTH 2

HTT 1

THH 2

THT 1

TTH 1
TTT 0

Looking at the table we see just 1 case of Three Heads, but 3 cases of Two Heads, 3 cases of
One Head, and 1 case of Zero Heads. So:

P(X = 3) = 1/8
P(X = 2) = 3/8
P(X = 1) = 3/8
P(X = 0) = 1/8

Example: Two dice are tossed.

The Random Variable is X = "The sum of the scores on the two dice".

Let's make a table of all possible values:

1st Die

1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8
2nd
Die 3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

There are 6 6 = 36 of them, and the Sample Space = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Let's count how often each value occurs, and work out the probabilities:

2 occurs just once, so P(X = 2) = 1/36


3 occurs twice, so P(X = 3) = 2/36 = 1/18
4 occurs three times, so P(X = 4) = 3/36 = 1/12
5 occurs four times, so P(X = 5) = 4/36 = 1/9
6 occurs five times, so P(X = 6) = 5/36
7 occurs six times, so P(X = 7) = 6/36 = 1/6
8 occurs five times, so P(X = 8) = 5/36
9 occurs four times, so P(X = 9) = 4/36 = 1/9
10 occurs three times, so P(X = 10) = 3/36 = 1/12
11 occurs twice, so P(X = 11) = 2/36 = 1/18
12 occurs just once, so P(X = 12) = 1/36

A Range of Values
We could also calculate the probability that a Random Variable takes on a range of values.

Example (continued) What is the probability that the sum of the scores is 5, 6, 7
or 8?

In other words: What is P(5 X 8)?

P(5 X 8) = P(X = 5) + P(X = 6) + P(X = 7) + P(X = 8) = (4+5+6+5)/36 = 20/36 = 5/9

Solving
We can also solve a Random Variable equation.

Example (continued) If P(X = x) = 1/12, what is the value of x?

P(X = 4) = 1/12, and P(X = 10) = 1/12

So there are two solutions: x = 4 or x = 10

Notice the different uses of X and x:

X is the Random Variable "The sum of the scores on the two dice".
x is a value that X can take.

Continuous
Random Variables can be either Discrete or Continuous :

Discrete Data can only take certain values (such as 1,2,3,4,5)


Continuous Data can take any value within a range (such as a person's height)

All our examples have been Discrete.

Learn more at Continuous Random Variables .

Mean, Variance, Standard Deviation


You can also learn how to find the Mean, Variance and Standard Deviation of Random
Variables .

Summary
A Random Variable is a set of possible values from a random experiment.

The set of possible values is called the Sample Space.

A Random Variable is given a capital letter, such as X or Z.

Random Variables can be discrete or continuous.

Random Variables - Continuous


A Random Variable is a set of possible values from a random experiment.

Example: Tossing a coin: we could get Heads or Tails.


Let's give them the values Heads=0 and Tails=1 and we have a Random Variable "X":

In short:

X = {0, 1}

Note: We could choose Heads=100 and Tails=150 or other values if we want! It is our choice.

Continuous
Random Variables can be either Discrete or Continuous :

Discrete Data can only take certain values (such as 1,2,3,4,5)


Continuous Data can take any value within a range (such as a person's height)

In our Introduction to Random Variables (please read that first!) we look at many examples of
Discrete Random Variables.

But here we look at the more advanced topic of Continuous Random Variables.

The Uniform Distribution


(Also called the Rectangular Distribution).

The Uniform Distribution has equal probability for all values of the Random variable
between a and b:
The probability of any value between a and b is p

We also know that p = 1/(b-a), because the total of all probabilities must be 1, so

the area of the rectangle = 1

p (ba) = 1

p = 1/(ba)

We can write:

P(X = x) = 1/(ba) for a x b


P(X = x) = 0 otherwise

Example: Old Faithful erupts every 91 minutes. You arrive there at random and
wait for 20 minutes ... what is the probability you will see it erupt?

This is actually easy to calculate, 20 minutes out of 91 minutes is:

p = 20/91 = 0.22 (to 2 decimals)

But let's use the Uniform Distribution for practice.


To find the probability between a and a+20, find the blue area:

Area = (1/91) x (a+20 - a)


= (1/91) x 20
= 20/91
= 0.22 (to 2 decimals)

So there is a 0.22 probability you will see Old Faithful erupt.

If you waited the full 91 minutes you would be sure (p=1) to have seen it erupt.

But remember this is a random thing! It might erupt the moment you arrive, or any time in the
91 minutes.

Cumulative Uniform Distribution


We can have the Uniform Distribution as a cumulative (adding up as it goes along) distribution:

The probability starts at 0 and builds up to 1


This type of thing is called a "Cumulative distribution function", often shortened to "CDF"

Example (continued):

Let's use the "CDF" of the previous Uniform Distribution to work out the probability:

At a+20 the probability has accumulated to about 0.22

Other Distributions

Knowing how to use the Uniform Distribution


helps when dealing with more complicated
distributions like this one:

The general name for any of these is probability density function or "pdf"

The Normal Distribution


The most important continuous distribution is the Standard Normal Distribution

It is so important the Random Variable has its own special letter Z.

The graph for Z is a symmetrical bell-shaped curve:


Usually we want to find the probability of Z being between certain values.

Example: P(0 < Z < 0.45)

(What is the probability that Z is between 0 and 0.45)

This is found by using the Standard Normal Distribution Table

Start at the row for 0.4, and read along until 0.45: there is the value
0.1736

P(0 < Z < 0.45) = 0.1736

Summary
A Random Variable is a variable whose possible values are numerical outcomes of a random
experiment.

Random Variables can be discrete or continuous.

An important example of a continuous Random variable is the Standard Normalvariable, Z.


Random Variables:
Mean, Variance and
Standard Deviation
A Random Variable is a set of possible values from a random experiment.

Example: Tossing a coin: we could get Heads or Tails.

Let's give them the values Heads=0 and Tails=1 and we have a Random Variable "X":

So:

We have an experiment (like tossing a coin)


We give values to each event
The set of values is a Random Variable

Learn more at Random Variables .

Mean, Variance and Standard Deviation

Example: Tossing a single unfair die

For fun, imagine a weighted die (cheating!) so we have these probabilities:

1 2 3 4 5 6

0.1 0.1 0.1 0.1 0.1 0.5


Mean or Expected Value:

When we know the probability p of every value x we can calculate the Expected Value (Mean) of
X:

= xp

Note: is Sigma Notation , and means to sum up.

To calculate the Expected Value:

multiply each value by its probability


sum them up

Example continued:
x 1 2 3 4 5 6

p 0.1 0.1 0.1 0.1 0.1 0.5

xp 0.1 0.2 0.3 0.4 0.5 3

= xp = 0.1+0.2+0.3+0.4+0.5+3 = 4.5

The expected value is 4.5

Note: this is a weighted mean : values with higher probability have higher contribution to the
mean.

Variance: Var(X)

The Variance is:


Var(X) = x2p 2

To calculate the Variance:

square each value and multiply by its probability


sum them up and we get x2p
then subtract the square of the Expected Value 2

Example continued:
x 1 2 3 4 5 6

p 0.1 0.1 0.1 0.1 0.1 0.5

x2p 0.1 0.4 0.9 1.6 2.5 18

x2p = 0.1+0.4+0.9+1.6+2.5+18 = 23.5

Var(X) = x2p 2 = 23.5 - 4.52 = 3.25

The variance is 3.25

Standard Deviation:

The Standard Deviation is the square root of the Variance:

= Var(X)

Example continued:
x 1 2 3 4 5 6

p 0.1 0.1 0.1 0.1 0.1 0.5

x2p 0.1 0.4 0.9 1.6 2.5 18

= Var(X) = 3.25 = 1.803...

The Standard Deviation is 1.803...

Let's have another example!

(Note that we run the table downwards instead of along this time.)

You plan to open a new McDougals Fried Chicken, and found these stats for
similar restaurants:
Percent Year's Earnings

20% $50,000 Loss

30% $0

40% $50,000 Profit

10% $150,000 Profit

Using that as probabilities for your new restaurant's profit, what is the Expected Value and
Standard Deviation?

The Random Variable is X = 'possible profit'.

Sum up xp and x2p:


Probability Earnings ($'000s)
p x xp x2p

0.2 -50 -10 500

0.3 0 0 0

0.4 50 20 1000

0.1 150 15 2250

p = 1 xp = 25 x2p = 3750

= xp = 25

Var(X) = x2p 2 = 3750 252 = 3750 625 = 3125

= 3125 = 56 (to nearest whole number)

But remember these are in thousands of dollars, so:

= $25,000
= $56,000

So you might expect to make $25,000, but with a very wide deviation possible.

Let's try that again, but with a much higher probability for $50,000:

Example (continued):

Now with different probabilities (the $50,000 value has a high probability of 0.7 now):

Probability Earnings ($'000s)


p x xp x2p

0.1 -50 -5 250

0.1 0 0 0
0.7 50 35 1750

0.1 150 15 2250

p = 1 Sums: xp = 45 x2p = 4250

= xp = 45

Var(X) = x2p 2 = 4250 452 = 4250 2025 = 2225

= 2225 = 47 (to nearest whole number)

In thousands of dollars:

= $45,000
= $47,000

The mean is now much closer to the most probable value.

And the standard deviation is a little smaller (showing that the values are more central.)

Continuous
Random Variables can be either Discrete or Continuous :

Discrete Data can only take certain values (such as 1,2,3,4,5)


Continuous Data can take any value within a range (such as a person's height)

Here we looked only at discrete data, as finding the Mean, Variance and Standard Deviation of
continuous data needs Integration .

Summary
A Random Variable is a variable whose possible values are numerical outcomes of a random
experiment.

The Mean (Expected Value) is: = xp

The Variance is: Var(X) = x2p 2

The Standard Deviation is: = Var(X)

The Binomial Distribution

"Bi" means "two" (like a bicycle has two wheels) ...


... so this is about things with two results.

Tossing a Coin:

Did we get Heads (H) or


Tails (T)

We say the probability of the coin landing H is


And the probability of the coin landing T is

Throwing a Die:

Did we get a four ... ?


... or not?

We say the probability of a four is 1/6 (one of the six faces is a four).
And the probability of not four is 5/6 (five of the six faces are not a four)

Let's Toss a Coin!


Toss a fair coin three times ... what is the chance of getting two Heads?
Tossing a coin three times (H is for heads, T for Tails) can get any of these 8 outcomes :

HHH

HHT

HTH

HTT

THH

THT

TTH

TTT

Which outcomes do we want?

"Two Heads" could be in any order: "HHT", "THH" and "HTH" all have two Heads (and one Tail).

So 3 of the outcomes produce "Two Heads".


What is the probability of each outcome?

Each outcome is equally likely, and there are 8 of them. So each has a probability of 1/8

So the probability of event "Two Heads" is:

Number of Probability of
outcomes we want each outcome

3 1/8 = 3/8

We used special words:

Outcome: the result of three coin tosses (8 different possibilities)


Event: "Two Heads" out of three coin tosses (3 possibilities)

Let's Calculate Them All:


The calculations are (P means "Probability of"):

P(Three Heads) = P(HHH) = 1/8


P(Two Heads) = P(HHT) + P(HTH) + P(THH) = 1/8 + 1/8 + 1/8 = 3/8
P(One Head) = P(HTT) + P(THT) + P(TTH) = 1/8 + 1/8 + 1/8 = 3/8
P(Zero Heads) = P(TTT) = 1/8

We can write this in terms of a Random Variable , X, = "The number of Heads from 3 tosses of a
coin":

P(X = 3) = 1/8
P(X = 2) = 3/8
P(X = 1) = 3/8
P(X = 0) = 1/8

And we can also draw a Bar Graph :


It is symmetrical!

Making a Formula
Now ... what are the chances of 5 heads in 9 tosses ... to list all outcomes (512) will a long
time!

So let's make a formula.

In our previous example, how could we get the values 1, 3, 3 and 1 ?

They are actually in the third row of Pascals Triangle ... !

Can we make them using a formula?

Sure we can, and here it is:

n = total number
k = number we want

It is often called "n choose k" and you can read more
about it at Combinations and Permutations .

Note: the "!" means " factorial ", for example 4! = 1234 = 24

Let's use it:

Example: 3 tosses getting 2 Heads

We have n=3 and k=2

n! 3! 321
= = =3
k!(n-k)! 2!(3-2)! 21 1

So there are 3 outcomes for "2 Heads"

(We knew that already, but now we have a formula for it.)

Let's use it for a harder question:

Example: what are the chances of 5 heads in 9 tosses?

We have n=9 and k=5

n! 9! 987654321
= = = 126
k!(n-k)! 5!(9-5)! 54321 4321

And for 9 tosses there are 29 = 512 total outcomes, so we get the probability:

Number of Probability of
outcomes we want each outcome
1 126
126 =
512 512

126 63
P(X=5) = = = 0.24609375
512 256

About a 25% chance.

(Easier than listing them all.)

Bias!
So far the chances of success or failure have been equally likely.

But what if the coins are biased (land more on one side than another) or choices are not 50/50.

Example: You sell sandwiches. 70% of people choose chicken, the rest choose
pork.

What is the probability of selling 2 chicken sandwiches to the next 3 customers?

This is just like the heads and tails example, but with 70/30 instead of 50/50.

Let's draw a tree diagram :


The "Two Chicken" cases are highlighted.

Notice that the probabilities for "two chickens" all work out to be 0.147 , because we are
multiplying two 0.7s and one 0.3 in each case.

Can we get the 0.147 from a formula? What we want is "two 0.7s and one 0.3"

0.7 is the probability of each choice we want, call it p


2 is the number of choices we want, call it k

Probability of "choices we want" (two chickens) is: pk

And

The probability of the opposite choice is: 1-p


The total number of choices is: n
The number of opposite choices is: n-k

Probability of "opposite choices" (one pork) is: (1-p)(n-k)

So all choices together is:

pk(1-p)(n-k)

Example: (continued)

p = 0.7 (chance of chicken)


n=3
k=2

So we get:

pk(1-p)(n-k) = 0.72(1-0.7)(3-2) = 0.72(0.3)(1) = 0.7 0.7 0.3 = 0.147

which is the probability of each outcome.

And the total number of those outcomes is:

n! 3! 321
= = =3
k!(n-k)! 2!(3-2)! 21 1

And we get:

Number of Probability of
outcomes we want each outcome

3 0.147 = 0.441

So the probability of event "2 people out of 3 choose chicken" = 0.441

OK. That was a lot of work for something we knew already, but now we can answer harder
questions.

Example: You say "70% choose chicken, so 7 of the next 10 customers should
choose chicken" ... what are the chances you are right?

p = 0.7
n = 10
k=7

So we get:
pk(1-p)(n-k) = 0.77(1-0.7)(10-7) = 0.77(0.3)(3) = 0.0022235661

That is the probability of each outcome.

And the total number of those outcomes is:

n! 10!
=
k!(n-k)! 7!(10-7)!

10987654321
=
7654321 321

1098
= = 120
321

And we get:

Number of Probability of
outcomes we want each outcome

120 0.0022235661 = 0.266827932

In fact the probability of 7 out of 10 choosing chicken is only about 27%

Moral of the story: even though the long-run average is 70%, don't expect 7 out of the next 10.

Putting it Together
Now we know how to calculate how many:

n!
k!(n-k)!

And the probability of each:

pk(1-p)(n-k)

We can multiply them together:

Probability of k out of n ways:

n!
P(k out of n) = pk(1-p)(n-k)
k!(n-k)!

The General Binomial Probability Formula

Important Notes:

The trials are independent,


There are only two possible outcomes at each trial,
The probability of "success" at each trial is constant.

Quincunx

Have a play with the Quincunx (then read Quincunx Explained ) to see the Binomial
Distribution in action.
Throw the Die

A fair die is thrown four times. Calculate the probabilities of getting:

0 Twos
1 Two
2 Twos
3 Twos
4 Twos

In this case n=4, p = P(Two) = 1/6

X is the Random Variable Number of Twos from four throws.

Substitute x = 0 to 4 into the formula:

n!
P(k out of n) = pk(1-p)(n-k)
k!(n-k)!

Like this (to 4 decimal places):

P(X = 0) = (4!/0!4!) (1/6)0(5/6)4 = 1 1 (5/6)4 = 0.4823


P(X = 1) = (4!/1!3!) (1/6)1(5/6)3 = 4 (1/6) (5/6)3 = 0.3858
P(X = 2) = (4!/2!2!) (1/6)2(5/6)2 = 6 (1/6)2 (5/6)2 = 0.1157
P(X = 3) = (4!/3!1!) (1/6)3(5/6)1 = 4 (1/6)3 (5/6) = 0.0154
P(X = 4) = (4!/4!0!) (1/6)4(5/6)0 = 1 (1/6)4 1 = 0.0008

Summary: "for the 4 throws, there is a 48% chance of no twos, 39% chance of 1 two, 12%
chance of 2 twos, 1.5% chance of 3 twos, and a tiny 0.08% chance of all throws being a two
(but it still could happen!)"

This time the Bar Graph is not symmetrical:


It is not symmetrical!

It is skewed because p is not 0.5

Sports Bikes
Your company makes sports bikes. 90% pass final inspection (and 10% fail and need to be
fixed).

What is the expected Mean and Variance of the 4 next inspections?

First, let's calculate all probabilities.

n = 4,
p = P(Pass) = 0.9

X is the Random Variable "Number of passes from four inspections".

Substitute x = 0 to 4 into the formula:

P(k out of n) = n! pk(1-p)(n-k)


k!(n-k)!

Like this:

P(X = 0) = (4!/0!4!) 0.900.14 = 1 1 0.0001 = 0.0001


P(X = 1) = (4!/1!3!) 0.910.13 = 4 0.9 0.001 = 0.0036
P(X = 2) = (4!/2!2!) 0.920.12 = 6 0.81 0.01 = 0.0486
P(X = 3) = (4!/3!1!) 0.930.11 = 4 0.729 0.1 = 0.2916
P(X = 4) = (4!/4!0!) 0.940.10 = 1 0.6561 1 = 0.6561

Summary: "for the 4 next bikes, there is a tiny 0.01% chance of no passes, 0.36% chance of 1
pass, 5% chance of 2 passes, 29% chance of 3 passes, and a whopping 66% chance they all
pass the inspection."

Mean, Variance and Standard Deviation


Let's calculate the Mean , Variance and Standard Deviation for the Sports Bike inspections.

There are (relatively) simple formulas for them. They are a little hard to prove, but they do
work!

The mean, or "expected value", is:

= np

For the sports bikes:

= 4 0.9 = 3.6

So we can expect 3.6 bikes (out of 4) to pass the inspection.


Makes sense really ... 0.9 chance for each bike times 4 bikes equals 3.6

The formula for Variance is:

Variance: 2 = np(1-p)

And Standard Deviation is the square root of variance:

= (np(1-p))

For the sports bikes:


Variance: 2 = 4 0.9 0.1 = 0.36

Standard Deviation is:

= (0.36) = 0.6

Note: we could also calculate them manually, by making a table like this:

X P(X) X P(X) X2 P(X)

0 0.0001 0 0

1 0.0036 0.0036 0.0036

2 0.0486 0.0972 0.1944

3 0.2916 0.8748 2.6244

4 0.6561 2.6244 10.4976

SUM: 3.6 13.32

The mean is the Sum of (X P(X)):

= 3.6

The variance is the Sum of (X2 P(X)) minus Mean2:

Variance: 2 = 13.32 3.62 = 0.36

Standard Deviation is:

= (0.36) = 0.6

And we got the same results as before (yay!)

Summary
The General Binomial Probability Formula
n!
P(k out of n) = pk(1-p)(n-k)
k!(n-k)!

Mean value of X: = np

Variance of X: 2 = np(1-p)
Standard Deviation of X: = (np(1-p))

Normal Distribution
Data can be "distributed" (spread out) in different ways.

It can be spread out


more on the left Or more on the right

Or it can be all jumbled up

But there are many cases where the data tends to be around a central value with no bias left or
right, and it gets close to a "Normal Distribution" like this:
A Normal Distribution

The "Bell Curve" is a Normal Distribution.


And the yellow histogram shows some data that
follows it closely, but not perfectly (which is usual).

It is often called a "Bell Curve"


because it looks like a bell.

Many things closely follow a Normal Distribution:

heights of people
size of things produced by machines
errors in measurements
blood pressure
marks on a test

We say the data is "normally distributed":

The Normal Distribution has:

mean = median = mode

symmetry about the center

50% of values less than the mean


and 50% greater than the mean

Quincunx
You can see a normal distribution being created by random chance!

It is called the Quincunx and it is an amazing machine.

Have a play with it!

Standard Deviations
The Standard Deviation is a measure of how spread out numbers are (read that page for details
on how to calculate it).

When we calculate the standard deviation we find that (generally):

68% of values are within


1 standard deviation of the mean

95% of values are within


2 standard deviations of the mean

99.7% of values are within


3 standard deviations of the mean

Example: 95% of students at school are between 1.1m and 1.7m tall.
Assuming this data is normally distributed can you calculate the mean and standard deviation?

The mean is halfway between 1.1m and 1.7m:

Mean = (1.1m + 1.7m) / 2 = 1.4m

95% is 2 standard deviations either side of the mean (a total of 4 standard deviations) so:

1 standard deviation = (1.7m-1.1m) / 4

= 0.6m / 4

= 0.15m

And this is the result:

It is good to know the standard deviation, because we can say that any value is:

likely to be within 1 standard deviation (68 out of 100 should be)


very likely to be within 2 standard deviations (95 out of 100 should be)
almost certainly within 3 standard deviations (997 out of 1000 should be)

Standard Scores
The number of standard deviations from the mean is also called the "Standard Score",
"sigma" or "z-score". Get used to those words!

Example: In that same school one of your friends is 1.85m tall


You can see on the bell curve that 1.85m is 3 standard
deviations from the mean of 1.4, so:

Your friend's height has a "z-score" of 3.0

It is also possible to calculate how many standard deviations 1.85 is from the mean

How far is 1.85 from the mean?

It is 1.85 - 1.4 = 0.45m from the mean

How many standard deviations is that? The standard deviation is 0.15m, so:

0.45m / 0.15m = 3 standard deviations

So to convert a value to a Standard Score ("z-score"):

first subtract the mean,


then divide by the Standard Deviation

And doing that is called "Standardizing":

We can take any Normal Distribution and convert it to The Standard Normal Distribution.

Example: Travel Time

A survey of daily travel time had these results (in minutes):

26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34

The Mean is 38.8 minutes, and the Standard Deviation is 11.4 minutes (you can copy and
paste the values into the Standard Deviation Calculator if you want).

Convert the values to z-scores ("standard scores").


To convert 26:

first subtract the mean: 26 - 38.8 = -12.8,

then divide by the Standard Deviation: -12.8/11.4 = -1.12

So 26 is -1.12 Standard Deviations from the Mean

Here are the first three conversions

Standard Score
Original Value Calculation
(z-score)

26 (26-38.8) / 11.4 = -1.12

33 (33-38.8) / 11.4 = -0.51

65 (65-38.8) / 11.4 = +2.30

... ... ...

And here they are graphically:

You can calculate the rest of the z-scores yourself!

Here is the formula for z-score that we have been using:


z is the "z-score" (Standard
Score)
x is the value to be standardized
is the mean
is the standard deviation

Why Standardize ... ?


It can help us make decisions about our data.

Example: Professor Willoughby is marking a test.

Here are the students results (out of 60 points):

20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17

Most students didn't even get 30 out of 60, and most will fail.

The test must have been really hard, so the Prof decides to Standardize all the scores and only
fail people 1 standard deviation below the mean.

The Mean is 23, and the Standard Deviation is 6.6, and these are the Standard Scores:

-0.45, -1.21 , 0.45, 1.36, -0.76, 0.76, 1.82, -1.36 , 0.45, -0.15, -0.91

Only 2 students will fail (the ones who scored 15 and 14 on the test)

It also makes life easier because we only need one table (the Standard Normal Distribution
Table ), rather than doing calculations individually for each value of mean and standard
deviation.

In More Detail
Here is the Standard Normal Distribution with percentages for every half of a standard
deviation, and cumulative percentages:
Example: Your score in a recent test was 0.5 standard deviations above the average, how
many people scored lower than you did?
Between 0 and 0.5 is 19.1%
Less than 0 is 50% (left half of the curve)

So the total less than you is:

50% + 19.1% = 69.1%

In theory 69.1% scored less than you did (but with real data the percentage may be different)

A Practical Example: Your company packages sugar in 1 kg


bags.
When you weigh a sample of bags you get these results:

1007g, 1032g, 1002g, 983g, 1004g, ... (a hundred measurements)


Mean = 1010g
Standard Deviation = 20g

Some values are less than 1000g ... can you fix that?

The normal distribution of your measurements looks like this:


31% of the bags are less than 1000g,
which is cheating the customer!

It is a random thing, so we can't stop bags having less than 1000g, but we can try to reduce
it a lot.

Let's adjust the machine so that 1000g is:

at 3 standard deviations:

From the big bell curve above we see that 0.1% are less. But maybe that is too small.
at 2.5 standard deviations:

Below 3 is 0.1% and between 3 and 2.5 standard deviations is 0.5%, together that is 0.1% +
0.5% = 0.6% (a good choice I think)

So let us adjust the machine to have 1000g at 2.5 standard deviations from the mean.

Now, we can adjust it to:

increase the amount of sugar in each bag (which changes the mean), or
make it more accurate (which reduces the standard deviation)

Let us try both.

ADJUST THE MEAN AMOUNT IN EACH BAG

The standard deviation is 20g, and we need 2.5 of them:


2.5 20g = 50g

So the machine should average 1050g, like this:

ADJUST THE ACCURACY OF THE MACHINE

Or we can keep the same mean (of 1010g), but then we need 2.5 standard deviations to be
equal to 10g:

10g / 2.5 = 4g

So the standard deviation should be 4g, like this:

(We hope the machine is that accurate!)

Or perhaps we could have some combination of better accuracy and slightly larger average size,
I will leave that up to you!

More Accurate Values ...


Use the Standard Normal Distribution Table when you want more accurate values.

Standard Normal Distribution Table

0 to Z

Up to Z

Z onwards
0 to 1.62: 44.74%
Note: Click to Freeze/Unfreeze
Left/right to adjust

2015 MathsIsFun.com v0.77

This is the "bell-shaped" curve of the Standard Normal Distribution.


It is a Normal Distribution with mean 0 and standard deviation 1.

It shows you the percent of population:

between 0 and Z (option "0 to Z")


less than Z (option "Up to Z")
greater than Z (option "Z onwards")

It only display values to 0.01%

The Table
You can also use the table below. The table shows the area from 0 to Z.

Instead of one LONG table, we have put the "0.1"s running down, then the "0.01"s running
along. (Example of how to use is below)

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Example: Percent of Population Between 0 and 0.45

Start at the row for 0.4, and read along until 0.45: there is the value 0.1736

And 0.1736 is 17.36%

So 17.36% of the population are between 0 and 0.45 Standard Deviations from the
Mean.

Because the curve is symmetrical, the same table can be used for values going either direction,
so a negative 0.45 also has an area of 0.1736
Example: Percent of Population Z Between -1 and 2

From 1 to 0 is the same as from 0 to +1:

At the row for 1.0, first column 1.00, there is the value 0.3413

From 0 to +2 is:

At the row for 2.0, first column 2.00, there is the value 0.4772

Add the two to get the total between -1 and 2:

0.3413 + 0.4772 = 0.8185

And 0.8185 is 81.85%

So 81.85% of the population are between -1 and +2 Standard Deviations from the
Mean.

Skewed Data
Data can be "skewed", meaning it tends to have a long tail on one side or the other:

Negative Skew No Skew Positive Skew


Negative Skew?
Why is it called negative skew? Because the long "tail" is on the negative side of the
peak.

People sometimes say it is "skewed to the left" (the long tail is on the left hand side)

The mean is also on the left of the peak.

The Normal Distribution has No Skew


A Normal Distribution is not skewed.

It is perfectly symmetrical.

And the Mean is exactly at the peak.

Positive Skew
And positive skew is when the long tail is on the positive side of the peak, and some
people say it is "skewed to the right".

The mean is on the right of the peak value.

Example: Income Distribution


Here is some data extracted from a recent Census.

As you can see it is positively skewed ... in fact the tail continues way past $100,000
Calculating Skewness
"Skewness" (the amount of skew) can be calculated, for example you could use the SKEW()
function in Excel or OpenOffice Calc.

You might also like