You are on page 1of 130

Data Analytics - 1

Chapter 1
Data and Statistics


Statistics

Applications in Business and Economics

Data

Data Sources

Descriptive Statistics
Statistical Inference
Computers and Statistical Analysis

Data Mining

Ethical Guidelines for Statistical Practice




Statistics
The term statistics can refer to numerical facts such as
averages, medians, percents, and index numbers that
help us understand a variety of business and economic
situations.
situations
Statistics can also refer to the art and science of
collecting, analyzing, presenting, and interpreting
data.

The goall off statistics


Th
t ti ti iis tto make
k us informed
i f
d users off
numerical information. It helps us make better
decisions.

Applications in
Business and Economics


Accounting
Public accounting firms use statistical sampling
procedures when conducting
p
g audits for their clients.
Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.

Finance

Financial advisors use price-earnings ratios and


dividend yields to guide their investment advice.

Applications in
Business and Economics
Marketing

Electronic point-of-sale scanners at retail checkout


counters
cou
te s aaree used to co
collect
ect data for
o a va
variety
ety o
of
marketing research applications.
Production

A variety of statistical quality control charts are used


to monitor the output of a production process.

Data and Data Sets




Data are the facts and figures collected, analyzed,


and summarized for presentation and interpretation.

All the
th d
data
t collected
ll t d iin a particular
ti l study
t d are referred
f
d
to as the data set for the study.

Elements, Variables, and Observations


Elements are the entities on which data are collected.
A variable is a characteristic of interest for the elements.
The
h set off measurements obtained
b
d for
f a particular
l
element is called an observation.
A data set with n elements contains n observations.
The total number of data values in a complete data
set is the number of elements multiplied by the
number of variables.
variables

Data, Data Sets,


Elements, Variables, and Observations
Variables
Element
Names

Company
Dataram
EnergySouth
Keystone
LandCare
Psychemedics

Stock
Exchange
NQ
N
N
NQ
N

Annual
Earn/
/
Sales($M) Share($)
73.10
74.00
365.70
111.40
17.60

0.86
1.67
0.86
0.33
0.13
Data Set

Scales of Measurement
Scales of measurement include:
Nominal

Interval

Ordinal

Ratio

The scale determines the amount of information


contained in the data.
The scale
Th
l indicates
i di
the
h data
d
summarization
i i and
d
statistical analyses that are most appropriate.

Scales of Measurement


Nominal
Data are labels or names used to identify an
attribute
b
off the
h element.
l

A nonnumeric label or numeric code may


y be used.

Scales of Measurement


Nominal
Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable ((e.g.
g 1 denotes Business,,
2 denotes Humanities, 3 denotes Education, and
so on).

Scales of Measurement


Ordinal

The data have the properties of nominal data and


the order or rank of the data is meaningful.
meaningful.

A nonnumeric label
l b l or numeric code
d may be
b used.
d

Scales of Measurement


Ordinal
Example:
p
Students of a university are classified by their
class standing using a nonnumeric label such as
Undergraduate, Post Graduate, or Doctoral.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Undergraduate, 2 denotes Post Graduate, and so on)..

Scales of Measurement


Interval
The data have the properties of ordinal data, and
the
h interval
i
lb
between observations
b
i
iis expressed
d iin
terms of a fixed unit of measure.
Interval data are always numeric.
numeric.

O ne uniton this scale is the sam e size anyw here


along the scale,so values can be treated
m athem atically (e.g.,averaged),butzero on the scale
does notindicate a totalabsence ofthe variable
being m easured

Scales of Measurement


Interval
Example:
Adi
Aditii has
h an SAT score off 1205
1205, while
hil Arun
A
has an SAT score of 1090. Aditi scored 115
points more than Arun.
Arun.

Scales of Measurement


Ratio
The data have all the properties of interval data
and
d the
h ratio off two values
l
is meaningful.
meaningful
f l.
Variables such as distance, height, weight, and time
use the ratio scale.
This scale must contain a zero value that indicates
that nothing exists for the variable at the zero point.

Scales of Measurement


Ratio
Example:
G
Ganeshs
h college
ll
record
d shows
h
36 credit
di h
hours
earned, while Govinds record shows 72 credit
hours earned. Govind has twice as many credit
hours earned as Ganesh.
Ganesh.

Categorical and Quantitative Data


Data can be further classified as being categorical
or quantitative.
The statistical analysis that is appropriate depends
on whether the data for the variable are categorical
or quantitative.
In general, there are more alternatives for statistical
analysis when the data are quantitative.

Categorical Data
Labels or names used to identify an attribute of
each element
Often referred to as qualitative data
Use either the nominal or ordinal scale of
measurement
Can be either numeric or nonnumeric
Appropriate statistical analyses are rather limited

Quantitative Data
Quantitative data indicate how many or how much:
discrete,
discrete, if measuring
g how many
y
continuous
continuous,, if measuring how much
Quantitative data are always numeric.
numeric.
O di
Ordinary
arithmetic
ih
i operations
i
are meaningful
i f l ffor
quantitative data.

10

Scales of Measurement
Data
Categorical

Numeric

Nominal

Quantitative

Non--numeric
Non

Ordinal

Nominal

Ordinal

Numeric

Interval

Ratio

Cross-Sectional Data
Cross-sectional data are collected at the same or
Crossapproximately the same point in time.
Example:
Example: data detailing the number of building
permits issued in February 2010 in each of the
ward of Coimbatore

11

Time Series Data


Time series data are collected over several time
periods.
Example:
Example: data detailing the number of building
permits issued in Ettimadai village in each of
the last 36 months

Time Series Data

12

Data Sources


Existing Sources

Internal company records almost any department


Business database services India Business Insight
Database
Government agencies - Ministry of Labour and Employment
Industry associations NASSCOM
Independent organizations Centre for Monitoring
Indi n Economy
Indian
E n m (CMIE)
Internet more and more firms

Data Sources


Data Available From Internal Company Records


Record

Some of the Data Available

E l
Employee
records
d

name, address,
dd
social
i l security
it number
b

Production records

part number, quantity produced,


direct labor cost, material cost
part number, quantity in stock,
reorder level, economic order quantity

Inventory records
Sales records
Credit records
Customer profile

product number, sales volume, sales


volume by region
customer name, credit limit, accounts
receivable balance
age, gender, income, household size

13

Data Sources


Statistical Studies - Experimental


In experimental studies the variable of interest is
first identified.
identified Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.
The largest experimental study ever conducted is
b li
believed
d to
t be
b the
th 1954 P
Public
bli H
Health
lth S
Service
i
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 11- 3) were selected.

Data Sources


Statistical Studies - Observational


In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest.
a survey is a good example
Studies of smokers and nonsmokers are
observational studies because researchers
do not determine or control
who will smoke and who will not smoke.

14

Data Acquisition Considerations


Time Requirement

Searching for information can be time consuming.


I f
Information
ti may no longer
l
b
be useful
f lb
by the
th time
ti
it
is available.

Cost of Acquisition

Organizations often charge for information even


when it is not their primary business activity.

D t Errors
Data
E

Using any data that happen to be available or were


acquired with little care can lead to misleading
information.

Descriptive Statistics
Most of the statistical information in newspapers,
magazines, company reports, and other publications
consists of data that are summarized and presented
i a form
in
f
that
h is
i easy to understand.
d
d

Such summaries of data, which may be tabular,


graphical, or numerical, are referred to as descriptive
statistics.
t ti ti

15

Example: Hudson Auto Repair


The manager of Hudson Auto would like to have a
better understanding of the cost of parts used in the
g
tune-ups
p p
performed in her shop.
p She examines
engine
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

Example: Hudson Auto Repair




Sample of Parts Cost ($) for 50 Tune-ups


91
71
104
85
62

78
69
74
97
82

93
72
62
88
98

57
89
68
68
101

75
66
97
83
79

52
75
105
68
105

99
79
77
71
79

80
75
65
69
69

97
72
80
67
62

62
76
109
74
73

16

Tabular Summary:
Frequency and Percent Frequency


Example: Hudson Auto


Parts
C t ($)
Cost
50--59
50
60--69
60
70--79
70
80--89
80
90--99
90
100--109
100

F
Frequency
2
13
16
7
7
5
50

Percent
F
Frequency
4
26
(2/50)100
32
14
14
10
100

Graphical Summary: Histogram


Example: Hudson Auto
18

Tune--up Parts Cost


Tune

16
14

Frequency

12
10
8
6
4
2

Parts
Cost
($)
5059 6069 7079 8089 9099 100-110

17

Numerical Descriptive Statistics


The most common numerical descriptive statistic
is the average (or mean).
The average
g demonstrates a measure of the central
tendency, or central location, of the data for a variable.
Hudsons average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).

Statistical Inference
Population the set of all elements of interest in a
particular study
b off the
h population
l i
Sample
l a subset
Statistical inference the process of using data obtained
from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census collecting data for the entire population
Sample survey collecting data for a sample

18

Process of Statistical Inference


1. Population
consists of all tunetuneups. Average
A
costt off
parts is unknown.
unknown

4. The sample average


is used to estimate the
population average.

2. A sample of 50
engine tunetune-ups
is examined.

3. The sample data


provide a sample
average parts cost
of $79 per tunetune-up.

Computers and Statistical Analysis


Statisticians often use computer software to perform
the statistical computations required with large
amounts of data.

19

Data Warehousing
Organizations obtain large amounts of data on a
daily basis by means of magnetic card readers, bar
code scanners, point of sale terminals, and touch
screen monitors.
i
Wal-Mart captures data on 20-30 million transactions
per day.
Visa processes 6,800 payment transactions per second.
Capturing, storing, and maintaining the data, referred
to as data warehousing,
warehousing is a significant undertaking.
undertaking

Data Mining
Analysis of the data in the warehouse might aid in
decisions that will lead to new strategies and higher
profits for the organization.
Using a combination of procedures from statistics,
mathematics, and computer science, analysts mine
the data to convert it into useful information.
The most effective data mining systems use automated
procedures to discover relationships in the data and
predict future outcomes, prompted by only general,
even vague, queries by the user.

20

Data Mining Applications


The major applications of data mining have been
made by companies with a strong consumer focus
such as retail, financial, and communication firms.
Data mining is used to identify related products that
customers who have already purchased a specific
product are also likely to purchase (and then pop-ups
are used to draw attention to those related products).
As another example, data mining is used to identify
customers who should receive special discount offers
based on their past purchasing volumes.

Data Mining Requirements


Statistical methodology such as multiple regression,
logistic regression, and correlation are heavily used.

Also needed are computer science technologies


involving artificial intelligence and machine learning.

A significant
i ifi
t iinvestment
t
t iin ti
time and
d money iis
required as well.

21

Data Mining Model Reliability


Finding a statistical model that works well for a
particular sample of data does not necessarily mean
that it can be reliably applied to other data.
With the enormous amount of data available, the
data set can be partitioned into a training set (for
model development) and a test set (for validating
the model).
There is, however, a danger of over fitting the model
to the p
point that misleading
g associations and
conclusions appear to exist.
Careful interpretation of results and extensive testing
is important.

Ethical Guidelines for Statistical Practice


In a statistical study, unethical behavior can take a
variety of forms including:
Improper sampling
Inappropriate analysis of the data
Development of misleading graphs
Use of inappropriate summary statistics
Biased interpretation of the statistical results
You should strive to be fair, thorough, objective, and
neutral as you collect, analyze, and present data.
As a consumer of statistics, you should also be aware
of the possibility of unethical behavior by others.

22

Descriptive Statistics:
Tabular and Graphical Presentations



Summarizing Categorical Data


Summarizing Quantitative Data
Categorical data use labels or names
to identify categories of like items.
Quantitative data are numerical values
that indicate how much or how many.

Summarizing Categorical Data








Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Bar Chart
Pie Chart

23

Frequency Distribution
A frequency distribution is a tabular summary of
data showing the frequency (or number) of items
in each of several nonnon-overlapping classes.
classes
The objective is to provide insights about the data
that cannot be quickly obtained by looking only at
the original data.

Frequency Distribution
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the
quality
q
y of their accommodations as being
g excellent
excellent,,
above average,
average, average,
average, below average,
average, or poor
poor.. The
ratings provided by a sample of 20 guests are:

Below Average
Above Average
Above Average
A
Average
Above Average
Average
Above Average

Average
Above Average
Below Average
P
Poor
Excellent
Above Average
Average

Above Average
Above Average
Below Average
P
Poor
Above Average
Average

24

Frequency Distribution


Example: Marada Inn


Rating
Frequency
2
Poor
3
Below Average
5
Average
9
Above Average
1
Excellent
T t l
Total
20

Relative Frequency Distribution


The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
class
A relative frequency distribution is a tabular
summary of a set of data showing the relative
frequency for each class.

25

Percent Frequency Distribution


The percent frequency of a class is the relative
frequency multiplied by 100.
A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.

Relative Frequency and


Percent Frequency Distributions


Example: Marada Inn


Relative
Frequency
Rating
.10
Poor
.15
Below Average
.25
Average
.45
Above Average
.05
Excellent
Total
1.00

Percent
Frequency
10
15
25 .10(100) = 10
45
5
100
1/20 = .05

26

Bar Chart
A bar chart is a graphical device for depicting
qualitative data.
On one axis (usually the horizontal axis)
axis), we specify
the labels that are used for each of the classes.
A frequency
frequency,, relative frequency,
frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that each
class is a separate category.

Bar Chart
Marada Inn Quality Ratings

10
9
Frequency

8
7
6
5
4
3
2
1
Poor

Below Average Above Excellent


Average
Average

Rating

27

Pareto Diagram
In quality control, bar charts are used to identify the
most important causes of problems.
When the bars are arranged in descending order of
height from left to right (with the most frequently
occurring cause appearing first) the bar chart is
called a Pareto diagram.
diagram.
This diagram is named for its founder, Vilfredo
Pareto, an Italian economist.

Pie Chart
The pie chart is a commonly used graphical device
for presenting relative frequency and percent
frequency
q
y distributions for categorical
g
data.


First draw a circle


circle;; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
Since there are 360 degrees in a circle, a class with a
relative frequency
q
y of .25 would consume .25(360)
(
) = 90
degrees of the circle.

28

Pie Chart
Marada Inn Quality Ratings
Excellent
5%

Above
Average
45%

Poor
10%

Below
Average
15%

Average
g
25%

Example: Marada Inn




Insights Gained from the Preceding Pie Chart

One--half of the customers surveyed gave Marada


One
aq
quality
y rating
g of above average
g or excellent
(looking at the left side of the pie). This might
please the manager.

For each customer who gave an excellent rating,


there were two customers who gave a poor
rating (looking at the top of the pie). This should
displease the manager.
manager

29

Summarizing Quantitative Data










Frequency Distribution
Relative Frequency and
Percent Frequency Distributions
Dot Plot
Histogram
Cumulative Distributions
Ogive

Frequency Distribution


Example: Hudson Auto Repair


The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune
tune--ups performed in the shop. She examines
50 customer invoices for tune
tune--ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

30

Frequency Distribution


Example: Hudson Auto Repair


Sample of Parts Cost($) for 50 Tune
Tune--ups
91
71
104
85
62

78
69
74
97
82

93
72
62
88
98

57
89
68
68
101

75
66
97
83
79

52
75
105
68
105

99
79
77
71
79

80
75
65
69
69

97
72
80
67
62

62
76
109
74
73

Frequency Distribution
The three steps necessary to define the classes for a
frequency distribution with quantitative data are:
1 Determine the number of non
1.
non--overlapping classes.
classes
2. Determine the width of each class.
3. Determine the class limits.

31

Frequency Distribution


Guidelines for Determining the Number of Classes


Use between 5 and 20 classes.

Data sets with a larger number of elements


usually require a larger number of classes.

Smaller data sets usually require fewer classes.


The goal is to use enough classes to show the
variation in the data, but not so many classes
that
h some contain only
l a few
f
d
data
items.

Frequency Distribution


Guidelines for Determining the Width of Each Class


Use classes of equal width.

Approximate Class Width =

Largest Data Value Smallest Data Value


Number of Classes
Making the classes the same
width
id h reduces
d
the
h chance
h
off
inappropriate interpretations.

32

Frequency Distribution


Note on Number of Classes and Class Width


In practice, the number of classes and the
appropriate
pp p
class width are determined by
y trial
and error.
Once a possible number of classes is chosen, the
appropriate class width is found.

The process can be repeated for a different


number of classes.

Ulti t l th
Ultimately,
the analyst
l t uses jjudgment
d
t tto
determine the combination of the number of
classes and class width that provides the best
frequency distribution for summarizing the data.

Frequency Distribution


Guidelines for Determining the Class Limits


Class limits must be chosen so that each data
item belongs
g to one and only
y one class.
The lower class limit identifies the smallest
possible data value assigned to the class.

The upper class limit identifies the largest


possible data value assigned to the class.

The appropriate values for the class limits


d
depend
d on the
h level
l
l off accuracy off the
h data.
d
An openopen-end class requires only a
lower class limit or an upper class limit.

33

Frequency Distribution


Example: Hudson Auto Repair


If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 99.55 10
Parts Cost ($) Frequency
50--59
50
2
60
60--69
13
70--79
70
16
80--89
80
7
90--99
90
7
100
100--109
5
Total 50

Relative Frequency and


Percent Frequency Distributions


Example: Hudson Auto Repair


Parts
Relative
Percent
Cost ($) Frequency
Frequency
50--59
50
.04
4
60--69
60
.26
2/50 26 .04(100)
70--79
70
.32
32
Percent
80--89
80
.14
14
frequency
q
y is
90--99
90
.14
14
14
the relative
100--109
100
.10
frequency
10
multiplied
Total 1.00
100
by 100.

34

Relative Frequency and


Percent Frequency Distributions


Example: Hudson Auto Repair


Insights Gained from the % Frequency Distribution:

Only 4% of the parts costs are in the $50$50-59 class.


class
30% of the parts costs are under $70.
The greatest percentage (32% or almost one
one--third)
of the parts costs are in the $70$70-79 class.
10% of the parts costs are $100 or more.

Dot Plot





One of the simplest graphical summaries of data is a


dot plot.
plot.
A horizontal axis shows the range
g of data values.
Then each data value is represented by a dot placed
above the axis.

35

Dot Plot


Example: Hudson Auto Repair


Tune--up Parts Cost
Tune

50

60

70

80

90

100

110

Cost ($)

Histogram
Another common graphical presentation of
quantitative data is a histogram
histogram..
The variable of interest is placed on the horizontal
axis.
A rectangle is drawn above each class interval with
its height corresponding to the intervals frequency,
frequency,
relative frequency,
frequency, or percent frequency.
frequency.
Unlike a bar graph, a histogram has no natural
separation
i between
b
rectangles
l off adjacent
dj
classes.
l

36

Histogram


Example: Hudson Auto Repair


18

Tune--up Parts Cost


Tune

16

Frequency

14
12
10
8
6
4
2

Parts

5059 6069 7079 8089 9099 100-110 Cost ($)

Histograms Showing Skewness


Symmetric
Left tail is the mirror image of the right tail
Examples: heights and weights of people
.35

Relativee Frequency

.30
.25
.20
.15
15
.10
.05
0

37

Histograms Showing Skewness




Moderately Skewed Left


A longer tail to the left
Example: exam scores

Relativee Frequency

.35
.30
.25
.20
.15
15
.10
.05
0

Histograms Showing Skewness


Moderately Right Skewed
A Longer tail to the right
Example: housing prices
.35

Relativee Frequency

.30
.25
.20
.15
15
.10
.05
0

38

Histograms Showing Skewness


Highly Skewed Right
A very long tail to the right
Example: executive salaries
.35

Relativee Frequency

.30
.25
.20
.15
15
.10
.05
0

Cumulative Distributions
Cumulative frequency distribution shows the
number of items with values less than or equal to the
upper limit of each class..
class
Cumulative relative frequency distribution shows
the proportion of items with values less than or
equal to the upper limit of each class.
Cumulative percent frequency distribution shows
the percentage of items with values less than or
equal to the upper limit of each class.

39

Cumulative Distributions
The last entry in a cumulative frequency distribution
always equals the total number of observations.
The last entry in a cumulative relative frequency
distribution always equals 1.00.
The last entry in a cumulative percent frequency
distribution always equals 100.

Cumulative Distributions


Hudson Auto Repair

Cost ($)
< 59
< 69
< 79
< 89
< 99
< 109

Cumulative Cumulative
Cumulative
Relative
Percent
Frequency
Frequency
Frequency
2
.04
4
15
.30
30
31 2 + 13 .62 15/50 62 .30(100)
38
.76
76
76
45
.90
90
50
1.00
100

40

Ogive


An ogive is a graph of a cumulative distribution.

The data values are shown on the horizontal axis.

Shown on the vertical axis are the:


cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is
plotted as a point.
The plotted points are connected by straight lines.

Ogive


Hudson Auto Repair


Because the class limits for the partsparts-cost data are
60-69,, and so on,, there appear
pp
to be one
one--unit
50--59,, 6050
gaps from 59 to 60, 69 to 70, and so on.

These gaps are eliminated by plotting points


halfway between the class limits.

Thus, 59.5 is used for the 5050-59 class, 69.5 is used


for the 6060-69 class, and so on.

41

Ogive with Cumulative Percent Frequencies


Example: Hudson Auto Repair

Cumulative Percent Frequen


ncy

Tune--up Parts Cost


Tune

100
80
60

(89.5, 76)

40
20
50

60

70

80

90

100

110

Parts
Cost ($)

Descriptive Statistics:
Tabular and Graphical Presentations


Exploratory Data Analysis: StemStem-and


and--Leaf Display

Crosstabulation and Scatter Diagram

42

Exploratory Data Analysis


The techniques of exploratory data analysis consist of
simple arithmetic and easyeasy-to
to--draw pictures that can
be used to summarize data quickly.
One such technique is the stemstem-and
and--leaf display
display..

Stem--and
Stem
and--Leaf Display
A stem
stem--and
and--leaf display shows both the rank order
and shape of the distribution of the data.
It is similar to a histogram
g
on its side, but it has the
advantage of showing the actual data values.
The first digits of each data item are arranged to the
left of a vertical line.
To the right of the vertical line we record the last
digit for each item in rank order.
Each line in the display is referred to as a stem
stem..
Each digit on a stem is a leaf.
leaf.

43

Example: Hudson Auto Repair


The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
g
tune
tune--ups
p p
performed in the shop.
p She examines
engine
50 customer invoices for tune
tune--ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

Stem--and
Stem
and--Leaf Display
Example: Hudson Auto Repair
Sample of Parts Cost ($) for 50 Tune
Tune--ups
91
71
104
85
62

78
69
74
97
82

93
72
62
88
98

57
89
68
68
101

75
66
97
83
79

52
75
105
68
105

99
79
77
71
79

80
75
65
69
69

97
72
80
67
62

62
76
109
74
73

44

Stem--and
Stem
and--Leaf Display
Example: Hudson Auto Repair
5
6
7
8
9
10
a stem

2
2
1
0
1
1

7
2
1
0
3
4

2
2
2
7
5

2
2
3
7
5

5
3
5
7
9

6
4
8
8

7 8 8 8 9 9 9
4 5 5 5 6 7 8 9 9 9
9
9

a leaf

Stretched StemStem-and
and--Leaf Display
If we believe the original stem
stem--and
and--leaf display has
condensed the data too much, we can stretch the
display vertically by using two stems for each
l di digit(s).
leading
di i ( )
Whenever a stem value is stated twice, the first value
corresponds to leaf values of 0 4, and the second
value corresponds to leaf values of 5 9.

45

Stretched StemStem-and
and--Leaf Display
Example: Hudson Auto Repair
5
5
6
6
7
7
8
8
9
9
10
10

2
7
2
5
1
5
0
5
1
7
1
5

2
6
1
5
0
8
3
7
4
5

2
7
2
5
2
9

2
8 8 8 9 9 9
2 3 4 4
6 7 8 9 9 9
3

7 8 9
9

Stem--and
Stem
and--Leaf Display
Leaf Units
A single digit is used to define each leaf.

In the preceding example,


example the leaf unit was 11.
Leaf units may be 100, 10, 1, 0.1, and so on.
Where the leaf unit is not shown, it is assumed
to equal 1.
The leaf unit indicates how to multiply the stem
stem-and--leaf numbers in order to approximate the
and
original data.

46

Example: Leaf Unit = 0.1


If we have data with values such as
8.6

11.7

9.4

9.1

10.2

11.0

8.8

a stem
stem--and
and--leaf display of these data will be
Leaf Unit = 0.1
8 6 8
9 1 4
10 2
11 0 7

Example: Leaf Unit = 10


If we have data with values such as
1806

1717

1974

1791

1682

1910

1838

a stem
stem--and
and--leaf display of these data will be
Leaf Unit = 10
16 8
17 1 9
18 0 3
19 1 7

The 82 in 1682
is rounded down
to 80 and is
represented as an 8.

47

Crosstabulations and Scatter Diagrams


Thus far we have focused on methods that are used
to summarize the data for one variable at a time.
time.
Often a manager
g is interested in tabular and
graphical methods that will help understand the
relationship between two variables.
variables.
Crosstabulation and a scatter diagram are two
methods for summarizing the data for two variables
simultaneously.

Crosstabulation
A crosstabulation is a tabular summary of data for
two variables.
Crosstabulation can be used when:
one variable is qualitative and the other is
quantitative,
both variables are qualitative, or
both variables are quantitative.
The left and top margin labels define the classes for
the two variables.

48

Crosstabulation
Example: Finger Lakes Homes
The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
quantitative
categorical
variable
variable
Home Style
Price
Colonial Log Split A
A--Frame Total
Range
< $200,000
> $200,000

18
12

6
14

19
16

12
3

55

Total

30

20

35

15

100

45

Crosstabulation
Example: Finger Lakes Homes
Insights Gained from Preceding Crosstabulation

The greatest number of homes (19) in the sample


are a split
split--level style and priced at less than
$200,000.

Only three homes in the sample are an A


A--Frame
style and priced at $200,000 or more.

49

Crosstabulation
Example: Finger Lakes Homes

Frequency
distribution
for the
price range
variable

Home Style
Log Split A
A--Frame

Price
Range

Colonial

< $200,000
> $200,000

18
12

6
14

19
16

12
3

55

Totall

30

20

35

15

100

Total
45

Frequency distribution for


the home style variable

Crosstabulation: Row or Column Percentages


Converting the entries in the table into row
percentages or column percentages can provide
additional insight about the relationship between
the
h two variables.
i bl

50

Crosstabulation: Row Percentages


Example: Finger Lakes Homes
Price
R
Range

Colonial
C l i l

< $200,000
> $200,000

32.73
26.67

Home Style
Log
L
Split
S lit A
A--Frame
F
10.91 34.55
31.11 35.56

T t l
Total

21.82
6.67

100
100

Note: row totals are actually 100.01 due to rounding.

(Colonial and > $200K)/(All > $200K) x 100 = (12/45) x 100

Crosstabulation: Column Percentages


Example: Finger Lakes Homes
Price
R
Range

Colonial
C l i l

< $200,000
> $200,000

60.00
40.00

Total

100

Home Style
Log
L
Split
S lit A
A--Frame
F
30.00 54.29
70.00 45.71
100

100

80.00
20.00
100

(Colonial and > $200K)/(All Colonial) x 100 = (12/30) x 100

51

Crosstabulation: Simpsons Paradox


Data in two or more crosstabulations are often
aggregated to produce a summary crosstabulation.
We must be careful in drawing conclusions about the
relationship between the two variables in the
aggregated crosstabulation.
crosstabulation.
In some cases the conclusions based upon an
aggregated crosstabulation can be completely
reversed if we look at the unaggregated data
data.. The
reversal of conclusions based on aggregate and
unaggregated data is called Simpsons paradox.
paradox.

Illustration of Simpsons paradox







Eg: Analysis of verdicts for two judges in two


Eg:
different courts.
JJudges
g Ron Luckett and Dennis Kendall p
presided
over cases in Common Pleas Court and Municipal
Court during the past three years.
Some of the verdicts they rendered were appealed.
In most of these cases the appeals court upheld the
original verdicts, but in some cases those verdicts
were reversed.

52




For each judge a crosstabulation was developed based


upon two variables: Verdict (upheld or reversed) and
Type of Court (Common Pleas and Municipal).
Suppose that the two crosstabulations were then
combined
bi d b
by aggregating
ti th
the type
t
off courtt d
data.
t
The resulting aggregated crosstabulation contains two
variables: Verdict (upheld or reversed) and Judge
(Luckett or Kendall).
This crosstabulation shows the number of appeals in
which the verdict was upheld and the number in
which
hi h the
h verdict
di was reversed
d ffor b
both
h jjudges.
d
The crosstabulation on the next slide shows these
results along with the column percentages in
parentheses next to each value.

Who is doing a better job?


A review of the column percentages shows that 86%
of the verdicts were upheld for Judge Luckett,
Luckett, while
88% of the verdicts were upheld for Judge Kendall.
From this aggregated crosstabulation,
crosstabulation, we conclude
that
h JJudge
d K
Kendall
d ll is
i doing
d i the
h better
b
job
j b because
b
a
greater percentage of Judge Kendalls verdicts are
being upheld.

53

The following unaggregated crosstabulations show


the cases tried by Judge Luckett and Judge Kendall in
each court; column percentages are shown in
parentheses next to each value.

Now, who is performing better?

When we unaggregate the data, we see that Judge


Luckett has a better record because a greater
percentage of Judge Lucketts verdicts are being
upheld in both courts.
This result contradicts the conclusion we reached
with the aggregated data crosstabulation that
showed Judge Kendall had the better record.
This reversal of conclusions based on aggregated and
unaggregated data illustrates Simpsons paradox.

54

The original crosstabulation was obtained by


aggregating the data in the separate crosstabulations
for the two courts.
Note that for both judges the percentage of appeals
that resulted in reversals was much higher
g
in
Municipal Court than in Common Pleas Court.
Because Judge Luckett tried a much higher
percentage of his cases in Municipal Court, the
aggregated data favored Judge Kendall.
When we look at the crosstabulations for the two
courts separately, however, Judge Luckett shows the
better record.
Thus, for the original crosstabulation,
crosstabulation, we see that the
type of court is a hidden variable that cannot be ignored
when evaluating the records of the two judges.

Scatter Diagram and Trendline


A scatter diagram is a graphical presentation of the
relationship between two quantitative variables.
One variable is shown on the horizontal axis and
the other variable is shown on the vertical axis.
The general pattern of the plotted points suggests
the overall relationship between the variables.
A trendline provides an approximation of the
p
relationship.

55

Scatter Diagram
A Positive Relationship

Scatter Diagram
A Negative Relationship

56

Scatter Diagram
No Apparent Relationship

Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of
Interceptions
1
3
2
1
3

y = Number of
Points Scored
14
24
18
17
30

57

Scatter Diagram

Nu
umber of Points Sco
ored

y
35
30
25
20
15
10
5
0

x
2

Number of Interceptions

Example: Panthers Football Team


Insights Gained from the Preceding Scatter Diagram

The scatter diagram indicates a positive relationship


between the number of interceptions
p
and the
number of points scored.

Higher points scored are associated with a higher


number of interceptions.

The relationship is not perfect; all plotted points in


the scatter diagram are not on a straight line.

58

Scatter Diagram and Trendline


Scatter Diagram for the Panthers
35
Number of
Points Scored.

30
25
20
15
10
5
0
0

1
2
3
Number of Interceptions

Tabular and Graphical Methods


Data
Categorical Data
Tabular
Methods
Frequency
Distribution
Rel. Freq. Dist.
Percent Freq.
Distribution
Crosstabulation

Quantitative Data

Graphical
Methods

Tabular
Methods

Bar Chart
Pie Chart

Frequency
Distribution
Rel. Freq. Dist.
% Freq. Dist.
q Dist.
Cum. Freq.
Cum. Rel. Freq.
Distribution
Cum. % Freq.
Distribution
Crosstabulation

Graphical
Methods
Dot Plot
Histogram
Ogive
StemStem-and
and-Leaf Display
Scatter
Diagram

59

Descriptive Statistics: Numerical Measures




Measures of Location

Measures of Variability

Measures of Location






Mean
Median
Mode
Percentiles
Quartiles

If the measures are computed


for data from a sample,
they are called sample statistics.
statistics.
If the measures are computed
for data from a population,
they are called population parameters.
parameters.

A sample statistic is referred to


as the point estimator of the
corresponding population parameter.

60

Mean





Perhaps the most important measure of location is


the mean.
mean.
The mean provides a measure of central location
location..
The mean of a data set is the average of all the data
values.
The sample mean x is the point estimator of the
population mean .

Sample Mean x

Sum of the values


of the n observations
i

n
Number of
observations
in the sample

61

Population Mean

Sum of the values


of the N observations
i

N
Number of
observations in
the population

Sample Mean
Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445
440
465
450
600
570
510

615
440
450
470
485
515
575

430
440
525
490
580
450
490

590
525
450
472
470
445
435

435
425
450
475
490
525
600

600
445
460
475
500
535
435

460
575
435
500
549
475
445

600
445
460
480
500
550
435

440
450
465
570
500
480
430

615
450
480
465
480
510
440

62

Sample Mean
Example: Apartment Rents
x
445
440
465
450
600
570
510

615
440
450
470
485
515
575

430
440
525
490
580
450
490

590
525
450
472
470
445
435

34,356
490.80
70

435
425
450
475
490
525
600

600
445
460
475
500
535
435

460
575
435
500
549
475
445

600
445
460
480
500
550
435

440
450
465
570
500
480
430

615
450
480
465
480
510
440

Median
The median of a data set is the value in the middle
when the data items are arranged in ascending order.
Whenever a data set has extreme values,, the median
is the preferred measure of central location.
The median is the measure of location most often
reported for annual income and property value data.
A few extremely large incomes or property values
can inflate the mean.

63

Median
For an odd number of observations:
26

18

27

12 14

27

19

7 observations

12

14

18

19

27

27

in ascending order

26

the median is the middle value.


Median = 19

Median
For an even number of observations:
26

18

27

12 14

27

30

12

14

18

19

27

27 30

26

19

8 observations
in ascending order

the median is the average of the middle two values.


Median = ((19 + 26)/2
)/ = 22.5

64

Median
Example: Apartment Rents
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Note: Data is in ascending order.

Trimmed Mean
Another measure, sometimes used when extreme
values are present, is the trimmed mean.
mean.
It is obtained by
y deleting
gap
percentage
g of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining
g values.

65

Mode
The mode of a data set is the value that occurs with
greatest frequency.
The g
greatest frequency
q
y can occur at two or more
different values.
If the data have exactly two modes, the data are
bimodal
bimodal..
If the data have more than two modes, the data are
multimodal..
multimodal
Caution: If the data are bimodal or multimodal,
Excels MODE function will incorrectly identify a
single mode.

Mode
Example: Apartment Rents
450 occurred most frequently (7 times)
Mode = 450
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Note: Data is in ascending order.

66

Percentiles
A percentile provides information about how the
data are spread over the interval from the smallest
value to the largest value.
Admission test scores for colleges and universities
are frequently reported in terms of percentiles.


The pth percentile of a data set is a value such that at


least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.

Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.
i = (p
(p/100)
/100)n
n
If i is not an integer, round up. The p th percentile
is the value in the i th position.
If i is an integer, the p th percentile is the average
of the values in positions i and i +1.

67

80th Percentile
Example: Apartment Rents
i = (p
(p/100)n
/100)n = (80/100)70 = 56
Averaging
g g the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Note: Data is in ascending order.

80th Percentile
Example: Apartment Rents
At least 80% of the
items take on a
value of 542 or less.

At least 20% of the


items take on a
value of 542 or more.

56/70 = .8 or 80%

14/70 = .2 or 20%

425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

68

Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile

Third Quartile
Example: Apartment Rents
Third quartile = 75th percentile
i = ((pp/100)
/100)n
n = (75/100)70 = 52.5 = 53
Third quartile = 525
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Note: Data is in ascending order.

69

Measures of Variability
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.

Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation

70

Range
The range of a data set is the difference between the
largest and smallest data values.
p
measure of variability.
y
It is the simplest
It is very sensitive to the smallest and largest data
values.

Range
Example: Apartment Rents
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Note: Data is in ascending order.

71

Interquartile Range
The interquartile range of a data set is the difference
between the third quartile and the first quartile.
It is the range
g for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.

Interquartile Range
Example: Apartment Rents
3rd Quartile (Q
(Q3) = 525
1st Q
Quartile (Q
(Q1)) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Note: Data is in ascending order.

72

Variance
The variance is a measure of variability that utilizes
all the data.
It is based on the difference between the value of
each observation (x
(xi) and the mean ( x for a sample,
for a population).
The variance is useful in comparing the variability
of two or more variables.

Variance
The variance is the average of the squared
differences between each data value and the mean.
The variance is computed as follows:
( xi x )
s
n 1
2

for a
sample

( xi )2
N

for a
population

73

Standard Deviation
The standard deviation of a data set is the positive
square root of the variance.
It is measured in the same units as the data
data,, making
it more easily interpreted than the variance.

Standard Deviation
The standard deviation is computed as follows:

s2

for a
sample

for a
population

74

Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:

100 %
x

100 %

for a
sample

for a
population

Sample Variance, Standard Deviation,


And Coefficient of Variation
Example: Apartment Rents

Variance

s2

(x

x )2

n1

2,, 996.16

Standard Deviation
s s 2 2996.16

Coefficient of Variation

54.74

the standard
deviation is
about 11%
of the mean

54.74

100 % 11.15%
100 %
x
490.80

75

Descriptive Statistics: Numerical Measures




Measures of Distribution Shape, Relative Location,


and Detecting Outliers

Exploratory Data Analysis

Measures of Association Between Two Variables

The Weighted Mean and


Working with Grouped Data

Measures of Distribution Shape,


Relative Location, and Detecting Outliers






Distribution Shape
z-Scores
Chebyshevs
Chebyshev s Theorem
Empirical Rule
Detecting Outliers

76

Distribution Shape: Skewness




An important measure of the shape of a distribution


is called skewness
skewness..
The formula for the skewness of sample
p data is

n
xi x
Skewness

(n 1)(n 2) s


Skewness can be easily computed using statistical


software.

Distribution Shape: Skewness


Symmetric (not skewed)
Skewness is zero.
Mean and median are equal.
q
.35

Relativ
ve Frequency

Skewness = 0

.30
.25
.20
.15
.10
.05
0

77

Distribution Shape: Skewness




Moderately Skewed Left


Skewness is negative.
Mean will usuallyy be less than the median.

Relativ
ve Frequency

.35

Skewness = .31

.30
.25
.20
.15
.10
.05
0

Distribution Shape: Skewness


Moderately Skewed Right
Skewness is positive.
Mean will usuallyy be more than the median.
.35

Relativ
ve Frequency

Skewness = .31

.30
.25
.20
.15
.10
.05
0

78

Distribution Shape: Skewness




Highly Skewed Right


Skewness is positive (often above 1.0).
Mean will usuallyy be more than the median.
Skewness = 1.25

Relativ
ve Frequency

.35
.30
.25
.20
.15
.10
.05
0

Distribution Shape: Skewness




Example: Apartment Rents


Seventy efficiency apartments were randomly
sampled
p
in a college
g town. The monthly
y rent p
prices
for the apartments are listed below in ascending order.
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

79

Distribution Shape: Skewness


Example: Apartment Rents
.35

Relative Frequency

Skewness = .92

.30
.25
.20
.15
.10
10
.05
0

z-Scores
The z-score is often called the standardized value.
It denotes the number of standard deviations a data
value xi is from the mean.

zi

xi x
s

Excels STANDARDIZE function can be used to


compute the zz-score.

80

z-Scores
An observations zz-score is a measure of the relative
location of the observation in a data set.
A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a zz-score greater than zero.
A data value equal to the sample mean will have a
z-score of zero.

z-Scores
Example: Apartment Rents
z-Score of Smallest Value (425)
z

xi x 425 490.80
490 80

1.20
s
54.74

Standardized Values for Apartment Rents


-1.20
-0.93
-0.75
-0.47
-0.20
0.35
1.54

-1.11
-0.93
-0.75
-0.38
-0.11
0.44
1.54

-1.11
-0.93
-0.75
-0.38
-0.01
0.62
1.63

-1.02
-0.84
-0.75
-0.34
-0.01
0.62
1.81

-1.02
-0.84
-0.75
-0.29
-0.01
0.62
1.99

-1.02
-0.84
-0.56
-0.29
0.17
0.81
1.99

-1.02
-0.84
-0.56
-0.29
0.17
1.06
1.99

-1.02
-0.84
-0.56
-0.20
0.17
1.08
1.99

-0.93
-0.75
-0.47
-0.20
0.17
1.45
2.27

-0.93
-0.75
-0.47
-0.20
0.35
1.45
2.27

81

Chebyshevs Theorem
At least (1 - 1/z
1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.
1
Chebyshevs theorem requires z > 1, but z need not
be an integer.

Chebyshevs Theorem
At least 75% of the data values must be
within z = 2 standard deviations of the mean.
At least 89% of the data values must be
within z = 3 standard deviations of the mean.
At least 94% of the data values must be
within z = 4 standard deviations of the mean.

82

Chebyshevs Theorem
Example: Apartment Rents
Let z = 1.5 with x = 490.80 and s = 54.74
At least (1 1/(1.5)2) = 1 0.44 = 0.56 or 56%
of the rent values must be between

x - z(s) = 490.80 1.5(54.74) = 409


and
(
) = 573
x + z(s) = 490.80 + 1.5(54.74)
(Actually, 86% of the rent values
are between 409 and 573.)

Empirical Rule
When the data are believed to approximate a
bell--shaped distribution
bell
The empirical rule can be used to determine the
percentage of data values that must be within a
specified number of standard deviations of the
mean.
The empirical rule is based on the normal
distribution ((will be discussed later))

83

Empirical Rule
For data having a bellbell-shaped distribution:
68.26% of the values of a normal random variable
are within +/mean
+/- 1 standard deviation of its mean.
95.44% of the values of a normal random variable
are within +/+/- 2 standard deviations of its mean.
99.72% of the values of a normal random variable
are within +/+/- 3 standard deviations of its mean.

Empirical Rule
99.72%
95.44%
68
68.26%
26%

3
1
2

+ 3
+ 1
+ 2

84

Detecting Outliers
An outlier is an unusually small or unusually large
value in a data set.
A data value with a zz--score less than -3 or greater
g
than +3 might be considered an outlier.
It might be:
an incorrectly recorded data value
a data value that was incorrectly included in the
data set
a correctly recorded data value that belongs in
the data set

Detecting Outliers
Example: Apartment Rents
The most extreme zz-scores are -1.20 and 2.27
Usingg ||zz| > 3 as the criterion for an outlier,, there
are no outliers in this data set.
Standardized Values for Apartment Rents
-1.20
-0.93
-0.75
-0.47
0 47
-0.20
0.35
1.54

-1.11
-0.93
-0.75
-0.38
0 38
-0.11
0.44
1.54

-1.11
-0.93
-0.75
-0.38
0 38
-0.01
0.62
1.63

-1.02
-0.84
-0.75
-0.34
0 34
-0.01
0.62
1.81

-1.02
-0.84
-0.75
-0.29
0 29
-0.01
0.62
1.99

-1.02
-0.84
-0.56
-0.29
0 29
0.17
0.81
1.99

-1.02
-0.84
-0.56
-0.29
0 29
0.17
1.06
1.99

-1.02
-0.84
-0.56
-0.20
0 20
0.17
1.08
1.99

-0.93
-0.75
-0.47
-0.20
0 20
0.17
1.45
2.27

-0.93
-0.75
-0.47
-0.20
0 20
0.35
1.45
2.27

85

Exploratory Data Analysis


Exploratory data analysis procedures enable us to use
simple arithmetic and easyeasy-to
to--draw pictures to
summarize data.
We simply sort the data values into ascending order
and identify the fivefive-number summary and then
construct a box plot
plot..

FiveFive-Number Summary
1

Smallest Value

First Quartile

Median

Third Quartile

Largest Value

86

FiveFive-Number Summary
Example: Apartment Rents
First Quartile = 445
Lowest Value = 425
Median = 475
Third Quartile = 525 Largest Value = 615
425
440
450
465
480
80
510
575

430
440
450
470
485
8
515
575

430
440
450
470
490
90
525
580

435
445
450
472
490
90
525
590

435
445
450
475
490
90
525
600

435
445
460
475
500
00
535
600

435
445
460
475
500
00
549
600

435
445
460
480
500
00
550
600

440
450
465
480
500
00
570
615

440
450
465
480
510
0
570
615

Box Plot
A box plot is a graphical summary of data that is
based on a fivefive-number summary.
A key to the development of a box plot is the
computation of the median and the quartiles Q1 and
Q3.
Box plots provide another way to identify outliers.

87

Box Plot
Example: Apartment Rents
A box is drawn with its ends located at the first and
third quartiles.
A vertical line is drawn in the box at the location of
the median (second quartile).

400 425 450 475 500 525 550 575 600 625
Q1 = 445
Q3 = 525
Q2 = 475

Box Plot
Limits are located (not drawn) using the interquartile
range (IQR).
Data outside these limits are considered outliers
outliers..
The locations of each outlier is shown with the
symbol * .

88

Box Plot
Example: Apartment Rents

The lower limit is located 1.5(IQR) below Q1.


Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

The upper limit is located 1.5(IQR) above Q3.


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.

Box Plot
Example: Apartment Rents

Whiskers (dashed lines) are drawn from the ends


of the box to the smallest and largest
g data values
inside the limits.

4 0 475 500
400
00 425 450
00 525
2 5500 575 600 625
Smallest value
inside limits = 425

Largest value
inside limits = 615

89

Box Plot

An excellentgraphicaltechnique for m aking


com parisons am ong tw o or m ore groups.

Measures of Association
Between Two Variables
Thus far we have examined numerical methods used
to summarize the data for one variable at a time.
Often a manager or decision maker is interested in
the relationship between two variables.
variables.
Two descriptive measures of the relationship
between two variables are covariance and correlation
coefficient
coefficient..

90

Covariance
The covariance is a measure of the linear association
between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.

Covariance
The covariance is computed as follows:
sxy

xy

( xi x )( yi y )
n 1

( xi x )( yi y )
N

for
samples

for
populations

91

Correlation Coefficient
Correlation is a measure of linear association and not
necessarily causation.
Just because two variables are highly correlated, it
does not mean that one variable is the cause of the
other.

Correlation Coefficient
The correlation coefficient is computed as follows:

rxy

sxy
sx s y

for
samples

xy

xy
x y

for
populations

92

Correlation Coefficient
The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear
relationship..
relationship
Values near +1 indicate a strong positive linear
relationship..
relationship
The closer the correlation is to zero, the weaker the
relationship.

Covariance and Correlation Coefficient


Example: Golfing Study
A golfer is interested in investigating the
relationship,
p if any,
y between driving
g distance and
18--hole score.
18
Average Driving
Average
18--Hole Score
Distance (yds.) 18
277.6
69
259.5
71
269.1
70
267.0
70
255.6
71
272.9
69

93

Covariance and Correlation Coefficient


Example: Golfing Study
x

277.6
259.5
269.1
267.0
255.6
272 9
272.9

69
71
70
70
71
69

10.65
-7.45
2.15
0.05
-11.35
5 95
5.95

-1.0
1.0
0
0
1.0
-1.0
10

Average 267.0 70.0


Std. Dev. 8.2192 .8944

-10.65
-7.45
0
0
-11.35
-5.95
5 95
Total -35.40

Covariance and Correlation Coefficient


Example: Golfing Study

Sample Covariance
sxy

35 40
(x x )(y y ) 35.40

n1

61

7.08

Sample Correlation Coefficient


rxy

sxy
sx s y

7.08
-.9631
(8.2192)(.8944)

94

The Weighted Mean and


Working with Grouped Data

Weighted Mean
Mean for Grouped Data
Variance for Grouped Data
Standard Deviation for Grouped Data

Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean
mean..
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.

95

Weighted Mean

where:
xi = value of observation i
wi = weight for observation i

Grouped Data
The weighted mean computation can be used to
obtain approximations of the mean, variance, and
standard deviation for the grouped data.
To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
We compute a weighted mean of the class midpoints
using the class frequencies as weights
weights..
Similarly,
y, in computing
p
g the variance and standard
deviation, the class frequencies are used as weights.

96

Mean for Grouped Data


Sample Data
x

fM
i

Population Data

fi M i
N

where:
fi = frequency of class i
Mi = midpoint of class i

Sample Mean for Grouped Data


Example:
Example: Apartment Rents
The previously presented sample of apartment
rents is shown here as grouped data in the form of
a frequency distribution.
Rent ($) Frequency
420-439
440-459
460-479
480-499
500-519
520-539
5
0 539
540-559
560-579
580-599
600-619

8
17
12
8
7
4
2
4
2
6

97

Sample Mean for Grouped Data


Example
Example:: Apartment Rents
Rent ($)
420 439
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560 579
560-579
580-599
600-619
Total

fi
8
17
12
8
7
4
2
4
2
6
70

Mi
429 5
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569 5
569.5
589.5
609.5

f iMi
3436 0
3436.0
7641.5
5634.0
3916.0
3566.5
2118.0
1099.0
2278 0
2278.0
1179.0
3657.0
34525.0

34, 525
34
493.21
70
This approximation
differs by $2.41 from
the actual sample
mean of $490.80.

Variance for Grouped Data


For sample data

s2

f i ( Mi x ) 2
n 1

For population data

2
fi ( Mi )
N

98

Sample Variance for Grouped Data


Example: Apartment Rents
Rent ($)
420-439
420
439
440-459
460-479
480-499
500-519
520-539
540-559
560 579
560-579
580-599
600-619
Total

fi
8
17
12
8
7
4
2
4
2
6
70

Mi
429 5
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569 5
569.5
589.5
609.5

Mi - x
-63
63.7
7
-43.7
-23.7
-3.7
16.3
36.3
56.3
76 3
76.3
96.3
116.3

(M i - x )2 f i (M i - x )2
4058 96 32471
4058.96
32471.71
71
1910.56 32479.59
562.16
6745.97
13.76
110.11
265.36
1857.55
1316.96
5267.86
3168.56
6337.13
5820 16 23280.66
5820.16
23280 66
9271.76 18543.53
13523.36 81140.18
208234.29

continued

Sample Variance for Grouped Data


Example: Apartment Rents

Sample Variance
s2 = 208,234.29/(70
208 234 29/(70 1) = 33,017.89
017 89

Sample Standard Deviation


s 3,017.89 54.94
This approximation differs by only $.20
from the actual standard deviation of $54.74.

99

Chapter 4
Introduction to Probability
Experiments, Counting Rules,
and Assigning Probabilities
Events and Their Probability
Some Basic Relationships
of Probability
Conditional Probability
Bayes Theorem

Uncertainties
Managers often base their decisions on an analysis
of uncertainties such as the following:
What are the chances that sales will decrease
if we increase prices?
What is the likelihood a new assembly method
will increase productivity?
What are the odds that a new investment will
be profitable?

100

Probability
Probability is a numerical measure of the likelihood
that an event will occur.
Probability values are always assigned on a scale
from 0 to 1.
A probability near zero indicates an event is quite
unlikely to occur.
A probability near one indicates an event is almost
certain to occur.

Probability as a Numerical Measure


of the Likelihood of Occurrence
Increasing Likelihood of Occurrence
Probability:

0
The event
is very
unlikely
to occur.
occur

.5
The occurrence
of the event is
just as likely as
it is unlikely.
unlikely

1
The event
is almost
certain
to occur.
occur

101

Statistical Experiments
In statistics, the notion of an experiment differs
somewhat from that of an experiment in the
physical
sciences.
p
y
In statistical experiments, probability determines
outcomes.
Even though the experiment is repeated in exactly
the same way, an entirely different outcome may
occur.
For this reason, statistical experiments are somesometimes called random experiments.
experiments.

An Experiment and Its Sample Space


An experiment is any process that generates wellwelldefined outcomes.
The sample space for an experiment is the set of
all experimental outcomes.
An experimental outcome is also called a sample
point.
point.

102

An Experiment and Its Sample Space


Experiment

Experiment Outcomes

Toss a coin
Inspection a part
Conduct a sales call
Roll a die
Play a football game

Head, tail
Defective, nonnon-defective
Purchase, no purchase
1, 2, 3, 4, 5, 6
Win, lose, tie

An Experiment and Its Sample Space


Example: Bradley Investments
Bradley has invested in two stocks, Markley Oil
and Collins Mining. Bradley has determined that the
possible
bl outcomes off these
h
investments three
h
months
h
from now are as follows.
Investment Gain or Loss
in 3 Months (in $000)
Markley Oil Collins Mining
8
10
2
5
0
20

103

A Counting Rule for


Multiple--Step Experiments
Multiple
If an experiment consists of a sequence of k steps
in which there are n1 possible results for the first step,
n2 possible results for the second step, and so on,
then the total number of experimental outcomes is
)(n
n2) . . . (n
(nk).
given by (n
(n1)(
A helpful graphical representation of a multiple
multiple--step
experiment is a tree diagram.
diagram.

A Counting Rule for


Multiple--Step Experiments
Multiple
Example: Bradley Investments
Bradley Investments can be viewed as a twotwo-step
experiment. It involves two stocks, each with a set of
experimental outcomes.
Markley Oil:
Collins Mining:
Total Number of
Experimental
xpe
e ta Outcomes:
Outco es:

n1 = 4
n2 = 2
n1n2 = ((4)(2)
)( ) = 8

104

Tree Diagram
Example: Bradley Investments
Markley Oil
(Stage 1)

Collins Mining
(Stage 2)
Gain 8

Gain 10

Gain 8
Gain 5

Lose 2

Lose 2

Gain 8

Even
Lose 20

Gain 8
Lose 2

Lose 2

Experimental
Outcomes
(10, 8)

Gain $18,000

(10, -2) Gain

$8,000

(5, 8)

Gain $13,000

(5, -2)

Gain

$3,000

((0, 8))

Gain

$8,000

(0, -2)

Lose

$2,000

(-20, 8) Lose $12,000


(-20, -2) Lose $22,000

Counting Rule for Combinations


Number of Combinations of N Objects
Taken n at a Time
A second useful counting rule enables us to count
the number of experimental outcomes when n objects
are to be selected from a set of N objects.

where:
h

N! = N(N 1)(N
1)(N 2) . . . (2)(1)
n! = n(n 1)(n
1)(n 2) . . . (2)(1)
0! = 1

105

Counting Rule for Permutations


Number of Permutations of N Objects
Taken n at a Time
A third useful counting rule enables us to count
the
h number
b off experimentall outcomes when
h n
objects are to be selected from a set of N objects,
where the order of selection is important.

where:

N! = N(N 1)(N
1)(N 2) . . . (2)(1)
n! = n(n 1)(n
1)(n 2) . . . (2)(1)
0! = 1

Assigning Probabilities
Basic Requirements for Assigning Probabilities
1. The probability assigned to each experimental
outcome must be between 0 and 11, inclusively
inclusively.
0 < P(Ei) < 1 for all i
where:
Ei is the ith experimental outcome
and P(Ei) is its probability

106

Assigning Probabilities
Basic Requirements for Assigning Probabilities
2. The sum of the probabilities for all experimental
1
outcomes must equal 1.
P(E1) + P(E2) + . . . + P(En) = 1
where:
n is the number of experimental outcomes

Assigning Probabilities
Classical Method
Assigning probabilities based on the assumption
off equally
ll likely
lik l outcomes
t
Relative Frequency Method
Assigning probabilities based on experimentation
or historical data
S bj ti Method
Subjective
M th d
Assigning probabilities based on judgment

107

Classical Method
Example: Rolling a Die
If an experiment has n possible outcomes, the
classical method would assign a probability of 1/n
1/n
to each outcome.
Experiment: Rolling a die
Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a
1/6 chance of occurring

Relative Frequency Method


Example: Lucas Tool Rental
Lucas Tool Rental would like to assign probabilities
to the number of car polishers it rents each day.
Office records show the following frequencies of daily
rentals for the last 40 days.
Number of
Polishers Rented
0
1
2
3
4

Number
of Days
4
6
18
10
2

108

Relative Frequency Method


Example: Lucas Tool Rental
Each probability assignment is given by dividing
the frequency
of days)
q
y (number
(
y ) by
y the total frequency
q
y
(total number of days).
Number of
Polishers Rented
0
1
2
3
4

Number
of Days
4
6
18
10
2
40

Probability
.10
.15
.45
45
4/40
.25
.05
1.00

Subjective Method
When economic conditions and a companys
circumstances change rapidly it might be
inappropriate to assign probabilities based solely on
hi t i l d
historical
data.
t
We can use any data available as well as our
experience and intuition, but ultimately a probability
value should express our degree of belief that the
experimental outcome will occur.
The best p
probability
y estimates often are obtained by
y
combining the estimates from the classical or relative
frequency approach with the subjective estimate.

109

Subjective Method
Example: Bradley Investments
An analyst made the following probability estimates.
N
Nett G
Gain
i or Loss
L
P
Probability
b bilit
E
Exper.
Outcome
O t
(10, 8)
.20
$18,000 Gain
(10, 2)
.08
$8,000 Gain
(5, 8)
.16
$13,000 Gain
(5, 2)
.26
$3,000 Gain
(0, 8)
.10
$8,000 Gain
(0, 2)
$2,000 Loss
.12
(20, 8)
$12,000 Loss
.02
(20, 2)
$22,000 Loss
.06

Events and Their Probabilities


An event is a collection of sample points.
The probability of any event is equal to the sum of
the probabilities of the sample points in the event.
If we can identify all the sample points of an
experiment and assign a probability to each, we
can compute the probability of an event.

110

Events and Their Probabilities


Example: Bradley Investments
Event M = Markley Oil Profitable
M = {(10, 8), (10, 2), (5, 8), (5, 2)}
P(M) = P(10, 8) + P(10, 2) + P(5, 8) + P(5, 2)
= .20 + .08 + .16 + .26
= .70

Events and Their Probabilities


Example: Bradley Investments
Event C = Collins Mining Profitable
C = {(10, 8), (5, 8), (0, 8), (
(20, 8)}
P(C) = P(10, 8) + P(5, 8) + P(0, 8) + P(20, 8)
= .20 + .16 + .10 + .02
= .48

111

Some Basic Relationships of Probability


There are some basic probability relationships that
can be used to compute the probability of an event
without knowledge of all the sample point probabilities.
Complement of an Event
Union of Two Events
Intersection of Two Events
Mutually Exclusive Events

Complement of an Event
The complement of event A is defined to be the event
consisting of all sample points that are not in A.
The complement of A is denoted by Ac.

Event A

Ac

Sample
Space S

Venn
Diagram

112

Union of Two Events


The union of events A and B is the event containing
all sample points that are in A or B or both.
The union of events A and B is denoted by A B
B

Event A

Event B

Sample
Space S

Union of Two Events


Example: Bradley Investments
Event M = Markley Oil Profitable
E
Event
C = Collins
C lli Mining
Mi i Profitable
P fi bl
M C
C = Markley Oil Profitable
or Collins Mining Profitable (or both)
M C
C = {(10, 8), (10, 2), (5, 8), (5, 2), (0, 8), ((
20, 8)}
P(M C)
C) = P(10, 8) + P(10, 2) + P(5, 8) + P(5, 2)
+ P(0, 8) + P(20, 8)
= .20 + .08 + .16 + .26 + .10 + .02
= .82

113

Intersection of Two Events


The intersection of events A and B is the set of all
sample points that are in both A and B.
The intersection of events A and B is denoted by A

Event A

Event B

Sample
Space S

Intersection of A and B

Intersection of Two Events


Example: Bradley Investments
Event M = Markley Oil Profitable
Event
E
C = Collins
C lli Mining
Mi i Profitable
P fi bl
M C
C = Markley Oil Profitable
and Collins Mining Profitable
M C
C = {(10, 8), (5, 8)}
P(M C)
C) = P(10, 8) + P(5, 8)
= .20 + .16
= .36

114

Addition Law
The addition law provides a way to compute the
probability of event A, or B, or both A and B occurring.
The law is written as:
P(A B
B) = P(A) + P(B) P(A B

Addition Law
Example: Bradley Investments
Event M = Markley Oil Profitable
Event C = Collins Mining Profitable
M C
C = Markley Oil Profitable
or Collins Mining Profitable
We know: P(M) = .70, P(C) = .48, P(M C
C) = .36
Thus: P(M C) = P(M) + P(C
P(C) P(M C)
= .70
70 + .48
48 .36
36
= .82
(This result is the same as that obtained earlier
using the definition of the probability of an event.)

115

Mutually Exclusive Events


Two events are said to be mutually exclusive if the
events have no sample points in common.
Two events are mutually exclusive if, when one event
occurs, the other cannot occur.
occurs,

Event A

E nt B
Event

Sample
Space S

Mutually Exclusive Events


If events A and B are mutually exclusive, P(A B = 0.
Th addition
The
dditi llaw for
f mutually
t ll exclusive
l i events
t is:
i
P(A B
B) = P(A) + P(B)

There is no need to
include
P(A B

116

Conditional Probability
The probability of an event given that another event
has occurred is called a conditional probability
probability..
The conditional probability of A given B is denoted
by P(A|B).
A conditional probability is computed as follows :

Conditional Probability
Example: Bradley Investments
Event M = Markley Oil Profitable
E
Event
C = Collins
C lli Mining
Mi i Profitable
P fi bl
= Collins Mining Profitable
given Markley Oil Profitable
We know: P(M C
C) = .36, P(M) = .70
Thus:

117

Multiplication Law
The multiplication law provides a way to compute the
probability of the intersection of two events.
The law is written as:
P(A B
B) = P(B)P(A|B)

Multiplication Law
Example: Bradley Investments
Event M = Markley Oil Profitable
Event C = Collins Mining Profitable
M C
C = Markley Oil Profitable
and Collins Mining Profitable
We know: P(M) = .70, P(C|M) = .5143
Thus: P(M C) = P(M)P(M|C
M|C))
= (.70)(.5143)
= .36
(This result is the same as that obtained earlier
using the definition of the probability of an event.)

118

Joint Probability Table


Collins Mining
Profitable (C) Not Profitable (Cc)

Markley Oil

Total

Profitable (M)

.36

.34

.70

Not Profitable (Mc)

.12

.18

.30

Total

.48

.52

1.00

Joint Probabilities
(appear in the body
of the table)

Marginal Probabilities
(appear in the margins
of the table)

Independent Events
If the probability of event A is not changed by the
existence of event B, we would say that events A
and B are independent
independent.
p
.
Two events A and B are independent if:
P(A|B) = P(A)

or

P(B|A) = P(B)

119

Multiplication Law
for Independent Events
The multiplication law also can be used as a test to see
if two events are independent.
The law is written as:
P(A B
B) = P(A)P(B)

Multiplication Law
for Independent Events
Example: Bradley Investments
Event M = Markley Oil Profitable
E
Event
t C = Collins
C lli Mining
Mi i Profitable
P fit bl
Are events M and C independent?
Does
DoesP(M C) = P(M)P(C) ?
We know: P(M C) = .36, P(M) = .70, P(C) = .48
But: P(M)P(C) = (.70)(.48) = .34, not .36
H
Hence:
M and
d C are nott independent.
i d
d t

120

Mutual Exclusiveness and Independence


Do not confuse the notion of mutually exclusive
events with that of independent events.
Two events with nonzero probabilities cannot be
both mutually exclusive and independent.
If one mutually exclusive event is known to occur,
the other cannot occur.; thus, the probability of the
other event occurring is reduced to zero (and they
are therefore
dependent).
h f
d
d )
Two events that are not mutually exclusive, might
or might not be independent.

Bayes Theorem
Often we begin probability analysis with initial or
prior probabilities.
probabilities.
Then, from a sample,
report,
or a p
product
p special
p
p
test we obtain some additional information.
Given this information, we calculate revised or
posterior probabilities.
probabilities.
Bayes theorem provides the means for revising the
prior probabilities.
Prior
Probabilities

New
Information

Application
of Bayes
Theorem

Posterior
Probabilities

121

Bayes Theorem
Example: L. S. Clothiers
A proposed shopping center will provide strong
competition
p
for downtown businesses like L. S.
Clothiers. If the shopping center is built, the owner
of L. S. Clothiers feels it would be best to relocate to
the shopping center.
The shopping center cannot be built unless a
zoning change is approved by the town council.
The planning board must first make a
recommendation, for or against the zoning change,
to the council.

Prior Probabilities
Example: L. S. Clothiers
Let:
A1 = town council approves the zoning change
A2 = town council disapproves the change
Using subjective judgment:
P(A
P(
A1) = .7, P(
P(A
A2) = .3

122

New Information
Example: L. S. Clothiers
The planning board has recommended against
the zoning
g change.
g Let B denote the event of a
negative recommendation by the planning board.
Given that B has occurred, should L. S. Clothiers
revise the probabilities that the town council will
approve or disapprove the zoning change?

Conditional Probabilities
Example: L. S. Clothiers
Past history with the planning board and the town
council indicates the following:
g

Hence:

P(B|A1) = .2

P(B|A2) = .9

P(BC|A1) = .8

P(BC|A2) = .1

123

Tree Diagram
Example: L. S. Clothiers
Town Council Planning Board

P(A1) = .7

Experimental
Outcomes

P(B|A1) = .2

P(A1 B) = .14

P(Bc|A1) = .8

P(A1 Bc) = .56

P(B|A2) = .99

P(A2 B) = .27
27

P(Bc|A2) = .1

P(A2 Bc) = .03

P(A2) = .3

Bayes Theorem
To find the posterior probability that event Ai will
occur given that event B has occurred, we apply
Bayes theorem.
theorem.

Bayes theorem is applicable when the events for


which we want to compute posterior probabilities
are mutually exclusive and their union is the entire
sample space.

124

Posterior Probabilities
Example: L. S. Clothiers
Given the planning boards recommendation not
to approve
the zoning
pp
g change,
g we revise the prior
p
probabilities as follows:

= .34

Posterior Probabilities
Example: L. S. Clothiers
The planning boards recommendation is good
news for L. S. Clothiers. The p
posterior p
probability
y of
the town council approving the zoning change is .34
compared to a prior probability of .70.

125

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 1
Prepare the following three columns:
Column 1 The mutually exclusive events for
which posterior probabilities are desired.
Column 2 The prior probabilities for the events.
Column 3 The conditional probabilities of the
new information given each event.

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 1
(1)

(2)

(3)

(4)

(5)

Conditional
Prior
Events Probabilities Probabilities

Ai

P(Ai)

P(B|Ai)

A1

.7

.2

A2

.33

.99

1.0

126

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 2
Prepare
p
the fourth column:
Column 4
Compute the joint probabilities for each event and
the new information B by using the multiplication
law.
Multiply the prior probabilities in column 2 by
th corresponding
the
di conditional
diti
l probabilities
b biliti in
i
column 3. That is, P(Ai B) = P(Ai) P(B|Ai).

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 2
(1)

(2)

(3)

(4)

(5)

Conditional
Prior
Joint
Events Probabilities Probabilities Probabilities

Ai

P(Ai)

P(B|Ai)

P(Ai B)

A1

.7

.2

.14

A2

.33

.99

27
.27

1.0

.7 x .2

127

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 2 (continued)
We see that there is a .14 p
probability
y of the town
council approving the zoning change and a
negative recommendation by the planning board.
There is a .27 probability of the town council
disapproving the zoning change and a negative
recommendation by the planning board.

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 3
Sum the jjoint probabilities
p
in Column 4. The
sum is the probability of the new information,
P(B). The sum .14 + .27 shows an overall
probability of .41 of a negative recommendation
by the planning board.

128

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 3
(1)

(2)

(3)

(4)

(5)

Conditional
Prior
Joint
Events Probabilities Probabilities Probabilities

Ai

P(Ai)

P(B|Ai)

P(Ai B)

A1

.7

.2

.14

A2

.33

.99

.27
27

1.0

P(B) = .41

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 4
Prepare
p
the fifth column:
Column 5
Compute the posterior probabilities using the
basic relationship of conditional probability.

The joint probabilities P(Ai B) are in column 4


and the probability P(B) is the sum of column 4.

129

Bayes Theorem: Tabular Approach


Example: L. S. Clothiers
Step 4
(1)

(2)

(3)

(4)

(5)

Prior
Joint
Posterior
Conditional
Events Probabilities Probabilities Probabilities Probabilities
P(B|Ai)
P(Ai)
P(Ai B)
P(Ai |B)
Ai

A1

.7

.2

.14

.3415

A2

.33

.99

.27
27

.6585
6585

P(B) = .41

1.0000

1.0

.14/.41

130

You might also like