Quantitative Methods in Management

Quantitative Methods in
Management
Day-4
Recap..
• Introduction
• Definition
• Terms and terminologies
• Types of statistics
• Types of data
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing variables
• Tables
• Frequency distribution
• Relative frequency distribution
• Relative percent frequency distribution
• Cumulative frequency distribution
• Univariate
• Bivariate / cross tabulation
• Diagrams
• Bar charts
• Pie charts
• Graphs
• Histogram
• Frequency polygon
• Frequency curve
• Cumulative frequency curve ( Ogive)
• EDA
• Stem and leaf plot
• Scatter diagram
• Dot plots
• Pareto chart
Numerical descriptive
statistics (cont…)
Day 4
Pg. 99-148
• Measures of location
• Measures of dispersion
• Measures of shapes
• kurtosis
RELATIVE LOCATION
Z score
Chebyshev's inequality
Empirical rule
Relative location – Z score
* In addition to measures of location, variability, and
shape, we are also interested in the relative location of
values within a data set.
* Measures of relative location help us determine how

far a particular value is from the mean.
* By using both the mean and standard deviation, we can

determine the relative location of any observation.
z-Scores
The z-score is often called the standardized value.
It denotes the number of standard deviations a data

value xi is from the mean.
xi  x
zi 
s
Excel’s STANDARDIZE function can be used to

compute the z-score.
Locating Extreme Outliers:
Z-Score
DCOVA
 To compute the Z-score of a data value, subtract the mean
and divide by the standard deviation.
 The Z-score is the number of standard deviations a data value

is from the mean.
 A data value is considered an extreme outlier if its Z-score is

less than -3.0 or greater than +3.0.
 The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Z-Score
DCOVA
XX
Z
S
where X represents the data value

X is the sample mean
S is the sample standard deviation
Z-Score
DCOVA
 Suppose the mean math SAT score is 490, with a
standard deviation of 100.
 Compute the Z-score for a test score of 620.
X  X 620  490 130

Z    1.3
S 100 100
A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
z-Scores
 An observation’s z-score is a measure of the relative

location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.
z-Scores
 Example: Apartment Rents
•z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Detecting Outliers
 An outlier is an unusually small or unusually large
value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set
Detecting Outliers FOR
PRACTICE
• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Empirical Rule
When the data are believed to approximate a

bell-shaped distribution …
The empirical rule can be used to determine the

percentage of data values that must be within a
specified number of standard deviations of the
mean.
The empirical rule is based on the normal

distribution, which is covered in later chapter.
Empirical Rule
For data having a bell-shaped distribution:
68.26% of the values of a normal random variable
are within
+/- 1 standard deviation of its mean.

are within
+/- 2 standard deviations of its mean.

are within
+/- 3 standard deviations of its mean.
Empirical Rule
99.72%
95.44%
68.26%
m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
The Empirical Rule
• The empirical rule approximates the variation of
data
in a bell-shaped distribution
• Approximately 68% of the data in a bell shaped
distribution is within 1 standard deviation of the
mean
or μ  1σ
68%
μ
μ  1σ
The Empirical Rule
• Approximately 95% of the data in a bell-shaped
distribution lies within two standard deviations of
the mean, or µ ± 2σ
• Approximately 99.7% of the data in a bell-shaped
distribution lies within three standard deviations
of the mean, or µ ± 3σ
95% 99.7%
μ  2σ μ  3σ
Using the Empirical Rule
 Suppose that the variable Math SAT scores is bell-shaped with a mean of 500 and
a standard deviation of 90. Then,
 68% of all test takers scored between 410 and 590 (500 ± 90).
 95% of all test takers scored between 320 and 680 (500 ± 180).
 99.7% of all test takers scored between 230 and 770 (500 ± 270).
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.
Chebyshev’s theorem requires z > 1, but z need not

be an integer.
At least 75% of the data values must be

within z = 2 standard deviationsof the mean.
At least 89%of the data values must be

At least94% of the data values must be

Let z = 1.5 with x = 490.80 and s = 54.74
At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%

of the rent values must be between
x - z(s) = 490.80  1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values

are between 409 and 573.)
Chebyshev Rule
• Regardless of how the data are distributed, at least (1 -

1/k2) x 100% of the values will fall within k standard
deviations of the mean (for k > 1)
• Examples:
At least withi
n
(1 - 1/22) x 100% = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) x 100% = 89% ………. k=3 (μ ± 3σ)
EXPLORATORY DATA
ANALYSIS
FIVE NUMBER SUMMARY
BOX PLOT
Exploratory Data Analysis
Exploratory data analysis procedures enable us to use
simple arithmetic and easy-to-draw pictures to
summarize data.
We simply sort the data values into ascending order

and identify the five-number summary and then
construct a box plot.
FIVE NUMBER SUMMARY
1. MINIMUM
2. QUARTILE 1
3. MEDIAN
4. QUARTILE 3
5. MAXIMUM
Computing the five number summary
• 80,100,100,110,130,190,200
• Q1= 100 Q3 = 190 Min = 80 Max 200
( for small sample size ; conflicting results may occur , the shape cannot
be clearly determined.)
• The monthly starting salaries for a sample of 12 business school graduates
are given below ( in ascending order)
3310 3355 3450 3480 3480
3490 3520 3540 3550 3650
3730 3925
FIVE NUMBER SUMMARY ARE

Min = 3310
Q1 = 3465
Médian = 3505
Q3 = 3600
Maximum = 3925
• The data shows a smallest value of
3310 and a largest value of 3925.
• Approximately one-fourth, or 25%, of

the observations are between adjacent
numbers in a five-number summary.
Box Plot
A box plot is a graphical summary of data that is

based on a five-number summary.
A key to the development of a box plot is the

computation of the median and the quartiles Q1 and
Q3 .
Box plots provide another way to identify outliers.

Box and Whisker Plot
• Five secific values are used:
• Median, Q2
• First quartile, Q1
• Third quartile, Q3
• Minimum value in the data set
• Maximum value in the data set
• Inner Fences
• IQR = Q3 - Q1
• Lower inner fence = Q1 - 1.5 IQR
• Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
• Lower outer fence = Q1 - 3.0 IQR
• Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
Minimum Q1 Q2 Q3 Maximum
Five-Number Summary
Lowest Value = 425First Quartile = 445
Median = 475
Third Quartile = 525
Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot

• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).
40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
 Limits are located (not drawn) using

the interquartile range (IQR).
 Data outside these limits are
considered outliers.
 The locations of each outlier is shown
with the symbol * .
continued
• LL = Q1 -1.5 (IQR)
• UL = Q3+1.5(IQR)
• If x < LL or x > UL ;;; x is an outlier

Box Plot

• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
• The upper limit is located 1.5(IQR) above Q3.
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
• There are no outliers (values less than 325 or

greater than 645) in the apartment rent data.
General Descriptive Stats Using
Microsoft Excel Functions DCOVA
House Prices Descriptive Statistics

$ 2,000,000 Mean $ 600,000 =AVERAGE(A2:A6)
$ 500,000 Standard Error $ 357,770.88 =D6/SQRT(D14)
$ 300,000 Median $ 300,000 =MEDIAN(A2:A6)
$ 100,000 Mode $ 100,000.00 =MODE(A2:A6)
$ 100,000 Standard Deviation $ 800,000 =STDEV(A2:A6)
Sample Variance 640,000,000,000 =VAR(A2:A6)
Kurtosis 4.1301 =KURT(A2:A6)
Skewness 2.0068 =SKEW(A2:A6)
Range $ 1,900,000 =D12 - D11
Minimum $ 100,000 =MIN(A2:A6)
Maximum $ 2,000,000 =MAX(A2:A6)
Sum $ 3,000,000 =SUM(A2:A6)
Count 5 =COUNT(A2:A6)
salary
salary
3310
Mean 3540
3355
Standard Error 47.81989569
3450 Median 3505
3480 Mode 3480
3480 Standard Deviation 165.6529779
3490 Sample Variance 27440.90909
3520 Kurtosis 1.718883645
3540 Skewness 1.091108688
3550 Range 615
Minimum 3310
3650
Maximum 3925
3730
Sum 42480
3925 Count 12
Microsoft Excel Data Analysis Tool
DCOVA
1. Select Data.
2. Select Data Analysis.
3. Select Descriptive Statistics and
click OK.
Microsoft Excel
DCOVA
4. Enter the cell range.

5. Check the Summary
Statistics box.
6. Click OK
Excel output
DCOVA
House Prices
Microsoft Excel
Mean 600000
descriptive statistics output, using the
Standard Error 357770.8764
house price data:
Median 300000
Mode 100000
Standard Deviation 800000
House Prices: Sample Variance 640,000,000,000
Kurtosis 4.1301
$2,000,000 Skewness 2.0068
500,000 Range 1900000
300,000 Minimum 100000
100,000 Maximum 2000000
100,000 Sum 3000000
Count 5
Minitab Output
DCOVA
Minitab descriptive statistics output using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Descriptive Statistics: House Price
Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000
N for
Variable Median Maximum Range Mode Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2 2.01 4.13
Distribution Shape and
The Boxplot DCOVA
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Boxplot Example
DCOVA
• Below is a Boxplot for the following data:
Xsmallest Q1 Q2 / Median Q3 Xlargest

0 2 2 2 3 3 4 5 5 9 27
00 22335 5 27 27
• The data are right skewed, as the plot depicts

Sample statistics versus population
parameters
DCOVA
Measure Population Sample

Parameter Statistic
Mean
m X
Variance
s2 S2
Standard
s S
Deviation
Measuring two variables
Co variance
correlation
We Discuss Two Measures Of The Relationship Between
Two Numerical Variables
 Scatter plots allow you to visually examine the

relationship between two numerical variables and
now we will discuss two quantitative measures of
such relationships.
 The Covariance
 The Coefficient of Correlation
The Covariance
DCOVA
• The covariance measures the strength of the linear
relationship between two numerical variables (X & Y)
• The sample covariance:

n
 ( X  X)( Y  Y)
i i
cov ( X , Y )  i1
n 1
• Only concerned with the strength of the relationship

• No causal effect is implied
Interpreting Covariance
DCOVA
• Covariance between two variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent
• The covariance has a major flaw:

• It is not possible to determine the relative strength of the
relationship from the size of the covariance
Coefficient of Correlation
DCOVA
• Measures the relative strength of the linear
relationship between two numerical variables
• Sample coefficient of correlation:
cov (X , Y)
r
SX SY
where
n
 (X  X)(Y  Y)
n n
i i  (X  X)
i
2
 i
(Y  Y ) 2
cov (X , Y)  i1
SX  i1
SY  i1
n 1 n 1 n 1
Features of the
Coefficient of Correlation
DCOVA
• The population coefficient of correlation is referred as ρ.
• The sample coefficient of correlation is referred to as r.
• Either ρ or r have the following features:
• Unit free
• Range between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with Various
Coefficients of Correlation
Y Y
DCOVA
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
The Coefficient of Correlation Using Microsoft
Excel Function
DCOVA
Test #1 Score Test #2 Score Correlation Coefficient
78 82 0.7332 =CORREL(A2:A11,B2:B11)
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77
The Coefficient of Correlation Using Microsoft
Excel Data Analysis Tool
1. Select Data DCOVA
2. Choose Data Analysis
3. Choose Correlation &
Click OK
The Coefficient of Correlation
Using Microsoft Excel
DCOVA
4. Input data range and select

appropriate options
5. Click OK to get output
Interpreting the Coefficient of Correlation
Using Microsoft Excel
DCOVA
 r = .733
Scatter Plot of Test Scores
100
 There is a relatively strong 95
positive linear relationship
Test #2 Score
90
between test score #1 and 85
test score #2. 80
75
70
 Students who scored high 70 75 80 85 90 95 100
Test #1 Score
on the first test tended to
score high on second test.
Pitfalls in Numerical
Descriptive Measures
DCOVA
• Data analysis is objective
• Should report the summary measures that best describe
and communicate the important aspects of the data set
• Data interpretation is subjective

• Should be done in fair, neutral and clear manner
Ethical Considerations
DCOVA
Numerical descriptive measures:
• Should document both good and bad results

• Should be presented in a fair, objective and neutral
manner
• Should not use inappropriate summary measures to
distort facts
Chapter Summary
In this chapter we have discussed:

• Describing the properties of central tendency,
variation, and shape in numerical data
• Constructing and interpreting a boxplot
• Computing descriptive summary measures for a
population
• Calculating the covariance and the coefficient of
correlation
• A home theatre in a box is the easiest and cheapest way to provide surround
sound for a home entertainment centre. A sample of prices is shown here
(Consumer Reports Buying Guide, 2013). The prices are for models with a DVD
player and for models without a DVD player.
Models with DVD Player Price Models without DVD Player Price
Sony HT-1800DP $450 Pioneer HTP-230 $300
Pioneer HTD-330DV 300 Sony HT-DDW750 300
Sony HT-C800DP 400 Kenwood HTB-306 360
Panasonic SC-HT900 500 RCA RT-2600 290
Panasonic SC-MTI 400 Kenwood HTB-206 300
• Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theatre unit?
• Compute the range, variance, and standard deviation for the two samples. What does
this information tell you about the prices for models with and without a DVD player?
Price with DVD player Price without DVD player
Mean 410 Mean 310
Standard Error 33.1662479 Standard Error 12.64911064
Median 400 Median 300
Mode 400 Mode 300
Standard Deviation 74.16198487 Standard Deviation 28.28427125
Sample Variance 5500 Sample Variance 800
Kurtosis 0.867768595 Kurtosis 4.578125
Skewness -0.551618069 Skewness 2.099223257
Range 200 Range 70
Minimum 300 Minimum 290
Maximum 500 Maximum 360
Sum 2050 Sum 1550
Count 5 Count 5
• The following data were used to construct the histograms of the number of
days required to fill orders for Dawson Supply, Inc., and J.C. Clark
Distributors
Dawson Supply Days for Delivery :11 10 9 10 11 11 10 11 10 10

Clark Distributors Days for Delivery : 8 10 13 7 10 11 10 7 15 12
• Use the range and standard deviation to support that Dawson Supply
provides the more consistent and reliable delivery times.
dawson clark
Mean 10.3 Mean 10.3

Standard Error 0.213437475 Standard Error 0.817176711
Median 10 Median 10
Mode 10 Mode 10
Standard Deviation 0.674948558 Standard Deviation 2.584139659
Sample Variance 0.455555556 Sample Variance 6.677777778
Kurtosis -0.282994816 Kurtosis -0.350865189
Skewness -0.433637384 Skewness 0.359288855
Range 2 Range 8
Minimum 9 Minimum 7
Maximum 11 Maximum 15
Sum 103 Sum 103
Count 10 Count 10
coefficient of variation 25.08873455
coefficient of variation 6.552898619
Practice
• The following times were recorded by the quarter-mile and mile runners of a
university track team (times are in minutes).
Quarter-Mile Times: .92 .98 1.04 .90 .99
Mile Times: 4.52 4.35 4.60 4.70 4.50
After viewing this sample of running times, one of the coaches commented
that the quarter milers turned in the more consistent times. Use the standard
deviation and the coefficient of variation to summarize the variability in the
data. Does the use of the coefficient of variation indicate that the coach’s
statement should be qualified?

Quantitative Methods in Management

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative Methods in Management

Uploaded by

Copyright:

Available Formats

Quantitative Methods in

* Measures of relative location help us determine how

* By using both the mean and standard deviation, we can

The z-score is often called the standardized value.

It denotes the number of standard deviations a data

Excel’s STANDARDIZE function can be used to

 The Z-score is the number of standard deviations a data value

 A data value is considered an extreme outlier if its Z-score is

where X represents the data value

X  X 620  490 130

 An observation’s z-score is a measure of the relative

Standardized Values for Apartment Rents

Standardized Values for Apartment Rents

When the data are believed to approximate a

The empirical rule can be used to determine the

The empirical rule is based on the normal

95.44% of the values of a normal random variable

99.72% of the values of a normal random variable

Chebyshev’s theorem requires z > 1, but z need not

At least 75% of the data values must be

At least 89%of the data values must be

At least94% of the data values must be

At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%

(Actually, 86% of the rent values

• Regardless of how the data are distributed, at least (1 -

We simply sort the data values into ascending order

• Q1= 100 Q3 = 190 Min = 80 Max 200

FIVE NUMBER SUMMARY ARE

• Approximately one-fourth, or 25%, of

A box plot is a graphical summary of data that is

A key to the development of a box plot is the

Box plots provide another way to identify outliers.

 Example: Apartment Rents

 Limits are located (not drawn) using

• If x < LL or x > UL ;;; x is an outlier

 Example: Apartment Rents

Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or

House Prices Descriptive Statistics

4. Enter the cell range.

Left-Skewed Symmetric Right-Skewed

• Below is a Boxplot for the following data:

Xsmallest Q1 Q2 / Median Q3 Xlargest

• The data are right skewed, as the plot depicts

Measure Population Sample

 Scatter plots allow you to visually examine the

• The sample covariance:

• Only concerned with the strength of the relationship

cov(X,Y) = 0 X and Y are independent

• The covariance has a major flaw:

4. Input data range and select

 There is a relatively strong 95

positive linear relationship

between test score #1 and 85

test score #2. 80

• Data interpretation is subjective

• Should document both good and bad results

In this chapter we have discussed:

Mean 410 Mean 310

Standard Error 33.1662479 Standard Error 12.64911064

Median 400 Median 300