You are on page 1of 67

Quantitative Methods in

Management
Day-4
Recap..
• Introduction
• Definition
• Terms and terminologies
• Types of statistics
• Types of data
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing variables
• Tables
• Frequency distribution
• Relative frequency distribution
• Relative percent frequency distribution
• Cumulative frequency distribution
• Univariate
• Bivariate / cross tabulation
• Diagrams
• Bar charts
• Pie charts
• Graphs
• Histogram
• Frequency polygon
• Frequency curve
• Cumulative frequency curve ( Ogive)
• EDA
• Stem and leaf plot
• Scatter diagram
• Dot plots
• Pareto chart
Numerical descriptive
statistics (cont…)
Day 4
Pg. 99-148
• Measures of location
• Measures of dispersion
• Measures of shapes
• kurtosis
RELATIVE LOCATION
Z score
Chebyshev's inequality
Empirical rule
Relative location – Z score
* In addition to measures of location, variability, and
shape, we are also interested in the relative location of
values within a data set.

* Measures of relative location help us determine how


far a particular value is from the mean.

* By using both the mean and standard deviation, we can


determine the relative location of any observation.
z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data


value xi is from the mean.

xi  x
zi 
s

Excel’s STANDARDIZE function can be used to


compute the z-score.
Locating Extreme Outliers:
Z-Score
DCOVA
 To compute the Z-score of a data value, subtract the mean
and divide by the standard deviation.

 The Z-score is the number of standard deviations a data value


is from the mean.

 A data value is considered an extreme outlier if its Z-score is


less than -3.0 or greater than +3.0.

 The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Locating Extreme Outliers:
Z-Score
DCOVA
XX
Z
S

where X represents the data value


X is the sample mean
S is the sample standard deviation
Locating Extreme Outliers:
Z-Score
DCOVA
 Suppose the mean math SAT score is 490, with a
standard deviation of 100.
 Compute the Z-score for a test score of 620.

X  X 620  490 130


Z    1.3
S 100 100

A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
z-Scores

 An observation’s z-score is a measure of the relative


location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.
z-Scores
 Example: Apartment Rents
•z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Detecting Outliers
 An outlier is an unusually small or unusually large
value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set
Detecting Outliers FOR
PRACTICE
 Example: Apartment Rents
• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Empirical Rule

When the data are believed to approximate a


bell-shaped distribution …

The empirical rule can be used to determine the


percentage of data values that must be within a
specified number of standard deviations of the
mean.

The empirical rule is based on the normal


distribution, which is covered in later chapter.
Empirical Rule
For data having a bell-shaped distribution:
68.26% of the values of a normal random variable
are within
+/- 1 standard deviation of its mean.

95.44% of the values of a normal random variable


are within
+/- 2 standard deviations of its mean.

99.72% of the values of a normal random variable


are within
+/- 3 standard deviations of its mean.
Empirical Rule
99.72%
95.44%
68.26%

m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
The Empirical Rule
• The empirical rule approximates the variation of
data
in a bell-shaped distribution
• Approximately 68% of the data in a bell shaped
distribution is within 1 standard deviation of the
mean
or μ  1σ

68%

μ
μ  1σ
The Empirical Rule
• Approximately 95% of the data in a bell-shaped
distribution lies within two standard deviations of
the mean, or µ ± 2σ
• Approximately 99.7% of the data in a bell-shaped
distribution lies within three standard deviations
of the mean, or µ ± 3σ

95% 99.7%

μ  2σ μ  3σ
Using the Empirical Rule
 Suppose that the variable Math SAT scores is bell-shaped with a mean of 500 and
a standard deviation of 90. Then,

 68% of all test takers scored between 410 and 590 (500 ± 90).

 95% of all test takers scored between 320 and 680 (500 ± 180).

 99.7% of all test takers scored between 230 and 770 (500 ± 270).
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.

Chebyshev’s theorem requires z > 1, but z need not


be an integer.
Chebyshev’s Theorem

At least 75% of the data values must be


within z = 2 standard deviationsof the mean.

At least 89%of the data values must be


within z = 3 standard deviationsof the mean.

At least94% of the data values must be


within z = 4 standard deviationsof the mean.
Chebyshev’s Theorem
 Example: Apartment Rents
Let z = 1.5 with x = 490.80 and s = 54.74

At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80  1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)
Chebyshev Rule

• Regardless of how the data are distributed, at least (1 -


1/k2) x 100% of the values will fall within k standard
deviations of the mean (for k > 1)
• Examples:

At least withi
n
(1 - 1/22) x 100% = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) x 100% = 89% ………. k=3 (μ ± 3σ)
EXPLORATORY DATA
ANALYSIS
FIVE NUMBER SUMMARY
BOX PLOT
Exploratory Data Analysis
Exploratory data analysis procedures enable us to use
simple arithmetic and easy-to-draw pictures to
summarize data.

We simply sort the data values into ascending order


and identify the five-number summary and then
construct a box plot.
FIVE NUMBER SUMMARY
1. MINIMUM
2. QUARTILE 1
3. MEDIAN
4. QUARTILE 3
5. MAXIMUM
Computing the five number summary
• 80,100,100,110,130,190,200

• Q1= 100 Q3 = 190 Min = 80 Max 200

( for small sample size ; conflicting results may occur , the shape cannot
be clearly determined.)
• The monthly starting salaries for a sample of 12 business school graduates
are given below ( in ascending order)
3310 3355 3450 3480 3480
3490 3520 3540 3550 3650
3730 3925

FIVE NUMBER SUMMARY ARE


Min = 3310
Q1 = 3465
Médian = 3505
Q3 = 3600
Maximum = 3925
• The data shows a smallest value of
3310 and a largest value of 3925.

• Approximately one-fourth, or 25%, of


the observations are between adjacent
numbers in a five-number summary.
Box Plot

A box plot is a graphical summary of data that is


based on a five-number summary.

A key to the development of a box plot is the


computation of the median and the quartiles Q1 and
Q3 .

Box plots provide another way to identify outliers.


Box and Whisker Plot
• Five secific values are used:
• Median, Q2
• First quartile, Q1
• Third quartile, Q3
• Minimum value in the data set
• Maximum value in the data set
• Inner Fences
• IQR = Q3 - Q1
• Lower inner fence = Q1 - 1.5 IQR
• Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
• Lower outer fence = Q1 - 3.0 IQR
• Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot

Minimum Q1 Q2 Q3 Maximum
Five-Number Summary
 Example: Apartment Rents
Lowest Value = 425First Quartile = 445
Median = 475
Third Quartile = 525
Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot

 Example: Apartment Rents


• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).

40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot

 Limits are located (not drawn) using


the interquartile range (IQR).
 Data outside these limits are
considered outliers.
 The locations of each outlier is shown
with the symbol * .
continued
• LL = Q1 -1.5 (IQR)
• UL = Q3+1.5(IQR)

• If x < LL or x > UL ;;; x is an outlier


Box Plot

 Example: Apartment Rents


• The lower limit is located 1.5(IQR) below Q1.

Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.
General Descriptive Stats Using
Microsoft Excel Functions DCOVA

House Prices Descriptive Statistics


$ 2,000,000 Mean $ 600,000 =AVERAGE(A2:A6)
$ 500,000 Standard Error $ 357,770.88 =D6/SQRT(D14)
$ 300,000 Median $ 300,000 =MEDIAN(A2:A6)
$ 100,000 Mode $ 100,000.00 =MODE(A2:A6)
$ 100,000 Standard Deviation $ 800,000 =STDEV(A2:A6)
Sample Variance 640,000,000,000 =VAR(A2:A6)
Kurtosis 4.1301 =KURT(A2:A6)
Skewness 2.0068 =SKEW(A2:A6)
Range $ 1,900,000 =D12 - D11
Minimum $ 100,000 =MIN(A2:A6)
Maximum $ 2,000,000 =MAX(A2:A6)
Sum $ 3,000,000 =SUM(A2:A6)
Count 5 =COUNT(A2:A6)
salary
salary

3310
Mean 3540
3355
Standard Error 47.81989569
3450 Median 3505
3480 Mode 3480
3480 Standard Deviation 165.6529779
3490 Sample Variance 27440.90909
3520 Kurtosis 1.718883645
3540 Skewness 1.091108688
3550 Range 615
Minimum 3310
3650
Maximum 3925
3730
Sum 42480
3925 Count 12
General Descriptive Stats Using
Microsoft Excel Data Analysis Tool
DCOVA
1. Select Data.
2. Select Data Analysis.
3. Select Descriptive Statistics and
click OK.
General Descriptive Stats Using
Microsoft Excel
DCOVA

4. Enter the cell range.


5. Check the Summary
Statistics box.
6. Click OK
Excel output
DCOVA
House Prices
Microsoft Excel
Mean 600000
descriptive statistics output, using the
Standard Error 357770.8764
house price data:
Median 300000
Mode 100000
Standard Deviation 800000
House Prices: Sample Variance 640,000,000,000
Kurtosis 4.1301
$2,000,000 Skewness 2.0068
500,000 Range 1900000
300,000 Minimum 100000
100,000 Maximum 2000000
100,000 Sum 3000000
Count 5
Minitab Output
DCOVA
Minitab descriptive statistics output using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Descriptive Statistics: House Price

Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000

N for
Variable Median Maximum Range Mode Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2 2.01 4.13
Distribution Shape and
The Boxplot DCOVA

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Boxplot Example
DCOVA

• Below is a Boxplot for the following data:

Xsmallest Q1 Q2 / Median Q3 Xlargest


0 2 2 2 3 3 4 5 5 9 27

00 22335 5 27 27

• The data are right skewed, as the plot depicts


Sample statistics versus population
parameters
DCOVA

Measure Population Sample


Parameter Statistic
Mean
m X
Variance
s2 S2
Standard
s S
Deviation
Measuring two variables
Co variance
correlation
We Discuss Two Measures Of The Relationship Between
Two Numerical Variables

 Scatter plots allow you to visually examine the


relationship between two numerical variables and
now we will discuss two quantitative measures of
such relationships.
 The Covariance
 The Coefficient of Correlation
The Covariance
DCOVA
• The covariance measures the strength of the linear
relationship between two numerical variables (X & Y)

• The sample covariance:


n

 ( X  X)( Y  Y)
i i
cov ( X , Y )  i1
n 1

• Only concerned with the strength of the relationship


• No causal effect is implied
Interpreting Covariance
DCOVA
• Covariance between two variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions

cov(X,Y) = 0 X and Y are independent

• The covariance has a major flaw:


• It is not possible to determine the relative strength of the
relationship from the size of the covariance
Coefficient of Correlation
DCOVA
• Measures the relative strength of the linear
relationship between two numerical variables
• Sample coefficient of correlation:

cov (X , Y)
r
SX SY

where
n

 (X  X)(Y  Y)
n n
i i  (X  X)
i
2
 i
(Y  Y ) 2

cov (X , Y)  i1
SX  i1
SY  i1
n 1 n 1 n 1
Features of the
Coefficient of Correlation
DCOVA
• The population coefficient of correlation is referred as ρ.
• The sample coefficient of correlation is referred to as r.
• Either ρ or r have the following features:
• Unit free
• Range between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with Various
Coefficients of Correlation
Y Y
DCOVA

X X
r = -1 r = -.6

Y
Y Y

X X X
r = +1 r = +.3 r=0
The Coefficient of Correlation Using Microsoft
Excel Function
DCOVA
Test #1 Score Test #2 Score Correlation Coefficient
78 82 0.7332 =CORREL(A2:A11,B2:B11)
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77
The Coefficient of Correlation Using Microsoft
Excel Data Analysis Tool
1. Select Data DCOVA
2. Choose Data Analysis
3. Choose Correlation &
Click OK
The Coefficient of Correlation
Using Microsoft Excel
DCOVA

4. Input data range and select


appropriate options
5. Click OK to get output
Interpreting the Coefficient of Correlation
Using Microsoft Excel
DCOVA

 r = .733
Scatter Plot of Test Scores

100

 There is a relatively strong 95

positive linear relationship

Test #2 Score
90

between test score #1 and 85

test score #2. 80

75

70
 Students who scored high 70 75 80 85 90 95 100

Test #1 Score
on the first test tended to
score high on second test.
Pitfalls in Numerical
Descriptive Measures
DCOVA
• Data analysis is objective
• Should report the summary measures that best describe
and communicate the important aspects of the data set

• Data interpretation is subjective


• Should be done in fair, neutral and clear manner
Ethical Considerations
DCOVA
Numerical descriptive measures:

• Should document both good and bad results


• Should be presented in a fair, objective and neutral
manner
• Should not use inappropriate summary measures to
distort facts
Chapter Summary

In this chapter we have discussed:


• Describing the properties of central tendency,
variation, and shape in numerical data
• Constructing and interpreting a boxplot
• Computing descriptive summary measures for a
population
• Calculating the covariance and the coefficient of
correlation
• A home theatre in a box is the easiest and cheapest way to provide surround
sound for a home entertainment centre. A sample of prices is shown here
(Consumer Reports Buying Guide, 2013). The prices are for models with a DVD
player and for models without a DVD player.

Models with DVD Player Price Models without DVD Player Price
Sony HT-1800DP $450 Pioneer HTP-230 $300
Pioneer HTD-330DV 300 Sony HT-DDW750 300
Sony HT-C800DP 400 Kenwood HTB-306 360
Panasonic SC-HT900 500 RCA RT-2600 290
Panasonic SC-MTI 400 Kenwood HTB-206 300

• Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theatre unit?
• Compute the range, variance, and standard deviation for the two samples. What does
this information tell you about the prices for models with and without a DVD player?
Price with DVD player Price without DVD player

Mean 410 Mean 310

Standard Error 33.1662479 Standard Error 12.64911064

Median 400 Median 300

Mode 400 Mode 300

Standard Deviation 74.16198487 Standard Deviation 28.28427125

Sample Variance 5500 Sample Variance 800

Kurtosis 0.867768595 Kurtosis 4.578125

Skewness -0.551618069 Skewness 2.099223257

Range 200 Range 70

Minimum 300 Minimum 290

Maximum 500 Maximum 360

Sum 2050 Sum 1550

Count 5 Count 5
• The following data were used to construct the histograms of the number of
days required to fill orders for Dawson Supply, Inc., and J.C. Clark
Distributors

Dawson Supply Days for Delivery :11 10 9 10 11 11 10 11 10 10


Clark Distributors Days for Delivery : 8 10 13 7 10 11 10 7 15 12

• Use the range and standard deviation to support that Dawson Supply
provides the more consistent and reliable delivery times.
dawson clark

Mean 10.3 Mean 10.3


Standard Error 0.213437475 Standard Error 0.817176711
Median 10 Median 10
Mode 10 Mode 10
Standard Deviation 0.674948558 Standard Deviation 2.584139659
Sample Variance 0.455555556 Sample Variance 6.677777778

Kurtosis -0.282994816 Kurtosis -0.350865189

Skewness -0.433637384 Skewness 0.359288855

Range 2 Range 8

Minimum 9 Minimum 7

Maximum 11 Maximum 15

Sum 103 Sum 103

Count 10 Count 10
coefficient of variation 25.08873455
coefficient of variation 6.552898619
Practice
• The following times were recorded by the quarter-mile and mile runners of a
university track team (times are in minutes).
Quarter-Mile Times: .92 .98 1.04 .90 .99
Mile Times: 4.52 4.35 4.60 4.70 4.50
After viewing this sample of running times, one of the coaches commented
that the quarter milers turned in the more consistent times. Use the standard
deviation and the coefficient of variation to summarize the variability in the
data. Does the use of the coefficient of variation indicate that the coach’s
statement should be qualified?

You might also like