Professional Documents
Culture Documents
=
=
=
n
i
i
n
i
i i
w
w
x w
x
1
1
=
+ 1 /2
+1
2
+
+ 1
2
2
*Weighted Average Example # 1
*Suppose at the end of your first semester, you want to calculate your
GPA. The following table describes the amount of credits you were
enrolled in and the grades you obtained:
*What is your GPA?
Course Credits Grade
CHEM 4 B
MATH 4 A
PSYCH 3 A
ENGL 3 C
SPAN 3 B
= 4 3 + 4 4 + 3 4 + 3 2 + 3 3 4 +4 + 3 +3 + 3 = 3.24
Absolute weights
*Weighted Average Example # 2
*Suppose at the end of your first semester, you want to
calculate your attendance grade under the following
conditions:
*Attendance was recorded 28 times.
*You have one excused and five unexcused absences.
*You did not come prepared to class 3 times.
*What is your attendance grade?
1 20 + 0 5 + 0.5 3
28
0.05 = 3.84%
Absolute weights
*Weighted Average Example # 3
*Suppose at the end of your first semester, you want to calculate your
GPA. The following table describes the amount of credits you were
enrolled in and the grades you obtained:
*What is your final grade?
Item Score Value Grade %
Exam 1 65 100 20%
Exam 2 75 100 20%
Exam 3 89 100 20%
Final Exam 91 100 20%
Quizzes/Projects 135 150 15%
Attendance 30 30 5%
65
100
0.2 +
75
100
0.20 +
89
100
0.20 +
91
100
0.2 +
135
150
0.2 +
30
30
0.05 = 0.825
Relative weights
*Another Measure of Central Tendency
*Mode
*Most frequent value in a data set or probability distribution
*Example: 4 4 2 5 4 11 12 1 5 6 6 7 8 3 10 9 3
* SORTED: 1, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 7, 8, 9, 10, 11, 12
Value Freq Value Freq
1 1 7 1
2 1 8 1
3 2 9 1
4 3 10 1
5 2 11 1
6 2 12 1
F
r
e
q
u
e
n
c
y
0 2 4 6 8 10 12 14
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
2
.
5
3
.
0
*Example: Central Tendency
STEP 1: Order observations from smallest to largest
*Ordered observations: 27, 75, 76, 78, 80, 83, 84, 86, 87, 110
*Mean: =
27+75++110
10
= 78.6
*10% Trimmed Mean:
(10)
=
75++87
8
= 81.125
*Median:
* Index: ( + 1)/2 = 11/2 = 5.5
* =
80+83
2
= 81.5
*Weighted Average:
* Index: 2 /2 + 8 = 1, =
1
9
*
or quantiles
*Given a sample of observations, a value for which a specified
fraction of the data values is less than or equal to
* can be any number between 1 and 100
*Quartiles
* 1: = 0.25
* 2: = 0.50
* 3: = 0.75
*Quartiles
*Q1 is the median of the first half of the observations
*Q3 is the median of the second half of the observations
*Example:
*Odd-number of observations: 1, 3, 4, 5, 7, 8, 10
*Even-number of observations: 1, 3, 4, 5, 7, 8, 10, 12
HINT: We always need to sort our observations
1 = 3.5 3 = 9
1 = 3 3 = 8
*Percentiles
*Different software use different equations.
*One potential formula:
* = +0.5
*where:
* = index
* First observation: = 1
* = total number of observations
* = fraction
*For the following set of data: 12, 14, 15, 18, 20, 21, 23, 24
*Estimate the 62.5% percentile
= 0.625 8 + 0.5 = 5.5
0
.
625
=
20 + 21
2
= 20.5
*Measures of Variability
*Variability reduction can be a challenging task.
*Cannot make decisions focusing only on central tendency.
*Sometimes referred to as measures of spread.
*Variability is always our ENEMY!
* It is impossible to get rid of all variability.
4
Process 1: 85 86 85 87 85.75
Process 2: 95 85 80 84 87.5
*Variability
* Given a sample of n observations
*Sample variance
* Sensitive to outliers
* Most widely used
* Degrees of freedom(df): n 1
* Lost one df estimating the sample mean
*
= 0
=1
, only n-1 deviations from the mean are freely determined.
*Sample standard deviation
*Range / Interquartile Range (IQR)
( )
1 1
1
2
1 2
1
2
2 2
|
.
|
\
|
= =
=
=
=
n
n
x
x
n
x x
s
n
i
n
i
i
i
n
i
i
o
2
s = o
( ) ( ) 1 3 ; Q Q IQR x min x max r
i i
= =
*Team Exercise # 1
*Given the following set of observations:
*83, 27, 80, 84, 75, 76, 110, 78, 86, 87
1. Calculate all measures of central tendency
*Mean
*10% Trimmed Mean
*Median
*Weighted average - smallest and largest observations get half
the weight of the remaining observations
*Why use graphical summaries?
*Can easily convey information about:
1. Central tendency
2. Spread / Variability
3. Shape
Source: oiip.uprm.edu/
*
*Simple graphical displays
*Scatter Plot
*Dot Diagram
*Histograms
*Stem and leaf
*Box plots
*Time series
*Shape
*Skewness
* A symmetric distribution can be folded along a vertical axis so that the two sides
coincide.
* Skewness measures the lack of symmetry with respect to a vertical axis.
* Associated with long tails.
* Skewed right (Positive Skewed) long tail toward the right.
* Skewed left (Negative Skewed) long tail toward the left.
*Kurtosis
* Measures the peakedness of a distribution.
* A Normal distribution has a kurtosis of 3.
* More peaked: Kurtosis > 3
* Less peaked: Kurtosis < 3
*Shape - 2
A. Unimodal - one mode
B. Bimodal - two modes
C. Multimodal - multiple modes
A
B
C
*Shape - Example
*Describe the shape of the following grade distributions
*Scatter Diagram
0
5
10
15
20
25
10 15 20 25 30 35
T
e
n
s
i
l
e
S
t
r
e
n
g
t
h
Cotton Percentages
*Time Series Plot
*Shows observed values for a given variable along with a time stamp
* Vertical axis observed values
* Horizontal axis time
*Conveys information about changes in central tendency and/or
variability over time.
0 50 100 150 200
-
2
0
2
4
6
8
1
0
t
0 50 100 150 200
-
4
-
2
0
2
4
t
*Dot Diagram
*Often used with discrete data.
*Observations: 1, 5, 3, 5, 1, 2, 1, 1, 2, 1, 4, 5, 3, 3, 4, 1, 3, 2, 4, 1, 5, 2,
3, 1, 3, 1, 2, 2, 3, 5, 4, 3, 2, 4, 2, 5, 4, 1, 2, 5, 1, 3, 5, 3, 4, 5, 3, 5, 1, 4
1 2 3 4 5
Observed values
*Histogram
Steps:
1. Divide the range of the data into intervals (or bins)
* Common choices
* Number of bins is equal to n
* Number of bins between 5 - 20
2. Select the form of the intervals
* Equal size [ = (max min ) / ]
* Equal probability
3. Define y-axis
* Absolute frequency counts
* Relative frequency percentage
* Cumulative frequency either in the form of counts or percentages
*Note on Histograms
*Right Closed vs. Left Closed
*Suppose you have data between 0 and 100.
*You intend to use 10 bins in a histogram.
Where does 10 go?
F
r
e
q
u
e
n
c
y
0 20 40 60 80 100
0
2
4
6
8
1
0
1
2
*Histograms in Quality Control (QC)
*Pareto analysis is the use of histograms to identify
improvement opportunities.
*In QC, Pareto analysis is used to identify quality costs by
category, or by product, or by type of detect of non-
conformity.
*80/20 Rule of Thumb:
*Vital few
*Find the few problems that drive the majority of the quality costs
$-
$5,000
$10,000
$15,000
$20,000
$25,000
$30,000
$35,000
$40,000
Insufficient solder Misaligned
components
Defective components Missing components Cold solder joints All other causes
Monthly quality-costs information for assembly of printed
circuit boards
*Histogram: Example 1
-10 0 10 20
0
2
4
6
8
1
0
Observed values
-10 0 10 20
0
.
0
0
0
.
0
5
0
.
1
0
0
.
1
5
0
.
2
0
Observed values
Absolute Frequency Relative Frequency
*Histogram: Example 1 (Contd)
Observed Values
F
r
e
q
u
e
n
c
y
-10 -5 0 5 10 15 20
0
1
0
2
0
3
0
4
0
5
0
Cumulative Frequency
*Histogram: Example 2
*Central tendency / Dispersion ?
Observed values
F
r
e
q
u
e
n
c
y
-15 -10 -5 0 5 10 15 20
0
2
4
6
8
1
0
1
2
Observed values
F
r
e
q
u
e
n
c
y
-15 -10 -5 0 5 10 15 20
0
5
1
0
1
5
2
0
2
5
Simulated with the same mean but different variance.
Which sample has the largest variability?
A = Left B = Right C = Cant tell
*Histogram: Example 3
Observed values
F
r
e
q
u
e
n
c
y
-15 -10 -5 0 5 10 15 20
0
2
4
6
8
Observed values
F
r
e
q
u
e
n
c
y
-15 -10 -5 0 5 10 15 20
0
2
4
6
8
1
0
*Central tendency / Dispersion ?
Simulated with the same variance and a different mean.
*Histogram: Example 4
Observed values
F
r
e
q
u
e
n
c
y
-15 -10 -5 0 5 10 15 20
0
5
1
0
1
5
2
0
2
5
Observed values
F
r
e
q
u
e
n
c
y
-15 -10 -5 0 5 10 15 20
0
2
4
6
8
1
0
*Central tendency / Dispersion ?
Simulated with different mean and different variance.
Which sample has the largest mean? Variability?
A = Left B = Right C = Cant tell
*Stem-and-Leaf
1. Order data
2. Divide data into two parts
*Stem one or more of the leading digits
*Leaf remaining digits
3. List the stem values in a vertical column
*Sometimes break the stem in upper (U) and lower (L) half
4. Record each leaf beside its stem
5. Include units for stems and leaves on display
6. OPTIONAL: Keep a tally of the number of leaves per stem
*Stem-and-Leaf: Nicotine Data
Nicotine content in cigarettes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
*Stem-and-Leaf Example
*Sample:
81 63 54 89 34 60 89 19 70 59 40 28 42 59 72 32 31 10 24 17
*Ordered data = 20 :
*10 17 19 24 28 31 32 34 40 42 54 59 59 60 63 70 72 81 89 89
1 0 7 9 3
2 4 8 2
3 1 2 4 3
4 0 2 2
5 4 9 9 3
6 0 3 2
7 0 2 2
8 1 9 9 3
*Box Plots
*Convey information about central tendency, dispersion, and
shape
*Central tendency: median
*Variability: interquartile range (IQR)
*Additional summary statistics: min, max, *mean*
1
4
3 2
6
5 7
*Outliers in Box Plots
*Outliers?
*Any point larger than 3 +1.5 ?
*Any point smaller than 1 1.5 ?
*Extreme outliers?
*Any point larger than 3 +3 ?
*Any point smaller than 1 3 ?
*Link to traditional tools
Source: Managing, Controlling, and Improving Quality (2011)
*Link to current literature
Source: IEEE/ACS International Conference on Computer Systems and Applications, 2009.
*Team Exercise # 2
1. Calculate
41
for the following observations:
* 19, 10, 15, 14, 21, 12, 13, 7, 13, 11, 9
2. Calculate the min, mean, median, and max for the following observations:
* 19, 10, 15, 14, 21, 27, 20, 25
3. Calculate the and
2
for the following observations:19, 10, 15, 14, 21, 12, 13
4. Calculate 1 and 3 for the following observations: 19, 10, 15, 14, 21, 27, 20, 25
*Box Plot Example
*Sample 1: 4,6,8,13,15,16,20
*Sample Mean: ?
*Median (Q2): ?
*First quartile (Q1): ?
*Third quartile (Q3): ?
*IQR: ?
*Sample 2: 4,6,8,13,15,16,20,36
*Sample Mean: ?
*Median (Q2): ?
*First quartile (Q1): ?
*Third quartile (Q3): ?
*IQR: ?
*Box Plot Example - 2
*Sample 1: 4,6,8,13,15,16,20
*Sample Mean: 11.71
*Median (Q2): 13
*First quartile (Q1): 6
*Third quartile (Q3): 16
*IQR: 10
*Sample 2: 4,6,8,13,15,16,20,36
*Sample Mean: 14.75
*Median (Q2): 14
*First quartile (Q1): 7
*Third quartile (Q3): 18
*IQR: 11
5
1
0
1
5
2
0
2
5
3
0
3
5
*Box Plot Example - 3
5
1
0
1
5
2
0
Sample 1 (S1) Sample 2 (S2)
1 = 6
3 = 16
2 = 13
1 = 7
3 = 18
2 = 14
4
20
4
20
36
Which sample has the largest variability?
Which is symmetric?
A = S1 B = S2 C = None
A = S1 B = S2 C = Cant tell
*
a. b. c. d. e.
0% 0% 0% 0% 0%
a.Mean
b.Median
c. Mode
d.Standard deviation
e.Skewness
10
*
a. b.
0% 0%
a. TRUE
b. FALSE
10
*
a. b.
0% 0%
a.TRUE
b.FALSE
10
*
a. b. c. d. e.
0% 0% 0% 0% 0%
a.IQR
b.Range
c. Variance
d.Standard
deviation
e. Weighted
average
10
a. b.
0% 0%
a.TRUE
b. FALSE
10
*
a. b. c. d.
0% 0% 0% 0%
a.Standard deviation
b.Variance
c. Range
d.Interquartile range
10
*
0%
0%
0%
S1 S2
5
1
0
1
5
2
0
2
5
3
0
3
5
a.S1
b.S2
c. Both have approx. the same variability.
10
*
0%
0%
0%
a.Symmetric
b.Skewed right
c. Skewed left
0
.
0
0
.
1
0
.
2
0
.
3
10
*
0%
0%
0%
a.Symmetric
b.Skewed right
c. Skewed left
0
.
7
0
0
.
7
5
0
.
8
0
0
.
8
5
0
.
9
0
0
.
9
5
1
.
0
0
10
*
0%
0%
a.Yes
b.No
0
1
2
3
4
0
1
2
3
4
10
*
0%
0%
a.Yes
b.No
0
1
2
3
4
0
1
2
3
4
10
*
a. b. c. d. e. f.
0% 0% 0% 0% 0% 0%
a.Standard deviation
b.Mean
c. Variance
d.Median
e.5
th
percentile
f. Range
10
*
a. b. c. d. e. f.
0% 0% 0% 0% 0% 0%
a.Standard deviation
b.Mean
c. Variance
d.Median
e.5
th
percentile
f. Range
10