Professional Documents
Culture Documents
Introduction
Summarize data using the measures of Identify the position of a data value in a
central tendency, such as the mean, data set, using various measures of
median, mode, and midrange. position, such as percentiles, deciles, and
quartiles.
Describe data using the measures of
variation, such as the range, variance, Use the techniques of exploratory data
standard deviation. analysis, including stem and leaf plots,
box plots and five-number summaries, to
discover various aspects of data.
In conducting surveys or researches, data In statistical study, the following two (2)
can be collected from population in order terms are often used by statisticians:
to obtain accurate measurements.
1) ____________: A characteristic or
However, in real life, sometimes the measure obtained using all the data
population is too large. Hence, instead of values from a specific population.
using the whole population, statisticians
often use samples taken from 2) ____________: A characteristic or
populations. measure obtained using the data values
from a sample.
1
9/16/2015
In this Data Description, three (3) types In this central tendency measurements,
of distributions are discussed: four (4) measurements are being studied:
where:
The mean is the sum of the data values,
- X : Sample mean (X-bar)
divided by the total number of value.
- n : Sample size
- : Summation (Capital letter of sigma)
where:
2.0 4.9 6.5 2.1 5.1 3.2 16.6
- : Population mean (mu)
- N : Population size
- : Summation (Capital letter of sigma)
2
9/16/2015
35 40 30 50 25 45 20
3
9/16/2015
Exercise2: The number of fishes from Exercise 3: The weight of 10 men were
seven ponds were recorded as follows. Find recorded in kg as follows. Find the mean for
the mean for this population. this population.
SAMPLE case: f X
X
f X N
n
where
where - f : Frequency of the corresponding X
- f : Frequency of the corresponding X - N f : Total frequencies
- n f : Total frequencies
Score, X Frequency, f
0 2
1 4
2 12
3 4
4 3
4
9/16/2015
4, 5, 8, 5, 7, 8, 9, 8, 8, 7
Solution: Solution:
5
9/16/2015
SAMPLE case: f Xm UCL LCL
Xm
X
f X m
Xm
UCL LCL N 2
n 2
where where
- X m : Class midpoint - X m : Class midpoint
- f : Frequency of the corresponding X - f : Frequency of the corresponding X
- n f : Total frequencies - N f : Total frequencies
6
9/16/2015
Mean,
In the previous example, there is an odd Example 7: Six customers purchased these
number of values in the data set. numbers of magazines: 1, 7, 3, 2, 3, 4. Find
the median.
In this case it is easy to select the middle
number in the data array. Step 1: Arrange the data in order.
Step 2: Select the middle point.
When there is an even number of values in
the data set, the median is obtained by o Data Array: 1, 2, 3, 3, 4, 7
taking the average of the two middle o The median, MD 3 3 3
numbers. 2
7
9/16/2015
Exercise 7: Determine the median for each MEDIAN for Ungrouped Frequency
of the following data sets. Distribution:
If the data in the distribution is expressed in Alternatively, one can examining the
terms of ungrouped distribution, the cumulative frequency of the Ungrouped
following data array is obtained: Frequency Distribution to locate the middle
place. Using the same example:
2 2 4 4 4 6 8 8 8 Number of Frequency Cumulative
Books sold Frequency
2 2 2
The MD = 4 since it is the middle point of 4 3 5
the distribution. 6 1 6
8 3 9
8
9/16/2015
9
9/16/2015
X Frequency, f
16 20 3
21 25 5
26 30 4
31 35 3
36 40 2
10
9/16/2015
After finding the median class, simply apply Exercise 8: Find the median for the
the formula and calculate the median: following distribution.
n X Frequency, f
cf
MD Lm W
2 5 14 5
f 15 24 7
17 25 34 19
8
MD 25.5
35 44 17
5 26.125
2
4 45 54 7
The mode is defined to be the value that If the data set has two data with equal
occurs most often in a data set. highest frequencies, the data set is said to
be Bimodal.
A data set can have more than one mode.
A data set has no mode if all values occur
with equal frequency.
11
9/16/2015
Example 10: Find the mode for the number Example 11: Six strains of bacteria were
of children per family for 10 selected tested to see how long they could remain
families. alive outside their normal environment. The
time, in minutes, is given below. Find the
Data set: 2, 3, 5, 2, 2, 1, 6, 4, 7, 3. mode.
Ordered set: 1, 2, 2, 2, 3, 3, 4, 5, 6, 7.
Data set: 2, 3, 5, 7, 8, 10
Hence, Mode = 2.
Hence, there is no mode, since all data
occurs equally at frequency of 1.
Example 13: Find the mode by using the MODE for Grouped Frequency
following data. Distribution:
X Frequency, f
15 3 The mode of Grouped Frequency
20 5 Distribution is known as MODAL CLASS.
25 8
30 3 TheMODAL CLASS is the class with highest
35 2 frequency.
Hence, Mode = 25.
12
9/16/2015
13
9/16/2015
Midrange MD
X X
Mode = Median = Mean
Mode < Median
Mean< >Mean
Median > Mode
14
9/16/2015
B. VARIANCE:
Defined to be the average of squares of
the distance each value is from the mean.
C. STANDARD DEVIATION:
Defined to be the square root of the
variance.
s2
and s
X
X
2 2
n 1 n 1
2
N and N
where: where:
- :Population mean (mu) - X : Sample mean
- :Population variance (lowercase sigma) - s : Sample variance
- N :Population size - n : Sample size
15
9/16/2015
35 45 30 35 40 25
16
9/16/2015
4 5 6 8 4 9
17
9/16/2015
s2 and s
POPULATION case: n 1 n 1
f X
2 f X
2 2
and
N N
18
9/16/2015
Score, X Frequency, f
0 2
1 4
2 12
3 4
4 3
s2 m
s m
POPULATION case: n 1 n 1
f X m
2 f X
2 2
and m
N N
Note:the midpoint X m is used in Grouped
Frequency Distribution.
19
9/16/2015
X Frequency, f
1 10 4
11 20 8
21 30 5
Score Frequency
0 19 8
20 39 13
40 59 24
60 79 11
80 99 4
20
9/16/2015
D. COEFFICIENT OF VARIATION:
If two samples have the same units of
measure, the variance and standard
deviation can be compared directly.
21
9/16/2015
To compare the two different units / variable, Example: The sample mean of the number
statisticians use the coefficient of variation of sales of cars over a 3-month period is 87,
The coefficient of variation (CV) refers to a and the standard deviation is 5. The sample
statistical measure of the distribution of data
mean of the commission is RM 5225, with
points in a data series around the mean.
the standard deviation of RM 773. Compare
Coefficient of Variation is defined as the standard
deviation divided by the mean. The result is
these two variations.
expressed as a percentage.
Sample Population
s
CVar 100% CVar 100%
X
22
9/16/2015
23
9/16/2015
solution solution
24
9/16/2015
25
9/16/2015
Test A X 38 X 40 s 5
Test B X 94 X 100 s 10
26
9/16/2015
18 12 3 5 15
8 10 2 6 20
27
9/16/2015
C. DECILES: D. QUARTILES:
Divides the data set into 10 equal groups. Divides the data set into 4 groups:
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 Q1 Q2 Q3 Q4
Relationships between Percentiles and Relationships between Percentiles and
Deciles: Quartiles:
1) D1 corresponding to P10 1) Q1 corresponding to P25
2) D3 corresponding to P30 2) Q2 corresponding to P50
3) D6 corresponding to P60 3) Q3 corresponding to P75
4) D10 corresponding to P100 4) Q4 corresponding to P100
The formula of finding the Quartiles: If the location exist decimal places, the
digit before decimal point is the location of
1st Quartile (Q1): the Quartile:
- Location of 1st Quartile = n 1
1
4 - e.g. 20 21 34 54 70 89
3rd Quartile (Q3):
Location of 1st Quartile
- Location of 3rd Quartile =
3
n 1
4
1
n 1 1 7 7 1.75
4 4 4
Where n = total number of data
28
9/16/2015
15 13 6 5 12
50 22 18
Solution Solution
29
9/16/2015
Step4: Compute the Upper Boundary (UB) Example: Given the following data set,
and Lower Boundary (LB). determine any outlier.
LB Q1 1.5IQR 5 6 12 13 15
UB Q3 1.5IQR 18 22 50
Solution Solution
Solution Solution
30
9/16/2015
31
9/16/2015
The first digit be the leading digit, and the Exercise: The IQ scores for 24 male and
second digit of the data be the trailing female students are shown as below.
digit. Construct a stem and leaf plot.
Stem Leaves
Male scores:
106 121 113 128 110 112 106 110 131 103 105 117
Female score:
121 108 111 103 100 115 118 116 117 125 131 112
32
9/16/2015
After obtaining the Five-Number Summary, Exercise 1: 21 girls estimated the length of
box plot can be plotted using Graph Paper. a line, in mm. The results were as follows.
Draw a box plot and identify any outliers if
LB UB
Q1 Q2 Q3 exists. Hence, comment on the shape of the
min max outlier distribution of length.
51 45 31 43 97 16 18
23 34 35 35 85 62 20
-13.25 5 7.5 14 21 22 41.25 50 22 51 57 49 22 18 27
33
9/16/2015
20 32 36 37 38 39
40 41 42 45 62
34
9/16/2015
Solution:
20 27 32 36 39 42 45 51 62
35