Professional Documents
Culture Documents
Discussion of Syllabus
Required Text:
Solved problems in Statistics, Part I- P. Ghargbouri, B. Todorow
Exercises in Statistics, Part I- P. Gharghbouri, B. Todorow
Meets
Mondays: 2:00-5:00pm-KHE221,
Wednesdays: 2:00-5:00pm- KHE221
Office Hours: Tuesdays: 5pm-5:45pm-VIC707
Labs:
2/153
3/153
4/153
Course Objectives
Identify and formulate problems where
statistics can have an impact.
See the relevance of statistics. Apply what has
been learned to other engineering courses and
to career practice.
Understand the basics of Statistics and
Probability Theory
Interpret the statistical results and retrieve
necessary information to help decision making
Develop the bases for the other courses.
5/153
Evaluation
30% MidtermTest (100 minutes) 10:00am,
Saturday, June 14, 2014
60% Final exam (180 minutes), room: TBA
10% Lab quizzes
6/153
OUTLINE Lecture 1
Statistics-Descriptive and Inferential Statistics
Populations, Parameters, and Samples,
Statistic, Variable
Data & Types of Data
7/153
In todays world
we are constantly being surrounded by statistics and statistical information.
For example:
Political Polls, Customer Surveys
Interest rates, Economic Predictions
Course Marks, Job Market Information
How can we make sense out of all these data?
How can we differentiate valid from flawed claims?
What is Statistics?!
8/153
Data
Information
Information: Knowledge
communicated concerning
some particular fact.
9/153
Example
A student is somewhat apprehensive about the statistics
course because the student believes the myth that the
course is difficult. The professor provides last terms marks to
the student. What information can the student obtain from this
list?
Statistics
Data
Information
10/153
11/153
12/153
13/153
Statistic
Samples have
Statistics
Parameter
Populations have
Parameters
MTH410 S14- Lecture 1
14/153
Types of Statistics
Descriptive statistics: involves the
arrangement, summary, and presentation of
data, to enable meaningful interpretation, and
to support decision making.
Inferential Statistics: a set of methods used
to draw conclusions about characteristics of a
population based on sample data.
15/153
Descriptive Statistics
Descriptive Statistics is a set of methods of organizing,
summarizing, and presenting data in a convenient and
informative way. These methods include:
Graphical Techniques
Numerical Techniques
The actual method used depends on what information we
would like to extract. Are we interested in:
16/153
Inferential Statistics
Descriptive Statistics describe the data set thats being
analyzed, but doesnt allow us to draw any conclusions
or make any inferences about the data. Hence we need
another branch of statistics: inferential statistics.
Inferential statistics is also a set of methods, but it is used
to draw conclusions or inferences about characteristics of
populations based on data from a sample.
17
17/153
Statistical Inference
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Population
Sample
Inference
Statistic
Parameter
What can we infer about a Populations Parameters
based on a Samples Statistics?
MTH410 S14- Lecture 1
18/153
Statistical Inference(Contd)
We use statistics to make inferences about parameters.
Therefore, we can make an estimate, prediction, or
decision about a population based on sample data.
Then, we can apply what we know about a sample to the
larger population from which the sample was drawn!
What is the purpose or/and which kind of benefits
19/153
20/153
21/153
Confidence Level
+ Significance Level
=1
MTH410 S14- Lecture 1
22/153
23/153
Graphical Descriptive
Techniques
2014/5/8
24/153
Agenda
Types of Data and Information
Graphical and Tabular Techniques for Nominal
Data
Graphical Techniques for Interval Data
Variables
MTH410 S14- Lecture 1
25/153
Definitions
A variable is some characteristic of a population or sample.
Typically denoted with a capital letter: X, Y, Z
E.g. student marks. No all students achieve the same mark. The
marks vary from student to student, so the name variable.
Values of a variable are all possible observations of the variable.
E.g. student marks: all integers between 0 and 100.
Data are the observed values of a variable.
E.g. marks of 6 students in an exam: {67, 74, 71, 83, 93, 48}
26/153
27/153
28/153
29/153
Interval Data
Real numbers, i.e. weights, prices,
distance, etc.
Also called as quantitative or numerical.
Arithmetic operations can be performed on
Interval Data, so its meaningful to talk
about 2*Weight, or Price + $1.5, and so on.
30/153
Nominal Data
The values of nominal data are categories.
E.g. responses to questions about marital status, coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
31/153
Ordinal Data
Ordinal Data appear to be nominal, but their values have an
order, a ranking to them.
E.g. The most active stocks traded on the NASDAQ in
descending order
MSFT = 1, CSCO = 2, Dell = 3, SunW = 4
Any other numbering system is valid provided the order is
maintained.
E.g. Another coding system as valid as the previous one:
MSFT = 6, CSCO = 11, Dell = 23, SunW = 45
We can say something like the number of stocks traded from:
Microsoft > Cisco or Sun Microsystems < Dell
It is still not meaningful to do arithmetic operations on this kind of data (e.g.
does 2*MSFT = CSCO?!).
MTH410 S14- Lecture 1
32/153
33/153
34/153
Categorical?
Interval
Data
Ordinal
Data
Ranked?
Categorical
Data
Nominal
Data
MTH410 S14- Lecture 1
35/153
36/153
Hierarchy of Data
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Higher
level
may be
treated
as lower
level(s)
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent categories.
Only calculations based on the frequencies of occurrence are valid.
Data may not be treated as ordinal or interval.
MTH410 S14- Lecture 1
37/153
Data
Categorical?
Interval Data
e.g. integers in {0..100}
Y
Ranked?
Categorical
Data
Ordinal Data
e.g. {F, D, C, B, A}
Nominal Data
e.g. {Pass | Fail}
NO ranked order to data
MTH410 S14- Lecture 1
38/153
Agenda
Types of Data and Information
Graphical and Tabular Techniques for
Nominal Data
39/153
40/153
Relative frequency
It is often preferable to show the relative frequency
(proportion) of observations falling into each class, rather
than the frequency itself.
Class frequency
Total number of observations
41/153
Class width
It is generally best to use equal class width, but
sometimes unequal class width are called for.
Unequal class width is used when the frequency
associated with some classes is too low. Then,
42/153
43/153
Example
1. Budweiser Light
2. Busch Light
3. Coors Light
4. Michelob Light
5. Miller Lite
6. Natural Light
7. Other brand
The responses were recorded using the codes. Construct a
frequency and relative frequency distribution for these data
and graphically summarize the data by producing a bar
chart and a pie chart.
MTH410 S14- Lecture 1
44/153
Example
1
1
5
1
3
3
3
7
2
6
1
6
3
4
5
2
5
5
2
1
1
5
1
2
3
3
6
1
5
5
4
1
3
1
6
3
1
1
2
1
1
2
1
1
5
1
2
1
3
7
6
3
7
4
4
2
4
3
5
1
1
1
3
1
4
3
6
1
1
1
1
3
5
5
3
7
6
5
1
7
2
5
3
5
7
5
3
5
1
3
3
1
5
3
5
5
3
1
3
3
4
1
5
5
6
3
6
1
3
2
1
3
1
1
6
1
5
1
5
1
3
3
5
3
6
3
6
3
1
1
1
7
1
5
4
6
1
1
5
5
5
3
6
2
4
7
6
1
1
3
5
1
3
3
6
6
1
3
2
3
1
3
3
1
4
3
5
3
7
1
5
1
5
3
5
2
2
7
3
3
3
1
5
6
6
7
6
7
5
1
5
1
1
3
5
3
1
3
1
3
1
1
3
1
5
2
1
7
3
7
5
2
5
1
5
3
5
1
1
5
3
5
5
1
2
1
1
2
2
5
1
4
4
1
5
3
6
6
3
3
7
3
5
4
1
5
6
1
1
5
5
1
5
5
3
1
1
3
6
1
5
1
5
5
1
7
3
1
1
6
5
1
3
3
1
1
1
1
1
7
1
5
1
1
5
45/153
Frequency
Relative Frequency
Budweiser Light
90
31.6%
Busch Light
19
6.7
Coors Light
62
21.8
Michelob Light
13
4.6
Miller Lite
59
20.7
Natural Light
25
8.8
Other brands
17
6.0
Total
285
100
2014/5/8
46/153
90
90
80
70
62
59
60
50
40
25
30
19
20
17
13
10
0
1
47/153
7
6%
1
31%
5
21%
4
4%
2
7%
3
22%
48/153
Nominal Data
Light Beer Brand
Frequency
Relative Frequency
Budweiser Light
90
31.6%
Busch Light
19
6.7
Coors Light
62
21.8
Michelob Light
13
4.6
Miller Lite
59
20.7
Natural Light
25
8.8
Other brands
17
6.0
6
9%
7
6%
100
90
90
1
31%
80
70
62
59
60
50
5
21%
40
25
30
4
4%
2
7%
3
22%
19
20
17
13
10
0
1
49/153
Agenda
Types of Data and Information
Graphical and Tabular Techniques for
Nominal Data
50/153
51/153
Building a Histogram
Example The marketing manager of a long-distance telephone
company conducted a survey of 200 new costumers wherein the
first months bills are recorded. What information can be extracted
from those data?
This manager was only able to find that the smallest bill is $0, and
the largest bill is $119.63, and most of bills are less than $100
However, there is a lot of information may be more interesting.
Bill distribution,
Are there many small bills and few large bills?
What is the typical bill?
Are the bills somewhat similar or different?
MTH410 S14- Lecture 1
52/153
Building a Histogram(Contd)
1) Collect the Data
2014/5/8
53/153
Building a Histogram(Contd)
1) Collect the Data
2) Create a frequency distribution for the data
a) Determine the number of classes to use. [8]
54/153
Building a Histogram(Contd)
1) Collect the Data
2) Create a frequency distribution for the data
a) Determine the number of classes to use. [8]
b) Determine how large to make each class. [15]
c) Place the data into each class
each item can only belong to one class;
55/153
Building a Histogram(Contd)
1) Collect the Data
2) Create a frequency
distribution for the data.
56/153
Building a Histogram(Contd)
1) Collect the Data
2) Create a frequency distribution for the data.
3) Draw the Histogram
57/153
Example : Interpret
About half of all
the bills are small
71+37=108
80
60
40
120
105
90
75
60
45
20
15
30
Frequency
Relatively, large
number of bills
are large
18+28+14=60
Bills
MTH410 S14- Lecture 1
58/153
59/153
Shapes of histograms
There are four typical shape characteristics
60/153
Shapes of Histograms
Variable
2014/5/8
Frequency
Frequency
Frequency
Symmetry
A histogram is said to be symmetric if, when we
draw a vertical line down the center of the histogram,
the two sides are identical in shape and size:
Variable
Variable
61/153
Shapes of Histograms(Contd)
Skewness
A skewed histogram is one with a long tail extending to
either the right or the left:
Negatively skewed
Positively skewed
MTH410 S14- Lecture 1
62/153
Shapes of Histograms(Contd)
Unimodal
Frequency
Frequency
Modality
A unimodal histogram is one with a single peak,
while a bimodal histogram is one with two peaks:
Bimodal
Variable
Variable
63/153
Modal classes
A modal class is the one with the largest number of
observations.
A unimodal histogram
64/153
Modal classes
A bimodal histogram
A modal class
A modal class
65/153
Frequency
Variable
Bell Shaped
MTH410 S14- Lecture 1
66/153
67/153
Stem Leaf
42
19
4
68/153
0000000000111112222223333345555556666666778888999999
000001111233333334455555667889999
0000111112344666778999
001335589
124445589
33566
3458
022224556789
Thus, we still have access to our
334457889999
original data points value!
00112222233344555999
001344446699
124557889
MTH410 S14- Lecture 1
69/153
2014/5/8
70/153
Ogive
(pronounced Oh-jive) is a graph of
a cumulative relative frequency distribution.
We create an ogive in three steps
First, from the frequency distribution created earlier,
calculate relative frequencies
2014/5/8
71/153
Relative Frequencies
For example, we had 71 observations in the first class (telephone
bills from $0.00 to $15.00). Hence, the relative frequency for this
class is 71 200 (the total # of phone bills) = 0.355 (or 35.5%)
2014/5/8
72/153
Ogive(Contd)
is a graph of a cumulative frequency distribution.
We create an ogive in three steps
1) Calculate relative frequencies.
2) Calculate cumulative relative frequencies by adding
the current class relative frequency to the previous
class cumulative relative frequency.
(For the first class, its cumulative relative frequency is just its relative
frequency)
2014/5/8
73/153
Cumulative Relative
Frequencies
first class, just itself
next class: .355+.185=.540
Always or by chance?
2014/5/8
74/153
Ogive(Contd)
is a graph of a cumulative frequency distribution.
1) Calculate relative frequencies.
2) Calculate cumulative relative frequencies.
3) Graph the cumulative relative frequencies.
2014/5/8
75/153
Ogive(Contd)
The ogive can be
used to answer
questions like:
What telephone bill
value is at the 50th
percentile?
around $35
2014/5/8
76/153
Agenda
Types of Data and Information
Graphical and Tabular Techniques for
Nominal Data
Graphical Techniques for Interval
Data
Describing Time-Series Data
2014/5/8
77/153
78/153
Example
We recorded the monthly average retail
price of gasoline since 1978.
Draw a line chart to describe these data
and briefly describe the results.
2014/5/8
79/153
Example
3.5
3
2.5
2
1.5
1
0.5
0
1
2014/5/8
25
49
73
97
121 145 169 193 217 241 265 289 313 337
80/153
Agenda
Types of Data and Information
Graphical and Tabular Techniques for
Nominal Data
Graphical Techniques for Interval Data
Describing Time-Series Data
81/153
82/153
Example
In a major North American city there are four competing
newspapers: the Post, Globe and Mail, Sun, and Star.
To help design advertising campaigns, the advertising
managers of the newspapers need to know which segments of
the newspaper market are reading their papers.
A survey was conducted to analyze the relationship between
newspapers read and occupation.
A sample of newspaper readers was asked to report which
newspaper they read: Globe and Mail (1) Post (2), Star (3),
Sun (4), and to indicate whether they were blue-collar worker
(1), white-collar worker (2), or professional (3).
2014/5/8
83/153
Example
By counting the number of times each of the 12 combinations occurs,
we produced the Table
Occupation
Newspaper
Blue Collar White Collar
Professional
Total
G&M
27
29
33
89
Post
18
43
51
112
Star
38
21
22
81
Sun
37
15
20
72
Total
120
108
126
354
2014/5/8
84/153
Example
If occupation and newspaper are related, then there will be differences in
the newspapers read among the occupations. An easy way to see this is
to covert the frequencies in each column to relative frequencies in each
column. That is, compute the column totals and divide each frequency by
its column total.
Occupation
Newspaper
Blue Collar
White Collar
Professional
G&M
27/120 =.23
29/108 = .27
33/126 = .26
Post
18/120 = .15
43/108 = .40
51/126 = .40
Star
38/120 = .32
21/108 = .19
22/126 = .17
Sun
37/120 = .31
15/108 = .14
20/126 = .16
2014/5/8
85/153
Example
Interpretation: The relative frequencies in the columns 2 & 3 are similar,
but there are large differences between columns 1 and 2 and between
columns 1 and 3.
similar
dissimilar
This tells us that blue collar workers tend to read different newspapers
from both white collar workers and professionals and that white collar and
professionals are quite similar in their newspaper choice.
2014/5/8
86/153
2014/5/8
87/153
Agenda
Types of Data and Information
Graphical and Tabular Techniques for Nominal
Data
Graphical Techniques for Interval Data
Describing Time-Series Data
Describing the Relationship Between Two
Variables
Two Nominal Variables
2014/5/8
88/153
2014/5/8
89/153
Example
A real estate agent wanted to know to what extent the selling
price of a home is related to its size. To acquire this
information he took a sample of 12 homes that had recently
sold, recording the price in thousands of dollars and the size
in hundreds of square feet. These data are listed in the
accompanying table. Use a graphical technique to describe
the relationship between size and price.
Size
Price
2014/5/8
23
18 26 20 22 14
33 28 23 20 27 18
315 229 355 261 234 216 308 306 289 204 265 195
90/153
Example
It appears that in fact there is a relationship,
that is, the greater the house size the greater
the selling price
2014/5/8
91/153
Non-Linear Relationship
2014/5/8
No Relationship
92/153
Summary
Interval
Data
Single Set of
Data
Relationship
Between
Two Variables
2014/5/8
Nominal
Data
Histogram, Ogive
Frequency and
Relative Frequency
Tables, Bar and Pie
Charts
Scatter Diagram
Cross-classification
Table, Bar Charts
93/153
Agenda
Introduction
Measures of Central Location
Measures of Variability
Measures of Relative Standing
94/153
Measures of Variability
Range, Standard Deviation, Variance,
Coefficient of Variation
95/153
Agenda
Introduction
Measures of Central Location
96/153
97/153
98/153
Arithmetic Mean
The arithmetic mean, or average, simply
as mean, is the most popular & useful
measure of central location.
It is computed by simply adding up all the
observations and dividing by the total
number of observations:
Sum of the observations
Mean =
Number of observations
The arithmetic mean for a sample is denoted with an
x-bar:
MTH410 S14- Lecture 1
99/153
Notation
When referring to the number of
observations in a population, we use
uppercase letter N
When referring to the number of
observations in a sample, we use lower
case letter n
The arithmetic mean for a population is
100/153
Size
Population
Sample
Mean
101/153
Mean(Contd)
Population Mean
Sample Mean
102/153
Size
Population
Sample
Mean
103/153
Mean(Contd)
is appropriate for describing interval data,
e.g. heights of people, marks of student
papers, etc.
is seriously affected by extreme values
called outliers.
E.g. If Bill Gates moved into any
neighborhood, the average household income
for that neighborhood would increase
dramatically beyond what it was previously!
MTH410 S14- Lecture 1
104/153
10
x01 x72 ... x2210
i 1 xi
x
10
10
11.0
Example
Suppose the telephone bills of Example 2.1 represent
the population of measurements. The population mean is
x42.19
x38.45
... x45.77
i200
1
2
200
1 x i
43.59
200
200
MTH410 S14- Lecture 1
105/153
Properties of Mean
Calculated by using every data point.
Every interval data has a unique mean.
Sum of deviations from mean is 0.
Effected from extreme (very large or small)
values
Not meaningful for nominal or ordinal data.
Useful comparing 2 or more data sets.
106/153
Median
The median is calculated by placing all the observations in
order; the observation that falls in the middle is the median.
Data: {0, 7, 12, 5, 14, 8, 0, 9, 22}
N=9 (odd)
107/153
Properties of Median
Calculated by using only 1 or at most 2
values.
Every interval data has a unique median.
Not affected from extreme values.
Can be calculated for ordinal data as well,
but cant be interpreted as the centre of
location.
108/153
Mode
The mode of a set of observations is the value that
occurs most frequently. Sometimes we say
MODE = PEAK of a curve.
A set of data may have one mode (or modal class),
or two modes, or more modes.
Mode can be used for all data types, although mainly
used for nominal data.
For populations and large samples the modal class
is more preferable.
Sample and population modes are computed the same way.
109/153
Mode(Contd)
E.g. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33}
N=10
Which observation appears most often?
The mode for this data set is 0. How about
this as a measure of central location?
In a small sample, it may not be a good measure.
110/153
Mode(Contd)
The mode may be not unique, i.e. 2 modes for
bimodal data.
Note: if you are using Excel for your data
analysis and your data is multi-modal (i.e.
there is more than one mode), Excel only
calculates the smallest one.
111/153
Properties of Mode
Not affected from extreme values.
Multiple modes possible, hence not a good
measure of central location.
No mode exists sometimes, all observations
have the same value.
Can be calculated for nominal data as well,
but cant be interpreted as the centre of
location
MTH410 S14- Lecture 1
112/153
median
mean
113/153
mode
MedianMean
Mode
median
mean
Mean
Mode
Median
114/153
115/153
116/153
25 30
),
117/153
118/153
Geometric Mean
The geometric mean is used when the variable is a
growth rate or rate of change, such as the value of
an investment over periods of time.
For the given series of rate of
returns the nth period return is
calculated by:
R g n (1 R1 )(1 R 2 )...(1 Rn ) 1
MTH410 S14- Lecture 1
119/153
Finance Example
Suppose a 2-year investment of $1,000 grows by 100% to
$2,000 in the first year, but loses 50% from $2,000 back to
the original $1,000 in the second year. What is the average
return?
Using the arithmetic mean,
misleading
This would indicate having more than $1,000 at the end of the second
year, however in fact we only have $1,000.
Solving for the geometric mean yields a rate of 0%.
more precise
120/153
121/153
Agenda
Introduction
Measures of Central Location
Measures of Variability
122/153
Measures of variability
Measures of central location fail to tell
the whole story about the distribution.
A question of interest still remains
unanswered:
How much are the observations spread out
around the mean value?
123/153
Small variability
124/153
Small variability
Larger variability
125/153
Range
The range is the simplest measure of variability, and
calculated as:
Range = Largest observation Smallest observation
E.g. Data set: {4, 4, 4, 4, 4, 50} Range = 46
Data set: {4, 8, 15, 24, 39, 50}
Range = 46
The range is the same in both cases, but the data
sets have very different distributions
MTH410 S14- Lecture 1
126/153
Range(Contd)
? ? ?
Smallest
observation
Largest
observation
127/153
Variance
Variance and its related measure,
standard deviation, are arguably the most
important statistics. Used to measure
variability, they also play a vital role in
almost all statistical inference procedures.
Population variance is denoted by
(Lower case Greek letter sigma squared)
Sample variance is denoted by
(Lower case s squared)
MTH410 S14- Lecture 1
128/153
Size
Population
Sample
Mean
Variance
MTH410 S14- Lecture 1
129/153
Variance(Contd)
population mean
sample mean
130/153
Variance(Contd)
As you can see, you have to calculate the sample mean
in order to calculate the sample variance.
131/153
goodofmeasure
of be a
CanAny
the sum
deviations
should
agree
gooddispersion
measure of
variability?
with this observation.
9 10 11 12
Sum = 0
but
Themeasurements
mean of both in B
arepopulations
more dispersed
is 10...
than those in A.
B
4
9-10 = -1
11-10 = +1
8-10 = -2
12-10 = +2
10-10 = 0
10
13
16
4-10 = - 6
16-10 = +6
7-10 = -3
13-10 = +3
10-10 = 0
Sum = 0
132/153
10
)
(
9
10
)
(
10
10
)
(
11
10
)
(
12
10
)
2A
2
5
2
2
2
2
2
(
4
10
)
(
7
10
)
(
10
10
)
(
13
10
)
(
16
10
)
B2
18
5
133/153
B
1 2 3
Date set A:
{1, 1, 1, 1, 1
3, 3, 3, 3, 3}
134/153
B
1 2 3
Date set A:
{1, 1, 1, 1, 1
3, 3, 3, 3, 3}
135/153
B
1
2 3
136/153
Application
Example
The following sample consists of the
number of jobs six students applied for:
17, 15, 23, 7, 9, 13.
Finds its mean and variance.
What are we looking to calculate?
137/153
Sample Variance
138/153
Standard Deviation
The standard deviation is the square root of
the variance.
Population standard deviation:
Sample standard deviation:
139/153
Size
Population
Sample
Mean
Variance
Standard
Deviation
MTH410 S14- Lecture 1
140/153
MAD
i 1
( xi x )
n
141/153
142/153
Agenda
Introduction
Measures of Central Location
Measures of Variability
Measures of Relative Standing
143/153
40%
Your score
Note: The 60th percentile doesnt mean you scored 60% on the
exam. It means that 60% of your peers scored lower than you on
the exam..
MTH410 S14- Lecture 1
144/153
Quartiles
We have special names for the 25th, 50th, and 75th
percentiles, namely quartiles.
The first or lower quartile is labeled Q1 = 25th percentile.
The second quartile, Q2 = 50th percentile (also the
median).
The third or upper quartile, Q3 = 75th percentile.
We can also convert percentiles into quintiles (fifths) and
deciles (tenths).
MTH410 S14- Lecture 1
145/153
Q1 Q2
Q3
Q1
Q3
>
<
Positively skewed
histogram
Q2
Negatively skewed
histogram
MTH410 S14- Lecture 1
146/153
= 10th percentile
= 25th percentile
= 50th percentile
= 75th percentile
= 90th percentile
147/153
Location of Percentiles
The following formula allows us to
approximate the location of any percentile:
P
LP (n 1)
100
whereLP is the location of the P th percentile
148/153
Location of Percentiles(Contd)
Given the data :
0 0 5 7 8 9 12 14 22 33
Where is the location of the 25th percentile?
0 0 5 7 8 9 12 14 22 33
149/153
Location of Percentiles(Contd)
What about the upper quartile?
L75 = (10+1)(75/100) = 8.25
0 0 5 7 8 9 12 14 22 33
It is located one-quarter of the distance between the eighth
and the ninth observations, which are 14 and 22, respectively.
One-quarter of the distance is: (.25)(22 - 14) = 2, which
means the 75th percentile is at: 14 + 2 = 16
150/153
Location of Percentiles(Contd)
position
2.75
16
0 0 5 7 8 9 12 14 22 33
position
8.25
3.75
Lp determines the position in the data set where the percentile value
lies, not the percentile itself.
We have already shown how to find the Median, which is the 50th
percentile. It is the 5.5th observation, (8+9)/2=8.5 The 50th percentile
is halfway between the fifth and sixth observations (in the middle
between 8 and 9), that is 8.5.
50
L 50 (10 1)
100
5.5
151/153
Interquartile Range
The quartiles can be used to create another
measure of variability, the interquartile range,
which is defined as follows:
Interquartile range = Q3 Q1
The interquartile range measures the spread of the
middle 50% of the observations.
Large values of this statistic mean that the 1st and 3rd
quartiles are far apart indicating a high level of
variability.
MTH410 S14- Lecture 1
152/153
1. It is a summary.
153/153