Professional Documents
Culture Documents
Political science:
How accurate are the election polls?
Two Broad Categories of Statistics
Descriptive Statistics
Inferential Statistics
Two Broad Categories of Statistics
Descriptive Statistics
- used to describe a mass of data in a clear, concise, and informative way.
- deals with the methods of organizing, summarizing, and presenting data.
Population
Describes Sample
Results
Sample to Population
Sub-groups Compares
Correlations
Central Tendency
Summarizes
Spread
Two Broad Categories of Statistics
Inferential Statistics
- concerned with making generalizations about the characteristics of a
larger set where only a part is examined.
Smaller set
(n units or observations)
Larger set
(N units or observations)
Inferences and
Generalizations
CHAPTER 2
THE POPULATION
A. BASIC CONCEPTS
DATA
- Facts and figures that are collected, presented, and analyzed
- Can be numeric or non-numeric
- Must be contextualized
UNIVERSE
- A collection or set of all individuals or entities whose characteristics are to
be studied.
- Answers the question “Who?”
Types of Universe
1. Finite - when the elements of the universe can be counted for a given time
period.
VARIABLE
- Attribute or characteristic of interest measurable on each and every unit of the universe
- Answers the question “What?”
Qualitative
Variable Discrete
Quantitative
Continuous
A. BASIC CONCEPTS
Types of Variables
1. Qualitative
- assumes values that are not numerical but can be categorized.
- categories may be identified by either non-numerical descriptions or by
numeric codes.
2. Quantitative
- indicates the quantity or amount of a characteristic
- data are always numeric
- can be discrete or continuous
A. BASIC CONCEPTS
1. Discrete
- variable with a finite or countable number of possible values
2. Continuous
- variable that assumes any value in a given interval
A. BASIC CONCEPTS
UNIVERSE POPULATION
U1 Y1
U2 Y2
U3 Y3
... ...
... ...
UN YN
A. BASIC CONCEPTS
SAMPLE
UNIVERSE / POPULATION
A. BASIC CONCEPTS
LEVELS OF MEASUREMENT
Nominal
Ordinal
Interval
Ratio
A. BASIC CONCEPTS
NOMINAL
Name, gender
A. BASIC CONCEPTS
ORDINAL
INTERVAL
RATIO
Age (in years), weekly food allowance (in peso), height (in cm)
Statistical analysis depends on the variable’s level of measurement.
Objective Method
Subjective Method
Use of Existing Records
B. METHODS OF DATA COLLECTION
OBJECTIVE METHOD
SUBJECTIVE METHOD
This method uses data which have been previously collected by another person or
institution for some other purpose.
B. METHODS OF DATA COLLECTION
TYPES OF DATA
1. Primary
- data which were acquired directly from the source.
2. Secondary
- data which were not acquired directly from the source.
Methods of Data Presentation
1. Textual
2. Tabular
3. Graphical
C. METHODS OF DATA PRESENTATION
1.TEXTUAL
2.TABULAR
✓ Data are organized into classes or categories by rows and/or columns and appropriate
pieces of information are found in the cells of the table.
✓ Relatively more information can be presented and trends can be easily seen.
✓ Some details are lost when data are summarized in tabular form.
PARTS OF A STATISTICAL TABLE
Body
Stubs/Classes
C. METHODS OF DATA PRESENTATION
3. GRAPHICAL PRESENTATION
80
70
60
Weight (kg)
50
40
30
20
10
0
155 160 165 170 175 180
Height (cm)
STEM AND LEAF PLOT
✓ Best for small number of observations with values greater than zero
90 80 75 80 80 80 90
90 100 100 75 70 80 80
STEPS IN CONSTRUCTING
A STEM-AND-LEAF PLOT
90
1. Arrange data in ascending or 80
75
descending order. 80
80
70 75 75 80
80 80 80 90
90
80 80 80 100
90 90 90 100
75
100 100 70
80
80
STEPS IN CONSTRUCTING
A STEM-AND-LEAF PLOT
90
2. Split each datum into a leaf value, which is 80
75
the last digit, and a stem value, which
80
consists of the remaining digits. 80
80
Examples: 90
90
100
75 100
Stem Leaf
100 75
70
80
80
STEPS IN CONSTRUCTING
A STEM-AND-LEAF PLOT
90
3. List the stems vertically in 80
increasing or decreasing order. 75
80
4. Draw a vertical line to the right of 80
the stems. 80
90
7 90
8 vertical line 100
Stem 100
9
75
10 70
80
80
STEPS IN CONSTRUCTING
A STEM-AND-LEAF PLOT
90
80
5. For each stem, write its leaves to the 75
right of the vertical line in ascending 80
order. 80
80
90
7 0 5 5 90
8 0 0 0 0 0 0 100
STEM AND LEAF PLOT 100
9 0 0 0 75
10 0 0 70
80
Figure 1. Distribution of the performance scores received by the vehicles in a car show. 80
STEM AND LEAF PLOT
IMPORTANT FEATURES:
36 53 41 52 56
25 25 21 36 42
60 40 50 54 46
43 51 55 53
30 53 54 56
Make a Stem and Leaf Plot based on these given
values
D. DESCRIPTIVE MEASURES
DESCRIPTIVE MEASURES
➢ Measures of Location
Summarizes a data set by giving a “typical value” within the range of the
data values that describes its location relative to entire data set.
x i
x1 x2 ... xN
= i 1
=
N N
2 -1 0 2
Answer: 0.75
A random sample of ten students is taken from the student body of
a college and their GPAs are recorded as follows:
1.90 3.00 2.53 3.71 2.12 1.76 2.71 1.39 4.00 3.33
Answer: 2.645
➢ Median
x N21 , if N is odd,
Md x N x N
1
2 2
, if N is even.
2
S O M E P RO P E RTIE S O F T H E M E D I AN
Example:
Consider the following scores of 10 students in SA #4
10 10 10 8 8 10 5 8 10 10
Mo = 10
SOME PROPERTIES OF THE MODE
1. Find the mean and the median for the LDL cholesterol level
in a sample of ten heart patients.
2. Find the mean and the median for the LDL cholesterol level
in a sample of ten heart patients on a special diet.
Where 500* indicates that the fifth mouse survived for at least 500 days
but the survival time (i.e., the exact value of the observation) is unknown.
a. Can you find the sample mean for the data set? If so, find it. If not, why
not?
b. Can you find the sample median for the data set? If so, find it. If not, why
not?
PERCENTILES
j
L= ×N
100
3.
(a) If L is a whole number, then Pj is the mean of the data
values in position L and position L + 1.
❖ These values divide an array into ten equal parts, each part
having ten percent of the data values, denoted by Dj.
❖ The 1st decile is the 10th percentile; the 2nd decile is the
20th percentile and so on.
DECILES
Remember:
In computing for the deciles, follow the procedure in the
computation of the equivalent percentiles.
D1 P10 D6 P60
D2 P20 D7 P70
D3 P30 D8 P80
D4 P40 D9 P90
D5 P50 D10 P100
QUARTILES
❖ These values divide an array into four equal parts, each part
having 25% of the data values, denoted by Qj.
Remember:
Similarly, in the computation of the quartiles we use the
procedure in the computation of the equivalent percentiles.
Q1 P25
Q2 P50 = Median D5
Q3 P75
Q4 P100
ILLUSTRATION
Group A Group B
6 5 6 7 8 4 2
7 1 3 4 7 7 7 5 4 8 Mean: 71.5
6 2 7 Median: 72.0
Mean: 71.5 Mode: 77.0
Median: 72.0 7 7 7
Mode: 77.0 8 5
9 3
10 0
D. DESCRIPTIVE MEASURES
➢ Measures of Dispersion
✓ The higher the value, the greater the variability in the data set.
R = MAX – MIN
IQR = Q3 – Q1
x
2
i
2 i 1
N
where xi = ith observation in the data set,
= mean of the data set,
N = total number of observations in the data set
PROPERTIES OF VARIANCE
✓ Always non-negative
2
CV 100%
PROPERTIES OF COEFFICIENT OF VARIATION
✓ Unitless
✓ The higher the CV, the more variable is the data set
relative to its mean
CHEBYSHEV’S RULE
✓ For any data set with mean (μ) and standard deviation (σ),
the following statements apply:
= Md = Mo = Md = Mo
MEASURE OF SKEWNESS
SK > 0 SK < 0
There are some extremely high values in the data set There are some extremely low values in the data set
MEASURE OF KURTOSIS
x
4
i
K i 1
3
N
4
MEASURE OF KURTOSIS
K=0
mesokurtic
K>0
leptokurtic
K<0
platykurtic
BOX AND WHISKERS PLOT
Step 1:
Draw a rectangular box with its left
Q1 Md Q3
edge at the Q1 and its right edge at the
Q3 , so the box length is the IQR.
Q3
Q3 +1.5 IQR
Q1- 1.5 IQR Q3+1.5 IQR Q3
Md
Q1 Md Q3 Q1
Q1- 1.5 IQR
STEPS IN CONSTRUCTING A
BOX-AND-WHISKERS PLOT
Notes:
✓ An observation beyond ± 1.5 IQR is an outlier.
✓ If the largest and smallest data values are outliers,
extend whiskers until 1.5 IQR from either ends of
the box.
STEPS IN CONSTRUCTING A
BOX-AND-WHISKERS PLOT
Q3 +1.5 IQR
Q1- 1.5 IQR Q3+1.5 IQR
w
Q3
h
i
s
Q1 Md Q3 k
Md e
r
Q1 s
whiskers
Q1- 1.5 IQR
STEPS IN CONSTRUCTING A
BOX-AND-WHISKERS PLOT
Step 4: Represent each outlier by a dot. For outliers having the same
values place the dots one on top of the other.
Q3 +1.5 IQR
Md
Q1 Q3 Q1
Md
Q1- 1.5 IQR
COMPARISON OF DISTRIBUTIONS USING
BOX-AND-WHISKERS PLOTS
Data Set A
Data Set B
COMPARISON OF DISTRIBUTIONS USING
BOX-AND-WHISKERS PLOTS
Number Data Minimum = 113
1 113 Maximum = 136
2 116
3 119 1st Quartile = 124
2nd Quartile = 126.5
4 121
3rd Quartile = 130
5 124
6 124
7 125
8 126
9 126
10 126
11 127
12 127
13 128
14 129
15 130
16 130
17 131
18 132
19 133
20 136
ILLUSTRATION