Professional Documents
Culture Documents
January 26, 28
Natalia Tchetcherina
Statistics.
Statistics as a subject provides a body of principles and
methodology for designing the process of data collection,
summarizing and interpreting the data, and drawing conclusions or
generalities.
Examples.
Employment. Monthly, as part of the Current Population Survey,
the Bureau of Census collects information about employment
status from a sample of about 65,000 households. Households are
contacted on a rotating basis with three-fourths of the sample
remaining the same for any two consecutive months.
The survey data are analyzed by the Bureau of Labor Statistics,
which reports monthly unemployment rates.
Natalia Tchetcherina
Statistics.
Examples.
Gallup Poll. This, the best known of the national polls, produces
estimates of the percentage of popular vote for each candidate
based on interviews with a minimum of 1500 adults. Beginning
several months before the presidential election, results are regularly
published. These reports help predict winners and track changes in
voter preferences.
Natalia Tchetcherina
Statistics.
Examples.
Making in medical research studies. Heart decease is the most
common cause of death in the industrialized nations. In the US
and Canada nearly 30 % of deaths each year are due to heart
deceases, mainly heart attack. Does regular aspirin intake reduces
deaths from heart attacks? The Harvard Medical School
conducted a landmark study to investigate. The people
participating in the study regularly took either aspirin or placibo (a
tablet with no active ingredient). Of those who took aspirin 0.9%
suffered heart attacks during the study. Of those who took placibo
1.7 % had heart attacks. Could we conclude that its beneficial for
people to take aspirin?
Natalia Tchetcherina
Types of Variables
Natalia Tchetcherina
Branches of statistics.
Natalia Tchetcherina
Categorical data
Numerical data.
Blood type:O, A, B, AB
Natalia Tchetcherina
Categorical data
Numerical data.
Natalia Tchetcherina
Categorical data
Numerical data.
Categorical Data.
Discrete Data.
Continuous data.
Frequency table.
Categorical Data.
Discrete Data.
Continuous data.
Frequency table.
Daily numbers (x) of internet system crashes.
Data: 1,3,1,1,0,1,0,1,1,0,2,2,0,0,0,1,2,1,2,0,0,1,6,4,3,3,1,2,4,0.
Value x
0
1
2
3
4
5
6
Total
Frequency
9
10
5
3
2
0
1
30
Relative Frequency
.300
.333
.167
.100
.067
.000
.033
1.000
Natalia Tchetcherina
Categorical Data.
Discrete Data.
Continuous data.
Natalia Tchetcherina
Categorical Data.
Discrete Data.
Continuous data.
Find the minimum and the maximum values in the data set.
Choose intervals or cells of equal length that cover the range
between the minimum and the maximum without overlapping.
These are called class intervals, and their endpoints class
boundaries.
Count the number of observations in the data that belong to
each class interval. The count in each class is the class
frequency or cell frequency.
Calculate the relative frequency of each class by dividing the
class frequency by the total number of observations in the
data:
Natalia Tchetcherina
STAT400.
Chapter
1. Overview and Descriptive Statistics
Class
frequency
Categorical Data.
Discrete Data.
Continuous data.
Example.
Natalia Tchetcherina
Categorical Data.
Discrete Data.
Continuous data.
Example.
Frequency Distribution for Bookstore Sales Data
(left endpoints included, but right endpoints
excluded)
Class Interval
$ 0125
125250
250375
375500
500625
Total
Frequency
5
8
13
11
3
40
Relative Frequency
5/40 = .125
8/40 = .200
13/40 = .325
11/40 = .275
3/40 = .075
1.000
Natalia Tchetcherina
Categorical Data.
Discrete Data.
Continuous data.
Example.
Natalia Tchetcherina
1 X
0 N 0 1 N1 2 N2
vi =
+
+
= 0.92
500
500
500
500
i=1
Median
The sample median of a set of n measurements x1 , x2 . . . , xn is the
middle value when the measurements are arranged from smallest
to largest. It is denoted as x
How to compute the median
1. Order the data from smallest to largest.
2. When the number of observations n is ODD the median is
middle observation of the ordered sample.
3. When the number of observations n is EVEN, two
observations from the ordered sample fall in the middle, and
the median is their average.
Natalia Tchetcherina
Natalia Tchetcherina
Percentiles
The sample 100 p-th percentile is a value such that after the data
are ordered from smallest to largest, at least 100p% of the
observations are at or below this value and at least 100(1 p)%
are at or above this value.
Calculating the Sample 100p-th Percentile.
1. Order the data from smallest to largest.
2. Determine the product (sample size) (proportion) = np.
3. If np is not an integer, round it up to the next integer and
find the corresponding ordered value.
4. If np is an integer, say k, calculate the average of the kth and
(k + 1)st ordered values.
Natalia Tchetcherina
Sample Quartiles
Natalia Tchetcherina
N
1 X
(vi )2 .
N
i=1
The standard
deviation is the positive square root of the
variance: = 2 .
If the rv X denotes a randomly selected value from the statistical
population, then a synonymous terminology for the population
variance is variance of X , and is denoted by X2 , or Var(X ).
Natalia Tchetcherina
N
1 X 2
vi 2
N
i=1
Natalia Tchetcherina
Boxplot
Natalia Tchetcherina