Professional Documents
Culture Documents
analysis
A. Dhandapani
The difficulty faced by many in understanding statistics is mostly due to
misunderstanding/not able to understand the terms used in statistics. For
example, significant in English implies sufficiently great or important to be
worth of noting. Statistical significant on the other hand tells us the
likelihood of obtaining the results merely through chance. This hand-out
provides brief explanation of most commonly used terms in statistics.
The origin of the word statistics implies that it is the study of human
population within a political union (State). However, Statistics can be
regarded (i) as Study of Populations; (ii) as study of variations and (iii) as
methods of reduction of data (Fisher, 1954 Statistical Methods for
Research Workers). When we collect information on assets owned from
100, 000 households in a state, we are not interested in individual
households. We are interested in the assets owned by the households. So,
we consider them as the population of assets rather than the population
of households. Thus, in statistics, we study the Population or aggregates
of individuals, either living or materials/things. The term Population in
statistics is not restricted only to living or materials, but even any
measurements measured indefinitely, can be regarded as a population.
The number of units in the population is called Population Size. The
population size may be finite, such as number of households or infinite
such as number of vehicles arriving at a particular traffic signal every
minute. Even when the population size is finite, it may be considered as
infinite, such as number of fishes in a lake. Also, when the population is
finite, another aspect worth noting is whether the individuals can be
identified.
When there is no variability in the units of the population, there is no need
to study the entire population, instead one can study only one individual
from the population. However, the variability is everywhere and we need
to study these variations. The measurement taken on individuals of the
population is called as Variate or as Variable. The possible values taken
by a variable can be binary such as gender of the child born (Male/Female)
or discrete such as number of children in a household (0,1,2,...) or
continuous such as height of the children.
While studying a population, we are usually not interested in individuals in
the population, particularly when there is variability and the size of the
population is high or infinite. The interest is to describe or identify the
relationships between the population units etc. The population is
from
these
web
http://bama.ua.edu/~jleeper/627/choosestat.html
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/
pages:
&
Populatio
n & No. of
variables
One
Population,
one
variable
One
Population,
Two levels
(independe
nt)
One
Population,
two groups
(dependent
or
matched)
One
Population,
more than
2 groups
One
Population,
more than
2
groups
(dependent
/ matched)
Type of Variables
Analysis
SAS
Procedure
R
Function
(Package)
Single sample
t-test (mean)
TTEST
t.test
Single sample
Median test
Nominal
Chi-Square
Goodness of fit
FREQ
chisq.test
TTEST
t.test
Wilcoxon Man
Whitney test
UNIVARIATE wilcox.test
UNIVARIATE wilcox.t
est
Nominal
Chi-Square test
FREQ
chisq.test
Paired t-test
TTEST
t.test
Wilcoxon
signed rank
UNIVARIATE wilcox.t
est
Nominal
McNeamer test
FREQ
One
ANOVA
way
Kruskal
test
Wallis
Nominal
Chi-Square test
FREQ
chisq.te
st
One
way
repeated
measurements
ANOVA
GLM
lm &
Anova
(Require
car &
foreign
packages)
Friedman Test
FREQ
friedman
.test
Nominal
Repeated
measures
Logistic
Regression
GENMOD
glemer
(requires
lme4)
GLM(or
ANOVA)
mcnemar.
test
Aov
NPAR1WAY Kruskal.tes
t
Populatio
n & No. of
variables
One
dependent
& two or
more
independe
nt
variables
One
dependent
and
one
Independe
nt
variables
2 or more
related
variables
Type of Variables
Analysis
SAS
Procedure
R
Function
(Package)
Dependent -Ratio or
Interval;
independent
ordinal/nominal
Factorial
ANOVA
GLM
anova
Dependent -Ratio or
Interval
;
independent
ordinal/nominal
&
ratio/interval
Analysis
of
Covariance/
Multiple
Regression
GLM
REG
lm
aov
Dependent ordinal
or
nominal;
independent
ordinal/nominal/rati
o/interval
Logistic
Regression
LOGSTIC
glm
Ratio/Interval
Correlation
Simple
Regression
CORR
cor
Ordinal/Interval
Nonparametric
Correlation
CORR
cor
Nominal
Simple logistic
LOGISTIC
glm
Ratio/Interval
Factor Analysis
FACTOR
fa (require
psych)