You are on page 1of 15

STA-2023

Statistics for Business and Economics


Text Book: McClave, Benson and Sincich, 12th edition
Vocabulary (Revised: Oct. 2013)

Chapter 1: Statistical Methods


Statistics is a branch of science dealing with methods used
for collecting, organizing, summarizing, analyzing and
interpreting data sets
Descriptive Statistics consists of the procedures used to
organize and summarize data sets, as well as to describe
their major characteristics
Inferential Statistics consists of the procedures used to
make estimations and decisions about a population based
on the information contained in a representative sample
Population is the set of all units (subjects or objects) of
interest in any statistical study
Sample is a subset of units chosen from the defined
population with the purpose of making a statistical
inference about the population
Representative Sample is a sample that reflects the
relevant characteristics of the population. Representative
samples can be obtained by using sampling techniques

Simple Random Sampling is the most basic probability


sampling technique. It involves a single list of all units of
the population which are given an equal chance to be
included in the sample
Census is a type of statistical study conducted on the
entire population
Sampling survey is a type of statistical study that involves
sample units and a questionnaire
Units are the individuals (subjects or objects) included in
any statistical study
Variables are characteristics that vary from one unit to
another
Qualitative (Categorical) data are the observations
describing an attribute or categorical characteristic of the
individuals
Quantitative data are the observations/measurements
describing a numerical characteristic of the individuals
Data/Data Set is the set of all observations/measurements
collected for one or more variables on a particular set of
units

Chapter 2: Descriptive Statistics

Frequency table is a way of organizing and summarizing the


information contained in a data set
Elements of a frequency table
Classes
Class limits
Class boundaries
Frequency or count frequency
Relative frequency
Percent frequency
Frequency graphs for categorical data
Bargraph: unattached bars on a rectangular system
Pareto chart: bargraph where the bars are arranged in
decreasing order of frequency from left to right
Piechart: circle graphs
Frequency graphs for quantitative data
Histogram: connected bars on a rectangular coordinate
system
Stem & Leaf plot
Frequency Distribution Curves

Frequency Distribution Curve for a quantitative data set


is a smooth curve that fits the relative frequency
histogram.

Three typical patterns of frequency curves:

Bell curve is symmetric and mound shaped


Skewed to the right has a long right tail
Skewed to the left has a long left tail

Measures of Central Tendency (Center)


Mean is the simple average calculated over all data points
Median is a value located at the middle of the distribution
when the data points are arranged in order
Mode is the most frequent or repeated data point
Modal Class of grouped data is the class (category or interval)
with the highest frequency
Measures of Variability (Spread)
Range is the distance between the endpoints (highest and
lowest values) of the data set
Standard deviation is a measure of the average distance
of all data values relative to the mean
Outliers are extremely high or low data values disconnected
from the rest of the data set
Chebyshev and Empirical Rules provide the expected percent
of data values falling within 1, 2, and 3 standard deviations of
the mean
Percentiles are measures of relative standing that describe the
percent of data points falling below or at any given data value

Quartiles (Q1, Q2, Q3) are special percentiles that divide the
data set in four (evenly weighted) subsets
Interquartile Range (IQR) is the distance between the first and
third quartile (Q3 - Q1). It describes the spread of the central
50% of the data set.
Five Number Summary is a way of describing a data set using
five special percentiles (Min, Q1, Q2, Q3, Max)
Box and Whiskers Plot is the graphical representation of the
Five Number Summary

Chapter 3: Probability

Random Experiment is an observable activity whose outcome


can not be predicted with certainty
Sample Space is the set of all basic outcomes of a random
experiment
Sample points are the elements of the sample space
Event is any subset of basic outcomes of a random experiment
Impossible Event is an event containing no sample points
Certain Event is an event containing all sample points of the
sample space
Tree Diagram is a graphical tool used to determine the sample
space of random experiments
Venn Diagram is a way of graphically portraying the sample
space and various events
Mutually Exclusive Events are events that do not share any
sample point
Compound Events:
Intersection of two events A and B is the compound event
containing the sample points that belong to both A and B
Union of two events A and B is the compound event
containing the sample points that belong to either A or B

Complement of any event A is the compound event


containing the sample points that belong to S (sample
space) and do not belong to A
Conditional Probability is a probability calculated on a
reduced sample space. This reduced sample space is given by a
pre-established event or condition
Independent Events are events such that the occurrence of one
of them does not affect the probability of the other
Contingency Table is a two-way table containing frequency
data on two categorical variables
Probability Tree is a tree diagram involving probabilities of
given events
Probability Rules:
Addition

P(A U B) = P(A) + P(B) P(A B)

Complement

P( AC ) = 1 P(A)

Conditional

P(AB) = P(A B) / P(B)

Multiplication

P(A B) = P(A) P(BA) or


P(A B) = P(B) P(AB)

Chapter 4, Part I: Discrete Probability Distributions

Random Variable is a numerical variable whose values are


associated with a random experiment and therefore cannot be
predicted with certainty
Types of random variables: Discrete and Continuous.
Discrete random variables are random variables defined
on isolated real numbers. They are typically used for
counting.
Continuous random variables are random variables
defined on a line interval of real numbers. They are
typically used for measuring.
Discrete Probability Distribution is a table, graph or formula
assigning probabilities to each value of a discrete random
variable.
Probability Histogram is a graphical representation of a
discrete probability distribution, associating the heights of bars
with the given probabilities.
Point-line probability graph is a graphical representation of a
discrete probability distribution, associating the heights of
vertical lines with the given probabilities.
Mean of a discrete probability distribution is the expected
value of any given discrete random variable X. The expected
value of X takes into account not only the X-values but also
their associated probabilities.

Standard Deviation of a discrete probability distribution


describes the variability of any discrete random variable X
relative to the mean . It takes into account not only the
deviation of X-values relative to the mean but also their
associated probabilities.
Binomial experiment is a random experiment involving a
number of identical and independent trials in which there are
only two possible outcomes (success and failure).
Binomial random variable is a discrete random variable
describing the number of successes in a binomial experiment.
Parameters of the binomial probability distribution are the
number of trials n and the rate of success p (probability of
success for each trial).
Poisson experiment is a random experiment in which the
number of occurrences of a given event during a specified
period of time is observed. The occurrences of the event are
assumed to be random and independent one to another.
Poisson random variable is a discrete random variable
describing the number of occurrences of a given event during a
specified period of time.
Parameter of the Poisson probability distribution is the
population or historical mean number of occurrences of the
given event during a specified period of time.

Chapter 4, Part II: Continuous Probability Distributions


Continuous random variable is a random variable that can
assume any value inside an interval of real numbers.
Density curve is a smooth curve associated with the relative
frequency graph of a continuous random variable
Continuous Probability Distribution is the probability model
for a continuous random variable where probabilities are
calculated as areas under the associated density curve
Normal random variable is a continuous random variable with
a density curve that is smooth, symmetric, and bell-shaped.
Normal or Bell Curve is the density curve for a normal
random variable
Normal probability distribution is the probability model for a
normal random variable.
Parameters of a normal probability distribution are the mean
and standard deviation of the associated normal random
variable.
Normal population is a population where a normal random
variable has been defined.
Standard normal variable is a normal random variable with a
mean of zero and standard deviation of one.
Z-scores are values of the standard normal variable. They
indicate the number of standard deviations that any raw score
(or value of any normal random variable) deviates from the
mean.

Chapter 5. Sampling Distributions

Review of basic concepts from chapter 1

Inferential Statistics
Population
Sample
Representative sample
Simple random sampling

Parameter is a descriptive numerical measure of the


population. Parameters are fixed numbers usually unknown
because the associated population is very large
Statistic is a descriptive numerical measure of a sample.
Statistics are used to estimate parameters and vary from
sample to sample.
Sampling Distribution is the probability distribution (model)
associated with any statistic when repeated samples (of the
same size) are drawn from the defined population.
Central Limit Theorem is a statistical property stating that the
sampling distribution of the sample mean is approximately
normal when the sample size is large enough.

Chapter 6. Estimation with confidence intervals

Estimation is the process of estimating or predicting the value


of a population parameter using a random sample and an
estimator.
Estimator is a formula or statistic defined on sample data with
the purpose of estimating a parameter.
Estimate is a numerical result obtained by substituting the
sample data on any given estimator.
Types of estimates: point estimate and interval estimate.
Point estimate consists of a single figure used to predict the
value of a population parameter.
Interval estimate consists of a numerical range where the
parameter is expected to fall with certain confidence.
Confidence coefficient is a probability that measures the
reliability of any interval estimate.
Confidence level is the confidence coefficient expressed as a
percentage.
Confidence interval is an interval estimate calculated with a
specified confidence level.
Margin of error is a measure of the error of estimation
involving the given confidence level and sample size.

Precision of any confidence interval is associated with the


margin of error of the estimate. The precision is better as the
margin of error is smaller. Hence, the narrower the confidence
interval the more precise the interval estimate is.

Chapter 7. Tests of Hypotheses based on a single sample


Research hypothesis is a statement or claim about a
population parameter that can be tested using sample
data.
Statistical hypotheses are the null and alternative
hypotheses.
Alternative hypothesis (Ha) describes the research
hypothesis of the problem.
Null hypothesis (Ho) describes the opposite of the
alternative hypothesis.
Test Statistic is a formula that summarizes the statistical
evidence collected in any test of hypotheses.
Rejection region is the set of values of the test statistic
indicating sufficient/convincing evidence against Ho.
Type I error consists of rejecting the null hypothesis Ho
when Ho is actually true.
Type II error consists of failing to reject the null
hypothesis Ho when Ho is actually false.
Alpha designates the probability of type I error.
Beta designates the probability of type II error.

p-value is a probability that measures the strength of the


evidence against Ho (that is, in favor of Ha). The p-value
of any statistical test describes the observed probability of
type I error.

You might also like