You are on page 1of 31

STATISTICS FOR MANAGEMENT

UNIT – I
INTRODUCTION TO STATISTICS AND PROBABILITY

STATISTICS
Plural Meaning : Data or Numerical Statements
Singular Meaning : Science of Statistics or Statistical Methods.

Definition:
“Statistics is the science of collection, organization, presentation, analysis and
interpretation of numerical data”.- Dr.S.P. Gupta.
“Statistics may be defined as the science of collection, presentation, analysis and
interpretation of numerical data” – Croxton & Cowdon.
Types of Statistics:
1) Descriptive - It describes the characteristics of data.
2) Inferential/Inductive – It helps in drawing inferences on the characteristics of the
population.
Other Divisions of Statistics:
 Analytical – To study the relationship between variables.
 Applied - Application of Statistical Methods in real life problems.
Functions of Statistics:
1. Collection
2. Numerical Presentation
3. Diagrammatic Presentation
4. Condensation
5. Comparison
6. Forecasting
7. Policy Making
8. Effect Measuring
9. Estimation
10. Tests of Significance
Importance of Statistics in Business:
1. Location and Size of operations
2. Demand Assessment
3. Production Planning

1
4. Quality Control
5. Marketing Decisions
6. Business Expansion Plans
7. Efficient Inventory Management
8. Personnel Administration
9. Accounts and Auditing
10. Operations Research
11. Banking Sector
Limitations of Statistics:
1. Statistics does not deal with qualities.
2. Statistics does not consider a single item.
3. All the values should not be the same.
4. Inductive logic is applied
5. Statistical results are not exact.
6. Statistics is one of the methods of studying a problem
7. Statistics can be misused.
Collection of Data:
Primary Data- First hand information
Method:
 Direct personal investigation
 Indirect oral investigation
 By local Reports
 By Schedules and Questionnaires
Secondary Data – Already Existing Data
Method:
 Official Publication of Government
 Reports
 Records

Classification of Data:
Process of arranging data into sequences and group according to their common characteristics
Types:
1. Geographical
2. Chronological
3. Quantitative
4. Qualitative
Types of variables:
A variable in statistical methods stands for any measurable quantity which can assume
a range of numerical values within certain limits.
For Example: age, income, height, weight, price are all variables.

2
1. Discrete Variable:
A discrete variable is characterized by jumps and gaps between one value and
the next. For Example, the number of tables in the Restaurant or number of rooms in
a house cannot assume fractional values.
2. Continuous Variable:
These variables which can take all possible values in a given specified range
are termed as continuous variables. For example, the age of students in a school,
heights, weights, etc.,
Organizing Data:
FREQUENCY DISTRIBUTION

UNIVARIATE BIVARIATE

DISCRETE CONTINUOUS CUMULATIVE

FREQUENCY DISTRIBUTION:
Frequency distribution is a series when a number of observations with
similar or closely related values are put in separate bunches or groups, each group
being in order of magnitude in a series. It is simply a table in which the data are
grouped into classes and the number of cases which fall in each class are
recorded. It shows the frequency of occurrence of different values of a single
Phenomenon.

a) Discrete (or) Ungrouped frequency distribution:


In this form of distribution, the frequency refers to discrete value. Here the
data are presented in a way that exact measurement of units are clearly indicated.

b) Continuous (or) Grouped Frequency distribution:


In statistics a Frequency distribution - continuous is an arrangement of the
values that one or more variables take in a sample. Each entry in the table contains the

3
frequency or count of the occurrences of values within a particular group or interval, and in
this way, the table summarizes the distribution of values in the sample.
In a Frequency distribution - continuous, scores falling within various ranges are tabulated.
No. of Classes K = 1 + 3.322 log10 N
N = Total no of observations / Values
Range
Magnitude of Class intervals i 
k
PRESENTATION OF DATA:
 DIAGRAMS
 GRAPHS
TYPES OF DIAGRAMS

ONE TWO THREE PICTOGRAMS


DIMENSIONAL DIMENSIONAL DIMENSIONAL CARTOGRAMS

Simple Bar Diagrams Rectangle Cubes


Multiple Bar Diagrams Square
Sub Divided Diagrams Circle
Percentage Bar Diagrams Pie

TYPES OF GRAPHS
HISTOGRAM
FREQUENCY POLYGON
FREQUENCY CURVE
OGIVES
FREQUECNY LINES
DIAGRAMATIC AND GRAPHICAL REPRESENTATION:
Diagrams:
A diagram is a visual form for presentation of statistical data, highlighting
their basic facts and relationship.
Significance of Diagrams and Graphs:
4
Diagrams and graphs are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
Types of diagrams:
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height is used and
the width is not considered. These diagrams are in the form of bar or line charts and can
be classified as
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Two-dimensional Diagrams:
In one-dimensional diagrams, only length 9 is taken into account. But in two-
dimensional diagrams the area represent the data and so the length and breadth have
both to be taken into account. Such diagrams are also called area diagrams or
surface diagrams. The important types of area diagrams are: 1. Rectangles 2. Squares 3.
Pie-diagrams
Three-dimensional diagrams:
Three-dimensional diagrams, also known as volume diagram, consist of cubes,
cylinders, spheres, etc. In such diagrams three things, namely length, width and
height have to be taken into account.
Pictograms and Cartograms:

5
Pictograms are not abstract presentation such as lines or bars but really depict the
kind of data we are dealing with. Pictures are attractive and easy to comprehend and
as such this method is particularly useful in presenting statistics to the layman.
Cartograms or statistical maps are used to give quantitative information as a
geographical basis. They are used to represent spatial distributions. The quantities on
the map can be shown in many ways such as through shades or colours or dots or
placing pictogram in each geographical unit.
Graphs:
A graph is a visual form of presentation of statistical data. A graph is more
attractive than a table of figure. Even a common man can understand the message
of data from the graph. Comparisons can be made between two or more phenomena
very easily with the help of a graph.
1.Histogram
2. Frequency Polygon
3.Frequency Curve
4. Ogive
5. Lorenz Curve
DESCRIPTIVE MEASURE
A Well planned data classification facilitates easy description of the hidden data
characteristics using a variety of summary measures.
 Measures of Central Tendency
 Dispersion
 Skewness
 Kurtosis
 Moments
MEASURES OF CENTRAL TENDENCY

MATHEMATICAL AVERAGES POSITIONAL AVERAGES

A.M G.M H.M MEDIAN MODE

6
MEASURES OF CENTRAL TENDENCY:
Measures of central tendency is a single number describing some features of
a set of data – Walli’s and Roberts
“An average stands for the whole group of which it forms a part yet represents the whole.”
Characteristics for a good or an ideal average :
The following properties should possess for an ideal average.
1. It should be rigidly defined.
2. It should be easy to understand and compute.
3. It should be based on all items in the data.
4. Its definition shall be in the form of a mathematical formula.
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7. It should be capable of being used in further statistical computations or
processing.
Arithmetic mean or mean :
Arithmetic mean or simply the mean of a variable is defined as the sum of the
observations divided by the number of observations. If the variable x assumes n
values x1, x2 …xn then the mean, x, is given by
FORMULAE:

X 
x i
for Individual Case
n

X 
 fx for Discrete Case
N

X  A
 fd * c for Continuous Case where d
xA
N c

This formula is for the ungrouped or raw data.


Weighted Arithmetic Mean :
For calculating simple mean, we suppose that all the values or the sizes of items in the
distribution have equal importance. But, in practical life this may not be so. In case
some items are more important than others, a simple average computed
is not representative of the distribution. Proper weight age has to be given to the
various items. For example, to have an idea of the change in cost of living of a certain
group of persons, the simple average of the prices of the commodities consumed by
7
them will not do because all the commodities are not equally important, e.g rice,
wheat and pulses are more important than tea, confectionery etc., It is the weighted
arithmetic average which helps in finding out the average value of the series after

giving proper weight to each group. Weighted Average Mean X 


 wx
w
Harmonic Mean (H.M) :
Harmonic mean of a set of observations is defined as the reciprocal of the arithmetic
average of the reciprocal of the given values. If x1,x2…..xn are n observations,
FORMULAE:
n
H .M  for Individual Case
1 x
N
H.M 
f for Discrete Case
 x
N
H.M 
f for Continuous Case
 m

H.M 
w
for Weighted Mean
fw
Geometric Mean :
th
The geometric mean of a series containing n observations is the n root of the
product of the values. If x1,x2…, xn are observations then
FORMULAE:

G.M  Anti log


 log x for Individual Case
n

G.M  Anti log


 f log x for Discrete Case
N

G.M  Anti log


 f log m for Continuous Case
N

G.M  Anti log


w log x
for Weighted Mean
N
Combined Mean :
If the arithmetic averages and the number of items in two or more related groups are

8
known, the combined or the composite mean of the entire group can be obtained by
x1 n1  x 2 n 2
X 12 
n1 n2

The advantage of combined arithmetic mean is that, we can determine the over,
all mean of the combined data without going back to the original data.
Positional Averages:
These averages are based on the position of the given observation in a
series, arranged in an ascending or descending order. The median and mode are called
the positional measures of an average.
Median :
The median is that value of the variate which divides the group into two equal
parts, one part comprising all values greater, and the other, all values less than median.
Ungrouped or Raw data :
Arrange the given values in the increasing or decreasing order. If the number of
values are odd, median is the middle value .If the number of values are even, median is
the mean of middle two values.
Grouped Data:
In a grouped distribution, values are associated with frequencies. Grouping can
be in the form of a discrete frequency distribution or a continuous frequency
distribution. Whatever may be the type of distribution , cumulative frequencies have to
be calculated to know the total number of items.

Cumulative frequency: (CF)


Cumulative frequency of each class is the sum of the frequency of the class and
the frequencies of the pervious classes, ie adding the frequencies successively, so that
the last cumulative frequency gives the total number of items.
FORMULAE:
th
n 1
M  item
2
for Individual Case
n n
M  &  1 th item
2 2
th
N 1
M  item for Discrete Case
2

9
 N  C .F 
M  L   2 *C
 for Continuous Case
 F
 

10
Mode:
Mode is the value which has the greatest frequency density.
FORMULAE:
Z = Repeated Value for Individual Case
Z = Highest Frequency Corresponding X Value for Discrete Case
 f1  f 2 
Z  L    * C for Continuous Case
 2 f1  f 0  f 2 
Relationship between Averages:
In a symmetrical distribution Mean, Median and Mode will coincide. Mean = Median = Mode. In
a asymmetrical (skewed) distribution these values will be different.
FORMULAE:
1
Mean – Median = (Mean – Mode)
3
Mode = 3 Median – 2 Mean
Mean – Mode = 3(Mean – Median)

MEASURES OF DISPERSION
The measures of dispersion report on how far the values of the distribution are from the center.
The measures of dispersion are:
MEASURES OF DISPERSION

ABSOLUTE MEASURE RELATIVE MEASURE

RANGE Q.D M.D S.D CO.EFF. CO.EFF. CO.EFF. CO.EFF.


OF RANGE OF Q.D OF M.D VARIANCE
Range:
The range is the difference between the highest and lowest data of a statistical distribution.
Range = L – S where L = largest value; S = smallest value.
LS
Coefficient of Range 
LS
Quartile deviation:
Quartile deviation or semi-interquartile range is the dispersion which shows the degree of
spread around the middle of a set of data. Since the difference between third and first quartiles is
called interquartile range therefore half of interquartile range is called semi-interquartile range
also known as quartile deviation.
FORMULAE:
Q3  Q1
Quartile Deviation 
2
Q3  Q1
Coefficient of Quartile Deviation 
Q3  Q1
th
 N  1
Q1    item
 4 
th For Individual Case:
 3( N  1) 
Q3    item
 4 

 ( N  1 )  C.F 
Q  L   4  *C
Where 1   For Continuous and Discrete case:
F
 
 (3( N  1) )  C.F 
 4 
Q3  L    *C
 F 
 
Mean Deviation:
A measure of variability equal to the average of the absolute values of a set of deviations from a
specified value, usually the arithmetic mean.
Case I: For Ungrouped Data AND Case II: For Grouped data :
Let x1, x2, x3, …, xn occur with frequencies f1, f2, f3, ,fn respectively and let Σf = n and M can be
either Mean or Median, then the mean deviation is given by the formula.
In this case the mean deviation is given by the formula
FORMULAE:

M .D 
 xx for Individual Case
n

M .D 
f xx
for Discrete Case
N

M .D 
 f mx for Continuous Case
N
M .D
Coefficient of M .D 
Mean
Standard Deviation and Variance:
The standard deviation is the square root of the variance.
The square of the standard deviation of a frequency distribution is called the variance of the
frequency distribution.
FORMULAE:

Method Individual Discrete Continuous


Actual
 x  x  f  x  x  f m  x
2 2 2
Mean    
n N N
Method

Direct x 2
x
2
 fx 2
  fx 
2
 fm 2
  fm 
2

    
    
    

Method n  n  N  N  N  N 

Short
 fd   fd  fd   fd
2 2
2
 2

 d    d 
2
2
       
Cut   N  N  N  N 
n  n     
 
Method Where d  x  A Where d  m  A

S .D
Coefficient of Variation (CV) Method CV  * 100
Mean

Variance 2  

N1 1  N 2 2  N 3 3  N1d1  N 2 d 2  N 3 d 3
2 2 2 2 2 2
 123 
N1  N 2  N 3
Combined Standard
Where d1  x1  x123
Deviation:
d 2  x2  x123
d 3  x3  x123

Skewness:
It is the degree of asymmetry or departure from symmetry of a distribution.

Mode(z) Median Mean Mean(x) Median Mode(z)


Positively Skewed Negatively Skewed
FORMULAE:
Absolute Measures Relative Measures
Mean  Mode
Sk p 
S .D
Mean – Mode Karl Pearsons
3( Mean  Median)
Sk p 
S .D
Bowley’s Coefficient of Skewness

(Q3  M )  (m  Q1 ) Mean  Mode


Sk p 
S .D
3( Mean  Median)
Sk p 
S .D

Moments:
Moments are the mean of various powers of deviation of the items. If the deviations
are measured from the AM, the moments are called Central Moments.
If the deviations are calculated from values other than AM the moments are called Raw
Moments.
FORMULAE:
Moments Measures Formula
First Moment
Mean x  A  1
about the Origin
Second Moment
 2   2   1 
2
Variance
about the Mean
Third Moment
Skewness 3  3  3 2 1  2 1  3
about the Mean
Fourth Moment
Kurtosis  4   4  4 3 1  6  2  1  2  3 1  4
about the Mean

Central Moments Individual Discrete Continuous


First Moment about the Origin  1 0  x  x  f  x  x  f m  x
n N N
Second Moment about the Mean
 x  x  f  x  x  f m  x
2 2 2

 2  2 n N N

  x  x  f  x  x  f m  x
3 3 3

Third Moment about the Mean  3


n N N
 x  x  f  x  x  f m  x
4 4 4

Fourth Moment about the Mean  4


n N N

Raw Moments Individual Discrete Continuous


 1   x  A  f  x  A  f  m  A
n N N
  x  A  f  x  A  f  m  A
2 2 2
 2
n N N
  x  A  f  x  A  f  m  A
3 3 3
 3
n N N
  x  A  f  x  A  f  m  A
4 4 4
 4
n N N

Kurtosis:
It is the degree of peakedness of a distribution, usually taken relative to a normal
distribution.
FORMULAE: Measures of Kurtosis:
2

(i) 1  3 3
2
4
(ii)  2  2
2
4
(iii)  2  2
 3 ie.,  2  3
2

Normal Curve Flat Topped Curve More Peaked Curve


2  3 &  2  0 2  3 & 2  0 2  3 &  2  0
Mesokurtic Platykurtic Laptokurtic

PROBABILITY
Definition:
It is a numerical measure of the likelihood of an occurrence of event. It is a measure of
the degree of uncertainty associated with random numbers.
Random Experiment:
An experiment which can be repeated any number of times under the same conditions,
but does not give unique results.(Trial)
Sample Space:
A set of all possible outcomes of a random experiment is called sample space.
Event/Sample Point:
Each outcome of a random experiment is called Event/Sample point.
Mutually Exclusive Events:
Two or more events are said to be mutually exclusive, if the occurrence of one will
affect the occurrence of other in the same trial.
Independent Events:
Two or more events are said to be independent events when the outcome of one does
not affect or is affected by the others. Example, if a coin is tossed twice, the result of 2 nd throw
would in no way be affected by the result of the 1 st throw.
Probability:
Probability of an event is ratio of number of favorable events to total number of events.
ie., if A is an favorable event and S is a sample space, then probability of event A is

n( A)
P ( A) 
n( S )

Note:
1. If A is any event in a random experiment then 0  p( A)  1
2. If A1,A2,………….An are mutually exclusive events, then
P ( A1  A2  ..............  An )  P( A1 )  P ( A2 )  ...............  P ( A n )

3. If A1,A2,………….An are independent events, then


P( A1  A2  ..............  An )  P( A1 ) * P( A2 ) * ............... * P( A n )

Problems:
1. Find the probability of getting number 5 while throwing a die?
S={1,2,3,4,5,6}
Let A be the event of getting Number 5.
n(A) = 1
n(s) = 6
n( A) 1
Then, P( A)  n( S )  6

2. What is the probability of getting the sum as 5 while throwing a die twice?
S = {(1,1) (1,2) (1,3)…….
(2,1)…..
(3,1)……
(4,1)…..
(5,1)….
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)}
Let A be the event of getting the sum as 5.
A = {(1,4) (2,3) (3,2) (4,1)}
n(A) =4 n(s) = 36 and
n( A) 4 1
Then, P ( A)  n( S )  36  9

3. From a pack of 52 cards, What is the Probability of getting a king?


Let A be the event of getting a king card.
n(A) =4
n(s) = 52
n( A) 4 1
Then, P ( A)  n( S )  52  13

Axioms of Probability:
Let S be the Sample space. A be an event. P be a real valued function defined on P(S).
P(A) is called the probability of A if P satisfies the following Axioms.
1.For every event A, . 0  P ( A)  1
2. .
3. If A and B are mutually exclusive events the P ( A  B )  P( A)  P( B )
Theorem :
1. Probability of an impossible event is zero. Ie.,P()=0. Where  is the impossible event.
2. Complementation law:
P(A) = 1 − P(Ac)
3. Additive law:
P(A  B) = P(A) + P(B) − P(A ∩ B)
4. Moreover, if A and B are mutually exclusive, then P(AB) = 0 andP(A ∪ B) = P(A) + P(B)
5. Multiplicative law (Product rule)
P(A ∩ B) = P(A).P(B)
P(A ∩ B) = P(B|A)P(A)
6. Moreover, if A and B are independent
P(AB) = P(A)P(B)
Conditional probability:
The conditional probability of an event B is the probability that the event will occur given
the knowledge that an event A has already occurred. This probability is written P(B|A), notation
for the probability of B given A. In the case where events A and B are independent (where event
A has no effect on the probability of event B), the conditional probability of event B given event
A is simply the probability of event B, that is P(B).
If events A and B are not independent, then the probability of the intersection of A and B (the
probability that both events occur) is defined by
P ( A and B)  P( A) . P( B / A) From this definition, the conditional probability P(B|A) is

easily obtained by dividing by P(A):

Bayes Theorem:
Let B1, B2 ……. Bn are mutually exclusive and exhaustive events and A is an event related to Bi,
then the conditional probability of Bi’s under the condition that A has already occurred is
P( A / Bi ). P ( Bi )
P Bi / A)   n

 P( A / B ) . P( B )
i 1
i i

Note that the union of all of the As (A1, A2, ... An) = the total sample space, so they cover every
possibility.
Example:
There are two bags containing balls of various colors. A bag is selected at random and a ball taken
from it at random. The probability of picking a blue ball out of bag 1 is ½ . The probability of
picking a blue ball from bag 2 is ¼ . If the experiment is carried out and a blue ball is selected,
what is the probability that bag 2 was selected?
Let A2 be the event that bag 2 was selected and let A1 be the event that bag one was selected. Let
B be the event that a blue ball is chosen. Then, using Bayes' Theorem:
P(A2|B) = P(B|A2)P(A2) .
P(B|A1)P(A1) + P(B|A2)P(A2)
Now, P(B|A2) = probability of picking a blue ball given that bag A2 is selected = ¼ from the
question.
Similarly, P(B|A1) = ½ .
P(A1) = probability of selecting bag 1 = ½ = P(A2)
Hence P(A2|B) = ¼ × ½.
½×½+¼×½ = 1/3 .
Random Variable:
A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon. There are two types of random variables, discrete and
continuous.
Discrete Random Variable:
A random variable, whose set of possible values is either finite or countably infinite, is called a
discrete random variable. A probability distribution is a table of values showing the probabilities
of various outcomes of an experiment.
For example, if a coin is tossed three times, the number of heads obtained can be 0, 1, 2 or
3. The probabilities of each of these possibilities can be tabulated as shown:

Number of Heads 0 1 2 3
Probability 1/8 3/8 3/8 1/8
A discrete variable is a variable which can only take a countable number of values. In this
example, the number of heads can only take 4 values (0, 1, 2, 3) and so the variable is discrete.
The variable is said to be random if the sum of the probabilities is one.
Probability Mass Function:
Let X be a discrete random variable taking the values xi’ s , i = 1, 2, 3, …… with a corresponding

p(xi)’s for I = 1, 2 , 3 …….. We say p (xi)’s as P.M..F of X, if (i) p ( x i )  0 (ii)  p( x i ) 1

(iii) P(xi) = P(x = xi) = Pi


Continuous Random Variable:
A random variable X is said to be continuous random variable, if it takes the values in the interval.
A continuous random variable is a random variable where the data can take infinitely many
values. For example, a random variable measuring the time taken for something to be done is
continuous since there are an infinite number of possible times that can be taken.
For any continuous random variable with probability density function f(x), we have that:


all x
f ( x) dx  1

Example:
X is a continuous random variable with probability density function given by f(x) = cx for 0 ≤ x ≤
1, where c is a constant. Find c.
If we integrate f(x) between 0 and 1 we get c/2. Hence c/2 = 1 (from the useful fact above), giving
c = 2.
Probability Density Function (P.D.F):
If X is a continuous random variable with probability function f(x) satisfying the conditions:
(i ) f ( x)  0

(ii ) 

f ( x) dx  1 then f ( x) is called P.D.F

Probability Distribution Function:


If X is discrete or continuous p function p ( x i  x ) is called probability distribution function of x.
Cumulative Distribution Function (c.d.f.)
If X is a continuous random variable with p.d.f. f(x) defined on a ≤ x ≤ b, then the cumulative
distribution function (c.d.f.), written F(t) is given by:

PROBABILITY DISTRIBUTION:

Probability Functions:
A probability function is a function which assigns probabilities to the values of a random variable.
 All the probabilities must be between 0 and 1 inclusive
 The sum of the probabilities of the outcomes must be 1.
If these two conditions aren't met, then the function isn't a probability function. There is no
requirement that the values of the random variable only be between 0 and 1, only that the
probabilities be between 0 and 1.

Probability Distributions:
A listing of all the values the random variable can assume with their corresponding
probabilities make a probability distribution.
A note about random variables. A random variable does not mean that the values can be
anything (a random number). Random variables have a well defined set of outcomes and well
defined probabilities for the occurrence of each outcome. The random refers to the fact that the
outcomes happen by chance -- that is, you don't know which outcome will occur next.
Here's an example probability distribution that results from the rolling of a single fair die.

x 1 2 3 4 5 6 sum
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 6/6=1

Discrete Distribution:
If a random variable is a discrete variable, its probability distribution is called a discrete
probability distribution.
Binomial Distribution:
A binomial experiment is an experiment which satisfies these four conditions
 A fixed number of trials
 Each trial is independent of the others
 There are only two outcomes
 The probability of each outcome remains constant from trial to trial.
These can be summarized as: An experiment with a fixed number of independent trials, each of
which can only have two possible outcomes. A binomial experiment has a fixed number of
independent trails. Each trial has only two outcomes. The fact that each trial is independent
actually means that the probabilities remain constant.
Examples of binomial experiments:
 Tossing a coin 20 times to see how many tails occur.
 Asking 200 people if they watch ABC news.
 Rolling a die to see if a 5 appears.
 Asking 500 die-hard Republicans if they would vote for the Democratic candidate. (Just
because something is unlikely, doesn't mean that it isn't binomial. The conditions are met -
there's a fixed number [500], the trials are independent [what one person does doesn't
affect the next person], and there's only two outcomes [yes or no]).
The Probability Mass Function of Binomial Distribution:
The probability of getting exactly x success in n trials, with the probability of success on a
single trial being p is:
P ( X  x)  nC x . p x . q n  x , x  0, 1, 2, ......

Where n = No. of trials


P = Probability of Success
Q = Probability of Failure. and P+Q=1
Properties of Binomial Distribution:
Mean µ = np
Variance 2 = npq
Standard Deviation  = npq

Poisson Distribution:
Named after the French mathematician Simeon Poisson, Poisson probabilities are useful when
there are a large number of independent trials with a small probability of success on a single trial
and the variables occur over a period of time. It can also be used when a density of items is
distributed over a given area or volume.
e   . x ,
P ( X  x)  x  0,1, 2, ......
x!
Where   np

Lambda in the formula is the mean number of occurrences. If you're approximating a binomial
probability using the Poisson, then lambda is the same as mu or n * p.
Properties of Poisson Distribution:
Mean µ = np = 
Variance 2 = 

Standard Deviation  = 

Example:
If there are 500 customers per eight-hour day in a check-out lane, what is the probability that there
will be exactly 3 in line during any five-minute period?
The expected value during any one five minute period would be 500 / 96 = 5.2083333. The 96 is
because there are 96 five-minute periods in eight hours. So, you expect about 5.2 customers in 5
minutes and want to know the probability of getting exactly 3.
p(3;500/96) = e^(-500/96) * (500/96)^3 / 3! = 0.1288 (approx)
Continuous Distribution:
If a random variable is a continuous variable, its probability distribution is called a
continuous probability distribution.
Uniform/Rectangular Distribution:
Let X be a continuous random variable defined on the interval [a, b] . then X is said to follow

1
uniform variate if X P.D.F is defined as f ( x)  , a xb
ba
(and f(x) = 0 if x is not between a and b) follows a uniform distribution with parameters a and b.
We write X ~ U(a,b)

Remember that the area under the graph of the random variable must be equal to 1.
Properties of Uniform Distribution:
If X ~ U(a, b), then:

E(X) = ½ (a + b)

Var (X) = (1/12)(b - a)2
Cumulative Distribution Function:
The cumulative distribution function can be found by integrating the p.d.f between 0 and t:
t
1 ta
F (t )   dx 
a
ba ba

Normal Distribution:

The Normal Distribution is also called the Gaussian distribution. It is defined by two
parameters mean ('average' m) and standard deviation (σ). A theoretical frequency distribution for
a set of variable data, usually represented by a bell-shaped curve symmetrical about the mean.

Let X be a continuous random variable the X is said to follow Normal distribution, if its p.d.f is

 x  2
1
defined as F ( x)  e 2 2
.  z 
 2
Properties:
 Normal Distribution is Bell-shaped
 Symmetric about mean
 Mean , Median and mode are coincide
 Skewness is zero
 Kurtosis is three
 It is a Continuous distribution.
 Never touches the x-axis
 Total area under curve is 1.00
 Approximately 68% lies within 1 standard deviation of the mean, 95% within 2 standard
deviations, and 99.7% within 3 standard deviations of the mean. This is the Empirical Rule
mentioned earlier.
Standard Normal Distribution:
A normal distribution with mean of standard deviation is one is called standard normal

 z2
1 x
distribution and its density function is defined as  ( x)  e 2
Where z   N (0, 1)
2 

 Mean is zero Total area = 1


 Variance is one
 Standard Deviation is one -∞ 0 ∞
 data values represented by z. 0.5 0.5
PROBLEMS

DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDENCY

1. The expenditure of 10 families in Rs. Are given below:

Family A B C D E F G H I J

Expenditure 30 70 10 75 500 8 42 250 40 36

Calculate the AM by Direct Method and Shortcut Method.


2. Calculate the mean number of persons per house. Given
No.of Persons/House 2 3 4 5 6

No.of Houses 10 25 30 25 10

Calculate the AM by Direct Method.


3. Find the AM using Assumed Mean Method.

Marks 40 50 54 60 68 80

No.of Students 10 18 20 39 15 8

4. Calculate the AM for the following

Marks 20-30 30-40 40-50 50-60 60-70 70-80

No.of Students 5 8 12 15 6 4

5. There are two branches of an establishment employing 100 and 80 persons respectively. If
the AM of the monthly salaries paid by the two branches are Rs.275 and Rs.225
respectively. Find the AM of the salaries of the employees of the establishment as a
whole.
6. The mean of 20 marks is found to be 40, later on it was discovered that a mark 53 was
misread as 83. Find the correct mean.
7. Calculate the weighted Average.

Items 68 85 101 102 108 110 112 113 124 128

Weight 1 45 31 1 11 7 23 17 14 14

8. Find the Median for the following 6,9,21,5,7,-2,0,32,9.


9. Find the Median for the following 57,58,61,42,38,65,72,66.
10. Find the Median for the following

Wage(Rs.) 50 75 100 150 250

No.of Labourers 8 14 10 5 3

11. Find the Median for the following

Height(cms) 145-150 150-155 155-160 160-165 165-170 170-175

No.of Students 2 5 10 8 4 1

12. Find the Mode for the following


i) 320,395,342,344,551,395,425,417,395
ii) 3,6,7,5,8,4,9
iii) 0,2,5,6,9,5,6,14,6,15,5,6,5

13. Determine the Mode.

Size of Dress 32 33 34 35 36 37 38 39 40 41
No.of Sets Produced 7 14 30 28 35 34 16 14 36 16

14. Find the modal size

Size of Shoes 3 4 5 6 7 8 9

No. of pairs sold 10 25 32 38 61 47 34

15. Find the mode

Class 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50

Frequency 22 45 67 73 85 190 64 55

16. Find the Mode.

C.I 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40

f 9 12 15 16 17 15 10 13

17. Determine the Mode.

No.of Days Absent <3 <6 <9 <12 <15 <18

No.of Students 10 25 38 48 51 52

18. Find the GM of 3,6,24,48

19. Find the GM for the following data

X 10 15 25 40 50

f 4 6 10 7 3

20. Find the GM for the following data

Marks 0-10 10-20 20-30 30-40 40-50 50-60

No.of Students 5 7 15 25 8

21. Find the GM for the following data

Commodity A B C D

Weight 1 6 3 2

Price 5 17 30 42
22. Find the H.M. 6,15,35,40,900,520,300,400,1800,2000
23. Find the Harmonic Mean for the following data.

X 10 12 14 16 18 20

f 5 18 20 10 6 1

24. Determine the H.M

Value 0-10 10-20 20-30 30-40 40-50

Frequency 8 12 20 6 4

25. Find the Harmonic Mean for the following data.

Value 1 2 5 10 20

Weight 2 5 10 5 2

26. Find Mean, Mode=64.2, Median=68.6.

27. Find Median, Mode=32.1, Mean=35.4

28. Find Mode, Mean=26.8, Median=27.9.

Measures of Dispersion

1. Find the Range, Quartile Deviation, Mean Deviation and Standard Deviation.
8,10,9,12,4,8,2

2. Find the Range, Quartile Deviation, Mean Deviation and Standard Deviation.

X 10 12 14 16 18 20 22

f 3 5 9 16 8 7 2

3. Find the Range, Quartile Deviation, Mean Deviation and Standard Deviation.

Marks 0-4 4-8 8-12 12-16

No.of Students 4 8 2 1

4. There are 20,30 and 50 employees in the three branches of a concern. Their mean salaries
are Rs.15, Rs.12 and Rs.18 thousand. SD of their salaries are Rs.3,Rs.5 and Rs.6 thousand
respectively. Find the mean salaries and the SD of salaries for the employees of the
concerns as a whole.
5. From the following price of gold in a week, find the city in which the price was more stable.

City A 498 500 505 504 502 509

City B 500 505 502 498 496 505

6. Goal scored by two teams A and B in a series of football matches were observed as follows.

No.of Goals Scored No. of Matches


in a Match Team A Team B

0 5 4

1 7 5

2 5 5

3 3 4

4 2 3

5 3 3

Which team A or B may be considered as a more consistent?

7. The marks in Business Maths of two sections of students of a college are given below:

Marks 20-30 30-40 40-50 50-60 60-70

Sec A 5 13 24 5 3

Sec B 7 14 25 12 2

Find in which section, the marks are more consistent.

8. The means and SD values for the no. of runs of two players A and B are 55 and 65, 4.2 and
7.8. Who is the more consistent player?

Skewness
1. Calculate Karl Pearson’s coefficient of skewness from the data given below:

Value 10 20 30 40 50 60 70

Frequency 1 5 12 22 17 9 4

Ans: Skp = 0.23

2. Assume that a firm has selected a random sample of 100 from its production line and has
obtained the data shown In table below:

Class 130-134 135-139 140-144 145-149 150-154 155-159 160-164


Interval

Frequency 3 12 21 28 19 12 5

Find Pearson’s coefficient of skewness.


Ans: Skp = 0.007

3. Calculate Bowley’s Coefficient of skewness from the data given below:

Wages(Rs.) 30-40 40-50 50-60 60-70 70-80 80-90 90-100

No.of Persons 1 3 11 21 43 32 9

Ans: SKB = -0.035

Moments
1. The first four moments of a distribution about the value of 5 of a variable are 2,20,40 and 50. Find
Arithmetic Mean, Variance, Skewness and Kurtosis.
Ans:Mean=7, Variance =16, Skewness =-64 and Kurtosis = 162

2. Find the first, second, third and fourth central moments for the set of numbers 2,4,5,6,8.
Ans:µ1=0, µ2=4, µ3=0, µ4=32.8

Kurtosis
1. The First four moments of a distribution about the origin are 1,4,10 and 46 respectively. Obtain
the various characteristics of the distribution on the basis of the information given. Comment
upon the nature of the distribution

Ans: Skewness =0 (Symmetric Distribution), Kurtosis = 3 (Mesokurtic)

Probability

4. Find the probability of getting number 5 while throwing a die?

5. What is the probability of getting the sum as 5 while throwing a die twice?

6. From a pack of 52 cards, What is the Probability of getting a king?

7. One card is drawn froma a pack of 52 cards. What is the probability that card being either
red or a king?

8. A person can hit a target in 4 out of 5 shots and another person can hit the target in 3 out of
4 shots. Find the probability that target being hit, when both try.

9. If A and B are independent events P(A)=0.4, P(B)=0.5. Find P(AUB).

10. If A and B are independent events P(A)=0.5 and P(B)=0.8. Find the probability that
neither of events occur P(AcUBc).
Conditional probability
1. When 2 dice are thrown, let A be the event that the sum of the points on the faces is odd
and B is the event that atleast one number is 2. Find the probability of P(A/B) and
P(B/A).

2. A coin is tossed twice. What is the conditional probability that both the coins show heads
given that the first one shows head.
Bayes Theorem
1. The contents of 3 boxes are as follows 1W,2B &3R balls, 2W,3B &1R balls and 3W,2B
&3R balls. A box is chosen at random and from it 2 balls are drawn at random of types
one black and one red. What is the probability that balls have been taken from box1,
box2, box3 respectively?

2. A box contains 2000 components of which 15% are defective. A II box contains 5000 of
which 25% are defective. Two other boxes contain 2 components each with 10%
defective components. A box is chosen at random and an item selected was found to be
defective. Find the probability that this has come from the second box.

3. In a bolt factory machine ABC manufacture respectively, 25%, 35% and 45% of total.
Their output 5%,4% &2% are defective bolts. A bolt is drawn at random and is found to
be defective. What is the probability that it was manufacturing by Machine A,B or C?
Random Variable

1. Let X be a discrete random variable with the following distribution function


X 1 2 3 4 5
P(X) 0.1 0.2 k 2k 0.1
Find (i) value of k (ii) P(X≤2) (iii)P(1≤X≤4)
(iv) P(X>4).

2. A random variable X has the following probability distribution


X 1 2 3 4 5
P(X) k 2k 3k k 5k
Find (i) value of k (ii) P(1<X≤4) (iii)P(X>2) (iv) P(X<4) (v) C.d.f.

3. Let X be a continuous random variable with the following p.d.f f(x)= kX2, 1<X≤2.
(i) Find K (ii) P(1<X<1.5)

4. The pdf of a Continuous random variable is defined as f(x)= AX, 1<X≤3.


Find the value of A.
Binomial Distribution
1. A coin is tossed 8 times. What is the probability that of getting:
i) Exactly 3 heads
ii) Atleast 2 heads
iii) Atmost 6 heads
iv) Getting atmost 2 heads

2. A die is thrown 10 times. What is the probability of getting


i) Exactly three 4’s
ii) Atleast two 4’s
iii) Atmost one 4
iv) Atleast eitht 4’s

3. The mean and variance of a binomial variate are 8&6. Find P(X≥2).
4. Comment on the following: X is a binomial variate with mean 6 and variance 10.

5. Seven coins are tossed and a number of heads noted. The experiment is repeated 128
times and the following distribution is obtained. Fit a binomial distr ibution assuming the
coin is unbiased.
No.of heads 0 1 2 3 4 5 6 7
Sequences 7 6 19 35 30 23 7 1

6. The set of 6 similar coin is tossed 640 times with the following results
No.of heads 0 1 2 3 4 5 6
Frequency 7 64 140 210 132 75 12
Calculate the binomial frequencies.
Poisson Distribution
1. If 3% of electric bulbs manufactured by a company are defective find the probability that
in a sample of 100 bulbs, exactly 5 bulbs are defective.

2. A manufacturer of pins knows that, 2% of the products are defective. If he sells pins in
boxes of 100 and guarantees that not more than 4 pins will be defective. What is the
probability that a box will fail to that guaranteed quantity?

3. In a poisson distribution P(X=1)=P(X=2). Find P(X≤1).


4. Fit a poisson distribution to the following data which gives the no. of gardens in a sample of
seeds:
No.of Gardens 0 1 2 3 4 5 6 7 8
Observed frequency 56 156 132 92 37 22 4 0 1

5. Fit a poisson distribution to the following data after correcting 50 pages of a proof of a
book. The proof reader finds that there are on the averages two errors per 5 pages i) how
many pages would one expect to find with 0,1,2,3,4 errors in one thousand pages of the 1 st
print of this book?
Uniform/Rectangular Distribution
1. If X is a uniform variate defined in the interval [1,3]. Find P(1≤X≤2). Find also the mean
and variance of X.

2. X is a uniform variate defined in the interval [1,6]. Find i) P(1≤X<6) (ii) P(X>2)
(iii) P(X≤3). Find mean and variance.

3. Let X be a uniform variate defined in the interval [-1,5].


Find i)P(0≤x<3) ii) P(X≥2) iii) P(X<3) iv) P(-1≤x<3) v) P(0≤X<4)

Normal Distribution

1. Find Probability of i) P(Z>2) ii) P(-2≤Z<3)


iii) P(-3<Z) iv) P(Z<1.56)

2. Find the following


i) Area of left of 1.35
ii) Area between -1.87 and -2.36
iii) Area to the right of -1.48
iv) Area to the right of 2.06
v) Area to the left of -1.08
vi) Area between 0.78 and 1.78
vii) Area between -1.5 to 1.5

3. Assume that the mean height of soldiers is 68.22 inches with a variance of 10.8 inches.
How many soldiers in a regiment of thousand would you expect to be over 6 feet tall?

4. In a test of 2000 electric bulbs it was found that the life of a particular bulb was
normally distributed with an average life of 2040 hours and standard deviation of 60
hours.
Estimate the number of bulbs likely to burn for
a) More than 2150 hours
b) Less than 1950 hours
c) More than 1920 hours but less than 2160 hours.

5. In a ND, 31% of items are under 45 and 8% are over 64. Find Mean and SD.

6. In a normal distribution 30% of the items are under 45 and 8% are over 70. Determine
mean and SD.

7. The wage of workers in a factory follows normal distribution. The wage distribution
is given below
Wage No.of.workers
Below 3500 240
3500-5000 650
5000 & above 320

8. 1000 candidates appeared for an examination 50 passed in I class and 350 passed in II
class and the rest were failed. If the minimum and maximum for second class is 50 &
60 assuming normal distribution. Determine Mean & SD.

9. X~N with mean 60. Given that 14.92 % of the values are greater than 75. Find its SD.

10. A normal variate has SD 8. If 87.7% of the values are less than 75. Find its mean?

********

You might also like