You are on page 1of 49

Simulation 1

Output Data Analysis I


Simulation 2
Outline
Motivation and Terminology
Recall: sampling distribution and mean
Confidence intervals for mean and quintiles
Estimators and Correlation
Terminating simulations
Non-terminating Simulations:Steady state
estimation
Transient Removal
Stopping Rules
Independent Replications, Batch means and
Regeneration methods

Simulation 3
Random In Random Out Principle
A simulation model maps the input random variables
into the output process of interest
Outputs stochastic character random processes
They may depend on the initial conditions
Sometimes are correlated
How to estimate your simulation performance?
What confidence do you have in your estimation?
How long you should run your simulation for a desired
precision?
Simulation 4
) ( ) ( d E n d D
i

n observations
from one run
Estimator for
true behavior
Statistically
typical
behavior
Average Estimates
Recall: Meaning of Measurements
The truly typical (in a statistical sense) behavior
of the model can be considered to be the average
over all possible behaviors the model can exhibit
Weighted by the probabilities of these behaviors


Simulation 5
Terminology (1/2)
The set of all possible observations for an experiment is
the population
The relative frequencies of data in this population is described
by the population distribution (all possible behaviors of the
system)
This distribution has parameters like the mean or the
standard deviation o
Performing an experiment n times results in n numbers
x
1
, x
n
, constituting a sample of size n
Simulation 6
Terminology (2/2)
This set of n numbers x
1
, x
n
also can be
described using characteristic values like mean
or variance
Such descriptions of a sample are called sample
statistics

Simulation 7
Simulated Experiments Principles
Samples are generated by running or replicating
the experiment.
The population size is infinite (there is no end to
the number of replications that can be made).
The goal of experimentation is to run enough
replications to get a sufficiently accurate estimate
the true population parameters while keeping
experimentation costs to a minimum.
To produce independent replications in a
simulation the starting seed value must be
different from run to run


Simulation 8
Outline
Motivation and Terminology
Recall: sampling distribution and mean
Confidence intervals for mean and quintiles
Estimators and Correlation
Terminating simulations
Non-terminating Simulations:Steady state
estimation
Transient Removal
Stopping Rules
Independent Replications, Batch means and
Regeneration methods

Simulation 9
The word sample comes from the same root as
word example
Similarly, one sample does not prove a theory, but
rather is an example
Basically, a definite statement cannot be made
about characteristics
Instead, make probabilistic statement about range
of most outcomes of the system
Confidence intervals
Sample versus Population (1/2)
Simulation 10
Sample versus Population (2/2)
Simulation 11
Sampling Distribution of the Mean
Let x
1
, , x
n
be a set of samples, taken from
independent, identically distributed random
variables with distribution mean and standard
deviation o
Define the sample mean as:

Sample mean is a RANDOM VARIABLE
Let us consider Sample mean distribution
Simulation 12
Sampling Distribution of the Mean
Mean of the sampling distribution of the mean (sample
mean)
Sample mean is an unbiased estimator for the mean of the
underlying distribution

Variance of the sample mean (also called standard
error) is estimator for distribution variance:
The formula shows that the larger the sample size, the
smaller the standard error of the mean.
Simulation 13
Sampling Distribution of the Mean
Distribution variance o is usually unknown
Distribution variance o can be estimated in turn
by the sample variance,
Simulation 14
Central Limit Theorem
Interpretation:
Each replication provides a sample for a metric which is
a mean of many individual waiting times
Looking at all these means and computing a histogram, it
will form a normal-like bell shape
Parameters can be estimated as described above
Rule of thumb: Normal approximation is usually
sufficiently good if about 30 or more experiments
were done

The mean of a sufficiently large number of
independent random variables has a distribution
that is approximately normal
Simulation 15
If samples X
1
, X
N
independent and from same
population (IID) with population mean and
standard deviation o, then sample mean
n
X
X
M
j
j
=
=
1
is approximately normally distributed with mean u
and standard deviation
n
o
Central Limit Theorem
Simulation 16
Outline
Motivation and Terminology
Recall: sampling distribution and mean
Confidence intervals for mean and quintiles
Estimators and Correlation
Terminating simulations
Non-terminating Simulations:Steady state
estimation
Transient Removal
Stopping Rules
Independent Replications, Batch means and
Regeneration methods

Simulation 17
Interval Estimation : Motivation
Sample mean is an estimator for distribution mean
Can we expect that the distribution mean and the
sample mean will be exactly the same? No!
At best: the sample mean is not far away from the
distribution mean
Yet, as the sample mean is also a random variable
itself, there will in general be a certain probability
that sample mean is arbitrarily far away from the
distribution mean
Can we at least hope to bound the probability that
sample and distribution mean are far apart? Yes!
Interpretation of probability: Repeating experiment,
independently computing many sample means, only
few of them will by far away from distribution mean

Simulation 18
Interval Estimation : Problem Statement
Based on metrics computed from a sample compute an
interval [a,b] that match the following:

Pr (parameter included in [a,b]) >= p = 1 o
for a given probability p

Parameters that are estimated: mean, variance, quantiles
etc

Simulation 19
Interval Estimation : Terminology
[a,b] - confidence interval
p or 1-o - confidence coefficient
100p% - confidence level, usually 90-95-99 %
o - significance level

Pr (parameter included in [a,b]) >= p = 1 o
for a given probability p
Simulation 20
Interval Estimation: Interpretation
When repeating the
experiment, evaluating
new sample metrics,
the statement that the
interval will contain
the true (unknown)
population mean will
be true in at least
100p% of experiments

P
D
F

Simulation 21
Confidence interval for mean with known variance
Suppose x
1
, , x
n
are samples from a normal
distribution with unknown mean but known
standard deviation o
From this sample, we want to determine an interval
such that the distribution mean is contained in this
interval with a confidence of, e.g., 95%
Hence, we are looking for values a and b such that

Simulation 22
Confidence interval for mean with known variance
The sample mean is itself normally distributed with
mean and variance o
2
/n
Hence the following is a standard normally
distributed variable (Gaussian with =0 and o=1) :

Simulation 23
Confidence interval for mean with known variance
2.5% of probability
in here
2.5% of probability
in here
95% of
probability
in here
Standard Normal Distribution
1.96 -1.96
Using 0.95 quantile of the standard normal
distribution 1.96
Simulation 24
Confidence interval for mean with unknown
variance
It is very rare for a researcher wishing to estimate
the mean of a population with known variance.
Therefore, the construction of a confidence
interval almost always involves the estimation of
both and .
Whenever the standard deviation is estimated, the
T-distribution rather than the standard normal
distribution should be used
o
o

o
o o
= + < <

1 ) (
2 / 1 , 1 2 / 1 , 1
n
t X
n
t X P
n n
t
n-1,1-o/2
- 1-o/2 Quantile of t distribution
Simulation 25
T-Distribution
Similar to the standard normal in that it is
unimodal, bell-shaped and symmetric.
The tail on the distribution are thicker than the
standard normal
The distribution is indexed by degrees of
freedom (df).
Simulation 26
T-Distribution
The degrees of freedom measure the amount of
information available in the data set that can be used
for estimating the population variance (df=n-1).
Area under the curve still equals 1.
Probabilities for the t-distribution with infinite df
equals those of the standard normal.
Simulation 27
Confidence interval for mean with unknown
variance
95 . 0 ) (
, 1 , 1
= + < <

n
t X
n
t X P
n n
o

o
o o
t
n-1,o
- a quantile of T-distribution with n-1
degrees of freedom (use tables or library
functions
Compare: 95% quantile of standard normal
distribution is 1.96, T-distribution 2.23 (n-1=10)
Uncertainty make the confidence interval wider
For n > 30 the difference between standard
normal and t-distributions is negligible
Simulation 28
X random vaiable with arbitrary distribution. Poplulation
quantile of order p is

Confidence Interval for Quantile
X
1
, ...,X
n
n samples from X
X (1) X (2) X (n) - order statistic, i.e. the set of values of
{X
1
, ...,X
n
} sorted in increasing order
Then confidence interval at level 100o % for q
p
is
[X (j),X (k) ]
where j and k satisfy B
n,p
(k 1) - B
n,p
(j 1) = o, B
n,p
is the CDF
of the binomial distribution.
p q F q X P
p p
= = s ) ( ) (
Simulation 29
Confidence Interval for Quantile: Proof
The true (unknown) quantile q
p
satisfies
P(X
i
< q
p
) = p.
Let Z
i
= 1 if X
i
< q
p
, 0 otherwise and N = E
all i
Z
i
,
i.e. N is the number of times that X
i
is below q
p

We have the event equalities
{ X
(j)
q
p
} = { N j }
{ X
(k)
q
p
} = { N k-1 }
Thus
Now Z
i
are iid Bernoulli(p) random variables thus N
is Binomial(n, p).
q
p
Simulation 30
Reducing Confidence Interval Width
The width of confidence interval can be reduced
by
increasing the number of observations n,
decreasing the value of S(n).
The reduction obtained by decreasing S(n) to half
of its value is the same as the one obtained by
producing four times as much as observations.
Hence, variance reduction techniques are very
important.

Simulation 31
Outline
Motivation and Terminology
Recall: sampling distribution and mean
Confidence intervals for mean and quintiles
Estimators and Correlation
Terminating simulations
Non-terminating Simulations:Steady state
estimation
Transient Removal
Stopping Rules
Independent Replications, Batch means and
Regeneration methods

Simulation 32
Estimators and Autocorrelation
Classic estimators require IID observations
However, many output processes are
autocorrelated and non-stationary
In presence of correlation, variance estimator is
biased
Positive correlation variance is underestimated
E.g. delay times are usually positively correlated
(well see an example later)
Simulation 33
Waiting time distribution in a queuing system:
Problem
Supermarket check-out queue: Revisit
Simple-minded approach: let D
i
indicate the
delay in queue of customer i
Let us try to estimate the average
customer waiting time distribution
Is this correct? No!

Simulation 34
Waiting time distribution in a queuing system:
Problem
Remember:
Waiting time of customer 1 is always zero
Waiting time of customer 2 depends on its
arrival time and customer 1s departure time
Hence, D
i
s are not identically distributed,
they are not independent either!
Using them to compute the distribution is like
comparing apples and oranges

Simulation 35
Correlation in M/M/1 system
This table shows the
time in system for an
M/M/1 system ; =
0.5
Note that if the
system is relatively
empty then both job A
and job A+1 are likely
to have small times in
system
If the system is
relatively full, then
job A and job A+1 are
likely to have large
times in system
M/M/1 Example
lambda: 0.5
mu: 1
Interarrival Arrives Service Service Service Time in
Job Time At Start Time End System
1 0.71 0.71 0.71 0.64 1.35 0.64
2 0.01 0.72 1.35 0.20 1.55 0.83
3 1.12 1.84 1.84 0.70 2.54 0.70
4 0.71 2.55 2.55 0.11 2.66 0.11
5 2.46 5.02 5.02 0.30 5.32 0.30
6 0.26 5.28 5.32 1.19 6.51 1.23
7 0.59 5.87 6.51 0.63 7.14 1.27
8 1.73 7.60 7.60 0.72 8.31 0.72
9 0.28 7.88 8.31 0.58 8.89 1.02
10 0.31 8.19 8.89 1.59 10.48 2.30
11 0.06 8.25 10.48 0.75 11.24 2.99
12 0.41 8.65 11.24 0.13 11.37 2.72
13 0.15 8.81 11.37 0.07 11.45 2.64
14 0.49 9.30 11.45 1.15 12.60 3.30
15 0.41 9.71 12.60 1.25 13.85 4.14
Simulation 36
Correlation in M/M/1 system
Measure the correlation between
successive observations. (i.e t1 and
t2 form a pair of observations, t2 and
t3 form another pair and so on
Notice how very positively
correlated the successive
observations are. (Remember that a
correlation of 1 would be perfect
correlation
M/M/1 Example
0.64 0.83
0.83 0.70
0.70 0.11
0.11 0.30
0.30 1.23
1.23 1.27
1.27 0.72
0.72 1.02
1.02 2.30
2.30 2.99
2.99 2.72
2.72 2.64
2.64 3.30
3.30 4.14
Cov: 1.122054
Correl: 0.893807
Simulation 37
Correlation in M/M/1 system
The observations are not IID. As a result,
measuring the mean time in system or the variance
of time in system and developing a confidence
interval for the mean is useless.
Solution:
Assume that our process is covariance-stationary (i.e.
the covariance between samples doesnt change over
time)
Observe that if we look not at successive observations,
but rather observations that are separated by some
amount (called a lag), we find that the covariance is
reduced.
Recall: Output Analysis and Variance
Estimation
Simulation output analysis
Point estimator and confidence interval
Variance estimation confidence interval

Independent and identically distributed (IID)
Suppose X
1
,X
m
are iid






38
Stochastic Stationary Process
Stationary time series with positive
autocorrelation



Stationary time series with negative
autocorrelation



Nonstationary time series with an upward
trend
The stochastic process X is stationary for t
1
,,t
k
, t T, if
1 1
" " ( ,..., ) ( ,..., )
d
k k
d
t t t t t t
where denotes equality in distribution X X X X
+ +
= =
39
Stochastic Stationary Process (2)
The expected value of the variance estimator is:






If Xi are independent, then is an unbiased estimator of

If the autocorrelation is positive, then is biased low as an estimator of

If the autocorrelation is negative, then is biased high as an estimator of




2
2
( )
( )
( )
( )
n
n
n
n
S X
E Var X when negatively correlated
n
S X
E Var X when positively correlated
n
(
>
(

(
<
(

( )
n
Var X
( )
n
Var X
( )
n
Var X
2
( )
n
S X
n
2
( )
n
S X
n
2
( )
n
S X
n
40
Simulation 41
Correlation in M/M/1 system :Lags
Lag of 1
(t
1
,t
2
), (t
2
,t
3
),
M/M/1 Example
0.64 0.83
0.83 0.70
0.70 0.11
0.11 0.30
0.30 1.23
1.23 1.27
1.27 0.72
0.72 1.02
1.02 2.30
2.30 2.99
2.99 2.72
2.72 2.64
2.64 3.30
3.30 4.14
Cov: 1.122054
Correl: 0.893807
Lag of 2
(t
1
,t
3
), (t
2
,t
4
),
M/M/1 Example
0.64 0.70
0.83 0.11
0.70 0.30
0.11 1.23
0.30 1.27
1.23 0.72
1.27 1.02
0.72 2.30
1.02 2.99
2.30 2.72
2.99 2.64
Cov: 0.424541
Correl: 0.532773
Lag of 3
(t
1
,t
4
), (t
2
,t
5
),
M/M/1 Example
0.64 0.11
0.83 0.30
0.70 1.23
0.11 1.27
0.30 0.72
1.23 1.02
1.27 2.30
0.72 2.99
1.02 2.72
Cov: 0.114085
Correl: 0.320257
Simulation 42
Correlation and Lag
Correlation and Lag
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10
Lag
C
o
r
r
e
l
a
t
i
o
n
TI = .9
TI = .5
Observations become less correlated as lag
increases.
The busier the system is (i.e. the closer the traffic
intensity is to 1), the greater the correlation



Simulation 43
Possible Solution- Multiple runs
Try to obtain estimates for the distributions of
each D
i
However, one simulation run only gives a single
sample for each D
i
Not enough to estimate a distribution from it
Solution: Use multiple runs, made independently
from each other
Let D
j,i
indicate waiting time of customer i in run j
Each run uses the same initial conditions and
parameters, yet different seeds for random number
generation

Simulation 44
Independence across runs
For a fixed i, D
j,i
are independent and identically
distributed random variables
Samples for different runs for waiting times can be used
to estimate D
i
= D
j,i
(for all j)
This property is called independence across runs
Let us talk about general rvs Y
i
and Y
j,i
, respectively
For m different observations per run, n runs:

How to determine m, the number of observations
per run?
1,1 1, 1,
2,1 2, 2,
,1 ,
... ...
... ...
... ... ... ...
... ... ...
i n
i n
R R n
Y Y Y
Y Y Y
Y Y
Simulation 45
Outline
Motivation and Terminology
Recall: sampling distribution and mean
Confidence intervals for mean and quintiles
Estimators and Correlation
Terminating simulations
Non-terminating Simulations:Steady state
estimation
Transient Removal
Stopping Rules
Independent Replications, Batch means and
Regeneration methods

Simulation 46
Two main types of simulation
Terminating (finite horizon) simulation:
Specific end of the simulation, a terminating event
Non-terminating simulation:
No natural event to specify the length of a run
Simulation could run forever terminating depends on
accuracy requirements

Simulation 47
Example of Terminating Systems
Terminating time known in advance
Bank one day operation simulation
Opens 8.30 am, closes 4.30 pm
Initial conditions: 0 customers, 8 out of 11 tellers working
Simulation time 480 minutes
Terminating time not known in advance
Communication system with A, B, C, D components
Simulation stops when system fails {A fails or D fails,
or (B and C fails)}
One possible performance measure: the mean time to
system failure: E(T
E
)
A
B
C
D
Simulation 48
Output Analysis for Terminating Simulations
Simulation runs over a time interval [0, T
E
]
Output observations: {Y
1
.Y
2
,Y
n
}
We want to estimate


Repeat simulation R times, with random initial conditions
independent streams from run to run
Y
ri
= the i-th observation within replication r
Y
ri
, Y
rj
may be correlated but Y
si
, Y
ri
are independent for all i and s
The sample mean can be defined as:


R r Y
n
r
n
i
ri
r
r
,..., 1 ,
1
1
= =

|
.
|

\
|
=

=
n
i
i
Y
n
E
1
1

Method of independent replications


Simulation 49
Terminating simulations: confidence interval
The variance estimator is calculated as



A 100(1-o)% confidence interval is calculated as


( )
( )
( )

=
R
r
r
R R
1
2
2
1
1
o
( ) ( ) o o
o o 1 , 2 / 1 , 2 /
+ s s
R R
t t

You might also like