You are on page 1of 50

Simulation in Excel:

Tricks, Trials & Trends


May 2012 (working draft)

Dennis Sweitzer, Ph.D.!


www.Dennis-Sweitzer.com
Presented to the
Delaware Chapter of the ASA
American College of Radiology
AstraZeneca Biostatistics Department

Abstract
Simulation in Excel: Tricks, Trials & Trends
Excel is a general purpose spreadsheet which is widely used & understood, but rarely used by itself for
simulations. However, the Data Table function in MS Excel can be used to execute substantial
simulations, without requiring cumbersome programming "tricks" or VBA coding. The result is an
arbitrarily large results table in which each row is one iteration of the simulation, and each column is a
random variable generated in the simulation.
A small number of additional probability functions are easily programmed using VBA to make Excel a
general purpose simulation package. Because VBA is interpreted, use of VBA functions can greatly limit
the speed of a simulation. However, for simulations of small size and complexity, the ease and familiarity
of working in Excel, outweigh the disadvantages of speed. Examples from clinical trials will be used.
Finally, I discuss new methods to move simulations out of the black boxes and into the enterprise, based
on work by Sam Savage. Simulation results (a SIP, or Stochastic Information Packet) from multiple
platforms can be stored as XML strings(using the DIST standard) in a SLURP (Stochastic Library Unit
with Relationships Preserved), and from there used for reports, planning, etc, or incorporated into other
simulations.

Outline
How to do Simulation in Excel
Simulation Sample Spreadsheet
Some Macros and VBA Clinical Trial Examples
functions
Notes on using Inverse Probability Functions
Validation, Verification, Sensitivity
Probability Management
in SIPS, SLURPS, & DIST

Background
Occasional need for simulations
Excel is convenient, but
does not explicitly support simulations
Simulation usually requires VBA programming
(so why not use R or SAS instead)

Or Add-in commercial programs (eg., @Risk)


Or some academic add-ins

Does have iterative calculations, Solver


Why not simulation?

Simulate what?
Stochastic Models
Unknown parameters? Guestimate a distribution
Optimizing choices? Test each with simulations

Sensitivity Analysis
Variations in Inputs
Variations in Outputs
2 parameters: use a table
>2 parameters: simulate & compare variation

Excel: Pros
Common Language / Common Tools
Most people understand Excel
MEGO
Many tools available in Excel
Transparency: Modeling assumptions can be:
Specified -- Graphed -- Debated
What you see is what you get!
More hands on deck, more eyes on the prize.:
Statistician
Team Member
Initial Model
Explores & breaks model
Repair & enhance
Repeat until satisfied

Excel Cons
Slower than in SAS, S+, R, etc
Lacks some statistical/probability functions
Latest versions are a little better
Still need to add some VBA code
Known bugs in statistical routines (often fixed)
Tradeoffs:
Quicker modifications
vs slower execution

Simple Solution: Data Tables


Excel Data Tables
Creates a table of values of a function

Each column is a Random Variable

Leftmost column is used as an argument


(unneeded for simulation)

Data Table repeats calculations for each row

Each row is a simulation iteration

1. Create Simulation

Create Random Variables using Inverse Probability Method:


For Random Variable X with distribution function F(x),
F(x): [0,1]
If Random Uniform U [0,1]
X = F-1(U)
(Excel: U=Rand() )

2. Align Random Variables


Calculations can be
anywhere in
Spreadsheet
Reference the
Variables in a row
Is best to label
variables in same way

3. Select Data Table


Select table region
1st row is Rand Vars
1st column is not used
(can label iterations)

From toolbar:
Data>Data Table

4. Create Simulation Table


Column input cell =
Upper left hand corner
of table
Row input cell = ignore
OK Populates the
table
(may have to manually
recalculate)

5. Execute Simulation
Iterative development
Simulation can be changed
Add reporting variables
Recalculate to rerun
(no need to use Data Table
again, unless expanding)
Hint: debug with short table,
expand for final run

The End
(of the key concepts)

Spreadsheet limitations
Only simple data structures are available
Rows & columns, no lists & trees
Discrete event simulations

Complex algorithms: difficult


Eg, While or for loops
Can improvise (cumbersome, slow, buggy)

Speed: slow
Data Storage: what-you-see-is-all-you-get

Tools: Excel Simulation Template


Adds some missing random functions
Adds some set-up macros

Excel template & examples at:


www.Dennis-Sweitzer.com

Macro SimulateSampler
To start a new simulation when you don't
remember the names & parameters of
common random variables used in simulation:
Run the Macro SimulationSample
Copy, delete, and edit as needed.
Make sure all random values are referenced
in the first row of the data table at the
bottom.

Macro SimulationSampler
Creates a simulation with
each of common
simulation functions

Macro SimulationSampler

Sets up header
row for data
table
Sets up a place
for statistics

Macro Simulate
Highlight the row of random variables
(1st row of simulation table)

Run macro "Simulate


Prompts for which will ask for the number of
simulation iterations,
The default number of iterations is 100
Debug & develop (manually recalculate)
Final run with >1000 iterations
Visual Basic code is computationally intensive,

Macro Simulate

Excel Random Variables

Rand() --Random Uniform [0,1]


NormSInv() Inverse Standard Normal Distribution
CriticalBinomial() Inverse Binomial Distribution
LogNormInv() - Inverse Log Normal Distribution
Caveat: parameters are mean, SD after the Log transformation

Erlang Distribution

How long do you wait until you get a


predetermined number of arrivals?
Interarrival times are distributed IID
exponential
Erlang is Gamma with integer parameter

Beta Distribution

Can use as
Distribution of a Binomial probability
Range = [0,1]
Generic bounded hump (vs Normal as generic unbounded hump)
Better behaved than a triangular distribution

Example#2, Problem

Client: Heres our plan.


Simple spreadsheet calculation
But only the expected value,
but not variability

Example #2, Simulation


Time to 100th
patient
Patients arrive
IID Exponential
Summary Statistics of Simulated values
(below)
Interpretation: under the assumptions,
90% of simulations required more than 4.4
months

Added VBA Functions


Inverse Functions Needed for Simulation
Poisson, Negative Binomial
Interpolation from Table
Interpolate: 1 or 2 dimensional interpolation
Convenience
Beta with Mean, SD as parameters
Beta with Hi, Low, and Mode used for
parameters
Log Normal with mean, SD as parameters

Missing Statistical Functions


Inverse Distributions
InvPoisson :: Poisson
InvPascal :: Negative Binomial
(how many failures before k successes)
Negative Binomial is continuous valued distribution;
Discrete version is often denoted Pascal distribution

Example#3,
Patients to Screen
Expected Enrollment rate
= 75% 5%

~ Beta Distribution
# Screen Failures
~ Negative Binomial (Pascal)
Depends on Enrollment
Rate

Beta Distribution (2)

For
Convenience
Beta distribution given Mean, SD
Beta distribution given Mean, SD, upper, lower bounds
Beta distribution given Mode, Upper, Lower bounds

Simulation from a Table

Find the value in the 1st vector;


Return interpolated value from 2nd
Simulate arbitrary distribution:
Top Row: values in [0,1]
Bottom Row: Quantiles
Result: interpolated value of U from table
Or a function: y=f(x)
X is found in top row, y is interpolated from bottom row

Table Simulation Uses


Polygonal distributions (like Triangular)
Survival curve (for time to event)
Est. K-M curve from data, simulate rest of trial

Arbitrary empirical distributions


Distribution from observations
Table of power calculations
eg, assurance calculations:
If # patients is random, so is effective power of the study
If True effect size is random, so is Pr{success}

Simulation from a 2-dimensional table

Here:
Rows are quartiles of a random function
Left column is value of a parameter
A family of distributions which vary with the parameter

Parameter y=75% (can be random)


Generate random numbers from the interpolated distribution.

Example #4: Interim Review


After 2 months, review randomization rates
Continue to Randomize to 100 patients
How long?

Example#4: Interim Review (Simulation)


Y= # Patients at 2 mos
~ Poisson
Time to Randomize
(100-Y) additional pts
~ Erlang (Gamma)
80% CI:; (2.5, 3.7)
months

Clinical Trials Applications


Simulations for planning
Prototyping larger simulation
Checking assumptions/validation

Planning
Expected Trial Performance
Usually not of interest -- already done w/o simulation
But should be
Variability of Trial Performance
Important for Risk Management: Whats the earliest,
the latest, the most, the least, etc
80% CIs
Structural Problems
Interactions of parameters may doom the trial before it
even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )

The Flaw of Averages!

Prototyping
Prototyping:
Toy simulation with hands-on teamwork
Development model
Get team buy-in on assumptions
Processing speed not important
Rapid modifications are important
Ideal?
Develop a prototype in an 1 hour meeting
Check for errors later
Run large simulations later for precise estimates

Checking planning assumptions


H0 = Simulation assumptions
Observed: a value X
{xi} = corresponding values in simulation
Rank of X in {xi} p-value
Stored Values: Use Function Percent Rank
Descriptive Statistics: Use Frequency Count
Use to:
Test assumptions, validate model, +??
If an observed value of X is rare in the simulation,
question assumptions!

Checking Assumptions
Example:
A trial is designed based on a non-trivial simulation.
The model predicts a completion rate of 65%
with 95% C.I.= (55%, 75%)
4 months into the trial, a 50% completion rate is observed.
How significant is this discrepancy?
Resimulate:
{xi} = simulated completion rates (1/iteration)
%tile Rank of observed 50% in simulated {xi} p-value
How likely is the observation, under the modeled
assumptions?

Checking Assumptions in Literature


Idea: Are assumptions consistent with
observations from other sources?
Build the simulation to also simulate studies from
literature
Common
Assumptions
& Code

Code &
Assumptions for
Study

Compare vs.
Internal
Assumptions

More Code &


Assumptions to
match literature

Compare vs.
literature

Sensitivity Analysis
What-ifs
Interactions between parameters
Identify Key Control points!
Vary parameters between simulations
Compare simulation results
Eg, average, worst-case scenarios

Correlations between simulated parameters


and outcomes

Macro Management
VBA Editor:
Alt-F11 (or find the menu)
Copy Module between sheets
Copy code from .xls sheet &
insert into VBA editor
Open & save as new sheet

Macro Management (newer)


In Visual Basic
From the
Tool Bar
File > Export File
Export VBA code
(module: SweitzerSimulationCoreCode)

File > Import File


Imports VBA code (into a module)

Further resources
Commercial and Free software packages
Provide:
More rigorous algorithms
More functions
Resampling, multivariate, etc

More support

Commercial Add-Ins
@RISK
www.palisade.com
Crystal Ball
www.decisioneering.com

Free Add-Ins
PopTools
(Windows only)
www.cse.csiro.au/poptools
SimTools.xla (Macintosh & Windows)
http://home.uchicago.edu/~rmyerson/addins.htm
Caveat: Licensing
Free for non-commercial (eg, education)
Not clear for other uses
(NB: vba code from my website is free for all use,
but not as useful)

Semi-Commercial
Low-cost Excel simulation add-in:
RiskSim by Michael Middleton
www.treeplan.com/
Also: Decision Trees, Sensitivity Analysis,
on-line text-book:
http://www.treeplan.com/chapters.htm

Additional Reading
INTRODUCTION TO MODELING AND GENERATING
PROBABILISTIC INPUT PROCESSES FOR SIMULATION

www.informs-sim.org/wsc07papers/008.pdf
Spreadsheet Simulation (Seila, 2006)
www.informs-sim.org/wsc06papers/002.pdf
Work Smarter, Not Harder: Guidelines for
Designing Simulation Experiments
www.informs-sim.org/wsc06papers/005.pdf
Tips for the Successful Practice of Simulation
www.informs-sim.org/wsc06papers/007.pdf

The End
(Actual not simulated)

You might also like