You are on page 1of 7

-

3 4 5In this lab, we will: 6 Learn how to generate uniformly distributed random numbers. 7 Learn how to generate some discrete random variables. 8 Illustrate the law of large numbers. 9 Introduce the Monte Carlo method of studying probability distributions. : Apply the Monte Carlo method to the binomial probability distribution. -.We will also introduce the following tata programming techni!ues and s"ills: - #unctions. -3 Conditional e$pressions. -4 %he assignment operator, &'(. -5 calar variables. -6 )sing &do( files. -7%he techni!ues we will learn in this lab are very useful for illustrating many concepts of -8probability and statistics. In addition, we illustrate the basic concept and practice of the -9so*called Monte Carlo method of analysis or e$perimentation. -:Generating uniformly distributed random numbers. 3.A continuously distributed random variable that is e!ually li"ely to ta"e any value 3-between +ero and one has a standard uniform probability distribution. uch a variable 33can be created in tata with the uniform() function. ,enerate -.,... draws from a 34standard uniform distribution and inspect the results in the browser.
35set obs 10000 36gen u=uniform() 37browse

Lab 2 Using Stata to Do Monte Carlo Experiments

38%he last command can be e$ecuted by clic"ing on the browser icon. 39 3: tata/s uniform random number generator returns a number between +ero and one, 4.e$clusive of one itself. 4-Functions. 43uniform is the name of a function in tata. #unction names in tata must always be 44followed by an open parenthesis with no intervening spaces. Why no intervening spaces0 451ecause otherwise tata will thin" the name is the name of a variable, and not a function. 46%he pair of parentheses surrounds the argument or arguments of the function. In this case 47the uniform function has no arguments, but the parentheses are needed anyway. If a 48function has more than one argument, then the arguments are separated by commas. 49Grap t e uniformly distributed random !ariable. 4:,raph the random variable using the menu2dialogue window:

5.,raphics ; <istogram 5Main: =ariable: u 53or by issuing the following command:


54histogram u

55#or continuous variables, tata has some internal rule for deciding how many bars >or 56bins? to use in constructing the graph. @otice that the density of the random variable is 57essentially constant throughout its range, which is why the distribution has the name 58&uniform(. 59Generating a discrete random !ariable" Example simulating t e rolls 5: of a die. 6.%he uniform random number generator is a building bloc" for creating virtually any 6-random variable. We will illustrate this by using it to simulate rolling a die. %he 63following commands simulate twenty rolls of a die. #ist, however, the number of 64observations in tata must be set to twenty, and in order to do this the memory must be 65&cleared(.
66clear 67set obs 10 68gen x=int(6*uniform())+1 69browse

6:<ere is what the generate command does. #irst, uniform() returns a uniformly 7.distributed random number between . and -, not including one itself. &6*uniform()( 7-therefore returns a random number between . and 7, not including 7 itself. %his becomes 73the argument to the int() function, which truncates the fractional part from the number, 74returning an integer between . and 6 inclusive A &int( is short for integer. #or e$ample, 75int(5. ) becomes 6. #inally, - is added to this, giving integers between - and 7 76inclusive. ince uniform() results in numbers uniformly distributed between +ero and 77one, the si$ final integers assigned to the variable x also have e!ual probability. 78Grap t e uniformly distributed random !ariable. 79,raph the random variable using the menu2dialogue window: 7:,raphics ; <istogram 8. Main: select Biscrete data, #ractionC =ariable: $ 8D A$is: Eeference lines: .-77778 83 F A$is: MaGor tic"2label properties: uggest H of tic"s: 7 84 %itle: %itle: n'-. 85 Iverall: @ame of graph: g-C chec": Eeplace 86or by issuing the following command:
87histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=10) 88name(g1! re&lace)

89%he hori+ontal grid line is drawn at -27th, the probability of any particular side of the die 8:facing up.

9.# e la$ of large numbers and t e fre%uentist notion of probability 9-%he limit in the fre!uentist notion of probability is the law of large numbers, that is, as 93the sample si+e A or number of trials A increases towards infinity, the sample proportion 94favorable to an event approaches A &settles down to( A the probability of the event. We 95will illustrate this by increasing the number of rolls of the die, and noticing that the 96sample distribution of outcomes settles down to the theoretical discrete uniform 97distribution of -27th probability for each side of the die. 98 99In order to do this, repeat the commands above, each time changing the number of 9:observations to be 3., then -.., then 6,.... Also, change the name for each graph as :.indicated below. %he easiest way to do this is to single*clic" on each command in the :-Eeview window, and then edit it in the Command window as necessary.
:3clear :4set obs 50 :5gen x=int(6*uniform())+1 :6histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=50) :7name(g2! re&lace) :8 :9clear ::set obs 200 -..gen x=int(6*uniform())+1 -.-histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=200) -.3name(g3! re&lace) -.4 -.5clear -.6set obs 10000 -.7gen x=int(6*uniform())+1 -.8histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=10000) -.9name(g4! re&lace)

-.: --.#inally, view all the graphs together with the following command:
---gra&h combine g1 g' g( g)! title(*aw of *arge +umbers)

--3Monte Carlo Estimation of a &robability Distribution --4Iver the last few decades, the use of the computer to study the probability distribution of --5a random variable has become commonplace. %he techni!ue is powerful when the theory --6and2or mathematics of the random process are2is too difficult to understand or to derive. --7If one simply "nows how the data of a random process are generated A the data --8generation process or B,J A then one can use a computer to create a large sample drawn --9from the un"nown distribution. %he law of large numbers can then applied to estimate --:virtually any aspect of the distribution. We will illustrate this by estimating the binomial -3.probability distribution when n = -. and = .3 . We "now what the actual probability -3-distribution is, including its mathematical representation, but pretend that we "now -33nothing more than the assumptions of how the binomial process generates data: that n -34independent trials each have a -26th probability of success. -35

-36In order to provide a concrete conte$t for this illustration, let/s assume that you want to -37"now the probability distribution of the number of patients &cured( in a drug trial of -. -38treated patients, where the probability of any one patient being &cured( by the drug is 3. -39percent. Dou might be interested in "nowing such things as how many patients would -3:you e$pect to be cured in this drug trial0 What is the most li"ely number of patients to be -4.cured0 What/s the chance than none, one, or any given number of patients in a trial are -4-cured0 -43 -44Bon/t get the word &trial( in the phrase &drug trial( mi$ed up with the word &trial( in the -45phrase &the number of trials in a binomial e$periment(. <ere each drug trial consists of -46ten binomial trials. -47 -48Kach drug trial of -. patients constitutes a single draw from this binomial distribution. In -49order to use the law of large numbers we will use -.,... draws, representing -.,... -4:independent drug trials each with -. patients. 1egin with the following commands:
-5.clear -5-scalar &=.' -53set obs 10000 -54gen x1=uniform(),& -55browse

-56Scalar !ariables' conditional expressions' t e assignment operator. -57%he second statement introduces a new type of variable in tata, called a &scalar( -58variable. %he variables we have been using are really vector variables, because they -59consist of a whole column of values. #or e$ample, x1, which you can see in the browser, -5:is a vector variable. A scalar variable, on the other hand, contains only one single value, -6.in this case, the value .3. @otice that the names of scalar variables do not appear in the -6-=ariables windowC nor do their values appear in the 1rowser. %o see what scalar -63variables are in tata/s memory, and what values they hold, use the following command:
-64scalar "ir

-65If you "now the name of a scalar variable, you can also inspect its contents with the -66display command:
-67"is&la# &

-68Dou can use a scalar variable anywhere in a generate command that you would normally -69type a number. #or e$ample, the generate command you typed above is e!uivalent to -6:having typed &gen x1=uniform(),.'(. Dou will see the power in using scalar variables -7.at the end of this lab. <ere, of course, & represents , the probability that a patient -7-selected at random will be cured. -73 -74What is the generate command doing0 %he part &uniform(),&( is a conditional -75e$pression. Conditional e$pressions evaluate to the number - >one? if true and . >+ero? if -76false. In this case, for each of -.,... observations, tata compares the random draw -77from the standard uniform distribution to & >the number .3?. If the random draw is less -78than &, the conditional e$pression evaluates to -C else if the random draw is greater than

-79or e!ual to &, the conditional e$pression evaluates to .. %he result, . or -, is then -7:assigned to the variable x1 for that observation. -8. -8-%his assignment is indicated by the assignment operator, &=( in the statement -83&gen x1=uniform(),&(. @ote that this &=( symbol has a different meaning in computer -84programming than in algebra. In algebra, it asserts that both sides of the e!uation have -85the same value >both sides are e!ual?. In computer programming, it means to ta"e the -86value of what is on its right, and give it to A assign it A to what is on its left. #or e$ample, -87in computer programming, the statement &x=x+1( means to increment the variable x by 1, -88but in algebra this is a nonsense false statement. -89 -8:When x1 ta"es the value ., that represents a patient who is not curedC when x1 ta"es the -9.value -, that represents a patient who is cured. %he interpretation of the variable x1 is -9-that its -.,... observations represent the outcomes for the first patient in each of the -93-.,... drug trials. %hese of course, are different people. -94Using (do) files in Stata -95)sing &do( files is sometimes a convenient way to do wor" in tata. %he following will -96illustrate a typical use of do files. -97 -98 >-? In the Eeview window clic" on the header Lrc. %his will separate the -99 commands with errors from the rest of the commands. -9: >3? elect the four lines >commands? beginning with &clear( and ending with -:. &gen x1=uniform(),&(. -:>4? Eight clic" on them and choose & end to Bo*file Kditor(. -:3 >5? %his is a simple te$t editor. Dou will now create the generate commands for -:4 the remaining : patients in each trial. elect the generate command and copy -:5 it nine times, so there are -. identical generate commands. -:6 >6? We will call the other patients x', x(, etc. Kdit the generate commands -:7 appropriately. Inly the digit immediately following x in each command has -:8 to be changed. Dou should now have -4 lines >commands? in the &do( file, -:9 with the last -. being the generate commands. -:: >7? As the last command in the do file, type the following command, which 3.. creates a new variable s that is the sum of x1 through x10:
3.-

gen s=x1+x'+x(+x)+x5+x6+x$+x-+x +x10

3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.:

>8? >9? >:?

Clic" the & ave( icon on the toolbar of the do*file editor. A & ave #ile( type window will open. Joint it to your flash drive and type the name &monte( in the file name bo$. %hen clic" the & ave( button. Clic" the &Bo( icon in the Bata Kditor/s tool bar. tata will attempt to e$ecute each command in the do file, as if you had typed each in the command window. If there is an error >red type in the Eesults window?, restore the & tata Bo*file Kditor( window, fi$ the command>s? that caused the error, and redo steps >8?*>9?.

3-.*nspect and grap t e Monte Carlo estimate of t e binomial 3-probability distribution 3-3Let/s loo" at the Monte Carlo estimate of the probability distribution. ,o into tata/s 3-4browser and loo" at the first row. %his represents the first drug trial of -. patents. Which 3-5patients had successful outcomes0 Which did not0 What does the variable &s( 3-6represent0 It represents the number of patients in this trial with successful outcomes, i.e., 3-7the number of successes in -. binomial trials. =erify this interpretation with the ne$t 3-8drug trial or two. %he variable &s( is therefore the variable of interest. It is a random 3-9variable with a binomial distribution, i.e., S M b( n = -., = .3 ) . Let/s graph its estimated 3-:probability distribution using the menu system: 33.,raphics ; <istogram 33Main: clic" the E >Eeset? buttonC select Biscrete data, #ractionC =ariable: s 334or type the command:
335histogram s! "iscrete fraction

336<ow does the Monte Carlo estimate of the mean and standard deviation compare with the 337true values0 What are the true values0 %he true mean is E ( S ) = n = -. .3 = 3 , and 338 STD ( S ) = n(- ) = -. .3 .9 =-.375:-- . )se the summari+e command to get 339the Monte Carlo estimates:
33:sum s

34.Dou can get a complete listing of the probability distribution with the tabulate command:
34-tab s

343Compare this with the actual probabilities given in Wonnacott N Wonnacott, p. 77:: s f>s? >O? . -..8 37.9 3 4..3 4 3..5 9.9 6 3.7 7 .7 8 .9 .. : .. .. . 344 345%he binomialtail() function in tata >see <elp functions? can be used to calculate 346probabilities in the upper tail of a binomial distribution, and so is li"e having %able III>c? 347of the Wonnacott N Wonnacott te$t, e$cept better. %his function ta"es three arguments: 348the first is n, the number of trialsC the second is s . , the lower bound of the upper tailC

349and the third is , the probability of success in each trial. In short, 34: Jr ( S s . ) = binomialtail ( n, s . , ) . Chec" the Monte Carlo estimate of the probability of 35.5 or more patient cures in a drug trial of -. patients against the actual value of this 35-probability given by:
353"is&la# binomialtail(10!)!.')

354# e &o$er of (do) files and scalar !ariables 355%o illustrate the power of using &do( files and scalar variables, let/s repeat this analysis 356for = .5 instead. imply restore the & tata Bo*file Kditor( window and edit the scalar 357command appropriately, then clic" the &Bo( tool again. Eecall the appropriate inspection 358commands from the Eeview window to view the results. 359E+D

You might also like