You are on page 1of 9

CLASS NOTES on SAMPLING DISTRIBUTION and Central Limit Theorem (CLT)

Why Sample the Population? Why not study the whole population?

The physical impossibility of checking all items in the population. The cost of studying all the items in a population. The sample results are usually adequate. Contacting the whole population would often be time-consuming. The destructive nature of certain tests (e.g., study of light bulb life .
Statisticians advocate Probability Sampling (not judgment sampling)

! probability sample is a sample selected in such a way that each item or person in the population being studied has a known likelihood of being included in the sample.
If e !"e #!d$ment "am%lin$ e ill ha&e no idea a'o!t the a((!ra() of o!r e"timate" "in(e e ha&e no idea a'o!t the *!alit) of #!d$ment"+ Pro'a'ilit) "am%lin$ ena'le" !" to (on"tr!(t probabilistic error bounds+ (to 'e "t!died in a "e(ond (o!r"e in Stati"ti(")+ The aim of "am%lin$ i" to $et a "am%le, hi(h i" representative of the %o%!lation+

Methods o Probability Sampling "imple #andom "ample ("#" $ ! sample formulated so that each item or person and each subset in the population has the same chance of being included. (e.g., from % items, prob. that any one is selected&'(%. ! simple way to implement this is to use a lottery or computer program. )or e*ample we can mark % cards and write names of items on these cards, shuffle the cards and select n cards. This will yield a simple random sample of si+e n. "ystematic #andom "ampling ("ys#" $ The items or individuals of the population are arranged in some order. ! random starting point is selected (by lottery and then every kth member of the population is selected.

If there are N-./// "tore" alon$ 0ifth a&en!e and e ant to "ele(t n-.// "tore" in the "am%le, 1-N2n or ./ 3e "h!ffle onl) the fir"t 1, and "ele(t one, "a) 45 No on e ")"temati(all) "ele(t "tore" ') addin$ 1, 61, 71, 51 et( to 5 So a ")"temati( "am%le ill ha&e "tore 45, .5, 65, 75, 55, 85 et(+

"tratified #andom "ampling ("tr#" $ ! population is first divided into subgroups, called strata, and a sample is selected from each stratum. (e.g., ,-. males, /-. females
If a "am%le of ./ i" "ele(ted, (n-./) 9/: of n -9, "o "ele(t 9 male" and 7 female"+ In $eneral, N-%o%!lation "i;e, N.-"trat!m .(female), N6- "trat!m 6 (male"), n-"am%le "i;e de"ired+ Sam%le "ho!ld ha&e - (N.2N)<n from "trat!m . and "o on+ Th!" 0emale" in "am%le - (N.2N)<n, Male" in the "am%le-(N62N)<n
Population has 25 students of whom 15 are white and 10 black. A stratified sample of size 10 should have how many whites / blacks Answer! "et #$population size% #1$blacks$10% #2$whites $15% n$sample size$10. #ote that #1 /# $&10/25'(10 or ) blacks and *ow many whites in the sample &#2/#'(n$ &15/25'(10 or + ,erify that +-)$10. .e have a representative sample

Cluster "ampling$ ! population is first divided into clusters and a sample of the clusters is selected. (used in marketing . 0t works if clusters are as heterogeneous as the population. )or a large country like the 1" it is convenient to use cluster sampleing and choose some geographical locations (2shkosh 3isconsin . ! sampling error is the difference between a sample statistic and its corresponding parameter. 3e can make probabilistic statements about this sampling error only if we have a probability sample (not 4udgment sample . 0n general, sampling distribution is for any sample statistic (mean, median, mode, standard deviation, etc defined over a sample space consisting of all possible samples of si+e n from the available population of si+e %. 5et us first study the sampling distribution of sample mean as an e*ample.

Sampling !istribution o the Sample Mean

The sampling distribution of the sample means is a probability distribution consisting of all possible sample means based on specified sample si+es selected from the population. The sampling distribution yields the probability of occurrence associated with each sample mean over the set of all possible sample mean numbers.

"#$MP%" & The law firm of 6oya and !ssociates has five partners (!,7,C,8,9 . !t their weekly partners meeting each reported the number of hours they charged clients for their services last week. ! ::, 7 :;, C /-, 8 :;, 9 ::. (eg, <r. 9 charged :: hrs 0f n&:, two partners are selected randomly, how many different samples are possible= This is the combination of >

ob4ects taken : at a time. That is, >C:& >?((:?/? &'-. There are '- possible samples.

Ten "am%le mean" are $i&en 'elo = (e+$+ if the "am%le ha" A and B, "am%le mean i" 65)

AB 65, AC 6>, AD 65, AE 66, BC 6?, BD 6>, BE 65, CD 6?, CE 6>, E0 65 E@er(i"e= dra a %i(t!re ith fre* on &erti(al a@i" for "am%lin$ di"tri'!tion of mean"+ Note a'o&e that mean of A and C i" 6>, B and D i" 6> and mean of C and E i" al"o 6>, hi(h mean" the x -6> re%eat" it"elf three time" (ha" fre*!en() 7)+ 3e find follo in$ li"t of fre*!en(ie"= x -66 ith fre*- ., x -65 ith fre*- 5, x -6> ith fre*- 7, x -6? ith fre*- 6+ Thi" i" almo"t the "am%lin$ di"tri'!tion of mean"

Total frequency &'-.


If e di&ide indi&id!al fre*!en(ie" ') total fre*!en() e $et Arelati&e fre*!en()B or %ro'a'ilit)+ The"e %ro'a'ilitie" add !% to one, "o e ha&e a %ro'+ di"tri'!tion+ The

This is a sampling distribution of all possible sample means. %ow the random variable is x , it is no longer 4ust @.

a'o&e information "a)" that the %ro'a'ilit) that "am%le mean i" 66 i" 6 o!t of ./ or /+6+ The "am%lin$ di"tri'!tion i" "im%l) thi" %ro'a'ilit) di"tri'!tion defined o&er all %o""i'le "am%le" of "i;e n from the %o%!lation of "i;e N+ In the real orld %ro'lem" N ill 'e lar$e (e+$+ 6// million US %o%!lation) and n ill 'e al"o 'e lar$e (e+$+, ./// %eo%le "!r&e)ed) and (N C n) ill 'e a"tronomi(al n!m'er+ then the "am%lin$ di"tri'!tion (an onl) 'e ima$ined+ 3e ha&e (ho"en a "im%le e@am%le of N-8, n-6 "o that the entire "am%lin$ di"tri'!tion (an 'e e@%li(itl) (om%!ted and &i"!ali;ed+

3hat are the %ro%ertie" of the "am%lin$ di"tri'!tion of "am%le


mean"
x

C Pro%ertie" in(l!de the mean and &arian(e of

Compute the mean of the sample means and compare it with the population mean$ )or our simple e*ample we can e*plicitly calculate the mean of means or 9*pected value of means or 9( x & A The mean of the sample means is obtained by weighting each sample mean by its frequency& B(:: (' C (:D (D C (:; (/ C (:E (: F('-&:>.: B#ead page :'D of your te*tF A "ince we know the value of every observation in the population in this (impractical simple e*ample, we have the directly calculated population mean & (::C:;C/-C:;C:: (> & (:>.: . %ote that in the real world we usually cannot find , we can only make

inferences about it from sample mean

A Observe that the grand mean of all 10 sample means (25.2) is equal to the population mean (25.2). Sin(e E( x )- , e "a) that Sam%le mean x i" an UNBIASED e"timator of %o%!lation mean 3e &erified thi" %ro%ert) a'o&e for the "im%le e@am%le of La )er ho!r"+ In $eneral, "!(h &erifi(ation i" diffi(!lt and one need" to !"e ad&an(ed theor)+

+ It i" %o""i'le to &erif) int!iti&el) that lar$er the "am%le "i;e, "maller the &arian(e+
No e t!rn to the &arian(e of
x

0or e@am%le if D i" hei$ht (1no n to 'e a Normal random &aria'le) e ant to e"timate the a&era$e hei$ht of all 0ordham "t!dent" from a "mall

"am%le of onl) ./ "t!dent"+ 3hen e (on"ider all %o""i'le "am%le" e (annot r!le o!t the "am%le of &er) tall fol1" (e+$+, all ./ from the 0ordham 'a"1et'all team ho are, "a), 9 ft tall)+ No the a&era$e hei$ht o&er "e&en feet i" lar$e and !%%er limit of the ran$e of a&era$e" ill 'e "e&en feet+ Similarl) the a&era$e for the "horte"t ./ "t!dent" ill 'e "maller than fi&e feet ("a))+ Th!" the ran$e of &aria'ilit) from the "malle"t to the lar$e"t a&era$e hei$ht" 'a"ed on n-./ ill 'e "%read o&er a ide ran$e+ Re(all that ide ran$e mean" lar$e &arian(e+ B) (ontra"t, if e (hoo"e n-.//, the a&era$e hei$ht for the talle"t .// ill not 'e "e&en feet, '!t "maller+ Similarl) the a&era$e hei$ht of "horte"t .// ill 'e hi$her than for "horte"t ./ and the ran$e for n-.// ill not 'e a" lar$e a ran$e for n-./+ Th!" the ran$e "%read of the "am%lin$ di"tri'!tion de(rea"e" a" n in(rea"e"+ In fa(t the &arian(e (an 'e %ro&ed to 'e in&er"el) %ro%ortional to n a" e "ee 'elo + Standard "rror (S") o the Sample Means (S'( root o sampling variance or standard deviation( )t is customary to distinguish between usual standard deviation (S!) and that o a sampling distribution (S")

The standard error of the sample means is the standard deviation of the sampling distribution of the sample means. n i" the "i;e of the "am%le+ is the standard deviation of the population (assumed known . 0t is computed by$ *bar & ( (n as a first appro*imation if % is not known or % is large (almost infinity . *bar is the symbol for the standard error of the sample means. 0f is not known and n /-, the standard deviation of the sample, denoted by s is used to appro*imate the population standard deviation. Then the formula for the standard error becomes$ "9( x & s sub x &s ( n !lways, think of "9( x as the standard deviation of the #andom Gariable x . 3hat is the shape of the

probability distribution of ( x = The following theorem says that it is %ormal and hence the theorem enables us to solve all kinds of practical problems.
*entral %imit +heorem (*%+) See page ,-& o .aw/es te0tboo/(
ECentral mean" it i" of (entral im%ortan(e to Stati"ti("+ Limit theorem 'e(a!"e it "t!die" the 'eha&ior a" n 'e(ome" lar$e, namel) a" n tend" to infinit), in %ra(ti(e for n7/+F

Thi" i" a %o erf!l re"!lt ') a mathemati(ian named Pol)a in .G6/H" "ho in$ that EIEN I0 @ i" NOT NORMAL, if n7/ the %ro(e"" of a&era$in$ (i" "o hel%f!l) that it )ield" normalit) of the "am%lin$ di"tri'!tion of ( x ) ith the &arian(e $i&en 'elo + )or a population with a mean and variance :, the sampling distribution of all possible means of all possible samples of si+e n generated from that population will be appro*imately normally distributed H x % I , (: (n B(%n ((%' F J assuming sufficiently large n. (n /- . 0f % is large the finite population correction term B(%n ((%' F is close to ' and can be ignored. Then, this formula simplifies to x % I , (: (n J
On %a$e" 7G.J7G6 of )o!r te@t there are "e&eral !"ef!l fi$!re"+ The) "ho that e&en if e "tart ith a 'imodal, e@%onential de(a) or !niform di"tri'!tion", hi(h are de(idedl) not normal to 'e$in ith the %ro(e"" of a&era$in$ $i&e" !" a normal di"tri'!tion for the "am%le mean %ro&ided the "am%le "i;e i" at lea"t 7/+ 3e ma) 1no that h!man intelli$en(e or h!man hei$ht are normall) di"tri'!ted, '!t e ha&e no rea"on to thin1 that La )erK" ho!r" are normall) di"tri'!ted+ The (entral limit theorem "a)" that a" lon$ a" )o! are a&era$in$ o&er 7/ la )er", normalit) (an 'e a""!med+ Thi" i" &er) !"ef!l "in(e e do not ha&e to &erif) the !nderl)in$ "ha%e of the di"tri'!tion+ A $ood %ra(ti(e e@am%le hi(h hi$hli$ht" the differen(e 'et een ordinar) di"tri'!tion of D and "am%lin$ di"tri'!tion of D'ar ith "e%arate ord %ro'lem" follo "= IL-D M N(../, ./6), 0ind P(ILN?/) Intelli$en(e L!otient (IL) i" normall) di"tri'!ted ith mean ../ and "tandard de&iation of ./+ A moron i" a %er"on ith IL le"" than ?/+ 0ind the %ro'a'ilit) that a randoml) (ho"en %er"on i" a moron+ (Oint thi" random &aria'le i" for a "in$le %er"on D) Let idiot 'e defined a" one ith an IL le"" than G/+ 0ind the %ro'a'ilit) that a randoml) (ho"en %er"on i" an idiot+ (Oint thi" random &aria'le i" for a "in$le %er"on D)

If a "am%le of 68 "t!dent" i" a&aila'le, hat i" the %ro'a'ilit) that the a&era$e IL e@(eed" ./8C (Oint thi" random &aria'le i" for an a&era$e o&er 68 %er"on" or D'ar) 3hat i" the %ro'a'ilit) that the a&era$e IL e@(eed" ..8 (Oint thi" random &aria'le i" for an a&era$e o&er 68 %er"on" or D'ar) An" er" are $i&en after man) 'lan1 line"

D-IL M N(../, (./)6 ) m!- -../ "tandard de&iation-"d- - ./ 5 time" "d- 5 -5/ Pla!"i'le ran$e of D ha" the lo er limit- J5 -../ P 5/ or 9/ !%%er limit i" Q5 -../ Q 5/ - .8/ Thi" (orre"%ond" ith the %la!"i'le ran$e of "tandard normal ; (J5 to 5) EDERCISE .= Gi&en that D-IL M N(../, (./)6 )+ If a d!m' moronK" IL i" ?/ or le"", find the %ro'a'ilit) that a randoml) (ho"en %er"on i" a d!m' moron+ ANS3ER .= Thi" i" #!"t normal di"tri'!tion ord %ro'lem+ In ")m'ol", e ant to find= P( @N?/)+ Re(all that %ro'a'ilit) i" "ome area !nder the Normal 'ell "ha%ed (!r&e+ 3e ant to e&al!ate a "haded area 'et een J to ?/ Thi" "haded area ha" the lo er limit of J and !%%er limit of ?/ The ma%%in$ of J to the ; "(ale i" o'&io!"l) J5 for all %ra(ti(al %!r%o"e" Oen(e e need not 'other ith the lo er limit of de"ired "haded area+ 3e "till need to ma% the !%%er limit ?/ to the ; "(ale ') !"in$ the ; tran"form an) ; - (@J) 2 - (?/J../)2./ 0or o!r !%%er limit @-?/-IL or moron, -../ and -./ ;- (?/J../)2? -J7 hen ;-7 area 'et een / and 7 i" /+5G?9 from the ta'le A of )o!r te@t Tail area i" /+8J/+5G?9 hen(e the an" er i" /+//.7

In R "oft are e (om%!te %norm(J7) to $et /+//.7 for the left tail EDERCISE 6= D-IL M N(../, (./)6 ) i" $i&en+ If a d!m' idiotK" IL i" G/ or le"", find the %ro'a'ilit) that a randoml) (ho"en %er"on i" a d!m' idiot+ In ")m'ol", find= P( @NG/)+ ANS3ER 6= 0or o!r !%%er limit @-G/-IL or idiot, -../ and -./ Ma%%in$ G/ to the ; "(ale i" (G/J../)2./ - J6 Tail area to the left of ;-J6 i" /+8J/+5996 -/+/66? In R "oft are e (om%!te %norm(J6) to $et /+/66? for the left tail EDERCSE 7= D-IL M N(../, (./)6 ) i" $i&en+ 0ind %ro'a'ilit) that the a&era$e IL of 68 "t!dent" e@(eed" ./8 ANS3ER 7= Sin(e the "am%le "i;e n-68 i" $i&en, thi" i" not a r!nJofJtheJmill normal di"tri'!tion ord %ro'lem+ The random &aria'le !nder (on"ideration here i" the a&era$e+ Oen(e, a "am%lin$ di"tri'!tion i" rele&ant hen e (on"ider a&era$e IL a" the &aria'le of intere"t, not he IL of an indi&id!al "t!dent, '!t the a&era$e o&er 68 "t!dent"+ "tandard de&iation of the "am%lin$ di"tri'!tion - Standard Error - SE - 2n n-68 n - 68 -8 SE - 2n - ./28 - 6 5SE - ? Pla!"i'le ran$e i" ../J? to ../Q? or ./6 to ..? for @'ar -a&era$e IL Area to the ri$ht of ./8 i" to 'e fo!nd M!"t ma% ./8 to the ; "(ale Ma%%in$ no i" ;-(@'ar J )2SE - (./8J../)26 - J6+8 Area 'et een / to 6+8 i" /+5G7? Total area /+8Q /+5G7? - /+GG7? - %ro'a'ilit) that the a&era$e IL e@(eed" ./8 In R "oft are e (om%!te %norm(J6+8,lo er+tail-0ALSE) to $et /+GG7? No find %ro'a'ilit) that the a&era$e IL e@(eed" ..8 Thi" i" the tail area to the ri$ht of ; - (..8J../)26 - /+8J/+5G7? - /+//>6 %norm(/+/.,lo er+tail-0ALSE) EDAMPLE 5 Li'rar) !"!all) ha" .7: of it" 'oo1" (he(1ed o!t 0ind the %ro'a'ilit) that in a "am%le of 8?? 'oo1" $reater than .5: are (he(1ed o!t+ ANS3ER= 3e ha&e %er(enta$e" here, "o it i" not the "im%le normal di"tri'!tion ord %ro'lem+ It !"e" the fa(t that %R M N(%, E%*2nF ) hi(h "a)" that the Sam%lin$ di"tri'!tion of the %ro%ortion %R i" Normal ith mean % and &arian(e %(.J%)2n E(%R)-/+.7, n-8?? Iar(%R)-6 (%R) - (/+.7)(.J/+.7)2n or /+///.G678

3e need the "*!are root of thi" &arian(e for !"e in o!r ; tran"form+ SE(%R)- "*rt(/+///.G678) - /+/.7?>G/7 - /+/.7G (here e ro!nd to 5 %la(e") Pla!"i'le ran$e i" /+.7 5< /+/.7G 5<SE i" /+/88> E/+/955 to /+.?8>F i" the %la!"i'le ran$e+ 0ind the %ro'a'ilit) that in a "am%le of 8?? 'oo1" $reater than .5: are (he(1ed o!t+ Oen(e the de"ired %oint i" to the ri$ht of the (enter at /+.7 In ")m'ol", e ant to (om%!te P(%R S/+.5)+ No let !" a%%l) ; tran"form to 'oth "ide" of the ine*!alit)+ P(%R S/+.5)- P(; S (/+.5 J /+.7)2SE ) or e ha&e to (om%!te= P(;S /+9.G5) - P(;S /+96)+ 3e m!"t ro!nd to 6 %la(e" to the ri$ht of the de(imal "in(e ; ta'le" are that a)+ 3e ant tail area, '!t e (an loo1 !% onl) the area from / to /+96 for ; /+8 MINUS /+6>56 or ANS- /+678?

Co%)ri$ht= Ori"hi1e"h D+ Iinod La"t !%dated ..2.G2.7 .9=/> a..2%..

You might also like