Professional Documents
Culture Documents
Unit8
Unit8
Structure
8.1 Introduction Objectives 8.2 8.3 8.4 8.5 Reasonswhyestimateshavetobemade MakingStatisticalInference TypesofEstimates CriteriaofaGoodEstimator 8.5.1Unbiasedness 8.5.2Efficiency 8.5.3Consistency 8.5.4Sufficiency 8.6 8.7 8.8 PointEstimates IntervalEstiamtes IntervalEstimatesandConfidenceIntervals SelfAssessmentQuestions1 8.9 8.10 DeterminingtheSampleSizeinEstimation Summary TerminalQuestions AnswertoSAQsandTQs
Estimation
8.1 Introduction
Everyonemakesestimates.Whenyouarereadytocrossastreet,youestimatethespeed ofanycarthatisapproaching,thedistancebetweenyouandthatcar,andyourownspeed. Havingmadethesequickestimates,youdecidewhethertowait,walk,orrun. LearningObjectives Inthisunitstudentswilllearnabout
1. 2. 3.
8.2 Reasonswhyestimateshavetobemade
All mangers must make quick estimates too. The outcome of these estimates can affect theirorganizationsasseriouslyastheoutcomeofyourdecisionastowhethertocrossthe street. Credit managers estimate whether a purchaser will eventually pay his bills.
SikkimManipalUniversity
121
StatisticsForManagement
Unit8
Prospectivehomebuyersmakeestimatesconcerningthebehaviourofinterestratesinthe mortgage market. All these people makeestimates without worry about whether they are scientific but with the hope that the estimates bear a reasonable resemblance to the outcome. Managers use estimates because in all but the most trivial decisions, they must make rational decisions without complete information and with a greatdeal of uncertainty about whatthefuturewillbring.Aseducatedcitizensandprofessionals,youwillbeabletomake more useful estimates by applying the techniques described in this and subsequent chapters.
8.3
Makingstatisticalinference
Statisticalinferenceisbasedonestimation,andhypothesistesting.Inbothestimationand hypothesistesting,weshallbemakinginferencesaboutcharacteristicsofpopulationsfrom information contained in samples. Here we infer something about a population from informationtakenfromasample. Herewetrytoestimatewithreasonableaccuracythepopulationproportion(theproportion of the population that possesses a given characteristic) and the population mean. To calculatetheexactproportionortheexactmeanwouldbeanimpossiblegoal.Evenso,we willbeabletomakeanestimate,andimplementsomecontrolstoavoidasmuchoftheerror aspossible.
8.4 Typesofestimates
Therearetwotypesofestimatesaboutapopulation 1 Apointestimateand 2 anintervalestimate 8.4.1 A Point estimate: is a single number that is used to estimate an unknown populationparameter.Apointestimateisofteninsufficient,becauseitiseitherrightor wrong,wedonotknowhowwrongitis.Therefore,apointestimateismuchmoreuseful ifitisaccompaniedbyanestimateoftheerrorthatmightbeinvolved. 8.4.2 An interval estimate: is a range of values used to estimate a population parameter. It indicates the error in two ways: by the extent of its range and by the probabilityofthetruepopulationparameterlyingwithinthatrange.
8.5 Criteriaofagoodestimator
8.5.1 Unbiasedness: This is a desirable property for a good estimator to have. Thetermunbiasednessreferstothefactthatasamplemeanisanunbiasedestimatorof a population mean because the mean of the sampling distribution of sample means
SikkimManipalUniversity
122
StatisticsForManagement
Unit8
takenfromthesamepopulationisequaltothepopulationmeanitself. Wecansaythata statisticisanunbiasedestimatorif,onaverage,ittendstoassumevaluesthatareabove the population parameter being estimated as frequently and to the same extent as it tendstoassumevaluesthatarebelowthepopulationparameterbeingestimated. 8.5.2 Efficiency: Another desirable property of a good estimator is that it be
efficient.Efficiencyreferstothesizeofthestandarderrorofthestatistic.Ifwecompare two statisticsfrom a sampleof the same sizeand try todecide whichoneis themore efficient estimator, we would pick the statistic that has the smaller standard error. Suppose we choose a sample of a given size and must decide whether to use the samplemeanorthesamplemediantoestimatethepopulationmean.Ifwecalculatethe standarderrorofthesamplemeanandfindittobe1.05andthencalculatethestandard errorofthesamplemedianandfindittobe1.6,wewouldsaythatthesamplemeanisa moreefficientestimatorofthepopulationmeanbecauseitsstandarderrorissmaller.It makes sense that an estimator with a smaller standard error (with less variation) will havemore chance of producinganestimate nearer to the population parameter under consideration. 8.5.3 Consistency: Astatisticisaconsistentestimatorofapopulationparameter
ifasthesamplesizeincreases,itbecomesalmostcertainthatthevalueofthestatistic comesveryclosetothevalueofthepopulationparameter.Ifanestimatorisconsistent, itbecomesmorereliablewithlargesamples. 8.5.4 Sufficiency: An estimator is sufficient if it makes so much use of the
information in the sample that no other estimator could extract from the sample additionalinformationaboutthepopulationparameterbeingestimated.
8.6
Pointestimates:
101 105 97 93 114 Resultsofasamplesof35Boxofbolts(boltsperbox) 103 112 102 98 97 100 97 107 93 94 100 110 106 110 103 98 106 100 112 105 97 110 102 98 112 93 97 99 100 99
Consider the table above, we have taken a sample of 35 boxes of bolts from a manufacturing line and have counted the bolts per box. We can arrive at the population meani.e.meannumberofboltsbytakingthemeanforthe35boxeswehavesampled.i.e. addingalltheboltsanddividingbythenumberofboxes. X3570
X=
n35
=102
SikkimManipalUniversity
123
StatisticsForManagement
Unit8
Thususingthesamplemeanxastheestimatorwehaveapointestimateofthepopulation mean.
2 Similarly we can use the sample variance s and estimate the population variance, where 2 thesamplevariances isgivenbytheformula.
(XX)
2 S =
n1
8.7
IntervalEstimates
Thepurposeofgatheringsamplesistolearnmoreaboutapopulation.Wecancomputethis information from the sample data as either point estimates, or as interval estimates. An intervalestimatedescribesarangeofvalueswithinwhichapopulationparameteris likelytolie. The marketing research director needs an estimate of the average life in months of car batterieshis company manufactures. We selecta random sampleof200 batteries with a mean life of 36 months. If we use the point estimate of the sample mean x as the best estimator of the population mean , we would report that the mean life of the companys batteriesis36months. Thedirectoralsoasksforastatementabouttheuncertaintythatwillbelikelytoaccompany this estimate, that is, a statement about the range within which the unknown population meanislikelytolie.Toprovidesuchastatement,weneedtofindthestandarderrorofthe mean. Ifweselectandplotalargenumberofsamplemeansfromapopulation,thedistributionof thesemeanswillapproximatetonormalcurve.Furthermore,themeanofthesamplemeans willbethesameasthepopulationmean.Oursamplesizeof200islargeenoughthatwe can apply the central limit theorem. Suppose we have already estimated the standard deviation of the population of the batteries and reported that it is 10 months. Using this standarddeviationwecancalculatethestandarderrorofthemean:sousingtheformula s
sX = n
WefindthestandarderrorS.E= Makingtheintervalestimate: We can tell to the director that our estimate of the life of the companys batteries is 36 months,andthestandarderrorthataccompaniesthisestimateis0.707.Inotherwords,the actualmeanlifeforallthebatteriesmayliesomewhereintheintervalestimateof35.293to 36.707months.Thisishelpfulbutinsufficientinformationforthedirector.Next,weneedto calculatethechancethattheactuallifewilllieinthisintervalorinotherintervalsofdifferent widthsthatwemightchoose, 2s (2x0.707), 3s (3x0.707),andsoon.
sX = 10/ 200tobe0.707permonth
SikkimManipalUniversity
124
StatisticsForManagement
Unit8
The probability is 0.955 that the mean of a sample size of 200 will be within 2 standard errorsofthepopulationmean.Stateddifferently,95.5percentofallthesamplemeansare within2standarderrorsfrom m.Thepopulationmeanwillbelocatedwithin2standard errorsfromthesamplemean95.5percentofthetime. Hencefromtheaboveexamplewecannowreporttothedirector,thatthebestestimateof thelifeofthecompanysbatteriesis36months,andweare68.3percentconfidentthatthe life lies in the interval from 35.293 to 36.707 months (36 1 sx ). Similarly, we are 95.5 percentconfidentthatthelifefallswithintheintervalof34.586to37.414months(36 2 sx), andweare99.7percentconfidentthatbatterylifefallswithintheintervalof33.879to38.121 months(36 3 sx).
8.8 IntervalEstimatesandconfidenceintervals
Inusingintervalestimates,wearenotconfinedto1,2and3standarderrorsforexample, 1.64 standard errors includes about 90 percent of the area under the curve it includes 0.4495 of the area on either side of the mean in a normal distribution. Similarly, 2.58 standarderrorincludesabout99percentofthearea,or49.51percentoneachsideofthe mean. The probability that we associate with an interval estimate is called the confidence level. Thisprobabilityindicateshowconfidentwearethattheintervalestimatewillinclude thepopulationparameter.Ahigherprobabilitymeansmoreconfidence.Inestimation,the mostcommonlyusedconfidencelevelsare90percent,95percent,and99percent,butwe arefreetoapplyanyconfidencelevel. Theconfidenceintervalistherangeoftheestimatewearemaking.Ifwereportthatweare 90 percent confident that the mean of the population of incomes of people in a certain communitywillliebetweenRs.8,000andRs.24,000,thentherangeRs.8,000Rs.24,000 is our confidence interval. Often, however, we will express the confidence interval in standard errors rather than in numerical values. Thus, we will often express confidence intervalslikethis:X1.64 sx X+1.64 sx =upperlimitoftheconfidenceinterval X1.64 sx =lowerlimitoftheconfidenceinterval
Thus, confidence limits are the upper and lower limits of the confidence interval. In this case,X+1.64 sx iscalledtheupperconfidencelimit(UCL)andX1.64 sx =isthelower confidencelimit(LCL). CalculatingintervalEstimatesoftheMeanfromLargeSamples
SikkimManipalUniversity
125
StatisticsForManagement
Unit8
Ifthesamplesarelargethenweusethefinitepopulationmultipliertocalculatethestandard error.Thisisgivenfromthepreviousunitas s
sx =
Nn
is
N1N
>0.05
CalculatingintervalEstimatesoftheProportionfromLargeSamples Statisticians often use as sample to estimate a proportion of occurrences in a population. Forexample,thegovernmentestimatesbyasamplingproceduretheunemploymentrate,or theproportionofunemployedpeople,inthecountrysworkforce. We know for a binomial distribution, the mean and the standard deviation of the binomial distributiontobe Mean=np Andstandarddeviation s = npqwhereq=1p Heren=numberoftrials p=probabilityofsuccessand q=probabilityoffailure=1p Sincewearetakingthemeanofthesampletobethemeanofthepopulationweactually meanthat mp =p Similarly, wecanmodifytheformulaforthestandarddeviationofthebinomialdistribution, npq,whichmeasuresthestandarddeviationinthenumberofsuccesses.Tochangethe numberofsuccessestotheproportionofsuccesses,wedivide npqbynandget pq/ n ThereforethestandarderroroftheproportionSp = pq/ n Example:Inaverylargeorganizationthedirectorwantedtofindoutwhatproportionsofthe employeesprefertoprovidetheirownretirementbenefitsinlieuofacompanysponsored plan.Asimplerandomsampleof75employeeswastakenandfoundthat40%,i.e.0.4of themareinterestedinprovidingtheirownretirementplans.Themanagementrequeststhat weusethissampletofindanintervalaboutwhichtheycanbe99percentconfidentthatit containsthetruepopulationproportion. Heren=75,p=0.4q=1p=10.4=0.6 ThereforeStandarderrorofthemean= pq/ n There the interval estimate for 99% levelof confidence is0.4 2.58 (0.057) = 0.253 and 0.547.
SikkimManipalUniversity
126
StatisticsForManagement
Unit8
Thereforetheproportionofthetotalpopulationofemployeeswhowishtoestablishtheirown retirementsplansliebetween0.253and0.547. IntervalEstimatesusingthestudentstDistribution Sofar,thesamplesizeswewereexaminingwerealllargerthan30.Thisisnotalwaysthe case.Questionslikehowcanwehandleestimateswherethenormaldistributionisnotthe appropriate sampling distribution, that is, when we are estimating the population standard deviationandthesamplesizeis30orless?Supposewehavedataonlyformletussay10 weeks or sample sizes less than 30, then fortunately, another distribution exists that is appropriateinthesecases.Itiscalledthetdistribution. EarlytheoreticalworkontdistributionswasdonebyamannamedW.S.Gossetintheearly 1990s. Gosset was employed by the Guinness Brewery in Dublin, Ireland, which did not permitemployeestopublishresearchfindingsundertheirownnames.So Gossetadopted thepennameStudent and published under thatname. Consequently, the tdistribution is commonlycalledStudentstdistribution,orsimplyStudentsdistribution. Conditionsforusage: Because it is used when the sample size is 30 or less, statisticians often associate the t distributionwithsmallsamplestatistics.Thisismisleadingbecausethesizeofthesampleis onlyoneoftheconditionsthatleadustousethetdistribution.Thesecondconditionisthat thepopulationstandarddeviationmustbeunknown.Useofthetdistributionsforestimating isrequiredwheneverthesamplesizeis30orlessandthepopulationstandarddeviationis notknown.Furthermore,inusingthetdistribution,weassumethatthepopulationisnormal orapproximatelynormal. Degreesoffreedom There is a different t distributionfor each of the possible degrees of freedom. What are degreesoffreedom?Wecandefinethemasthenumberofvalueswecanchoosefreely. We will use degrees of freedom when we select a t distribution to estimate a population mean,andwewillusen1degreesoffreedom,wherenisthesamplesize.Forexample,if weuseasampleof20toestimateapopulationmean,wewilluse19degreesoffreedomin ordertoselecttheappropriatetdistribution. Withtwosamplevalues,wehaveonedegreeoffreedom(21=1),andwithsevensample values,wehavesixdegreesoffreedom(71=6).Ineachofthesetwoexamples,then,we hadn1degreesoffreedom,assumingnisthesamplesize.Similarly,asampleof23would giveus22degreesoffreedom. UsingthetDistributionTable
SikkimManipalUniversity
127
StatisticsForManagement
Unit8
Comparisonbetweentandztables Thetableoftdistributionvaluesdiffersinconstructionfromtheztableornormaldistribution tableusedpreviously.Thettableismorecompactandshowsareasandtvaluesforonlya fewpercentages(10,5,2,and1Percent).Becausethereisadifferenttdistributionforeach numberofdegreesoffreedom,amorecompletetablewouldbequitelengthy.Althoughwe canconceiveoftheneedforamorecompletetable Aseconddifferenceinthettableisthatitdoesnotfocusonthechancethatthepopulation parameter being estimated will fall with our confidence interval. Instead, it measures the chance that the population parameter we are estimating will not be within our confidence interval (that is, that it will lie outsideit). If we are making anestimate at the 90 percent confidence level, we would look in the t table under the 0.10 column (100 percent 90 percent=10percent).Thisis0.10chanceoferrorissymbolizedbytheGreekletteralpha .Wewouldfindtheappropriatetvaluesforconfidenceintervalsof95percent,98percent, and99percentunderthecolumnsheaded0.05,0.02,and0.01,respectively. A third differenceinusing the t table is that we must specify the degrees offreedom with which we are dealing. Suppose we make an estimate at the 90 percent confidence level withasamplesizeof14,whichis13degreesoffreedom.Lookunderthe0.10columnuntil youencountertherowlabelled13.Likeazvaluethetvaluethereof1.771showsthatifwe mark off plus and minus 1.7716 sx (estimated standard errors of x) on either side of the mean, the area under the curvebetweenthese two limits will be90 percent, and the area outsidetheselimits(thechanceoferror)willbe10percent. Remember that in any estimation problem in which the sample size is 30 or less and the standard deviation of the population is unknown and the underlying population can be assumedtobenormalorapproximatelynormal,weusethetdistribution.
SelfAssessmentQuestions1 1. Pizza Hut has developed quite a business in Bangalore by delivering pizza orders promptly.ItguaranteesthatitsPizzaswillbedeliveredin30minutesorlessfromthetime theorderwasplaced,andifthedeliveryislate,thePizzaisfree.Thetimethatittakesto delivereachPizzaorderthatisontimeisrecordedinthePizzaTimeBook(PTB),andthe deliverytimeforthosePizzasthataredeliveredlateisrecordedas30minutesinthePTB. Asampleof12randomentriesfromthePTBarelistedbelow: 15.3 10.8 29.5 12.2 30 14.8 10.1 30 30 22.1 19.6 18.3
a. Findthemeanforthesample b. Fromwhatpopulationwasthissampledrawn?
SikkimManipalUniversity
128
StatisticsForManagement
Unit8
c. Can this sample be used to estimate the average time that it takes for Pizzas hut to deliverapizza?Explain. 2. Madhuafrugalstudentwantstobuyausedbike.Afterrandomlyselecting125wanted advertisements, he found the average price of the bike to be Rs.3250 with a standard deviationofRs.615. a. Establish an interval estimatefor the average price of the bike so that Madhu can be 68.3%certainthatthepopulationmeanliesinthisinterval. b. Establish an interval estimatefor the average price of the bike so that Madhu can be 95.5%certainthatthepopulationmeanliesinthisinterval. 3. Given the following confidence levels, express the lower and upper limits of the confidenceintervalfortheselevelsintermsofXand sx.(Usethenormaldistributiontables). a. 54percent. b. 75percent. c. 94percent. d. 98percent. 4. From a population of 540, a sample of 60 individuals is taken. From this sample the meanisfoundtobe6.2andthestandarddeviationSDtobe1.368 a. Findtheestimatedstandarderrorofthemean. b. Constructa96%confidenceintervalofthemean. 5. Forthefollowingsamplesizesandconfidencelevelsfindtheapproximatetvaluesfor constructingconfidenceintervals(usethettable) a)n=2895% b)n=898% c)n=1390%d)n=2595%
SikkimManipalUniversity
129
StatisticsForManagement
Unit8
IIM wants to conduct a survey of the annual earning of its graduates in international placements.Itknowsfromthepastexperiencethatthestandarddeviationofitspopulation of studentsis $ 1500. How largea sample size should betaken inorder to estimate the meanannualearningsoflastyearsclasswithin$500at95%levelofconfidence? If you look at the problem above: it is stated that variationof $500 on either side ofthe populationsmean. Thatmeansz sx =500 At95%levelofconfidenceweknowfromtheztablethatz=1.96 Therefore1.96sx =500andthatmeans sx =500/1.96=255 Nowifthestandarderrorofthemeanis255thatleadsusto sx = s / n=255.Since s =1500wecanfindn.thatis
2 1500/ n=255thereforen=(1500/255) =34.6
Meaningnshouldbegreaterthan34.6or35iftheuniversitywanttoestimatetheprecision withwhichitwantstoconductthesurvey.
8.10 Summary
In this chapter we have seen point estimates and interval estimates. These are the foundation for inferential statistics in estimation and hypothesis testing which we will be discussinginthenextunit.Alsowehaveseentheconceptofconfidencelevelsandmake estimations whenthesample sizesaresmall andlarge. Also we have gonein reverseto estimate a sample size provided we know the level of accuracy we want to construct the estimate. Also we have seen thatif the sample size is less than 30 and the populations standarddeviationisnotknown,weusethestudentstdistributionforestimations. TerminalQuestions 1. ICICIisdeterminingtheno.oftellersavailableduringtheFridaylunchrushhour.
The bank has collected dataon theno.ofpeople who enteredthebankduringthepast3 monthsonFridayfrom11amto1pm. Usingthedatabelow,findthepointestimatesofthe meanandSDofthepopulationfromwhichthesamplewasdrawn 242 294 2. 275 328 FromapopulationknowtohaveaSDof1.4,asampleof60individualsistaken. 289 306 342 385 279 245 269 305
SikkimManipalUniversity
130
StatisticsForManagement
Unit8
3.
sx =
= 55.01
a)x 1 sx =3250 55.01=3194.99and3305.01tobe68.3%certain b)95.5%certainmeansx 2 s x =3250 110.02givingarangebetween 3139and3360.02 3. a.x 0.74 sx d.x 2.33 s x b.x 1.15 s x
c.x 1.88 sx
4. sx =
a.Given: s =1.368 s n Nn X N1
as
n N
>0.05
sx =
=0.167
SikkimManipalUniversity
131