Professional Documents
Culture Documents
Part1
Population=fullResearchResults(N)
Sample=PartofPopulation(n)
TypesofData:
1.categorical=responses(eg.eyecolour,levelofsatisfaction...)
a)Nominal=yes/noanswers
b)ordinal=values(eg.1.poor,2.average,3.good)
2.Numerical
a)continues=counting(fullnumberseg.age)
b)discrete=measurement(eg.weight)
ComulativeFrequencyDistribution=SumofPopulationsorSamplestoCertainPoint
eg.
Class
Frequency
Percentage
CumulativeFr.
CumulativeP.
10butlessthan 3
20
15%
15%
20butlessthan 6
30
30%
45%
30butlessthan 5
40
25%
14
70%
addingfr.values
addingp.values
X
i
=ithvalueofthevariableX
N
xi
ArithmeticMean(Population)= = i=1N =
x1,x2,x3=PopulationValues
x1+x2+x3+...xN
xi
ArithmeticMean(Sample)= x = i=nn =
x1+x2+x3+...xn
n
x1,x2,x3=SampleValue
Median=ValuewhichStandsintheMiddle(eg.1,2,2,3,3,4,5Medianis3)
1
Position
alsoCalculatedby:
n+1
2
Note:IfevenAmountofNumberstheAverageoftheTwointheMiddleisMedian
(xix)2
Variance(Sample)= s2 = i=1 n1
Variance(Population)=sameFormulaotherSymbols:
s2 = 2
x =
n1=N
StandardDeviation(Sample)= s =
(xix)2
i=1
n1
StandardDeviation(Population)=sameFormulaother
Symbolss.o.:
BasicallyStandardDeviationisforboth: variance
CoefficientofVariation= C V = ( xs ) 100%
MeasuresrelativeVariation
expressesthestandarddeviationasapercentageofthemean
AlwaysinPercentage
ShowsVariationrelativetoMean
CanbeusedtoComparetwoormoresetsofDatameasuredindifferentunits
n
(xix)(yiy)
n1
Covariance(Population)=SameFormulaotherSymbols
s.o.
MeasuresthestrengthoflinearRelationshipbetweentwovariables
2
ResultsofCovariance:COV(x,y)>0=xandytendtomoveinthesamedirection
COV(x,y)=0=thereisnolinearrelationshipbetweenxandy
COV(x,y)<0=xandytendtomoveintheoppositedirection
CoefficientofCorrelation: r =
COV (x,y)
sxsy
CoefficientofCorrelation:sameFormulaotherSymbols
s.o.:
r=p
Note:1<r<1!!!!
Part2
DiscreteRandomVariable=CountableNumber(TossaCoin:X=NumberofHeads=>Xis
discreterandomvariable)
P(x) 0
P (x) = 1
x
CumulativeProbabilityFunction=showstheprobabilitythatXisequalorsmallerthan x0
F( x0) = P (X x0)
ExpectedValue(orMean)ofadisc.Number
Variance
2 = (X )2P (x)
StandardDeviation
= 2 =
(x ) P (x ) + (x ) P (x ).....
1
FunctionsofRandomVariables
P(x)istheProbabilityforXg(x)isafunctiondescribingX
Ifg(x)=Xwegetthenormalfunction
Ifg(x)=(x )2 wegettheformulaforvariance
SpecialcaseifXisalwaysthesamevariablethanwecansaythattheMeanisXand
theVariance=0
IfthereisavariablebeforeourXwejustmultiplythemfortheExpectedValue
3
E(bX)=b 2
IfthereisavariablebeforeourXwejustsquareittomultiplyitwiththeVariancetoget
theVarianceofthatEquation
Var(bX)= b22
Example:
ConsiderZ=a+bXXhasMeanof x andVarianceof x2
=>
z = E(a + bx) = a + bx
=>
SPECIALCASE!!!!(abitcomplicatedbutstepbystepeasy)
Z=
Xx
x
ExpectedValueZ:
z = E((X x)/x) = (E(X) x)/x = (x x)/x = 0/x = 0
WearesimplyusingtherulesthatwecanexcludetheVariableXfromtheotherconstants.
Insteadofa+bxwehavetheopposite:(xa)/binwhich(a= xandb = x) =>wecanusethe
rule
VarianceofZ:
z2 = V ar((X x)/x) = V ar(X/x) V ar(x/x) = V ar(1/x X ) = (1/x)2 V ar(X) = (1/)2 x2 =
Looksworsethanitis.aswellweareusingtherule.FirstweareseparatingtheXvaluefrom
theavalue( x) thanwecanjustletitfallbecausewhenwelookforvariancewedonttake
intoaccounttheconstantwhichweaddorsubtract.ThanwesimplytakeXseparatelyfrom
thebvalue (x) .BecauseweknowtogettheVariancewesimplytakethebvaluetothe
squareandsolvetheVariancevalueforX.BecausethevariancevalueandthebValueare
both x2 andwehavetodividethemfromeachotherwegetthevalueof1.
BernoulliDistribution:
justtwopossibilities:success/failure
P=probabilityofsuccess
1P=probabilityoffailure
Randomvariablexdefinedas1ifsuccessand0iffailure
P(X=1)=PandP(X=0)=1P
Mean: = nP
TheVariance: 2 = nP (1 P )
n!
TheNumberofsequencesofx(success)inntrials:
C nx = x!(nx)!
x2
x2
= 1
BernoulliProbabilityDistribution:
Hastohaveafixednumberofn
Pofsuccessandfailureaddupto1anddontchangeduringtheexperiment,
independentfromeachother
nx
n!
P(x)= x!(nx)!
P (1 P )
=>ProbabilityofxsuccessesinntrialswiththeprobabilityofPoneachtrial
JointProbabilityFunction
XtakesthespecificvaluexandYtakesthevalueyasafunctionofxandy
P(x,y)=P(X=x y = Y )
MarginalProbabilitiesare:
P(x)= P (x, y)
P(y)= P (x, y)
ConditionalProbabilityFunction
YtakesthevalueofyandxisspecifiedforX=P(yIx)= P(x,y)
P(x)
=y)
XtakethevalueofxandyisspecifiedforY=P(XIY=y)= P(X,Y
P(Y =y)
(slightlydifferent)
IndependentwhenP(x)P(y)=P(x,y)
Covariance:Thestrengthoflinearrelationship
x y
COV (X,Y )
Correlation:p=Corr(X,Y)= xy
p=0norelationship
p>0positiverelationship=>whenXishighYaswell
p<0negativerelationship=>whenXhighYlow
ComulativeDistributionFunction
expressestheprobabilitythatXdoesntexceedxF(x)=P(X x)
example:aandb,twovaluesofX,a<b=>P(a<X<b)=F(b)F(a)
Mean: x = E (X) =
x2 = V ar(X) =
xmin
xmax
xmin
xmax
xf (x)dx
(x E (X))2f (x)dx
NormalDistributionFunction
lookslikeabell
symmetrically
Mean,MedianandModeareequal
Locationisdeterminedby >changingitshiftsthedistributionto
leftorright
Spreadisdeterminedby >changingitspreadsorcloses
Therandomvariablehasaninfiniterange
anynormaldistributionfunctioncanbeturnedintoastandardized
X
normaldistribution(Z)=Z = >thestandardizednormal
distributionshavegenerallyameanof0andavarianceof1
UseTable1inthebooktogetfromaZvaluetheF(Z)value
JointCumulativeDistributionFunction
SupposeXandYarecontinuousrandomvariables
Thefunctionisdescribed:F(x,y)
ItdefinesthatXislessthanxsimultaneouslyYislessthany
F(x,y)=P(X<x Y < y)
Therandomvariablesareindependentif:
P(X x, Y y) =P(X
x)P (Y y)
F(x,y)=F(x)F(y)
Covariance=COV(X,Y)=E((X x)(Y y))
Note:IfXandYareindependenttheircovariancewillbe0
)
Correlation=Corr(X,Y)= COVx(X,Y
y
RulesforRandomVariables
1. Themeanoftheirdifferenceisthedifferenceoftheirmeans:
E(XY)= x y
2. IftheCovariancebetweenXandYis0,thenthevarianceoftheir
differencesis:
Var(XY)= x2 + y2
3. IftheCovariancebetweenXandYisnot0,thentheVarianceof
6
theirdifferenceis:
Var(XY)= x2y2 2Cov(X, Y )
linearcombinationofXandY(whereaandbareconstant),
W=aX+bY
themeanofWis,
w = E (W ) = E (aX + bY ) = ax + by
thevarianceofWis, w2 = a2x2 + b2y2 + 2abCorr(X, Y )xy
Note:ifXandYarenormallydistributedWisaswell
Part3
DescriptiveStatistics:Collecting,presentinganddescribingdata
InferentialStatistics:Drawingconclusionsordecisionsconcerninga
PopulationbasedonSampleData
SamplingDistributions
distributionofallvaluesofasamplefromapopulation
TheStepstodevelopaSampleDistribution:
1. Listthegivenvalues(example,N=4,X=ageofthe4individually,
ValuesofX=18,20,22,24
2. CalculateMeanandStandardDeviation(Population)
a. = 18+20+22+24
= 21
4
b. =
(Xi)2
N
= 2.236
3. Allpossiblesamplecombinationsinatable:
18
20
22
24
18
18,18
18,20
18,22
18,24
20
20,18
20,20
20,22
20,24
22
22,18
22,20
22,22
22,24
24
24,18
24,20
24,22
24,24
4. Thandrawameantable
18
20
22
24
18
18
19
20
21
20
19
20
21
22
22
20
21
22
23
24
21
22
23
24
5. =>16SampleMeans
6. SummaryofSamplingDistribution:
a. =
Xi
N
b. =
X
18+19+19+20+20+20+21+21+21+21+22+22+22+23+23+24
16
(Xi)2
N
(1821)2+(1921)2+(1921)2...(2421)2
16
7. ComparingthePopulationandSample:
a. Population:
i. N=4
ii. = 21
iii. = 2.236
b. Sample:
i. n=2
ii. = 21
iii. = 1.58
ExpectedValueofSampleMeanDistribution
X=
1
n
Xi
i=1
StandardErroroftheMean
DescribestheVariabilityintheMean:
DecreaseswhenSampleSizeincreases
= n
= 21
= 1.58
IfthePopulationisNormal
samplingdistributionalsonormallydistributed
X = andX = n
ZValueforSampleMeanDistributions
Z=
(X)
X
= (X)
X = samplemean
= populationmean
= populationstandarddeviation
n=samplesize
IfPopulationisnotnormal
approximatelynormalifn>25
Example:
= 3
= 8
n= 36
Probabilitythat X between7.8and8.2=?
n>25=>approxnormal=> = X & X = n
X =
3
36
= 0.5
P(7.8<Z<8.2)=P ( 7.88
0.5 <
X
X
<
8.28
0.5 ) =P(0.4<Z<0.4)
=>F(0.4)(1F(0.4))=0.3108
SampleVariance
x1,x2,x3,xnarerandomsampleofpopulation
s2
1
n1
(xi x)2
i=1
thesquarerootiscalledstandardsampledeviation
ChiSquareDistribution
dependsondegreesoffreedom=n1=d.f.
table7
2
n12 = (n1)s
2
exampletofindProbability
Freezerhastoholdtemperaturewithlittlevariation
standarddeviationofnomorethan4=> = 4
Sample14Freezeraretested=>14=n=>d.f.=13
Whatistheprobabilitythatthesamplevarianceexceeds
27.52?=> s2 = ?
2
(141)27.52
2
P (s2 > 27.52) = P ((n1)s
) = P ( (n1)s
2 >
16
16 > 22.36) = P (13 > 22.36) = 0.05
P( 132 > 22.36) = 0.05
Table7:d.f.1322.36as =>P=0.05
FindingtheChiValue
n1=141=13=d.f.
= 0.05
132 = 22.36
PointandIntervalEstimates
Pointestimateisasinglenumber
Intervalestimatesisthewidthofalowerbutstillreliablepointtoa
upperbutstillreliablepointalsoknownas
confidenceinterval
IfP(a< <b)=1 thantheintervalfromatobiscalleda100(1 )%
aconfidenceintervalof
Thequantity(1 )iscalledtheconfidenceleveloftheinterval(
between0and1)writtenasa< <bwith100(1 ) %confidence
supposeconfidencelevel=95%
alsowrittenas(1 )=0.95=>fromrepeatedsamples95%off
alltheintervalswillhavetheunknownvariable
GeneralFormula:Pointestimate (reliability
Factor)(StandardError)
Note:thereliabilityfactordependsondesiredlevelof
10
confidence
tDistribution
Considerasampleofnobservations
meanof x andstandarddeviations
normallydistributedpopulationwithmeanof
n1degreesoffreedom
x
Thenvariable: t = s
n
Weusetdistributionwhenpopulationstandarddeviationis
unknownanduseinstead(s=samplestandarddeviation)
=>notthataccuratebecauseweusejustasample
Assumption:
Populationstandarddeviationisunknown
populationisnormallydistributed
ifpopulationisnotnormalusbiggersample
UseTDistribution
(1 )confidenceintervalEstimate
x tn1,/2 sn < < x + tn1,/2 sn
tdependsondegreesoffreedom
useTable8forsolving
Example
Samplen=25 x = 50 s=8forma95%confidenceinterval
for
d.f.251=24(1 )=0.95=>0.05= /2 =0.025
ConfidenceIntervalforthePopulationProportion
P =
11
P(1P)
n
p Z a/2
=>
p(1p)
n
p(1p)
=>
n
p(1p)
n
ToexplainIusethefollowingexample:
Randomsampleof100people25arelefthanded95%
confidenceintervalforthetrueproportionoflefthanders
p Z a/2 p(1p)
a/2
100
0.95+0.025=0.975ZTable:lookintheF(Z)for0.975=>1.96
n=100
25
1.96 0.25(10.25)
< P < 100
+ 1.96 0.25(10.25)
100
100
0.1651<P<0.3349
Wecaninterpretthatsolutionasfollowed:
Weare95%confidentthatthetruepercentageoflefthanders
inthepopulationliesbetween16.51%and33.49%
25
100
Part4
DifferencesbetweentwoMeans
Goal:
formaconfidenceintervalforthedifferencebetween x y
Needtobeunrelatedandindependent(onesampledoesntaffect
theother)
Pointestimateisthedifferencebetweenthesamplemeans x y
If x2andy2 areknownuse
Z a/2
If x2andy2 areunknownusetdistribution
x2andy2 areknown
Assumptions
samplesarerandomandindependent
populationdistributionhastobenormaldistributed
Populationvariancesareknown
Var( X Y ) = XY 2 =
x2
nx
y2
ny
xy)
andZ= (xy)(
=>standardnormaldistribution
2 y 2
x
nx
+ ny
ConfidenceIntervaliswrittenasfollowed:
12
(x y) Z a/2
x2
nx
y2
ny
x2
nx
y2
ny
x2andy2 areunknown
Assumption:
Samplesareindependentandrandom
Populationsarenormallydistributed
PopulationVariancesareunknownandassumedunequal
Useatdistributionwithvdegreesoffreedom
2
v=
sy2
( snxx + ny )2
sy2
s 2
( ny )2
( nxx )2
( (nx1) )+( (ny1) )
TheconfidenceIntervalisdescribedasfollows:
(x y) tv,a/2
sx2
nx
sy2
ny
tv,a/2
< x y < (x y) +
sx2
nx
sy2
ny
ConfidenceIntervalforthePopulationVariance
Goal:FormaConfidenceIntervalforthepopulationVariance, 2
BasedonSampleVariance s2
Populationisnormallydistributed
2
(n1)s2
n1,a/22
< 2 <
(n1)s2
n1,1a/22
Forexplanationhereanexample:
Suppose:
Samplesize:17
SampleMean:3004
Samplestandarddeviation:74
Populationisnormal
Determinetheconfidenceintervalfor 2
13
Firstdetermineeverything:
n1=171=16 a2 =(10.95)/2=0.0251 a2 =0.975thanfindChi
Values:
X n1,a/22 = X 171,0.0252 > 28.85
X n1,1a/22 = X 171,1a/22 > 6.91
Thanfind s2 = 742
2
2
Nowfillitintheformula: (n1)s
2 < <
n1,a/2
2
(n1)s2
n1,1a/22
(171)74
< 2 < (171)74
=3037< 2 <12680
28.85
6.91
=>Nowwehavethestandardsampledeviation.fromthatwe
justhavetotakethesquarerootandjustconsiderthepositive
valuesassolutions=>55.1&112.6asourlimitssowecan
formulate:Weareto95%confidentthatthepopulation
standarddeviationliesbetween55.1and112.6
HypothesisTests
NullHypothesis
alwaysaboutapopulationparameter
NullHypothesisisthehypothesisthatweassumethatour
assumptioniscorrect(example:themeanofthetvsinanamerican
householdisthree=> H 0 : = 3
Referstostatusquo(notguilty)
containsalways=, or
mayormaynotberejected
AlternativeHypothesis
assumestheoppositeof H 0 (inourexample: H 1 : =/ 3 )
containsalways=,
/ < or >
Mayornotmaybesupported
Example:thepopulationmeanageis50=> H 0 : = 50
nowweselectasampleandcalculatethemean.Letssuppose
itwas X = 20=>unlikelyNullhypothesisistrue
14
Levelofsignificance
Definestherejectionregionofthesampledistribution
writtenas typicalvaluesare0.01,0.05,0.1
isselectedbyresearcher
providesthecriticalvalues
Typesoftests(3isanexampleforanynumber)
TwoTailtest:
H0 : = 3
H 1 : =/ 3
UpperTailtest:
H0 : 3
H 1 : >3
LowerTailtest:
H0 : 3
H 1 : <3
ErrorsinmakingDecisions
Type1Error=rejectatrueNullHypothesis
theprobabilityis alsocalledlevelof signif icance
Type2Error=FailtorejectafalseNullHypothesis
Theprobabilityis
actualSituationshownbelow
Decision
H 0true
H 0f alse
DonotrejectH 0
Noerror(1 )
Type2Error( )
RejectH 0
Type1Error( )
NoError(1 )
TestofHypothesisfortheMean( Known)
Convertsampleresult( x)toazvalue
15
ConsidertheTest:
H 0 : = 0
H 1 : > 0
TheDecisionRuleis:
Reject H 0if z =
> za
AlternateRule:
Reject H 0if X > 0 + Z a n
PValue
ProbabilityobtainingaTeststatisticmoreextremethanthe
observedsamplevaluegiventhat H 0 istrue
alsocalledobservedValueofSignificance
showsthesmallestvalueof forwhich H 0 canberejected
Convertsampleresult(eg. x)toteststatistic(eg.zstatistic)
Exampleofuppertailtest:
obtainpvalue
x
pvalue=(P> /n0 , giventhatH 0istrue) =>
P (Z >
x0
/n
= 0)
DecisionRulecomparethepvalueto
Ifpvalue< ,reject H 0
Ifpvalue ,dontreject H 0
OneTailTest
alternativeHypothesisfocusesononeDirection
if H 1is" > "thensomething, itsanuppertailtest
if H 1is" < "thensomething, itsalowertailtest
LoweranduppertailtestshavejustonecriticalValuesince
therejectionareaisinonlyonetail
TwoTailTest
twocriticalvaluesdefiningthetwoareasofrejection
16
tTestofHypothesisfortheMean( U nknown)
convertsampleresults( x) toatteststatistic
ConsidertheTest:
H 0 : = 0
H 1 : > 0
TheDecisionRuleis:
x
Reject H 0if t = s 0 > tn1,a
n
Foratwotailedtest:
H 0 : = 0
H 1 : =/ 0
TheDecisionRuleis
t=
x0
s
n
< tn1,a/2orif t =
x0
s
n
> tn1,a/2
TestofthePopulationProportion
involvescategoricalvalues
twooutcomes
success(acertaincharacteristicispresent)
failure(acertaincharacteristicisnotpresent)
ProportionofthepopulationiswrittenasP
SampleSizeislarge
SampleProportioninthesuccessareaiswritten "p
p =
x
n
numberofsuccessesinsample
samplesize
ifnP(1P)>9, pcanbeseenasapproximatelynormaldistributed
ThereforeMean= p = P
andstandardDeviation= =
p
P(1P)
n
HypothesisTestforProportion(nP(1P)>9)
ZVALUEbecausenormaldistributed
Z=
17
pP
0
P 0(1P 0)
n