You are on page 1of 17

Summary

Part1

Population=fullResearchResults(N)
Sample=PartofPopulation(n)

TypesofData:
1.categorical=responses(eg.eyecolour,levelofsatisfaction...)
a)Nominal=yes/noanswers
b)ordinal=values(eg.1.poor,2.average,3.good)

2.Numerical
a)continues=counting(fullnumberseg.age)
b)discrete=measurement(eg.weight)

ComulativeFrequencyDistribution=SumofPopulationsorSamplestoCertainPoint
eg.

Class

Frequency

Percentage

CumulativeFr.

CumulativeP.

10butlessthan 3
20

15%

15%

20butlessthan 6
30

30%

45%

30butlessthan 5
40

25%

14

70%

addingfr.values

addingp.values

X
i
=ithvalueofthevariableX
N

xi

ArithmeticMean(Population)= = i=1N =
x1,x2,x3=PopulationValues

x1+x2+x3+...xN

xi

ArithmeticMean(Sample)= x = i=nn =

x1+x2+x3+...xn
n

x1,x2,x3=SampleValue
Median=ValuewhichStandsintheMiddle(eg.1,2,2,3,3,4,5Medianis3)
1

Position
alsoCalculatedby:

n+1
2

Note:IfevenAmountofNumberstheAverageoftheTwointheMiddleisMedian

(xix)2

Variance(Sample)= s2 = i=1 n1

Variance(Population)=sameFormulaotherSymbols:
s2 = 2
x =
n1=N

StandardDeviation(Sample)= s =

(xix)2
i=1

n1

StandardDeviation(Population)=sameFormulaother
Symbolss.o.:

BasicallyStandardDeviationisforboth: variance

CoefficientofVariation= C V = ( xs ) 100%

MeasuresrelativeVariation
expressesthestandarddeviationasapercentageofthemean
AlwaysinPercentage
ShowsVariationrelativetoMean
CanbeusedtoComparetwoormoresetsofDatameasuredindifferentunits
n

(xix)(yiy)

Covariance(Sample) C OV (x, y) = sx,y = i=1

n1

Covariance(Population)=SameFormulaotherSymbols
s.o.
MeasuresthestrengthoflinearRelationshipbetweentwovariables
2

ResultsofCovariance:COV(x,y)>0=xandytendtomoveinthesamedirection
COV(x,y)=0=thereisnolinearrelationshipbetweenxandy
COV(x,y)<0=xandytendtomoveintheoppositedirection

CoefficientofCorrelation: r =

COV (x,y)
sxsy

CoefficientofCorrelation:sameFormulaotherSymbols
s.o.:
r=p

Note:1<r<1!!!!

Part2

DiscreteRandomVariable=CountableNumber(TossaCoin:X=NumberofHeads=>Xis
discreterandomvariable)
P(x) 0

P (x) = 1
x

CumulativeProbabilityFunction=showstheprobabilitythatXisequalorsmallerthan x0
F( x0) = P (X x0)
ExpectedValue(orMean)ofadisc.Number

= E(x) = xP (x) =>example:E(x)= (x1 P (x1)) + (x2 P (x2))...


x

Variance
2 = (X )2P (x)
StandardDeviation

= 2 =

(x )2P (x) => =


x

(x ) P (x ) + (x ) P (x ).....
1

FunctionsofRandomVariables
P(x)istheProbabilityforXg(x)isafunctiondescribingX

ExpectedValue:E(g(x))= g(x)P (x)


x

Ifg(x)=Xwegetthenormalfunction
Ifg(x)=(x )2 wegettheformulaforvariance
SpecialcaseifXisalwaysthesamevariablethanwecansaythattheMeanisXand
theVariance=0
IfthereisavariablebeforeourXwejustmultiplythemfortheExpectedValue
3

E(bX)=b 2

IfthereisavariablebeforeourXwejustsquareittomultiplyitwiththeVariancetoget
theVarianceofthatEquation
Var(bX)= b22

Example:
ConsiderZ=a+bXXhasMeanof x andVarianceof x2
=>
z = E(a + bx) = a + bx

z2 = V ar(a + bx) = b22 =>standarddeviationofZ= |b| x

=>

SPECIALCASE!!!!(abitcomplicatedbutstepbystepeasy)

Z=

Xx
x

ExpectedValueZ:
z = E((X x)/x) = (E(X) x)/x = (x x)/x = 0/x = 0
WearesimplyusingtherulesthatwecanexcludetheVariableXfromtheotherconstants.
Insteadofa+bxwehavetheopposite:(xa)/binwhich(a= xandb = x) =>wecanusethe
rule
VarianceofZ:
z2 = V ar((X x)/x) = V ar(X/x) V ar(x/x) = V ar(1/x X ) = (1/x)2 V ar(X) = (1/)2 x2 =
Looksworsethanitis.aswellweareusingtherule.FirstweareseparatingtheXvaluefrom
theavalue( x) thanwecanjustletitfallbecausewhenwelookforvariancewedonttake
intoaccounttheconstantwhichweaddorsubtract.ThanwesimplytakeXseparatelyfrom
thebvalue (x) .BecauseweknowtogettheVariancewesimplytakethebvaluetothe

squareandsolvetheVariancevalueforX.BecausethevariancevalueandthebValueare
both x2 andwehavetodividethemfromeachotherwegetthevalueof1.

BernoulliDistribution:
justtwopossibilities:success/failure
P=probabilityofsuccess
1P=probabilityoffailure
Randomvariablexdefinedas1ifsuccessand0iffailure
P(X=1)=PandP(X=0)=1P

Mean: = nP
TheVariance: 2 = nP (1 P )
n!
TheNumberofsequencesofx(success)inntrials:
C nx = x!(nx)!

x2
x2

= 1

BernoulliProbabilityDistribution:
Hastohaveafixednumberofn
Pofsuccessandfailureaddupto1anddontchangeduringtheexperiment,
independentfromeachother

nx

n!
P(x)= x!(nx)!
P (1 P )

=>ProbabilityofxsuccessesinntrialswiththeprobabilityofPoneachtrial

JointProbabilityFunction
XtakesthespecificvaluexandYtakesthevalueyasafunctionofxandy
P(x,y)=P(X=x y = Y )

MarginalProbabilitiesare:

P(x)= P (x, y)

P(y)= P (x, y)

ConditionalProbabilityFunction

YtakesthevalueofyandxisspecifiedforX=P(yIx)= P(x,y)
P(x)
=y)
XtakethevalueofxandyisspecifiedforY=P(XIY=y)= P(X,Y
P(Y =y)

(slightlydifferent)
IndependentwhenP(x)P(y)=P(x,y)
Covariance:Thestrengthoflinearrelationship

Cov(X,Y)=E((X x)(Y y)) = (x x)(y y)P (x, y)

x y
COV (X,Y )
Correlation:p=Corr(X,Y)= xy

p=0norelationship
p>0positiverelationship=>whenXishighYaswell
p<0negativerelationship=>whenXhighYlow

ComulativeDistributionFunction
expressestheprobabilitythatXdoesntexceedxF(x)=P(X x)
example:aandb,twovaluesofX,a<b=>P(a<X<b)=F(b)F(a)
Mean: x = E (X) =
x2 = V ar(X) =

xmin
xmax

xmin

xmax

xf (x)dx

(x E (X))2f (x)dx

NormalDistributionFunction
lookslikeabell
symmetrically
Mean,MedianandModeareequal
Locationisdeterminedby >changingitshiftsthedistributionto
leftorright
Spreadisdeterminedby >changingitspreadsorcloses
Therandomvariablehasaninfiniterange
anynormaldistributionfunctioncanbeturnedintoastandardized
X

normaldistribution(Z)=Z = >thestandardizednormal
distributionshavegenerallyameanof0andavarianceof1
UseTable1inthebooktogetfromaZvaluetheF(Z)value

JointCumulativeDistributionFunction
SupposeXandYarecontinuousrandomvariables
Thefunctionisdescribed:F(x,y)
ItdefinesthatXislessthanxsimultaneouslyYislessthany
F(x,y)=P(X<x Y < y)
Therandomvariablesareindependentif:
P(X x, Y y) =P(X

x)P (Y y)
F(x,y)=F(x)F(y)
Covariance=COV(X,Y)=E((X x)(Y y))
Note:IfXandYareindependenttheircovariancewillbe0

)
Correlation=Corr(X,Y)= COVx(X,Y

y
RulesforRandomVariables
1. Themeanoftheirdifferenceisthedifferenceoftheirmeans:
E(XY)= x y
2. IftheCovariancebetweenXandYis0,thenthevarianceoftheir
differencesis:
Var(XY)= x2 + y2
3. IftheCovariancebetweenXandYisnot0,thentheVarianceof
6

theirdifferenceis:
Var(XY)= x2y2 2Cov(X, Y )
linearcombinationofXandY(whereaandbareconstant),
W=aX+bY
themeanofWis,
w = E (W ) = E (aX + bY ) = ax + by
thevarianceofWis, w2 = a2x2 + b2y2 + 2abCorr(X, Y )xy
Note:ifXandYarenormallydistributedWisaswell

Part3
DescriptiveStatistics:Collecting,presentinganddescribingdata
InferentialStatistics:Drawingconclusionsordecisionsconcerninga
PopulationbasedonSampleData

SamplingDistributions
distributionofallvaluesofasamplefromapopulation
TheStepstodevelopaSampleDistribution:
1. Listthegivenvalues(example,N=4,X=ageofthe4individually,
ValuesofX=18,20,22,24
2. CalculateMeanandStandardDeviation(Population)
a. = 18+20+22+24
= 21
4
b. =

(Xi)2
N

= 2.236

3. Allpossiblesamplecombinationsinatable:

18

20

22

24

18

18,18

18,20

18,22

18,24

20

20,18

20,20

20,22

20,24

22

22,18

22,20

22,22

22,24

24

24,18

24,20

24,22

24,24

4. Thandrawameantable

18

20

22

24

18

18

19

20

21

20

19

20

21

22

22

20

21

22

23

24

21

22

23

24

5. =>16SampleMeans
6. SummaryofSamplingDistribution:
a. =

Xi
N

b. =
X

18+19+19+20+20+20+21+21+21+21+22+22+22+23+23+24
16

(Xi)2
N

(1821)2+(1921)2+(1921)2...(2421)2
16

7. ComparingthePopulationandSample:
a. Population:
i. N=4
ii. = 21
iii. = 2.236
b. Sample:
i. n=2
ii. = 21
iii. = 1.58

ExpectedValueofSampleMeanDistribution
X=

1
n

Xi
i=1

StandardErroroftheMean
DescribestheVariabilityintheMean:

DecreaseswhenSampleSizeincreases

= n

= 21

= 1.58

IfthePopulationisNormal
samplingdistributionalsonormallydistributed
X = andX = n

ZValueforSampleMeanDistributions
Z=

(X)
X

= (X)

X = samplemean
= populationmean
= populationstandarddeviation
n=samplesize

IfPopulationisnotnormal
approximatelynormalifn>25
Example:
= 3
= 8
n= 36
Probabilitythat X between7.8and8.2=?

n>25=>approxnormal=> = X & X = n
X =

3
36

= 0.5

P(7.8<Z<8.2)=P ( 7.88
0.5 <

X
X

<

8.28
0.5 ) =P(0.4<Z<0.4)

=>F(0.4)(1F(0.4))=0.3108

SampleVariance
x1,x2,x3,xnarerandomsampleofpopulation

s2

1
n1

(xi x)2
i=1

thesquarerootiscalledstandardsampledeviation

ChiSquareDistribution
dependsondegreesoffreedom=n1=d.f.
table7
2

n12 = (n1)s
2
exampletofindProbability
Freezerhastoholdtemperaturewithlittlevariation
standarddeviationofnomorethan4=> = 4
Sample14Freezeraretested=>14=n=>d.f.=13
Whatistheprobabilitythatthesamplevarianceexceeds
27.52?=> s2 = ?
2

(141)27.52
2
P (s2 > 27.52) = P ((n1)s
) = P ( (n1)s
2 >
16
16 > 22.36) = P (13 > 22.36) = 0.05
P( 132 > 22.36) = 0.05

Table7:d.f.1322.36as =>P=0.05
FindingtheChiValue
n1=141=13=d.f.
= 0.05
132 = 22.36

PointandIntervalEstimates
Pointestimateisasinglenumber
Intervalestimatesisthewidthofalowerbutstillreliablepointtoa
upperbutstillreliablepointalsoknownas
confidenceinterval
IfP(a< <b)=1 thantheintervalfromatobiscalleda100(1 )%
aconfidenceintervalof
Thequantity(1 )iscalledtheconfidenceleveloftheinterval(
between0and1)writtenasa< <bwith100(1 ) %confidence
supposeconfidencelevel=95%
alsowrittenas(1 )=0.95=>fromrepeatedsamples95%off
alltheintervalswillhavetheunknownvariable
GeneralFormula:Pointestimate (reliability
Factor)(StandardError)
Note:thereliabilityfactordependsondesiredlevelof
10

confidence
tDistribution
Considerasampleofnobservations
meanof x andstandarddeviations
normallydistributedpopulationwithmeanof
n1degreesoffreedom
x
Thenvariable: t = s
n

Weusetdistributionwhenpopulationstandarddeviationis
unknownanduseinstead(s=samplestandarddeviation)
=>notthataccuratebecauseweusejustasample
Assumption:
Populationstandarddeviationisunknown
populationisnormallydistributed
ifpopulationisnotnormalusbiggersample
UseTDistribution
(1 )confidenceintervalEstimate
x tn1,/2 sn < < x + tn1,/2 sn
tdependsondegreesoffreedom
useTable8forsolving
Example
Samplen=25 x = 50 s=8forma95%confidenceinterval
for
d.f.251=24(1 )=0.95=>0.05= /2 =0.025

tn1/2 = t240.025 =2.0639


50(2.0639) 825 < <50+(2.0639) 825
46.69776< <53.30224

ConfidenceIntervalforthePopulationProportion
P =

11

P(1P)
n

p Z a/2

=>

p(1p)
n

p(1p)
=>

n

< P < p + Z a/2

p(1p)
n

ToexplainIusethefollowingexample:
Randomsampleof100people25arelefthanded95%
confidenceintervalforthetrueproportionoflefthanders

p Z a/2 p(1p)

< P < p + Z a/2 p(1p)


n
n
p = 25 Z =>10.95=0.050.05/2=0.025

a/2

100

0.95+0.025=0.975ZTable:lookintheF(Z)for0.975=>1.96
n=100

25

1.96 0.25(10.25)
< P < 100
+ 1.96 0.25(10.25)
100
100
0.1651<P<0.3349
Wecaninterpretthatsolutionasfollowed:
Weare95%confidentthatthetruepercentageoflefthanders
inthepopulationliesbetween16.51%and33.49%
25
100

Part4
DifferencesbetweentwoMeans
Goal:
formaconfidenceintervalforthedifferencebetween x y
Needtobeunrelatedandindependent(onesampledoesntaffect
theother)
Pointestimateisthedifferencebetweenthesamplemeans x y
If x2andy2 areknownuse

Z a/2
If x2andy2 areunknownusetdistribution

x2andy2 areknown

Assumptions
samplesarerandomandindependent
populationdistributionhastobenormaldistributed
Populationvariancesareknown
Var( X Y ) = XY 2 =

x2
nx

y2
ny

xy)
andZ= (xy)(
=>standardnormaldistribution

2 y 2

x
nx

+ ny

ConfidenceIntervaliswrittenasfollowed:
12

(x y) Z a/2

x2
nx

y2
ny

< x y < (x y) + Z a/2

x2
nx

y2
ny

x2andy2 areunknown

Assumption:
Samplesareindependentandrandom
Populationsarenormallydistributed
PopulationVariancesareunknownandassumedunequal
Useatdistributionwithvdegreesoffreedom
2

v=

sy2

( snxx + ny )2
sy2
s 2
( ny )2
( nxx )2
( (nx1) )+( (ny1) )

TheconfidenceIntervalisdescribedasfollows:
(x y) tv,a/2

sx2
nx

sy2
ny

tv,a/2
< x y < (x y) +

sx2
nx

sy2
ny

ConfidenceIntervalforthePopulationVariance
Goal:FormaConfidenceIntervalforthepopulationVariance, 2
BasedonSampleVariance s2
Populationisnormallydistributed
2

RandomVariable: n12 = (n1)s


2
n1,a2 denotesthenumberforwhich:P( n12 > n1,a2) =

(n1)s2
n1,a/22

< 2 <

(n1)s2
n1,1a/22

Forexplanationhereanexample:
Suppose:
Samplesize:17
SampleMean:3004
Samplestandarddeviation:74
Populationisnormal
Determinetheconfidenceintervalfor 2

13

Firstdetermineeverything:
n1=171=16 a2 =(10.95)/2=0.0251 a2 =0.975thanfindChi
Values:
X n1,a/22 = X 171,0.0252 > 28.85
X n1,1a/22 = X 171,1a/22 > 6.91
Thanfind s2 = 742
2

2
Nowfillitintheformula: (n1)s
2 < <
n1,a/2
2

(n1)s2
n1,1a/22

(171)74
< 2 < (171)74
=3037< 2 <12680
28.85
6.91
=>Nowwehavethestandardsampledeviation.fromthatwe
justhavetotakethesquarerootandjustconsiderthepositive
valuesassolutions=>55.1&112.6asourlimitssowecan
formulate:Weareto95%confidentthatthepopulation
standarddeviationliesbetween55.1and112.6

HypothesisTests

NullHypothesis
alwaysaboutapopulationparameter
NullHypothesisisthehypothesisthatweassumethatour
assumptioniscorrect(example:themeanofthetvsinanamerican
householdisthree=> H 0 : = 3
Referstostatusquo(notguilty)
containsalways=, or
mayormaynotberejected
AlternativeHypothesis
assumestheoppositeof H 0 (inourexample: H 1 : =/ 3 )
containsalways=,
/ < or >
Mayornotmaybesupported
Example:thepopulationmeanageis50=> H 0 : = 50
nowweselectasampleandcalculatethemean.Letssuppose
itwas X = 20=>unlikelyNullhypothesisistrue
14

Levelofsignificance
Definestherejectionregionofthesampledistribution
writtenas typicalvaluesare0.01,0.05,0.1
isselectedbyresearcher
providesthecriticalvalues
Typesoftests(3isanexampleforanynumber)
TwoTailtest:
H0 : = 3
H 1 : =/ 3
UpperTailtest:
H0 : 3
H 1 : >3
LowerTailtest:
H0 : 3
H 1 : <3

ErrorsinmakingDecisions
Type1Error=rejectatrueNullHypothesis
theprobabilityis alsocalledlevelof signif icance
Type2Error=FailtorejectafalseNullHypothesis
Theprobabilityis

actualSituationshownbelow
Decision

H 0true

H 0f alse

DonotrejectH 0

Noerror(1 )

Type2Error( )

RejectH 0

Type1Error( )

NoError(1 )

TestofHypothesisfortheMean( Known)
Convertsampleresult( x)toazvalue

15


ConsidertheTest:
H 0 : = 0
H 1 : > 0
TheDecisionRuleis:
Reject H 0if z =

> za

AlternateRule:
Reject H 0if X > 0 + Z a n

PValue
ProbabilityobtainingaTeststatisticmoreextremethanthe
observedsamplevaluegiventhat H 0 istrue
alsocalledobservedValueofSignificance
showsthesmallestvalueof forwhich H 0 canberejected
Convertsampleresult(eg. x)toteststatistic(eg.zstatistic)
Exampleofuppertailtest:
obtainpvalue
x
pvalue=(P> /n0 , giventhatH 0istrue) =>
P (Z >

x0
/n

= 0)

DecisionRulecomparethepvalueto
Ifpvalue< ,reject H 0
Ifpvalue ,dontreject H 0

OneTailTest
alternativeHypothesisfocusesononeDirection
if H 1is" > "thensomething, itsanuppertailtest
if H 1is" < "thensomething, itsalowertailtest
LoweranduppertailtestshavejustonecriticalValuesince
therejectionareaisinonlyonetail
TwoTailTest
twocriticalvaluesdefiningthetwoareasofrejection
16

tTestofHypothesisfortheMean( U nknown)
convertsampleresults( x) toatteststatistic
ConsidertheTest:
H 0 : = 0
H 1 : > 0
TheDecisionRuleis:
x
Reject H 0if t = s 0 > tn1,a
n

Foratwotailedtest:
H 0 : = 0
H 1 : =/ 0
TheDecisionRuleis
t=

x0
s
n

< tn1,a/2orif t =

x0
s
n

> tn1,a/2

TestofthePopulationProportion
involvescategoricalvalues
twooutcomes
success(acertaincharacteristicispresent)
failure(acertaincharacteristicisnotpresent)
ProportionofthepopulationiswrittenasP
SampleSizeislarge
SampleProportioninthesuccessareaiswritten "p

p =

x
n

numberofsuccessesinsample
samplesize

ifnP(1P)>9, pcanbeseenasapproximatelynormaldistributed
ThereforeMean= p = P
andstandardDeviation= =
p

P(1P)
n

HypothesisTestforProportion(nP(1P)>9)
ZVALUEbecausenormaldistributed
Z=
17

pP
0

P 0(1P 0)
n

You might also like