Professional Documents
Culture Documents
ML:LinearRegressionwithOneVariableCoursera
CourseraBeta
TriptiShergill
ContactUs
MyContributions
LogOut
ML:LinearRegressionwithOneVariable
FromCoursera
Contents
1ModelRepresentation
2TheHypothesisFunction
2.1Example:
3CostFunction
4FrequentlyAskedQuestions
ModelRepresentation
Recallthatinregressionproblems,wearetakinginputvariablesandtryingtofittheoutputontoacontinuous
expectedresultfunction.
Linearregressionwithonevariableisalsoknownas"univariatelinearregression."
Univariatelinearregressionisusedwhenyouwanttopredictasingleoutputvalueyfromasingleinputvalue
x .We'redoingsupervisedlearninghere,sothatmeanswealreadyhaveanideaaboutwhattheinput/output
causeandeffectshouldbe.
TheHypothesisFunction
Ourhypothesisfunctionhasthegeneralform:
= h (x) = 0 + 1 x
y
Example:
Supposewehavethefollowingsetoftrainingdata:
input output
x
y
0
4
https://share.coursera.org/wiki/index.php/ML:Linear_Regression_with_One_Variable
1/3
30/09/2016
ML:LinearRegressionwithOneVariableCoursera
1
2
3
7
7
8
Nowwecanmakearandomguessaboutourh function: 0
becomesh (x) = 2 + 2x.
= 2and 1 = 2.Thehypothesisfunction
Soforinputof1toourhypothesis,ywillbe4.Thisisoffby3.Notethatwewillbetryingoutvariousvalues
of 0 and 1 totrytofindvalueswhichprovidethebestpossible"fit"orthemostrepresentative"straight
line"throughthedatapointsmappedonthexyplane.
CostFunction
Wecanmeasuretheaccuracyofourhypothesisfunctionbyusingacostfunction.Thistakesanaverage
(actuallyafancierversionofanaverage)ofalltheresultsofthehypothesiswithinputsfromx'scomparedto
theactualoutputy's.
m
1
J( 0 , 1 ) =
2m
(y
y )
i
i
i=1
1
Tobreakitapart,itis 2
1
=
2m
(h (x i ) y )
i
i=1
wherex
isthemeanofthesquaresofh (x i ) y ,orthedifferencebetweenthe
x
i
predictedvalueandtheactualvalue.
Thisfunctionisotherwisecalledthe"Squarederrorfunction",or"Meansquarederror".Themeanishalved
(
1
2m
) asaconvenienceforthecomputationofthegradientdescent,asthederivativetermofthesquare
1
functionwillcanceloutthe term.
2
Nowweareabletoconcretelymeasuretheaccuracyofourpredictorfunctionagainstthecorrectresultswe
havesothatwecanpredictnewresultswedon'thave.
Ifwetrytothinkofitinvisualterms,ourtrainingdatasetisscatteredonthexyplane.Wearetryingtomake
straightline(definedbyh (x))whichpassesthroughthisscatteredsetofdata.Ourobjectiveistogetthebest
possibleline.Thebestpossiblelinewillbesuchsothattheaveragesquaredverticaldistancesofthescattered
pointsfromthelinewillbetheleast.Inthebestcase,thelineshouldpassthroughallthepointsofourtraining
dataset.InsuchacasethevalueofJ( 0 , 1 )willbe0.
FrequentlyAskedQuestions
Q:Whyisthecostfunctionaboutthesumofsquares,ratherthanthesumofcubes(orjustthe
(h(x) y)orabs(h(x) y))?
A:Itmightbeeasiertothinkofthisasmeasuringthedistanceoftwopoints.Inthiscase,wearemeasuringthe
).
distanceoftwomultidimensionalvalues(i.e.theobservedoutputvaluey i andtheestimatedoutputvaluey
i
Weallknowhowtomeasurethedistanceoftwopoints(x 1 , y 1 )and(x 2 , y 2 ),whichis
(x
x2)
+ (y 1 y 2 )
.Ifwehavendimensionthenwewantthepositivesquarerootof
.That'swherethesumofsquarescomesfrom.(seealsoEuclideandistance
(https://en.wikipedia.org/wiki/Euclidean_distance))
i=1
(x i y i )
https://share.coursera.org/wiki/index.php/ML:Linear_Regression_with_One_Variable
2/3
30/09/2016
ML:LinearRegressionwithOneVariableCoursera
Thesumofsquaresisnttheonlypossiblecostfunction,butithasmanyniceproperties.Squaringtheerror
meansthatanoverestimateis"punished"justthesameasanunderestimate:anerrorof1istreatedjustlike+1,
andthetwoequalbutoppositeerrorscantcanceleachother.Ifwecubetheerror(orjustusethedifference),
welosethisproperty.Alsointhecaseofcubing,bigerrorsarepunishedmorethansmallones,soanerrorof2
becomes8.
Thesquaringfunctionissmooth(canbedifferentiated)andyieldslinearformsafterdifferentiation,whichis
niceforoptimization.Italsohasthepropertyofbeingconvex.Aconvexcostfunctionguaranteestherewill
beaglobalminimum,soouralgorithmswillconverge.
Ifyouthrowinabsolutevalue,thenyougetanondifferentiablefunction.Ifyoutrytotakethederivativeof
abs(x)andsetitequaltozerotofindtheminimum,youwon'tgetanyanswerssinceit'sundefinedin0.
Q:WhycantIuse4thpowersinthecostfunction?Donttheyhavethenicepropertiesofsquares?
A:Imaginethatyouarethrowingdartsatadartboard,orfiringarrowsatatarget.Ifyouusethesumofsquares
astheerror(wherethecenterofthebullseyeistheoriginofthecoordinatesystem),theerroristhedistance
fromthecenter.Nowrotatethecoordinatesby30degree,or45degrees,oranything.Thedistance,andhence
theerror,remainsunchanged.4thpowerslackthisproperty,whichisknownasrotationalinvariance.
Q:Whydoes1/(2*m)makethematheasier?
A:Whenwedifferentiatethecosttocalculatethegradient,wegetafactorof2inthenumerator,duetothe
exponentinsidethesum.This'2'inthenumeratorcancelsoutwiththe'2'inthedenominator,savingusone
mathoperationintheformula.
Next:GradientDescentBacktoIndex:Main
Retrievedfrom"https://share.coursera.org/wiki/index.php?
title=ML:Linear_Regression_with_One_Variable&oldid=34035"
Category:
ML:LectureNotes
Thispagewaslastmodifiedon5August2016,at10:54.
Thispagehasbeenaccessed119,565times.
Privacypolicy
AboutCoursera
Disclaimers
https://share.coursera.org/wiki/index.php/ML:Linear_Regression_with_One_Variable
3/3