You are on page 1of 3

30/09/2016

ML:LinearRegressionwithOneVariableCoursera

CourseraBeta
TriptiShergill

ContactUs
MyContributions
LogOut

ML:LinearRegressionwithOneVariable
FromCoursera

Contents
1ModelRepresentation
2TheHypothesisFunction
2.1Example:
3CostFunction
4FrequentlyAskedQuestions

ModelRepresentation
Recallthatinregressionproblems,wearetakinginputvariablesandtryingtofittheoutputontoacontinuous
expectedresultfunction.
Linearregressionwithonevariableisalsoknownas"univariatelinearregression."
Univariatelinearregressionisusedwhenyouwanttopredictasingleoutputvalueyfromasingleinputvalue
x .We'redoingsupervisedlearninghere,sothatmeanswealreadyhaveanideaaboutwhattheinput/output
causeandeffectshouldbe.

TheHypothesisFunction
Ourhypothesisfunctionhasthegeneralform:
= h (x) = 0 + 1 x
y

Notethatthisisliketheequationofastraightline.Wegivetoh (x)valuesfor 0 and 1 togetour


.Inotherwords,wearetryingtocreateafunctioncalledh thatistryingtomapourinput
estimatedoutputy
data(thex's)toouroutputdata(they's).

Example:
Supposewehavethefollowingsetoftrainingdata:
input output
x
y
0
4
https://share.coursera.org/wiki/index.php/ML:Linear_Regression_with_One_Variable

1/3

30/09/2016

ML:LinearRegressionwithOneVariableCoursera

1
2
3

7
7
8

Nowwecanmakearandomguessaboutourh function: 0
becomesh (x) = 2 + 2x.

= 2and 1 = 2.Thehypothesisfunction

Soforinputof1toourhypothesis,ywillbe4.Thisisoffby3.Notethatwewillbetryingoutvariousvalues
of 0 and 1 totrytofindvalueswhichprovidethebestpossible"fit"orthemostrepresentative"straight
line"throughthedatapointsmappedonthexyplane.

CostFunction
Wecanmeasuretheaccuracyofourhypothesisfunctionbyusingacostfunction.Thistakesanaverage
(actuallyafancierversionofanaverage)ofalltheresultsofthehypothesiswithinputsfromx'scomparedto
theactualoutputy's.
m

1
J( 0 , 1 ) =

2m

(y
y )
i
i
i=1
1

Tobreakitapart,itis 2

1
=
2m

(h (x i ) y )
i

i=1

wherex
isthemeanofthesquaresofh (x i ) y ,orthedifferencebetweenthe
x
i

predictedvalueandtheactualvalue.
Thisfunctionisotherwisecalledthe"Squarederrorfunction",or"Meansquarederror".Themeanishalved
(

1
2m

) asaconvenienceforthecomputationofthegradientdescent,asthederivativetermofthesquare
1

functionwillcanceloutthe term.
2

Nowweareabletoconcretelymeasuretheaccuracyofourpredictorfunctionagainstthecorrectresultswe
havesothatwecanpredictnewresultswedon'thave.
Ifwetrytothinkofitinvisualterms,ourtrainingdatasetisscatteredonthexyplane.Wearetryingtomake
straightline(definedbyh (x))whichpassesthroughthisscatteredsetofdata.Ourobjectiveistogetthebest
possibleline.Thebestpossiblelinewillbesuchsothattheaveragesquaredverticaldistancesofthescattered
pointsfromthelinewillbetheleast.Inthebestcase,thelineshouldpassthroughallthepointsofourtraining
dataset.InsuchacasethevalueofJ( 0 , 1 )willbe0.

FrequentlyAskedQuestions
Q:Whyisthecostfunctionaboutthesumofsquares,ratherthanthesumofcubes(orjustthe
(h(x) y)orabs(h(x) y))?
A:Itmightbeeasiertothinkofthisasmeasuringthedistanceoftwopoints.Inthiscase,wearemeasuringthe
).
distanceoftwomultidimensionalvalues(i.e.theobservedoutputvaluey i andtheestimatedoutputvaluey
i
Weallknowhowtomeasurethedistanceoftwopoints(x 1 , y 1 )and(x 2 , y 2 ),whichis

(x

x2)

+ (y 1 y 2 )

.Ifwehavendimensionthenwewantthepositivesquarerootof

.That'swherethesumofsquarescomesfrom.(seealsoEuclideandistance
(https://en.wikipedia.org/wiki/Euclidean_distance))

i=1

(x i y i )

https://share.coursera.org/wiki/index.php/ML:Linear_Regression_with_One_Variable

2/3

30/09/2016

ML:LinearRegressionwithOneVariableCoursera

Thesumofsquaresisnttheonlypossiblecostfunction,butithasmanyniceproperties.Squaringtheerror
meansthatanoverestimateis"punished"justthesameasanunderestimate:anerrorof1istreatedjustlike+1,
andthetwoequalbutoppositeerrorscantcanceleachother.Ifwecubetheerror(orjustusethedifference),
welosethisproperty.Alsointhecaseofcubing,bigerrorsarepunishedmorethansmallones,soanerrorof2
becomes8.
Thesquaringfunctionissmooth(canbedifferentiated)andyieldslinearformsafterdifferentiation,whichis
niceforoptimization.Italsohasthepropertyofbeingconvex.Aconvexcostfunctionguaranteestherewill
beaglobalminimum,soouralgorithmswillconverge.
Ifyouthrowinabsolutevalue,thenyougetanondifferentiablefunction.Ifyoutrytotakethederivativeof
abs(x)andsetitequaltozerotofindtheminimum,youwon'tgetanyanswerssinceit'sundefinedin0.
Q:WhycantIuse4thpowersinthecostfunction?Donttheyhavethenicepropertiesofsquares?
A:Imaginethatyouarethrowingdartsatadartboard,orfiringarrowsatatarget.Ifyouusethesumofsquares
astheerror(wherethecenterofthebullseyeistheoriginofthecoordinatesystem),theerroristhedistance
fromthecenter.Nowrotatethecoordinatesby30degree,or45degrees,oranything.Thedistance,andhence
theerror,remainsunchanged.4thpowerslackthisproperty,whichisknownasrotationalinvariance.
Q:Whydoes1/(2*m)makethematheasier?
A:Whenwedifferentiatethecosttocalculatethegradient,wegetafactorof2inthenumerator,duetothe
exponentinsidethesum.This'2'inthenumeratorcancelsoutwiththe'2'inthedenominator,savingusone
mathoperationintheformula.
Next:GradientDescentBacktoIndex:Main
Retrievedfrom"https://share.coursera.org/wiki/index.php?
title=ML:Linear_Regression_with_One_Variable&oldid=34035"
Category:
ML:LectureNotes
Thispagewaslastmodifiedon5August2016,at10:54.
Thispagehasbeenaccessed119,565times.
Privacypolicy
AboutCoursera
Disclaimers

https://share.coursera.org/wiki/index.php/ML:Linear_Regression_with_One_Variable

3/3

You might also like