版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
.機器學(xué)習(xí)題庫一、極大似然1、ML estimationofe*ponentialmodel(10)AGaussiandistributionisoftenusedtomodeldataontherealline,butissometimesinappropriatewhenthedataareoftenclosetozerobutconstrainedtobenonnegative.Insuchcasesonecanfitane*ponentialdistribution,whoseprobabilitydensityfunctionisgivenbyGivenNobservations*idrawnfromsuchadistribution:Writedownthelikelihoodasafunctionofthescaleparameterb.Writedownthederivativeoftheloglikelihood.Giveasimplee*pressionfortheMLestimateforb.2、換成Poisson分布:px|xe,y0,1,2,...x!3、二、貝葉斯假設(shè)在考試的多項選擇中,考生知道正確答案的概率為p,猜想答案的概率為1-p,并且假設(shè)考生知道正確答案答對題的概率為1,猜中正確答案的概率為1m,其中m為多項選擇項的數(shù)目。則考生答對題目,求他知道正確答案的概率。1、pknown,correctpConjugatepriorspknown|correct1pknownp1pmThereadingsforthisweekincludediscussionofconjugatepriors.Givenalikelihoodpx|foraclassmodelswithparametersθ,aconjugatepriorisadistributionp|withhyperparametersγ,suchthattheposteriordistribution與先驗的分布族一樣Supposethatthelikelihoodisgivenbythee*ponentialdistributionwithrateparameterλ:ShowthatthegammadistributionGamma|,e_1isaconjugatepriorforthee*ponential.Derivetheparameterupdategivenobservationsx,,xandthepredictiondistributionpx|x,,x.1NN11NShowthatthebetadistributionisaconjugatepriorforthegeometricdistributionwhichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.(c)Supposepisaconjugatepriorforthelikelihoodpx|;showthatthemi*ture|priorisalsoconjugateforthesamelikelihood,assumingthemi*tureweightswmsumto1.(d)Repeatpart(c)forthecasewherethepriorisasingledistributionandthe. >.likelihoodisami*ture,andthepriorisconjugateforeachmi*tureponentofthelikelihood.somepriorscanbeconjugateforseveraldifferentlikelihoods;fore*ample,thebetaisconjugatefortheBernoulliandthegeometricdistributionsandthegammaisconjugateforthee*ponentialandforthegammawithfi*edα(E*tracredit,20)E*plorethecasewherethelikelihoodisami*turewithfi*edponentsandunknownweights;i.e.,theweightsaretheparameterstobelearned.三、判斷題〔1〕給定n個數(shù)據(jù)點,如果其中一半用于訓(xùn)練,另一半用于測試,則訓(xùn)練誤差和測試誤差之間的差異會隨著n的增加而減小?!?〕極大似然估計是無偏估計且在所有的無偏估計中方差最小,所以極大似然估計的風(fēng)險最小?!玻场郴貧w函數(shù)A和B,如果A比B更簡單,則A幾乎一定會比B在測試集上表現(xiàn)更好?!玻础橙志€性回歸需要利用全部樣本點來預(yù)測新輸入的對應(yīng)輸出值,而局部線性回歸只需利用查詢點附近的樣本來預(yù)測輸出值。所以全局線性回歸比局部線性回歸計算代價更高?!玻怠矪oosting和Bagging都是組合多個分類器投票的方法,二者都是根據(jù)單個分類器的正確率決定其權(quán)重。Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthebinedclassifiervaryroughlyinconcert〔F〕Whilethetrainingerrorofthebinedclassifiertypicallydecreasesasafunctionofboostingiterations,theerroroftheindividualdecisionstumpstypicallyincreasessincethee*ampleweightsbeeconcentratedatthemostdifficulte*amples.(7)OneadvantageofBoostingisthatitdoesnotoverfit.〔F〕(8)Supportvectormachinesareresistanttooutliers,i.e.,verynoisye*amplesdrawnfromadifferentdistribution.〔F〕〔9〕在回歸分析中,最正確子集選擇可以做特征選擇,當(dāng)特征數(shù)目較多時計算量大;嶺回歸和Lasso模型計算量小,且Lasso也可以實現(xiàn)特征選擇?!?0〕當(dāng)訓(xùn)練數(shù)據(jù)較少時更容易發(fā)生過擬合?!?1〕梯度下降有時會陷于局部極小值,但EM算法不會?!?2〕在核回歸中,最影響回歸的過擬合性和欠擬合之間平衡的參數(shù)為核函數(shù)的寬度。IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.〔T〕True/False:Inaleast-squareslinearregressionproblem,addinganL2. >.regularizationpenaltycannotdecreasetheL2errorofthesolutionw?onthetrainingdata.〔F〕True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasesthee*pectedL2errorofthesolutionw?onunseentestdata〔F〕.(16)除了EM算法,梯度下降也可求混合高斯模型的參數(shù)。(T)Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.True!Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeenbined.False!Ifthedataisnotseparablebyalinearbinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.TheL2penaltyinaridgeregressionisequivalenttoaLaplacepriorontheweights.〔F〕Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsofthee*pectationma*imationalgorithm.(F)Intrainingalogisticregressionmodelbyma*imizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.(F)一、回歸1、考慮回歸一個正則化回歸問題。在下列圖中給出了懲罰函數(shù)為二次正則函數(shù),當(dāng)正則化參數(shù)C取不同值時,在訓(xùn)練集和測試集上的log似然〔meanlog-probability〕?!?0分〕〔1〕說法"隨著C的增加,圖2中訓(xùn)練集上的log似然永遠(yuǎn)不會增加〞是否正確,并說明理由?!?〕解釋當(dāng)C取較大值時,圖2中測試集上的log似然下降的原因。2、考慮線性回歸模型:y~Nw wx,2,訓(xùn)練數(shù)據(jù)如下列圖所示?!?0分〕1〔1〕用極大似然估計參數(shù),并在圖〔a〕中畫出模型。〔3分〕〔2〕用正則化的極大似然估計參數(shù),即在log似然目標(biāo)函數(shù)中參加正則懲罰函數(shù)Cw2,2 1并在圖〔b〕中畫出當(dāng)參數(shù)C取很大值時的模型?!?分〕〔3〕在正則化后,高斯分布的方差2是變大了、變小了還是不變?〔4分〕. >.圖(a)
圖(b)3.考慮二維輸入空間點xx,x
T
上的回歸問題,其中
x
1,1,j
1,2
在單位正方形。1
2
j訓(xùn)練樣本和測試樣本在單位正方形中均勻分布,輸出模型為y~Nx3x510xx7x25x3,1,我們用1-10階多項式特征,采用線性回歸模型來121212學(xué)習(xí)*與y之間的關(guān)系〔高階特征模型包含所有低階特征〕,損失函數(shù)取平方誤差損失?,F(xiàn)在n20個樣本上,訓(xùn)練1階、2階、8階和10階特征的模型,然后在一個大規(guī)模的獨立的測試集上測試,則在下3列中選擇適宜的模型〔可能有多個選項〕,并解釋第3列中你選擇的模型為什么測試誤差小?!?0分〕訓(xùn)練誤差最小 訓(xùn)練誤差最大 測試誤差最小1階特征的線性模型 *2階特征的線性模型 *8階特征的線性模型 *10階特征的線性模型 *(2)現(xiàn)在n106個樣本上,訓(xùn)練1階、2階、8階和10階特征的模型,然后在一個大規(guī)模的獨立的測試集上測試,則在下3列中選擇適宜的模型〔可能有多個選項〕,并解釋第3列中你選擇的模型為什么測試誤差小?!?0分〕訓(xùn)練誤差最小 訓(xùn)練誤差最大 測試誤差最小1階特征的線性模型 *2階特征的線性模型8階特征的線性模型 * *. >.10階特征的線性模型 *Theappro*imationerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(T)Thestructuralerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(F)4、Wearetryingtolearnregressionparametersforadatasetwhichweknowwasgeneratedfromapolynomialofacertaindegree,butwedonotknowwhatthisdegreeis.Assumethedatawasactuallygeneratedfromapolynomialofdegree5withsomeaddedGaussiannoise(thatisyw wxwx2wx3wx4wx5,~0,1.0 1 2 3 4 5Fortrainingwehave100{*,y}pairsandfortestingweareusinganadditionalsetof100{*,y}pairs.Sincewedonotknowthedegreeofthepolynomialwelearntwomodelsfromthedata.ModelAlearnsparametersforapolynomialofdegree4andmodelBlearnsparametersforapolynomialofdegree6.Whichofthesetwomodelsislikelytofitthetestdatabetter"Answer:Degree6polynomial.Sincethemodelisadegree5polynomialandwehaveenoughtrainingdata,themodelwelearnforasi*degreepolynomialwilllikelyfitaverysmallcoefficientfor*6.Thus,eventhoughitisasi*degreepolynomialitwillactuallybehaveinaverysimilarwaytoafifthdegreepolynomialwhichisthecorrectmodelleadingtobetterfittothedata.5、Input-dependentnoiseinregressionOrdinaryleast-squaresregressionisequivalenttoassumingthateachdatapointisgeneratedaccordingtoalinearfunctionoftheinputpluszero-mean,constant-varianceGaussiannoise.Inmanysystems,however,thenoisevarianceisitselfapositivelinearfunctionoftheinput(whichisassumedtobenon-negative,i.e.,*>=0).Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationintheunivariatecase"(Hint:onlyoneofthemdoes.)(iii)iscorrect.InaGaussiandistributionovery,thevarianceisdeterminedbythecoefficientofy;sobyreplacing2byx2,wegetavariancethatincreaseslinearly2with*.(Notealsothechangetothenormalization"constant.〞)(i)hasquadraticdependenceon*;(ii)doesnotchangethevarianceatall,itjustrenamesw1.CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthemodelfamily(ies)youchose.(ii)and(iii).(Notethat(iii)worksfor 20.)(i)e*hibitsalargevarianceat*=0,andthevarianceappearsindependentof*.True/False:Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregressionforaninfinitedatasetgeneratedaccordingtothecorrespondingmodel.True.Inbothcasesthealgorithmwillrecoverthetrueunderlyingmodel.. >.d)Forthemodelyouchoseinpart(a),writedownthederivativeofthenegativeloglikelihoodwithrespecttow1.二、分類1.產(chǎn)生式模型vs.判別式模型[points]Yourbillionairefriendneedsyourhelp.Sheneedstoclassifyjobapplicationsintogood/badcategories,andalsotodetectjobapplicantswholieintheirapplicationsusingdensityestimationtodetectoutliers.Tomeettheseneeds,doyouremendusingadiscriminativeorgenerativeclassifier"Why"[final_sol_s07]產(chǎn)生式模型因為要估計密度px|y[points]Yourbillionairefriendalsowantstoclassifysoftwareapplicationstodetectbug-proneapplicationsusingfeaturesofthesourcecode.Thispilotprojectonlyhasafewapplicationstobeusedastrainingdata,though.Tocreatethemostaccurateclassifier,doyouremendusingadiscriminativeorgenerativeclassifier"Why"判別式模型樣本數(shù)較少,通常用判別式模型直接分類效果會好些[points]Finally,yourbillionairefriendalsowantstoclassifypaniestodecidewhichonetoacquire.Thisprojecthaslotsoftrainingdatabasedonseveraldecadesofresearch.Tocreatethemostaccurateclassifier,doyouremendusingadiscriminativeorgenerativeclassifier"Why"產(chǎn)生式模型樣本數(shù)很多時,可以學(xué)習(xí)到正確的產(chǎn)生式模型2、logstic回歸Figure2:Log-probabilityoflabelsasafunctionofregularizationparameterCHereweusealogisticregressionmodeltosolveaclassificationproblem.InFigure2,wehaveplottedthemeanlog-probabilityoflabelsinthetrainingandtestsetsafterhavingtrainedtheclassifierwithquadraticregularizationpenaltyanddifferentvaluesoftheregularizationparameterC.(1)Intrainingalogisticregressionmodelbyma*imizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.(F)Answer:Thelog-probabilityoflabelsgivene*amplesimpliedbythelogisticregressionmodelisaconcave(conve*down)functionwithrespecttotheweights.The(only)locallyoptimalsolutionisalsogloballyoptimal(2)Astochasticgradientalgorithmfortraininglogisticregressionmodelswithafi*edlearningratewillfindtheoptimalsettingoftheweightse*actly.〔F〕Answer:Afi*edlearningratemeansthatwearealwaystakingafinitesteptowardsimprovingthelog-probabilityofanysingletraininge*ampleintheupdateequation.Unlessthee*amplesaresomehow"aligned〞,wewillcontinuejumpingfromsidetosideoftheoptimalsolution,andwillnotbeabletogetarbitrarilyclosetoit.Thelearningratehastoapproachtozerointhecourseoftheupdatesfortheweights. >.toconverge.Theaveragelog-probabilityoftraininglabelsasinFigure2canneverincreaseasweincreaseC.〔T〕Strongerregularizationmeansmoreconstraintsonthesolutionandthusthe(average)log-probabilityofthetraininge*amplescanonlygetworse.E*plainwhyinFigure2thetestlog-probabilityoflabelsdecreasesforlargevaluesofC.AsCincreases,wegivemoreweighttoconstrainingthepredictor,andthusgivelessfle*ibilitytofittingthetrainingset.Theincreasedregularizationguaranteesthatthetestperformancegetsclosertothetrainingperformance,butasweover-constrainourallowedpredictors,wearenotabletofitthetrainingsetatall,andalthoughthetestperformanceisnowveryclosetothetrainingperformance,botharelow.(5)Thelog-probabilityoflabelsinthetestsetwoulddecreaseforlargevaluesofCevenifwehadalargenumberoftraininge*amples.〔T〕Theaboveargumentstillholds,butthevalueofCforwhichwewillobservesuchadecreasewillscaleupwiththenumberofe*amples.Addingaquadraticregularizationpenaltyfortheparameterswhenestimatingalogisticregressionmodelensuresthatsomeoftheparameters(weightsassociatedwiththeponentsoftheinputvectors)vanish.Aregularizationpenaltyforfeatureselectionmusthavenon-zeroderivativeatzero.Otherwise,theregularizationhasnoeffectatzero,andweightwilltendtobeslightlynon-zero,evenwhenthisdoesnotimprovethelog-probabilitiesbymuch.3、正則化的Logstic回歸ThisproblemwewillrefertothebinaryclassificationtaskdepictedinFigure1(a),whichweattempttosolvewiththesimplelinearlogisticregressionmodel(forsimplicitywedonotusethebiasparameterw0).Thetrainingdatacanbeseparatedwithzerotrainingerror-seelineL1inFigure1(b)forinstance.(a)The2-dimensionaldatasetusedin
(b)ThepointscanbeseparatedbyL1(1)ConsiderProblemaregularization2 approachwherewe(solidtryline)toma*imize.PossibleotherdecisionforlargeC.Notethatonlyw2ispenalized.We’dliketoknowwhichofthefourlinesboundariesareshownbyL2;L3;L4.inFigure1(b)couldariseasaresultofsuchregularization.ForeachpotentiallineL2,L3orL4determinewhetheritcanresultfromregularizingw2.Ifnot,e*plainverybrieflywhynot.L2:No.Whenweregularizew2,theresultingboundarycanrelylessonthevalueof*2andthereforebeesmorevertical.L2hereseemstobemorehorizontalthantheunregularizedsolutionsoitcannoteasaresultofpenalizing2wL3:Yes.Herew2^2issmallrelativeto1^2w(asevidencedbyhighslope),andeventhoughitwouldassignaratherlowlog-probabilitytotheobservedlabels,itcouldbeforcedbyalargeregularizationparameterC.L4:No.ForverylargeC,wegetaboundarythatisentirelyvertical(line*1=0orthe*2a*is).L4hereisreflectedacrossthe2*a*isandrepresentsapoorersolutionthanit’scounterpartontheotherside.Formoderateregularizationwehavetogetthebest. >.solutionthatwecanconstructwhilekeepingw2small.L4isnotthebestandthuscannoteasaresultofregularizing2w.Ifwechangetheformofregularizationtoone-norm(absolutevalue)andalsoregularizew1wegetthefollowingpenalizedlog-likelihoodConsideragaintheprobleminFigure1(a)andthesamelinearlogisticregressionmodel.AsweincreasetheregularizationparameterCwhichofthefollowingscenariosdoyoue*pecttoobserve(chooseonlyone):(*)Firstw1willbee0,thenw2.()w1andw2willbeezerosimultaneously()Firstw2willbee0,thenw1.()Noneoftheweightswillbeee*actlyzero,onlysmallerasCincreasesThedatacanbeclassifiedwithzerotrainingerrorandthereforealsowithhighlog-probabilitybylookingatthevalueof2*alone,i.e.makingw1=0.Initiallywemightprefertohaveanon-zerovaluefor1wbutitwillgotozeroratherquicklyasweincreaseregularization.Notethatwepayaregularizationpenaltyforanon-zerovalueofw1andifitdoesn’thelpclassificationwhywouldwepaythepenalty"Theabsolutevalueregularizationensuresthatw1willindeedgotoe*actlyzero.AsCincreasesfurther,evenw2willeventuallybeezero.Wepayhigherandhighercostforsettingw2toanon-zerovalue.Eventuallythiscostoverwhelmsthegainfromthelog-probabilityoflabelsthatwecanachievewithanon-zerow2.Notethatwhenw1=w2=0,thelog-probabilityoflabelsisafinitevaluenlog(0:5).1、SVMFigure4:Trainingset,ma*imummarginlinearseparator,andthesupportvectors(inbold).(1)Whatistheleave-one-outcross-validationerrorestimateforma*imummarginseparationinfigure4"(weareaskingforanumber)〔0〕Basedonthefigurewecanseethatremovinganysinglepointwouldnotchancetheresultingma*imummarginseparator.Sinceallthepointsareinitiallyclassifiedcorrectly,theleave-one-outerroriszero.(2)Wewoulde*pectthesupportvectorstoremainthesameingeneralaswemovefromalinearkerneltohigherorderpolynomialkernels.(F)Therearenoguaranteesthatthesupportvectorsremainthesame.Thefeaturevectorscorrespondingtopolynomialkernelsarenon-linearfunctionsoftheoriginalinputvectorsandthusthesupportpointsforma*imummarginseparationinthefeaturespacecanbequitedifferent.(3)Structuralriskminimizationisguaranteedtofindthemodel(amongthoseconsidered)withtheloweste*pectedloss.〔F〕Weareguaranteedtofindonlythemodelwiththelowestupperboundonthee*pectedloss.(4)WhatistheVC-dimensionofami*tureoftwoGaussiansmodelintheplanewithequalcovariancematrices"Why"Ami*tureoftwoGaussianswithequalcovariancematriceshasalineardecisionboundary.LinearseparatorsintheplanehaveVC-dime*actly3.4、SVM對如下數(shù)據(jù)點進(jìn)展分類:. >.Plotthesesi*trainingpoints.Aretheclasses{+,?}linearlyseparable"yesConstructtheweightvectorofthema*imummarginhyperplanebyinspectionandidentifythesupportvectors.Thema*imummarginhyperplaneshouldhaveaslopeof?1andshouldsatisfy*1=3/2,*2=0.Thereforeit’sequationis*1+*2=3/2,andtheweightvectoris(1,1)T.Ifyouremoveoneofthesupportvectorsdoesthesizeoftheoptimalmargindecrease,staythesame,orincrease"Inthisspecificdatasettheoptimalmarginincreaseswhenweremovethesupportvectors(1,0)or(1,1)andstaysthesamewhenweremovetheothertwo.(E*traCredit)Isyouranswerto(c)alsotrueforanydataset"Provideacountere*ampleorgiveashortproof.Whenwedropsomeconstraintsinaconstrainedma*imizationproblem,wegetanoptimalvaluewhichisatleastasgoodthepreviousone.Itisbecausethesetofcandidatessatisfyingtheoriginal(larger,stronger)setofcontraintsisasubsetofthecandidatessatisfyingthenew(smaller,weaker)setofconstraints.So,fortheweakerconstraints,theoldoptimalsolutionisstillavailableandtheremaybeadditionssoltonsthatareevenbetter.Inmathematicalform:Finally,notethatinSVMproblemswearema*imizingthemarginsubjecttotheconstraintsgivenbytrainingpoints.Whenwedropanyoftheconstraintsthemargincanincreaseorstaythesamedependingonthedataset.Ingeneralproblemswithrealisticdatasetsitise*pectedthatthemarginincreaseswhenwedropsupportvectors.Thedatainthisproblemisconstructedtodemonstratethatwhenremovingsomeconstraintsthemargincanstaythesameorincreasedependingonthegeometry.2、SVM對下述有3個數(shù)據(jù)點的集合進(jìn)展分類:Aretheclasses{+,?}linearlyseparable"No。(b) Considermappingeachpointto3-Dusingnewfeaturevectorsx1,2x,x2.Aretheclassesnowlinearlyseparable"Ifso,findaseparatinghyperplane.Thepointsaremappedto1,0,0,1,2,1,1,2,1respectively.Thepointsarenowseparablein3-dimensionalspace.Aseparatinghyperplaneisgivenbytheweightvector(0,0,1)inthenewspaceasseeninthefigure.Defineaclassvariableyi2{?1,+1}whichdenotestheclassof*iandletw=(w1,w2,w3)T.Thema*-marginSVMclassifiersolvesthefollowingproblem?0,0,2,1UsingthemethodofLagrangemultipliersshowthatthesolutioniswTband1themarginisw?
.2. >.Foroptimizationproblemswithinequalityconstraintssuchastheabove,weshouldapplyKKTconditionswhichisageneralizationofLagrangemultipliers.Howeverthisproblemcanbesolvedeasierbynotingthatwehavethreevectorsinthe3-dimensionalspaceandallofthemaresupportvectors.Hencetheall3constraintsholdwithequality.ThereforewecanapplythemethodofLagrangemultipliersto,Showthatthesolutionremainsthesameiftheconstraintsarechangedtoforany1.(f)(E*tra Credit)Isyouranswerto(d)alsotrueforanydatasetand1"Provideacountere*ampleorgiveashortproof.SVMSupposeweonlyhavefourtraininge*amplesintwodimensions(seefigureabove):positivee*amplesat*1=[0,0],*2=[2,2]andnegativee*amplesat*3=[h,1],*4=[0,3],wherewetreat0≤h≤3asaparameter(1).Howlargecanh≥0besothatthetrainingpointsarestilllinearlyseparable"Upto(e*cluding)h=1(2).Doestheorientationofthema*imummargindecisionboundarychangeasafunctionofhwhenthepointsareseparable(Y/N)"No,because*1,*2,*3remainthesupportvectors.(3).Whatisthemarginachievedbythema*imummarginboundaryasafunctionofh"[Hint:Itturnsoutthatthemarginasafunctionofhisalinearfunction.](4).Assumethatwecanonlyobservethesecondponentoftheinputvectors.Withouttheotherponent,thelabeledtrainingpointsreduceto(0,y=1),(2,y=1),(1,y=-1),and(3,y=-1).Whatisthelowestorderpofpolynomialkernelthatwouldallowustocorrectlyclassifythesepoints"Theclassesofthepointsonthe*2-projectedlineobservetheorder1,-1,1,-1.Therefore,weneedacubicpolynomial.3、LDAUsingasetof100labeledtraininge*amples(twoclasses),wetrainthefollowingmodels:GaussI:AGaussianmi*turemodel(oneGaussianperclass),wherethecovariancematricesarebothsettoI(identitymatri*).Gauss*:AGaussianmi*turemodel(oneGaussianperclass)withoutanyrestrictionsonthe. >.covariancematrices.LinLog:Alogisticregressionmodelwithlinearfeatures.QuadLog:Alogisticregressionmodel,usingalllinearandquadraticfeatures.Aftertraining,wemeasureforeachmodeltheaveragelogprobabilityoflabelsgivene*amplesinthetrainingset.Specifyalltheequalitiesorinequalitiesthatmustalwaysholdbetweenthemodelsrelativetothisperformancemeasure.Wearelookingforstatementslike"model1<=model2〞or"model1=model2〞.Ifnosuchstatementholds,write"none〞.GaussI<=LinLog(bothhavelogisticpostiriors,andLinLogisthelogisticmodelma*imizingtheaveragelogprobabilities)Gauss*<=QuadLog(bothhavelogisticpostiriorswithquadraticfeatures,andQuadLogisthemodelofthisclassma*imizingtheaveragelogprobabilities)LinLog<=QuadLog(logisticregressionmodelswithlinearfeaturesareasubclassoflogisticregressionmodelswithquadraticfunctions—thema*imumfromthesuperclassisatleastashighasthema*imumfromthesubclass)GaussI<=QuadLog(followsfromaboveinequalities)(Gauss*willhavehigheraveragelogjointprobabilitiesofe*amplesandlabels,thenwillGaussI.Buthavehigheraveragelogjointprobabilitiesdoesnotnecessarilytranslatetohigheraveragelogconditionalprobabilities)Whichequalitiesandinequalitiesmustalwaysholdifweinsteadusethemeanclassificationerrorinthetrainingsetastheperformancemeasure"Againusetheformat"model1<=model2〞or"model1=model2〞.Write"none〞ifnosuchstatementholds.None.Havinghigheraveragelogconditionalprobabilities,oraveragelogjointprobabilities,doesnotnecessarilytranslatetohigherorlowerclassificationerror.Countere*amplescanbeconstructedforallpairsinbothdirections.Althoughthereisnoinequalitieswhichisalwayscorrect,itismonlythecasethatGauss*<=GaussIandthatQuadLog<=LinLog.Partialcreditofuptotwopointswasawardedfortheseinequalities.5、WeconsiderheregenerativeanddiscriminativeapproachesforsolvingtheclassificationproblemillustratedinFigure4.1.Specifically,wewilluseami*tureofGaussiansmodelandregularizedlogisticregressionmodels.Figure4.1.Labeledtrainingset,where"+〞correspondstoclassy=1.Wewillfirstestimateami*tureofGaussiansmodel,oneGaussianperclass,withtheconstraintthatthecovariancematricesareidentitymatrices.Themi*ingproportions(classfrequencies)andthemeansofthetwoGaussiansarefreeparameters.Plotthema*imumlikelihoodestimatesofthemeansofthetwoclassconditionalGaussiansinFigure4.1.Markthemeansaspoints"*〞andlabelthem"0〞and"1〞accordingtotheclass.Themeansshouldbeclosetothecenterofmassofthepoints.b)Drawthedecisionboundaryinthesamefigure.Sincethetwoclasseshavethesamenumberofpointsandthesamecovariancematrices,thedecisionboundaryisalineand,moreover,shouldbedrawnastheorthogonalbisector. >.ofthelinesegmentconnectingtheclassmeans.(2)Wehavealsotrainedregularizedlinearlogisticregressionmodelsforthesamedata.Theregularizationpenalties,usedinpenalizedconditionalloglikelihoodestimation, were Cw2, wherei=0,1,2.Inotherwords,onlyoneofitheparameterswereregularizedineachcase.BasedonthedatainFigure4.1,wegeneratedthreeplots,oneforeachregularizedparameter,ofthenumberofmisclassifiedtrainingpointsasafunctionofC(Figure4.2).Thethreeplotsarenotidentifiedwiththecorrespondingparameters,however.Pleaseassignthe"top〞,"middle〞,and"bottom〞plotstothecorrectparameter,w0,w1,orw2,theparameterthatwasregularizedintheplot.Provideabriefjustificationforeachassignment.?"top〞=(w1)Bystronglyregularizingw1weforcetheboundarytobehorizontalinthefigure.Thelogisticregressionmodeltriestoma*imizethelog-probabilityofclassifyingthedatacorrectly.Thehighestpenaltyesfromthemisclassifiedpointsandthustheboundarywilltendtobalancethe(worst)errors.Inthefigure,thisisroughlyspeaking*2=1line,resultingin4errors."middle〞=(w0)Ifweregularizew0,thentheboundarywilleventuallygothroughtheorigin(biastermsettozero).Basedonthefigurewecanfindagoodlinearboundarythroughtheoriginwithonlyoneerror."bottom〞=(w2)Thetrainingerrorisunaffectedifweregularizew2(constraintheboundarytobevertical);thevalueofw2wouldbesmallalreadywithoutregularization.4、midterm2009problem46、Considertwoclassifiers:1)anSVMwithaquadratic(secondorderpolynomial)kernelfunctionand2)anunconstrainedmi*tureoftwoGaussiansmodel,oneGaussianperclasslabel.Theseclassifierstrytomape*amplesinR2tobinarylabels.Weassumethattheproblemisseparable,noslackpenaltiesareaddedtotheSVMclassifier,andthatwehavesufficientlymanytraininge*amplestoestimatethecovariancematricesofthetwoGaussianponents.ThetwoclassifiershavethesameVC-dimension.(T)Supposeweevaluatedthestructuralriskminimizationscoreforthetwoclassifiers.Thescoreistheboundonthee*pectedlossoftheclassifier,whentheclassifierisestimatedonthebasisofntraininge*amples.Whichofthetwoclassifiersmightyieldthebetter(lower)score"Provideabriefjustification.TheSVMwouldprobablygetabetterscore.Bothclassifiershavethesameple*itypenaltybutSVMwouldbetteroptimizethetrainingerrorresultinginalower(orequal)overallscore.[final2004]2,Weestimatedami*tureoftwoGaussiansmodelbasedontwodimensionaldatashowninfigure3.1below.Themi*turewasinitializedrandomlyintwodifferentwaysandrunforthreeiterationsbasedoneachinitialization.However,thefiguresgotmi*ed. >.up(yes,again!).Please
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 《心臟康復(fù)培訓(xùn)》課件
- 小學(xué)一年級20以內(nèi)加減法混合運算
- 小學(xué)五年級數(shù)學(xué)小數(shù)乘除法計算練習(xí)題 集
- 二年級上冊21 雪孩子(教案)
- 2025年1月內(nèi)蒙古自治區(qū)普通高等學(xué)校招生考試適應(yīng)性測試(八省聯(lián)考)歷史試題
- 《新地產(chǎn)營銷新機會》課件
- 混凝土路面施工協(xié)議書
- 口腔科護士的工作總結(jié)
- 育人為本點滴栽培班主任工作總結(jié)
- 浴室用品銷售工作總結(jié)
- 2024年領(lǐng)導(dǎo)干部任前廉政知識考試測試題庫及答案
- 中醫(yī)辨證-八綱辨證(中醫(yī)學(xué)課件)
- 冠脈介入進(jìn)修匯報
- 蔣詩萌小品《誰殺死了周日》臺詞完整版
- 生涯發(fā)展展示
- 【家庭自制】 南北香腸配方及28種制作方法
- 廠房施工總結(jié)報告
- 先進(jìn)物流理念主導(dǎo)和先進(jìn)物流技術(shù)支撐下的日本現(xiàn)代物流
- 建筑小區(qū)生雨水排水系統(tǒng)管道的水力計算
- 公務(wù)員職務(wù)和級別工資檔次套改及級別對應(yīng)表
- 社會團體選舉辦法
評論
0/150
提交評論