![北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第1頁](http://file4.renrendoc.com/view14/M06/2D/1B/wKhkGWcigoaAMbq7AAFgWhcOWo4738.jpg)
![北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第2頁](http://file4.renrendoc.com/view14/M06/2D/1B/wKhkGWcigoaAMbq7AAFgWhcOWo47382.jpg)
![北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第3頁](http://file4.renrendoc.com/view14/M06/2D/1B/wKhkGWcigoaAMbq7AAFgWhcOWo47383.jpg)
![北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第4頁](http://file4.renrendoc.com/view14/M06/2D/1B/wKhkGWcigoaAMbq7AAFgWhcOWo47384.jpg)
![北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第5頁](http://file4.renrendoc.com/view14/M06/2D/1B/wKhkGWcigoaAMbq7AAFgWhcOWo47385.jpg)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
Class5:ANOVA(AnalysisofVariance)andF-tests
I.WhatisANOVA
WhatisANOVA?ANOVAistheshortnamefortheAnalysisofVariance.TheessenceofANOVAistodecomposethetotalvarianceofthedependentvariableintotwoadditivecomponents,oneforthestructuralpart,andtheotherforthestochasticpart,ofaregression.Todaywearegoingtoexaminetheeasiestcase.
II.ANOVA:AnIntroduction
Letthemodelbe
.
Assumingxisacolumnvector(oflengthp)ofindependentvariablevaluesfortheith'observation,
.
Thenisthepredictedvalue.
sumofsquarestotal:
because.
ThisisalwaystruebyOLS.
=SSE+SSR
Important:thetotalvarianceofthedependentvariableisdecomposedintotwoadditiveparts:SSE,whichisduetoerrors,andSSR,whichisduetoregression.
Geometricinterpretation:[blackboard]
DecompositionofVariance
Ifwetreat(yī)Xasarandomvariable,wecandecomposetotalvariancetothebetween-groupportionandthewithin-groupportioninanypopulation:
?
Prove:
?
?
(bytheassumptionthat,forallpossiblek.)
TheANOVAtableistoestimat(yī)ethethreequantitiesofequation(1)fromthesample.Asthesamplesizegetslargerandlarger,theANOVAtablewillapproachtheequationcloserandcloser.
Inasample,decompositionofestimatedvarianceisnotstrictlytrue.Wethusneedtoseparatelydecomposesumsofsquaresanddegreesoffreedom.IsANOVAamisnomer?
III.ANOVAinMatrix
Iwilltrytogiveasimpliedrepresentat(yī)ionofANOVAasfollows:
(because)
(inyourtextbook,monsterlook)
SSE=e'e
(because,asalways)
(inyourtextbook,monsterlook)
IV.ANOVATable
SOURCE
SS
DF
MS
F
with
Regression
SSR
DF(R)
MSR
MSR/MSE
DF(R)
Error
SSE
DF(E)
MSE
DF(E)
Total
SST
DF(T)
Letususearealexample.Assumethatwehavearegressionestimatedtobe
y=-1.70+0.840x
ANOVATable
SOURCE
SS
DF
MS
F
with
Regression
6.44
1
6.44
6.44/0.19=33.89
1,18
Error
3.40
18
0.19
Total
9.84
19
Weknow,,,,.IfweknowthatDFforSST=19,whatisn?
n=20
?=201.71.7+0.840.84509.12-21.70.84100-125.0
?=6.44
SSE=SST-SSR=9.84-6.44=3.40
DF(Degreesoffreedom):demonstration.Note:discountingtheinterceptwhencalculatingSST.
MS=SS/DF
p=0.000[askstudents].Whatdoesthep-valuesay?
V.F-Tests
F-testsaremoregeneralthant-tests,t-testscanbeseenasaspecialcaseofF-tests.IfyouhavedifficultywithF-tests,pleaseaskyourGSIstoreviewF-testsinthelab.F-teststakestheformofafractionoftwoMS's.
AnFstatistichastwodegreesoffreedomassociatedwithit:thedegreeoffreedominthenumerator,andthedegreeoffreedominthedenominator.
AnFstatisticisusuallylargerthan1.TheinterpretationofanFstatisticsisthatwhethertheexplainedvariancebythealternativehypothesisisduetochance.Inotherwords,thenullhypothesisisthattheexplainedvarianceisduetochance,orallthecoefficientsarezero.
ThelargeranF-statistic,themorelikelythat(yī)thenullhypothesisisnottrue.Thereisat(yī)ableinthebackofyourbookfromwhichyoucanfindexactprobabilityvalues.
Inourexample,theFis34,whichishighlysignificant.
VI.R2
R2=SSR/SST
Theproportionofvarianceexplainedbythemodel.
Inourexample,
R-sq=65.4%
VII.Whathappensifweincreasemoreindependentvariables.
1.SSTstaysthesame.
2.SSRalwaysincreases.
3.SSEalwaysdecreases.
4.R2alwaysincreases.
5.MSRusuallyincreases.
6.MSEusuallydecreases.
7.F-testusuallyincreases.
Exceptionsto5and7:irrelevantvariablesmaynotexplainthevariancebuttakeupdegreesoffreedom.Wereallyneedtolookat(yī)theresults.
VIII.Important:GeneralWaysofHypothesisTestingwithF-Statistics.
AlltestsinlinearregressioncanbeperformedwithF-teststatistics.Thetrickistorun"nestedmodels."
Twomodelsarenestediftheindependentvariablesinonemodelareasubsetorlinearcombinat(yī)ionsofasubset(子集)oftheindependentvariablesintheothermodel.
Thatistosay.IfmodelAhasindependentvariables(1,,),andmodelBhasindependentvariables(1,,,),AandBarenested.Aiscalledtherestrictedmodel;Biscalledlessrestrictedorunrestrictedmodel.WecallArestrictedbecauseAimpliesthat.Thisisarestriction.
Anotherexample:Chasindependentvariable(1,,+),Dhas(1,+).
CandAarenotnested.
CandBarenested.OnerestrictioninC:.
CandDarenested.OnerestrictioninD:.
DandAarenotnested.
DandBarenested:tworestrictioninD:;.
Wecanalwaystesthypothesesimpliedintherestrictedmodels.Steps:runtworegressionforeachhypothesis,onefortherestrictedmodelandonefortheunrestrictedmodel.TheSSTshouldbethesameacrossthetwomodels.WhatisdifferentisSSEandSSR.Thatis,whatisdifferentisR2.Let
;
Usethefollowingformulas:
?
or
(proof:useSST=SSE+SSR)
Note,df(SSEr)-df(SSEu)=df(SSRu)-df(SSRr)=,
isthenumberofconstraints(notnumberofparameters)impliedbytherestrictedmodel
or
Notethat
Thatis,for1dftests,youcaneitherdoanF-testorat-test.Theyyieldthesameresult.Anotherwaytolookatitisthat(yī)thet-testisaspecialcaseoftheFtest,withthenumeratorDFbeing1.
IX.AssumptionsofF-tests
WhatassumptionsdoweneedtomakeanANOVAtablework?
Notmuchanassumption.Allweneedistheassumptionthat(X'X)isnotsingular,sothattheleastsquareestimatebexists.
Theassumptionof=0isneededifyouwanttheANOVAtabletobeanunbiasedestimat(yī)eofthetrueANOVA(equation1)inthepopulation.Reason:wewantbtobeanunbiasedestimatorof,andthecovariancebetweenbandtodisappear.
ForreasonsIdiscussedearlier,theassumptionsofhomoscedasticityandnon-serialcorrelationarenecessaryfortheestimat(yī)ionof.
Thenormalityassumptionthatiisdistributedinanormaldistributionisneededforsmallsamples.
X.TheConceptofIncrement
Everytimeyouputonemoreindependentvariableintoyourmodel,yougetanincreasein.Wesometimecalledtheincrease"incremental."Whatismeansisthatmorevarianceisexplained,orSSRisincreased,SSEisreduced.Whatyoushouldunderstandisthattheincrementalat(yī)tributedtoavariableisalwayssmallerthanthewhenothervariablesareabsent.
?XI.ConsequencesofOmittingRelevantIndependentVariables
Saythetruemodelisthefollowing:
.
Butforsomereasonweonlycollectorconsiderdataon.Therefore,weomitintheregression.Thatis,weomitinourmodel.Webrieflydiscussedthisproblembefore.Theshortstoryisthatwearelikelytohaveabiasduetotheomissionofarelevantvariableinthemodel.Thisissoeventhoughourprimaryinterestistoestimatetheeffectoforony.
Why?Wewillhaveaformalpresentationofthisproblem.
XII.MeasuresofGoodness-of-Fit
Therearedifferentwaystoassessthegoodness-of-fitofamodel.
A.R2
R2isaheuristicmeasurefortheoverallgoodness-of-fit.Itdoesnothaveanassociatedteststat(yī)istic.
R2measurestheproportionofthevarianceinthedependentvariablethat(yī)is“explained”bythemodel:
R2=
B.ModelF-test
ThemodelF-testteststhejointhypothesesthat(yī)allthemodelcoefficientsexceptfortheconstanttermarezero.
DegreesoffreedomsassociatedwiththemodelF-test:
Numerator:p-1
Denominator:n-p.
C.t-testsforindividualparameters
At-testforanindividualparameterteststhehypothesisthat(yī)aparticularcoefficientisequaltoaparticularnumber(commonlyzero).
tk=(bk-k0)/SEk,whereSEkisthe(k,k)elementofMSE(X’X)-1,withdegreeoffreedom=n-p.
D.IncrementalR2
Relativetoarestrictedmodel,thegaininR2fortheunrestrictedmodel:
R2=Ru2-Rr2
?E.F-testsforNestedModel
ItisthemostgeneralformofF-testsandt-tests.
?
Itisequaltoat(yī)-testiftheunrestrictedandrestrictedmodelsdifferonlybyonesingleparameter.
ItisequaltothemodelF-testifwesettherestrictedmodeltotheconstant-onlymodel.
[Askstudents]WhatareSST,SSE,andSSR,andtheirassociateddegreesoffreedom,fortheconstant-onlymodel?
NumericalExample
Asociologicalstudyisinterestedinunderstandingthesocialdeterminantsofmat(yī)hematicalachievementamonghighschoolstudents.Youarenowaskedtoansweraseriesofquestions.Thedataarerealbuthavebeentailoredforeducat(yī)ionalpurposes.Thetotalnumberofobservationsis400.Thevariablesaredefinedas:
y:mathscore
x1:fat(yī)her'seducation
x2:mother'seducation
x3:family'ssocioeconomicstatus
x4:numberofsiblings
x5:classrank
x6:parents'totaleducation(note:x6
=
x1
+
x2)
Forthefollowingregressionmodels,weknow:
Table1
SST?SSR?SSE?DFR2
(1)yon(1x1x2x3x4)?34863?4201
(2)yon(1x6x3x4)?34863???396 .1065
(3)yon(1x6x3x4x5)?34863?10426?24437?395?.2991
(4)x5on(1x6x3x4)???269753 396?.0210
1.PleasefillthemissingcellsinTable1.
2.Testthehypothesisthattheeffectsoffather'seducat(yī)ion(x1)andmother'seducation(x2)onmathscorearethesameaftercontrollingforx3andx4.
3.Testthehypothesisthatx6,x3andx4inModel(2)allhaveazeroeffectony.
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 江蘇醫(yī)藥職業(yè)學(xué)院《機器學(xué)習(xí)實驗》2023-2024學(xué)年第二學(xué)期期末試卷
- 上海中醫(yī)藥大學(xué)《軟件設(shè)計綜合實驗》2023-2024學(xué)年第二學(xué)期期末試卷
- 河北科技大學(xué)《新媒體用戶分析》2023-2024學(xué)年第二學(xué)期期末試卷
- 江西經(jīng)濟管理職業(yè)學(xué)院《技術(shù)與生活世界》2023-2024學(xué)年第二學(xué)期期末試卷
- 長沙衛(wèi)生職業(yè)學(xué)院《基礎(chǔ)工程設(shè)計原理》2023-2024學(xué)年第二學(xué)期期末試卷
- 四川大學(xué)《高級管理會計》2023-2024學(xué)年第二學(xué)期期末試卷
- 西安音樂學(xué)院《中學(xué)學(xué)科課程標(biāo)準(zhǔn)與教材研究》2023-2024學(xué)年第二學(xué)期期末試卷
- 內(nèi)蒙古美術(shù)職業(yè)學(xué)院《素描1》2023-2024學(xué)年第二學(xué)期期末試卷
- 中國人民大學(xué)《視覺-語音設(shè)計實訓(xùn)》2023-2024學(xué)年第二學(xué)期期末試卷
- 二零二五年度股權(quán)質(zhì)押合同信息披露范本
- HYT 235-2018 海洋環(huán)境放射性核素監(jiān)測技術(shù)規(guī)程
- ISO28000:2022供應(yīng)鏈安全管理體系
- 中國香蔥行業(yè)市場現(xiàn)狀分析及競爭格局與投資發(fā)展研究報告2024-2034版
- 婦科惡性腫瘤免疫治療中國專家共識(2023)解讀
- 2024年浪潮入職測評題和答案
- 小班數(shù)學(xué)《整理牛奶柜》課件
- 中考語文真題雙向細(xì)目表
- 我國新零售業(yè)上市公司財務(wù)質(zhì)量分析-以蘇寧易購為例
- 藥品集采培訓(xùn)課件
- 股骨干骨折教學(xué)演示課件
- 動靜脈內(nèi)瘺血栓
評論
0/150
提交評論