第十七-章相關(guān)分析與回歸分析_第1頁(yè)
第十七-章相關(guān)分析與回歸分析_第2頁(yè)
第十七-章相關(guān)分析與回歸分析_第3頁(yè)
第十七-章相關(guān)分析與回歸分析_第4頁(yè)
第十七-章相關(guān)分析與回歸分析_第5頁(yè)
已閱讀5頁(yè),還剩79頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

ChapterSeventeen

CorrelationAndRegression

第十七章相關(guān)分析與回歸分析

學(xué)習(xí)目標(biāo)討論積矩相關(guān)系數(shù)、偏相關(guān)和部分相關(guān)的概念,并說(shuō)明這些相關(guān)關(guān)系如何為回歸分析建立基礎(chǔ)。解釋二元回歸的特點(diǎn)和方法,描述其一般模型、參數(shù)估計(jì)、標(biāo)準(zhǔn)化回歸系數(shù)、顯著性檢驗(yàn)、預(yù)測(cè)準(zhǔn)確性、殘差分析和模型交叉檢驗(yàn)。解釋多元回歸分析的特點(diǎn)和方法,尤其是逐步回歸、含虛擬變量的回歸以及回歸中的方差和協(xié)方差分析。描述多元回歸分析中用的特殊方法,尤其是逐步回歸、含虛擬變量的回歸以及回歸中的方差和協(xié)方差分析。探討非定量相關(guān)及其測(cè)量指標(biāo)學(xué)習(xí)內(nèi)容積矩相關(guān)系數(shù)偏相關(guān)非定量相關(guān)回歸分析二元回歸統(tǒng)計(jì)與二元回歸分析的關(guān)系進(jìn)行二元回歸分析多元回歸

與二元回歸分析相關(guān)的統(tǒng)計(jì)量

進(jìn)行多元回歸分析逐步回歸多重共線性預(yù)測(cè)的相對(duì)重要性交叉驗(yàn)證回歸與虛擬變量方差分析與回歸分析總結(jié)ProductMomentCorrelation

積矩相關(guān)系數(shù)積矩相關(guān)系數(shù)r是最常用的概括兩個(gè)定量(定距或定比尺度)變量X與Y的關(guān)系強(qiáng)度的統(tǒng)計(jì)量它是一個(gè)決定X與Y是否存在線性關(guān)系的指標(biāo)。由于這個(gè)指標(biāo)最早由KarlPearson提出的,因此也被稱為Pearson相關(guān)系數(shù)。

它同時(shí)也叫簡(jiǎn)單相關(guān)系數(shù)、雙變量相關(guān)系數(shù)或者相關(guān)系數(shù)。ProductMomentCorrelation

積矩相關(guān)系數(shù)對(duì)于n個(gè)觀測(cè)值的樣本,變量為X和Y,積矩相關(guān)系數(shù)r計(jì)算為r=(Xi-X)(Yi-Y)Si=1n(Xi-X)2Si=1n(Yi-Y)2Si=1nDivisionofthenumeratoranddenominatorby(n-1)givesr=(Xi-X)(Yi-Y)n-1Si=1n(Xi-X)2n-1Si=1n(Yi-Y)2n-1Si=1n=COVxySxSyProductMomentCorrelation

積矩相關(guān)系數(shù)r在-1.0和+1.0之間變化。不論兩個(gè)變量各自的測(cè)量單位是什么,相關(guān)系數(shù)都是不變的。

ExplainingAttitudeTowardtheCityofResidence

研究對(duì)居住城市的態(tài)度Table17.1調(diào)查對(duì)象編號(hào)對(duì)城市的態(tài)度居住年限天氣的重要性161032912113812443415101211646175878224911188109910111017812225ProductMomentCorrelation

積矩相關(guān)系數(shù)相關(guān)系數(shù)的計(jì)算如下: =(10+12+12+4+12+6+8+2+18+9+17+2)/12 =9.333

XY =(6+9+8+3+10+4+5+2+11+9+10+2)/12 =6.583(Xi-X)(Yi-Y)Si=1n =(10-9.33)(6-6.58)+(12-9.33)(9-6.58) +(12-9.33)(8-6.58)+(4-9.33)(3-6.58) +(12-9.33)(10-6.58)+(6-9.33)(4-6.58) +(8-9.33)(5-6.58)+(2-9.33)(2-6.58) +(18-9.33)(11-6.58)+(9-9.33)(9-6.58) +(17-9.33)(10-6.58)+(2-9.33)(2-6.58) =-0.3886+6.4614+3.7914+19.0814 +9.1314+8.5914+2.1014+33.5714 +38.3214-0.7986+26.2314+33.5714 =179.6668ProductMomentCorrelation

積矩相關(guān)系數(shù)DecompositionoftheTotalVariation

總變差分解r2

=

Explained

variationTotal

variation

=

SSxSSy

=

Total

variation

-

Error

variationTotal

variation=

SSy

-

SSerrorSSy

DecompositionoftheTotalVariation

總方差分解

DecompositionoftheTotalVariation

總變差分解r=0時(shí)的非線性關(guān)系PartialCorrelation偏相關(guān)偏相關(guān)系數(shù)是用于測(cè)量在控制或調(diào)整了一個(gè)或多個(gè)變量的基礎(chǔ)上,兩個(gè)變量之間的關(guān)系計(jì)算偏相關(guān)系數(shù)是需要考慮其“階數(shù)”,這

“階數(shù)”說(shuō)明有多少個(gè)變量被控制或調(diào)整簡(jiǎn)單相關(guān)系數(shù)r是零階的,因?yàn)樵跍y(cè)量?jī)蓚€(gè)變量之間關(guān)系時(shí)不需要控制額外變量的作用。

PartialCorrelation偏相關(guān)

PartCorrelationCoefficient

部分相關(guān)系數(shù)部分相關(guān)系數(shù)代表從X中去除其他自變量線性影響后,Y和X之間的相關(guān)性。ry(x.z)部分相關(guān)系數(shù)計(jì)算如下:通常認(rèn)為偏相關(guān)系數(shù)比部分相關(guān)系數(shù)重要。ry(x.z)

=

rxy

-

ryzrxz1

-

rxz2NonmetricCorrelation非定量相關(guān)

rs

t

rt

t

rs

rRegressionAnalysis回歸分析Regression

analysis

examinesassociativerelationshipsbetweenametricdependentvariableandoneormoreindependentvariablesinthefollowingways:回歸分析是分析定量因變量與一個(gè)或多個(gè)自變量之間相關(guān)關(guān)系的有效且易用的方法,可以用于以下幾方面;Determinewhethertheindependentvariablesexplainasignificantvariationinthedependentvariable:whetherarelationshipexists.確定自變量是否能夠解釋因變量的重要變差,即二者之間是否存在關(guān)系。Determinehowmuchofthevariationinthedependentvariablecanbeexplainedbytheindependentvariables:strengthoftherelationship.確定因變量中有多大比例的變差可以有自變量來(lái)解釋,即關(guān)系的強(qiáng)度有多大。RegressionAnalysis回歸分析Determinethestructureorformoftherelationship:themathematicalequationrelatingtheindependentanddependentvariables.確定二者關(guān)系的形式,即與自變量和因變量有關(guān)的數(shù)學(xué)方程式。Predictthevaluesofthedependentvariable.預(yù)測(cè)因變量的值。Controlforotherindependentvariableswhenevaluatingthecontributionsofaspecificvariableorsetofvariables.在評(píng)估特定變量貢獻(xiàn)時(shí),控制其他變量的作用。Regressionanalysisisconcernedwiththenatureanddegreeofassociationbetweenvariablesanddoesnotimplyorassumeanycausality.盡管自變量可能解釋一部分因變量的變差,但這并不表示必然存在因果關(guān)系StatisticsAssociatedwithBivariate

RegressionAnalysis與二元回歸分析相關(guān)的統(tǒng)計(jì)量Bivariateregressionmodel.ThebasicregressionequationisYi=+Xi

+ei,whereY=dependentorcriterionvariable,X=independentorpredictorvariable,=interceptoftheline,=slopeoftheline,andeiistheerrortermassociatedwiththeithobservation.二元回歸模型,基本的回歸等式為Yi=+Xi

+ei,其中Yi

是因變量或標(biāo)準(zhǔn)變量,Xi為自變量或預(yù)測(cè)變量,為直線截距,為直線斜率,ei為第i個(gè)觀測(cè)值的誤差。Coefficientofdetermination.Thestrengthofassociationismeasuredbythecoefficientofdetermination,r2.Itvariesbetween0and1andsignifiestheproportionofthetotalvariationinYthatisaccountedforbythevariationinX.可決系數(shù)變量之間聯(lián)系的強(qiáng)度由可決系數(shù)r2

類測(cè)量,其值在0和1之間變化,表表示Y的總變差中能被X變差解釋的比例。Estimatedorpredictedvalue.TheestimatedorpredictedvalueofYiisi

=a+bx,whereiisthepredictedvalueofYi,andaandbareestimatorsof

and,respectively.

估計(jì)值或預(yù)測(cè)值:Yi的估計(jì)值或預(yù)測(cè)值為=a+bx,為Yi預(yù)測(cè)值,a

b

分別為和的估計(jì)值。

b0

b1

b0

b1

b0

b1StatisticsAssociatedwithBivariate

RegressionAnalysis與二元回歸分析相關(guān)的統(tǒng)計(jì)量Regressioncoefficient.Theestimatedparameterbisusuallyreferredtoasthenon-standardizedregressioncoefficient.回歸系數(shù)。估計(jì)的參數(shù)b通常是指非標(biāo)準(zhǔn)化回歸系數(shù)。Scattergram.Ascatterdiagram,orscattergram,isaplotofthevaluesoftwovariablesforallthecasesorobservations.散點(diǎn)圖。散點(diǎn)圖是根據(jù)兩個(gè)變量的所有觀測(cè)值繪制的圖。Standarderrorofestimate.Thisstatistic,SEE,isthestandarddeviationoftheactualYvaluesfromthepredictedvalues.估計(jì)標(biāo)準(zhǔn)誤。SEE表示Y的實(shí)際值與預(yù)測(cè)值之間的標(biāo)準(zhǔn)差Standarderror.Thestandarddeviationofb,SEb,iscalledthestandarderror.標(biāo)準(zhǔn)誤。B的標(biāo)準(zhǔn)差Seb被稱作標(biāo)準(zhǔn)誤。YStatisticsAssociatedwithBivariate

RegressionAnalysis與二元回歸分析相關(guān)的統(tǒng)計(jì)量Standardizedregressioncoefficient.Alsotermedthebetacoefficientorbetaweight,thisistheslopeobtainedbytheregressionofYonXwhenthedataarestandardized.標(biāo)準(zhǔn)化回歸系數(shù)。也被稱作beta系數(shù)或beta權(quán)數(shù),是X與Y均為標(biāo)準(zhǔn)化數(shù)據(jù)時(shí)的斜率。Sumofsquarederrors.Thedistancesofallthepointsfromtheregressionlinearesquaredandaddedtogethertoarriveatthesumofsquarederrors,whichisameasureoftotalerror,誤差平方和。將所有偏離回歸擬合線的點(diǎn)的距離的平方和加總就得到誤差平方和,值總誤差的測(cè)量指標(biāo),記作tstatistic.Atstatisticwithn-2degreesoffreedomcanbeusedtotestthenullhypothesisthatnolinearrelationshipexistsbetweenXandY,orH0:β=0,wheret=b/SEbT統(tǒng)計(jì)量。自由度為n-2的t統(tǒng)計(jì)量可用于檢驗(yàn)X與Y不存在線性關(guān)系的零假設(shè)。

ejS2ConductingBivariateRegressionAnalysis

PlottheScatterDiagram

二元回歸分析散點(diǎn)圖Ascatterdiagram,orscattergram,isaplotofthevaluesoftwovariablesforallthecasesorobservations.

散點(diǎn)圖就是根據(jù)兩個(gè)變量的所有觀測(cè)值繪制的圖表Themostcommonlyusedtechniqueforfittingastraightlinetoascattergramistheleast-squaresprocedure.Infittingtheline,theleast-squaresprocedureminimizesthesumofsquarederrors,用一條直線對(duì)散點(diǎn)圖進(jìn)行擬合的最常用方法為最小二乘法.為找到最佳擬合線,最小二乘法可以令誤差平方和最小。

ejS2ConductingBivariateRegressionAnalysis進(jìn)行二元回歸分析法Fig.17.2圖PlottheScatterDiagram繪制散點(diǎn)圖FormulatetheGeneralModel建立二元回歸模型EstimatetheParameters估計(jì)參數(shù)EstimateStandardizedRegressionCoefficients估計(jì)標(biāo)準(zhǔn)化回歸系數(shù)TestforSignificance顯著性檢驗(yàn)DeterminetheStrengthandSignificanceofAssociation確定相關(guān)關(guān)系的強(qiáng)度和顯著性CheckPredictionAccuracy檢查預(yù)測(cè)準(zhǔn)確度ExaminetheResiduals殘差檢驗(yàn)

Cross-ValidatetheModel模型交叉檢驗(yàn)ConductingBivariateRegressionAnalysis

FormulatetheBivariateRegressionModel分析二元回歸模型Inthebivariateregressionmodel,thegeneralformofastraightlineis:Y

=X

b0+

b1whereY=dependentorcriterionvariable因變量或標(biāo)準(zhǔn)變量X=independentorpredictorvariable自變量或預(yù)測(cè)變量

=interceptoftheline直線的截距

b0

b1=slopeoftheline直線的斜率

Theregressionprocedureaddsanerrortermtoaccountfortheprobabilisticorstochasticnatureoftherelationship:在回歸分析中需要加上誤差項(xiàng),以便考察變量之間關(guān)系的隨機(jī)性Yi

=

b0+

b1

Xi+eiwhereeiistheerrortermassociatedwiththeithobservation.式中ei為第I個(gè)觀察值相關(guān)的誤差項(xiàng)PlotofAttitudewithDurationFig.17.34.52.256.7511.25913.593615.7518DurationofResidenceAttitudeWhichStraightLineIsBest?Fig.17.49

6

3

2.25

4.5

6.75

9

11.25

13.5

15.75

18

Line1

Line2

Line3

Line4

BivariateRegression二元回歸

Fig.17.5X2X1X3X5X4YJeJeJYJXYβ0+β1XConductingBivariateRegressionAnalysis

EstimatetheParameters二元回歸參數(shù)分析 areunknownandareestimatedfromthesampleobservationsusingtheequation在大多數(shù)情況下,和是未知的,需要根據(jù)等式從樣本觀測(cè)值中估計(jì)

whereiistheestimatedorpredictedvalueofYi,andaandbareestimatorsofInmostcases,

b0and

b1Yi=a+bxiYand,respectively.

b=COVxySx2=(Xi-X)(Yi-Y)Si=1n(Xi-X)Si=1n2=XiYi-nXYSi=1nXi2-nX2Si=1n

b0

b1b0ConductingBivariateRegressionAnalysis

EstimatetheParameters二元回歸參數(shù)分析Theintercept,a,maythenbecalculatedusing:截距a則可以計(jì)算如下 a=ForthedatainTable17.1,theestimationofparametersmaybe

illustratedasfollows:

=(10)(6)+(12)(9)+(12)(8)+(4)(3)+(12)(10)+(6)(4) +(8)(5)+(2)(2)+(18)(11)+(9)(9)+(17)(10)+(2)(2) =917

Xi2 =102+122+122+42+122+62 +82+22+182+92+172+22 =1350-bYXS12iS=112=i1XiYiConductingBivariateRegressionAnalysis

EstimatetheParameters二元回歸參數(shù)分析Itmayberecalledfromearliercalculationsofthesimplecorrelationthat:前面我們講過(guò)簡(jiǎn)單相關(guān)系數(shù)的計(jì)算為: =9.333 =6.583

Givenn=12,bcanbecalculatedas:

=0.5897

a=XYb

=

917

-

(12)

(9.333)

(

6.583)1350

-

(12)

(9.333)2Y-bX

=6.583-(0.5897)(9.333) =1.0793ConductingBivariateRegressionAnalysis

EstimatetheStandardizedRegressionCoefficient估計(jì)標(biāo)準(zhǔn)化回歸系數(shù)Standardizationistheprocessbywhichtherawdataaretransformedintonewvariablesthathaveameanof0andavarianceof1(Chapter14).標(biāo)準(zhǔn)化就是將原始數(shù)據(jù)轉(zhuǎn)換為均值為0,方差為1的新變量的過(guò)程(見14章)Whenthedataarestandardized,theinterceptassumesavalueof0.數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化后,截距取值為0Thetermbetacoefficientorbetaweight

isusedtodenotethestandardizedregressioncoefficient.Bata系數(shù)被用來(lái)表示標(biāo)準(zhǔn)化回歸系數(shù)。 Byx=Bxy

=rxy

Thereisasimplerelationshipbetweenthestandardizedandnon-standardizedregressioncoefficients:標(biāo)準(zhǔn)化和非標(biāo)準(zhǔn)化回歸系數(shù)的關(guān)系可以簡(jiǎn)單表示如下:

Byx=byx(Sx/Sy)ConductingBivariateRegressionAnalysis

TestforSignificance二元回歸顯著性檢驗(yàn)ThestatisticalsignificanceofthelinearrelationshipbetweenXandYmaybetestedbyexaminingthehypotheses:對(duì)于X和Y之間的線性關(guān)系的統(tǒng)計(jì)顯著性可以通過(guò)以下假設(shè)進(jìn)行檢驗(yàn)Atstatisticwithn-2degreesoffreedomcanbeused,where通常采用雙尾檢驗(yàn),對(duì)此要采用自由度為n-2的t統(tǒng)計(jì)量SEbdenotesthestandarddeviationofbandiscalledthestandarderror.Seb表示b的標(biāo)準(zhǔn)差,被稱為標(biāo)準(zhǔn)誤。

H0:b1=0H1:b110t

=

bSEbConductingBivariateRegressionAnalysis

TestforSignificance二元回歸顯著性檢驗(yàn)Usingacomputerprogram,theregressionofattitudeondurationofresidence,usingthedatashowninTable17.1,yieldedtheresultsshowninTable17.2.Theintercept,a,equals1.0793,andtheslope,b,equals0.5897.Therefore,theestimatedequationis:用計(jì)算機(jī)程序,根據(jù)表17-1的數(shù)據(jù),可以建立對(duì)城市態(tài)度與居住年限的回歸方程,其中截距a=1.0793,斜率b=0.5897,估計(jì)的方程式為:Attitude()=1.0793+0.5897(Durationofresidence)Thestandarderror,orstandarddeviationofbisestimatedas0.07008,andthevalueofthetstatisticast=0.5897/0.0700=8.414,withn-2=10degreesoffreedom.B的標(biāo)準(zhǔn)誤或標(biāo)準(zhǔn)差為0.07008,t=0.5897/0.0700自由度為n-2=10FromTable4intheStatisticalAppendix,weseethatthecriticalvalueoftwith10degreesoffreedomand=0.05is2.228foratwo-tailedtest.Sincethecalculatedvalueoftislargerthanthecriticalvalue,thenullhypothesisisrejected.從附錄統(tǒng)計(jì)表4中,我們可以找到自由度為10,a=0.05時(shí)雙尾檢驗(yàn)t的臨界值為2.228,由于t的計(jì)算值大于臨界值,零假設(shè)被拒絕。

aYConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation確定相關(guān)關(guān)系的強(qiáng)度和顯著性Thetotalvariation,SSy,maybedecomposedintothevariationaccountedforbytheregressionline,SSreg,andtheerrororresidualvariation,SSerrororSSres,asfollows:總變差SSy可以分解為回歸變差,SSreg和殘差SSerror或Ssres,即SSy=SSreg+SSreswhere

S

S

y

=

(

Y

i

-

Y

)

2

n

S

i

=1

S

S

r

e

g

=

(

Y

i

-

Y

)

2

S

S

r

e

s

=(

Y

i

-

Y

i

)

2

n

S

i

=1n

S

i

=1DecompositionoftheTotal

VariationinBivariateRegression

二元回歸中的總變差分解Fig.17.6X2X1X3X5X4YXTotalVariationSSyResidualVariation殘余變差SSresExplainedVariation解釋變差SSregYConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation確定相關(guān)關(guān)系的強(qiáng)度和顯著性Toillustratethecalculationsofr2,letusconsideragaintheeffectofattitudetowardthecityonthedurationofresidence.Itmayberecalledfromearliercalculationsofthesimplecorrelationcoefficientthat:為說(shuō)明r2

計(jì)算,我們?nèi)稳灰跃幼∧晗迣?duì)城市態(tài)度的影響為例。在此之前我們?cè)?jīng)計(jì)算過(guò)簡(jiǎn)單相關(guān)系數(shù)

=120.9168SSy=(Yi-Y)2Si=1n

r

2

=

S

S

r

e

g

S

S

y

=

S

S

y

-

S

S

r

e

s

S

S

y

Thestrengthofassociationmaythenbecalculatedasfollows:變量之間聯(lián)系的強(qiáng)度計(jì)算如下;ConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation確定相關(guān)關(guān)系的強(qiáng)度和顯著性Thepredictedvalues()canbecalculatedusingtheregression預(yù)測(cè)值可以通過(guò)回歸方程來(lái)計(jì)算equation:Attitude()=1.0793+0.5897(Durationofresidence)ForthefirstobservationinTable17.1,thisvalueis:()=1.0793+0.5897x10=6.9763.Foreachsuccessiveobservation,thepredictedvaluesare,inorder,8.1557,8.1557,3.4381,8.1557,4.6175,5.7969,2.2587,11.6939,6.3866,11.1042,and2.2587.對(duì)以后各項(xiàng)觀測(cè)值,預(yù)測(cè)值依次為8.1557,8.1557,3.4381,8.1557,4.6175,5.7969,2.2587,11.6939,6.3866,11.1042,和2.2587YYYConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation確定相關(guān)關(guān)系的強(qiáng)度和顯著性Therefore,

=(6.9763-6.5833)2+(8.1557-6.5833)2 +(8.1557-6.5833)2+(3.4381-6.5833)2 +(8.1557-6.5833)2+(4.6175-6.5833)2 +(5.7969-6.5833)2+(2.2587-6.5833)2 +(11.6939-6.5833)2+(6.3866-6.5833)2 +(11.1042-6.5833)2+(2.2587-6.5833)2 =0.1544+2.4724+2.4724+9.8922+2.4724 +3.8643+0.6184+18.7021+26.1182 +0.0387+20.4385+18.7021

=105.9524SSreg=(Yi-Y)2Si=1nConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation確定相關(guān)關(guān)系的強(qiáng)度和顯著性 =(6-6.9763)2+(9-8.1557)2+(8-8.1557)2

+(3-3.4381)2+(10-8.1557)2+(4-4.6175)2 +(5-5.7969)2+(2-2.2587)2+(11-11.6939)2 +(9-6.3866)2+(10-11.1042)2+(2-2.2587)2

=14.9644ItcanbeseenthatSSy=SSreg+SSres.Furthermore,

r2 =SSreg/SSy =105.9524/120.9168 =0.8762SSres=(Yi-Yi)2Si=1nConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation

確定相關(guān)關(guān)系的強(qiáng)度和顯著性Another,equivalenttestforexaminingthesignificanceofthelinearrelationshipbetweenXandY(significanceofb)isthetestforthesignificanceofthecoefficientofdetermination.Thehypothesesinthiscaseare:

另外一個(gè)考察X與Y之間線性關(guān)系顯著性(b的顯著性)的等價(jià)檢驗(yàn),是可決系數(shù)顯著性檢驗(yàn)。該檢驗(yàn)的假設(shè)為:

H0:R2pop=0

H1:R2pop>0ConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation

確定相關(guān)關(guān)系的強(qiáng)度和顯著性TheappropriateteststatisticistheFstatistic:適當(dāng)?shù)慕y(tǒng)計(jì)檢驗(yàn)量為F統(tǒng)計(jì)量

whichhasanFdistributionwith1andn-2degreesoffreedom.TheFtestisageneralizedformofthettest(seeChapter15).Ifarandomvariableistdistributedwithndegreesoffreedom,thent2isFdistributedwith1andndegreesoffreedom.Hence,theFtestfortestingthesignificanceofthecoefficientofdeterminationisequivalenttotestingthefollowinghypotheses:它服從F分布,自由度為1和n-2。F檢驗(yàn)是t檢驗(yàn)的一般形式,如果隨機(jī)變量服從自由度為n的t分布,那么t2就服從自由度為1和n的F分布。因此檢驗(yàn)可決系數(shù)顯著性的F檢驗(yàn)與以下假設(shè)意義相同: orF

=

SSregSSres/(n-2)

H0:b1=0H1:b110

H0:r=0H1:r10ConductingBivariateRegressionAnalysis

DeterminetheStrengthandSignificanceofAssociation

確定相關(guān)關(guān)系的強(qiáng)度和顯著性FromTable17.2,itcanbeseenthat:

r2=105.9522/(105.9522+14.9644)

=0.8762

Whichisthesameasthevaluecalculatedearlier.ThevalueoftheFstatisticis:

F=105.9522/(14.9644/10)=70.8027

with1and10degreesoffreedom.ThecalculatedFstatisticexceedsthecriticalvalueof4.96determinedfromTable5intheStatisticalAppendix.Therefore,therelationshipissignificantat=0.05,corroboratingtheresultsofthettest.自由度為1和10.計(jì)算出的F統(tǒng)計(jì)量超過(guò)了根據(jù)附錄統(tǒng)計(jì)表5查到的臨界值4.96,因此,變量之間的關(guān)系在a=0.05的對(duì)平下顯著,證實(shí)了t檢驗(yàn)的結(jié)果

aBivariateRegression

二元回歸Table17.2MultipleR 0.93608R2 0.87624AdjustedR2 0.86387StandardError 1.22329

ANALYSISOFVARIANCE

df SumofSquares MeanSquareRegression 1 105.95222 105.95222Residual 10 14.964441.49644F=70.80266 SignificanceofF=0.0000VARIABLESINTHEEQUATIONVariable b SEb Beta(?) TSignificance ofTDuration 0.58972 0.070080.93608 8.414 0.0000(Constant) 1.07932 0.74335 1.452 0.1772BivariateRegression

二元回歸Table17.2多元R

0.93608R2 0.87624調(diào)整的

R2 0.86387標(biāo)準(zhǔn)誤

1.22329

方差分析

自由度

平方和

均方回歸方程 1 105.95222 105.95222殘差

10 14.964441.49644F=70.80266

F

的顯著性

=0.0000等式中的變量變量

b SEb Beta(?) TSignificanceofT

居住年限 0.58972 0.070080.93608 8.414 0.0000(常數(shù)項(xiàng) 1.07932 0.74335 1.452 0.1772ConductingBivariateRegressionAnalysis

CheckPredictionAccuracy檢查預(yù)測(cè)準(zhǔn)確度Toestimatetheaccuracyofpredictedvalues,,itisusefultocalculatethestandarderrorofestimate,SEE.為估計(jì)預(yù)測(cè)值的準(zhǔn)確性,有必要計(jì)算估計(jì)的標(biāo)準(zhǔn)誤SEE,這個(gè)統(tǒng)計(jì)量表示Y的實(shí)際值與預(yù)測(cè)值之間的標(biāo)準(zhǔn)差。

orormoregenerally,iftherearekindependentvariables,如果有K個(gè)自變量,一般形式為

ForthedatagiveninTable17.2,theSEEisestimatedasfollows:

=1.22329Y2(12)?--=?=nSEEniiiYY2-=nSEESSres1--=knSEESSresSEE

=

14.9644/(12-2)Assumptions假設(shè)Theerrortermisnormallydistributed.ForeachfixedvalueofX,thedistributionofYisnormal.誤差項(xiàng)呈正態(tài)分布,對(duì)于每個(gè)X的固定值,Y為正態(tài)分布ThemeansofallthesenormaldistributionsofY,givenX,lieonastraightlinewithslopeb.給定X,所有正態(tài)分布的Y的均值位于一條斜率為b的直線上Themeanoftheerrortermis0.誤差項(xiàng)的均值為0Thevarianceoftheerrortermisconstant.ThisvariancedoesnotdependonthevaluesassumedbyX.誤差項(xiàng)的方差固定,方差不隨X值變化Theerrortermsareuncorrelated.Inotherwords,theobservationshavebeendrawnindependently.誤差項(xiàng)是不相關(guān)的,即觀測(cè)值是相互獨(dú)立的。MultipleRegression多元回歸Thegeneralformofthemultipleregressionmodelisasfollows:多元回歸模型的一般形式如下:whichisestimatedbythefollowingequation:該模型通過(guò)以下公式盡享估算 =a+b1X1+b2X2+b3X3+...+bkXk

Asbefore,thecoefficientarepresentstheintercept,buttheb'sarenowthepartialregressioncoefficients.如前所述,系數(shù)a代表的是截距,但b現(xiàn)在是偏回歸系數(shù)。Y

Y=b0+b1X1+b2X2+b3X3+...+bkXk+eeStatisticsAssociatedwithMultipleRegression與多元回歸有關(guān)的統(tǒng)計(jì)量AdjustedR2.R2,coefficientofmultipledetermination,isadjustedforthenumberofindependentvariablesandthesamplesizetoaccountforthediminishingreturns.Afterthefirstfewvariables,theadditionalindependentvariablesdonotmakemuchcontribution.

調(diào)整的。將多元可決系數(shù)根據(jù)自變量和樣本規(guī)模進(jìn)行調(diào)整,除了前幾個(gè)自變量,其他自變量對(duì)因變量的影響不大。Coefficientofmultipledetermination.Thestrengthofassociationinmultipleregressionismeasuredbythesquareofthemultiplecorrelationcoefficient,R2,whichisalsocalledthecoefficientofmultipledetermination.多元可決系數(shù)。多元回歸中變量之間關(guān)系的強(qiáng)度由多元相關(guān)系數(shù)的平方R2來(lái)測(cè)量Ftest.TheFtestisusedtotestthenullhypothesisthatthecoefficientofmultipledeterminationinthepopulation,R2pop,iszero.Thisisequivalenttotestingthenullhypothesis.TheteststatistichasanFdistributionwithkand(n-k-1)degreesoffreedom.F檢驗(yàn)。F檢驗(yàn)用于檢驗(yàn)樣本總體多元可決系數(shù)R2pop為0的假設(shè)。這與檢驗(yàn)零假設(shè)是等價(jià)的。檢驗(yàn)統(tǒng)計(jì)量服從F分布,自由度為k和(n-k-1).StatisticsAssociatedwithMultipleRegression與多元回歸有關(guān)的統(tǒng)計(jì)量PartialFtest.Thesignificanceofapartialregressioncoefficient,,ofXimaybetestedusinganincrementalFstatistic.TheincrementalFstatisticisbasedontheincrementintheexplainedsumofsquaresresultingfromtheadditionoftheindependentvariableXitotheregressionequationafteralltheotherindependentvariableshavebeenincluded.偏F檢驗(yàn)。對(duì)Xi的偏回歸系數(shù)進(jìn)行顯著性檢驗(yàn)可以應(yīng)用遞增F統(tǒng)計(jì)量。遞增F統(tǒng)計(jì)量取決于在所有其他自變量都包括在模型中的情況下,向回歸方程引入新自變量時(shí)可解釋平方和的增量。

Partialregressioncoefficient.Thepartialregressioncoefficient,b1,denotesthechangeinthepredictedvalue,,perunitchangeinX1whentheotherindependentvariables,X2toXk,areheldconstant.偏回歸系數(shù)。偏回歸系數(shù)b1表示在X2到Xk均固定不變時(shí),改變一單位X1引起的預(yù)測(cè)值的變化。Y

biConductingMultipleRegressionAnalysis

PartialRegressionCoefficients偏回歸系數(shù)

Tounderstandthemeaningofapartialregressioncoefficient,letusconsideracaseinwhichtherearetwoindependentvariables,sothat:

為便于理解偏回歸系數(shù)的意義,我們假設(shè)有兩個(gè)自變量,所以有如下公式

=a+b1X1+b2X2First,notethattherelativemagnitudeofthepartialregressioncoefficientofanindependentvariableis,ingeneral,differentfromthatofitsbivariateregressioncoefficient.首先,注意一個(gè)自變量的偏回歸系數(shù)的相對(duì)重要性在總體上不如其二元回歸系數(shù)。Theinterpretationofthepartialregressioncoefficient,b1,isthatitrepresentstheexpectedchangeinYwhenX1ischangedbyoneunitbutX2isheldconstantorotherwisecontrolled.Likewise,b2representstheexpectedchangein

YforaunitchangeinX2,whenX1isheldconstant.Thus,callingb1andb2partialregressioncoefficientsisappropriate.偏回歸系數(shù)b1代表的意義是,X2在不變或受到控制的前提下,X1變化一個(gè)單位會(huì)使Y產(chǎn)生的預(yù)期變化。同樣b2代表的意義是,X1在不變或受到控制的前提下,X2變化一個(gè)單位會(huì)引起Y產(chǎn)生的預(yù)期變化。YConductingMultipleRegressionAnalysis

PartialRegressionCoefficients偏回歸系數(shù)ItcanalsobeseenthatthecombinedeffectsofX1andX2onYareadditive.Inotherwords,ifX1andX2areeachchangedbyoneunit,theexpectedchangeinYwouldbe(b1+b2).。X2,和

X1對(duì)Y的聯(lián)合作用是累加的。即如果都改變一個(gè)單位,Y的預(yù)期變化就是(b1+b2

)SupposeonewastoremovetheeffectofX2fromX1.ThiscouldbedonebyrunningaregressionofX1onX2.Inotherwords,onewouldestimatetheequation1=a+bX2andcalculatetheresidualXr=(X1-1).Thepartialregressioncoefficient,

b1,isthenequaltothebivariateregressioncoefficient,br,obtainedfromtheequation=a+brXr.假設(shè)我們希望從X1中X2去除的影響,可以用X2對(duì)X1進(jìn)行回歸,也就是估計(jì)方程=a+bX2

,并計(jì)算殘差Xr=(X1-1),因此,偏回歸系數(shù)br與方程Y=a+brXr.中的二元回歸系數(shù)相等。XXYConductingMultipleRegressionAnalysis

PartialRegressionCoefficients偏回歸系數(shù)Extensiontothecaseofkvariablesisstraightforward.Thepartialregressioncoefficient,b1,representstheexpectedchangeinYwhenX1ischangedbyoneunitandX2throughXkareheldconstant.Itcanalsobeinterpretedasthebivariateregressioncoefficient,b,fortheregressionofYontheresidualsofX1,whentheeffectofX2throughXkhasbeenremovedfromX1.

以上方程可以直接擴(kuò)展到K個(gè)變量的情況。偏回歸系數(shù)b1道標(biāo)X2到Xk固定時(shí),X1 變化一單位引起Y的預(yù)期變化。它也可以解釋為去除X2到Xk對(duì)X1的影響后,Y對(duì)X1殘差回歸的二元回歸系數(shù)。Therelationshipofthestandardizedtothenon-standardizedcoefficientsremainsthesameasbefore:

標(biāo)準(zhǔn)化與非標(biāo)準(zhǔn)化系數(shù)之間的關(guān)系為:

B1=b1(Sx1/Sy) Bk=bk(Sxk/Sy)Theestimatedregressionequationis:估計(jì)出的回歸方程為;

()=0.33732+0.48108X1+0.28865X2orAttitude=0.33732+0.48108(Duration)+0.28865(Importance)YMultipleRegressionTable17.3MultipleR 0.97210R2 0.94498AdjustedR2 0.93276StandardError 0.85974

ANALYSISOFVARIANCE

df SumofSquares MeanSquareRegression 2 114.26425 57.13213

Residual 9 6.65241 0.73916

F=77.29364 SignificanceofF=0.0000VARIABLESINTHEEQUATIONVariable b SEb Beta(?) TSignificance ofTIMPORTANCE 0.28865 0.086080.31382 3.353 0.0085

DURATION 0.48108 0.058950.76363 8.160 0.0000

(Constant) 0.33732 0.56736 0.595 0.5668

多元回歸Table17.3多元

R

0.97210R2 0.94498調(diào)整的

R2 0.93276標(biāo)準(zhǔn)誤 0.85974

ANALYSISOFVARIANCE

df SumofSquares MeanSquare回歸方程 2 114.26425 57.13213

殘差 9 6.65241 0.73916

F=77.29364 SignificanceofF=0.0000VARIABLESINTHEEQUATION變量

b SEb Beta(?) TSignificance ofT天氣重要性 0.28865 0.086080.31382 3.353 0.0085

居住年限 0.48108 0.058950.76363 8.160 0.0000

(常數(shù)項(xiàng)

0.33732 0.56736 0.595 0.5668

ConductingMultipleRegressionAnalysis

StrengthofAssociation聯(lián)系的強(qiáng)度SSy=SSreg+SSreswhereSSreg=(Yi-Y)2Si=1nSSy=(Yi-Y)2Si=1nSSres=(Yi-Yi)2Si=1nConductingMultipleRegressionAnalysis

StrengthofAssociation聯(lián)系的強(qiáng)度Thestrengthofassociationismeasuredbythesquareofthemultiplecorrelationcoefficient,R2,whichisalsocalledthecoefficientofmultipledetermination. 變量之間聯(lián)系的強(qiáng)度可以用多元相關(guān)系數(shù)的平方R2,來(lái)測(cè)量,也稱多元可決系數(shù)。R2

=

SSregSSyR2isadjustedforthenumberofindependentvariablesandthesamplesizebyusingthefollowingformula:

R2可以根據(jù)自變量的數(shù)量和樣本規(guī)模按照如下公式調(diào)整AdjustedR2

=R2

-

k(1

-

R2)n

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論