




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
1、1第七章多元回歸分析異方差問題的處理2contentsnWhats heteroskedasticity?nWhy worry about heteroskedasticity?nHow to test the heteroskedasticity?nCorrections for heteroskedasticity?3Whats heteroskedasticity?4What is HeteroskedasticitynRecall the assumption of homoskedasticity implied that conditional on the explanatory
2、 variables, the variance of the unobserved error, u, was constantvar(u|x)=s2 (homoskedasticity)nIf this is not true, that is if the variance of u is different for different values of the xs, then the errors are heteroskedasticvar(ui|xi)=si2(heteroskedasticity)nExample:if we examine a cross section o
3、f firms in one industry, error terms associated with very large firms might have larger variances than those error terms associated with smaller firms; sales of larger firms might be more volatile than sales of smaller firms.Consider a cross-section study of family income and expenditures. It seems
4、plausible to expect that low income individuals would spend at a rather steady rate, while the spending patterns of high income families would be relatively volatile.5.x x1x2yf(y|x)Example of Heteroskedasticityx3.E(y|x) = b0 + b1x6Patterns of heteroskedasticity7Why Worry About Heteroskedasticity?8Wh
5、y Worry About Heteroskedasticity?nOLS is still unbiased and consistent, even if we do not assume homoskedasticity The R2 and adj-R2 are unaffected by heteroskedasticity.nThe standard errors of the estimates are biased if we have heteroskedasticityThe OLS estimates arent efficient, thats the variance
6、s of the estimates are not the smallest variances.nIf the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences2,or limjjjjiijjjiEPxx uEExxbbbbbbb9How to test the heteroskedasticity?10Testing for HeteroskedasticityGolfeld-Quandt Tes
7、t11Testing for HeteroskedasticityGolfeld-Quandt TestnEssentially want to test H0: Var(u|x1, x2, xk) = s2, which is equivalent to H0: E(u2|x1, x2, xk) = E(u2) = s2nH1: si2 = cxi2.nGoldfeld-Quandt test procedure:Order the data by the magnitude of the independent variable x, which is thought to be rela
8、ted to the error variance.Omit the middle d observations. d might be chosen, for example, to be approximately 1/5 of the total sample size.Fit the two separate regressions, the first for the portion of the data associated with low values of x and the second associated with high values of x. each reg
9、ression will involve (n-d)/2 pieces of data and (n-d)/2-k-1 degrees of freedom.Calculate the residual sum of squares associated with each regression: SSR1 associated with low xs and SSR2 associated with high xs.The statistic SSR2 /SSR1 will be distributed as an F statistic with n-d-2(k+1)/2 degress
10、of freedom in both the numerator and the denominator.12Example:Goldfeld-Quandt Test, (HR: Ex6.2, 154)nInsheet using pathex61.txtnsort incnreg hexp inc if inc=15, get SSR2=2.024, n1=n2=10, k+1=2nForm statistic F=SSR2/SSR1=6.7467nThe critical value F8,8=3.438nSo we reject the null hypothesis and commi
11、t that the data are heteroskedasticity.13Testing for HeteroskedasticitynEssentially want to test H0: Var(u|x1, x2, xk) = s2, which is equivalent to H0: E(u2|x1, x2, xk) = E(u2) = s2nIf assume the relationship between u2 and xj will be linear, can test as a linear restrictionnSo, for u2 = d0 + d1x1 +
12、 dk xk + v, this means testing H0: d1 = d2 = = dk = 014The Breusch-Pagan Test nDont observe the error, but can estimate it with the residuals from the OLS regressionregress y on x1,x2,xk. We get the residual inAfter regressing the residuals squared on all of the xs, can use the R2 to form an F or LM
13、 testregress 2 on x1,x2,xk. And test the joint zero hypotheses of the regressors.nThe F statistic is just the reported F statistic for overall significance of the regression, F = R2/k/(1 R2)/(n k 1), which is distributed Fk, n k - 1nThe LM statistic is LM = nR2, which is distributed c2k15Ex6.2 HR bo
14、oknreg hexp inc /* use all observations*/npredict res, r /* get the residuals*/ngen ressq=res2 /*square of res*/nreg ressq incnget the F value is 10.13 and p-value is 0.52%.nSo, we reject the null hypothesis of homoskedasticity at 1% significance.nUse LM test, nR=200.36=7.2nThe critical value 2(1)=3
15、.84, p-value is 0.73%, we get the same result.16Example: Housing price Equation (Wooldridge, p267)nEstimated modelprce =-21770.31+2.068lotsize + 122.778sqrft + 13852.52 bdrmspredict res, r. we get the residuals i of above eq.gen ressq=res2reg ressq on lotsize, sqrft, bdrmsressq=-5.52e9+201520.9lotsi
16、ze+1691037sqrft+1.04e9bdrmsF=5.34 p-value = 0.20%nR2=880.1601=14.1152 2(3)=7.8147 p-value = 0.28%So, we have a strong evidence to reject the null hypothesis of homoskedasticity.17Example: Housing price Equation (Wooldridge, p267), cont.nWe check whether there is heteroskedasticity in log form.nEstim
17、ated model islog(prce) =5.611+0.168log(lotsize) + 0.700log(sqrft) + 0.037 bdrmspredict resid, rgen residsq=resid2regress residsq on log(lotsize), log(sqrft), bdrmsresdsq=0.510 0.007 log(lotsize)-0.063 log(sqrft)+0.017 bdrmsF=1.41 p-value=24.51%nR2=88*0.048=4.224, p-value=23.83%nSo, we cant reject th
18、e null hypothesis and there is no heteroskedasticity.18The White TestnThe Breusch-Pagan test will detect any linear forms of heteroskedasticitynThe White test allows for nonlinearities by using squares and crossproducts of all the xs, ie, k=32= d0 d1 x1+ d2x2 +d3 x3 + d4 x12+d5x22 +d6x32+d7x1x2+d8x1
19、x3+d9x2x3+vnStill just using an F or LM to test whether all the xj, xj2, and xjxh are jointly significant,nThis can get to be unwieldy pretty quickly19Alternate form of the White testnConsider that the fitted values from OLS, , are a function of all the xsnThus, 2 will be a function of the squares a
20、nd crossproducts and and 2 can proxy for all of the xj, xj2, and xjxh, so nRegress the residuals squared on and 2 and use the R2 to form an F or LM statisticnNote only testing for 2 restrictions nownThe procedure of a special case of white test:regress y on x1,x2,xk. We get the residual iCalculate ,
21、 2 (predict ybar,xb. Gen ybarsq=ybar2)regress 2 on , 2 . And test the joint zero hypotheses of the regressorsUse F statistic or LM test to test the null hypothesis of homoskedasiticity.20Example: white test in the log housing price equationnlog(prce) =5.611+0.168log(lotsize) + 0.700log(sqrft) + 0.03
22、7 bdrmspredict resid, rpredict lpbargen residsq=resid2gen lpbarsq=lpbar2regress residsq on lpbar lpbarsqresdsq=23.778 3.714lpbar +0.145lpbarsqF=1.73 p-value=18.30%nR2=88*0.0392=3.4496, p-value=17.82%nWe still get the same result as BP test, and there is no heteroskedasticity21Corrections for Heteros
23、kedasticity22Corrections for HeteroskedasticityKnown variancesnVar(ui|x)=si2nThe original model isy =b0 + b1x1 + bkxk+ uTwo sides divided by si at the same timenThe new disturbance isui*=ui/si ,then var(ui*)=var(ui/si)=var(ui)/si2=1nSo the new modely/si =b0/si + b1x1/si + bkxk/si+ u/si, that is,y* =
24、b0* + b1x1* + bkxk*+ u*We can estimate the new model with OLS, this is called WLSBut, usually, we dont know the variances. 23Case of form being known up to a multiplicative constantnSuppose the heteroskedasticity can be modeled as Var(u|x) = s2h(x), where the trick is to figure out what h(x) hi look
25、s likenE(ui/hi|x) = 0, because hi is only a function of x, and Var(ui/hi|x) = s2, because we know Var(u|x) = s2hinSo, if we divided our whole equation by hi we would have a model where the error is homoskedastic 24Example: Simple Savings Function 012*201conside the simple savings functionvar|format
26、then,varvarvarSo, we divide original equation by to get1iiiiiiiiiiiiiiiiiiiiiisavincuuincincuuincuuincuincincincusavincincincincbbssbbnUsing data saving.raw, the OLS regression issvI = -124.84 + 0.147 incInThe WLS regression issv*I = -124.95wb + 0.172 inc*I (480.86) (0.057) n=100 R2=0.2259Where, wb
27、= 1/sqrt (inci). you can write it asnsvi= -124.95 + 0.172 inci25Generalized Least SquaresnEstimating the transformed equation by OLS is an example of generalized least squares (GLS) nGLS will be BLUE in this case,(because the transformed equation will meet the Gauss-Markov assumption)nGLS is a weigh
28、ted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi)2*0011121011200111The sum of squared residuals in the transformed variables are1niiikikiniiikkiiiiiniiikikiiyxxxyxxhhhhyxxxhbbbbbbbbb26More on WLS,01,2,3,Lets consider the wage determination, where,
29、 i denote a particular firm and let e denote an employee with in the firm. Assume the above equation sati ei ei ei ei ewageeduagetenureubbbbisfies the Gauss-Markovassumptions, then we can estimate it, given a sample onindividuals across various firms. But, we only have the average values of wages, e
30、ducation, age, tenure by firm. That is, individual level data are not available. Thus, let , denote averagewages, average educations, average age, and average tenurefor the people at firm i, separately. Then the oriiiiwage educ age tenure0123iginal equationcan be transfromed to iiiiiwageeduagetenure
31、ubbbb27More on WLS, cont.2,If the original equation at the individual level satisfies the homoskedasticity assumption, then the firm-level equation the transformed equation must be heteroskedasticity.if var for all andi euis 2 , then var/, where is the number of employees in firm.1In this case, , th
32、e most efficient procedure is WLS, withweights equal to the number of employees at the firm 1/. Thisiiiiiiieummihmhms ensures that larger firms receive more weight. This givesus an efficient way of estimation the parameters in the individuallevel model when we only have averages at the firm level. A
33、 similar weighting arises when we are using per capita data atthe city, country, state, or country level. If the individual-level equation satisfies the Gauss-Markov assumptions, then the errorin per capita equation has a variance proportional to one over thesize of the population. Therefore, weight
34、ed least squares with weights equal to the population is appropriate.28Summary of WLSnWLS is great if we know what Var(ui|xi) looks likenIn most cases, wont know form of heteroskedasticitynExample where do is if data is aggregated, but model is individual levelnWant to weight each aggregate observat
35、ion by the inverse of the number of individuals29Feasible GLSnMore typical is the case where you dont know the form of the heteroskedasticitynIn this case, you need to estimate h(xi)nTypically, we start with the assumption of a fairly flexible model, such asVar(u|x) = s2exp(d0 + d1x1 + + dkxk) nSinc
36、e we dont know the d, must estimate30Feasible GLS (continued)nOur assumption implies that u2 = s2exp(d0 + d1x1 + + dkxk)vWhere E(v|x) = 1, then if E(v) = 1ln(u2) = a0 + d1x1 + + dkxk + eWhere E(e) = 1 and e is independent of xnNow, we know that is an estimate of u, so we can estimate this by OLS31Fe
37、asible GLS (continued)nNow, an estimate of h is obtained as = exp(), and the inverse of this is our weightnSo, what did we do? Run the original OLS model, save the residuals, , square them and take the logRegress ln(2) on all of the independent variables and get the fitted values, Do WLS using 1/exp
38、() as the weight32Example of FGLS: Demand for Cigarettes (Smoke.raw)nWhat determine the demand of people?nModelcgs = -3.64 + 0.88 log(income) 0.75 log(cigpric) 0.50 educ + 0.77 age 0.009 age2 2.83 restaurnnUse Breusch-Pagan test the heteroskedasticity:Get 2 and reg 2 on all independent variablesGet
39、F=5.55 p-value=0 Or, LM=8070.04=32.8 p-value =0.000014nreg ln(2) on all the independent variables and get the fitted value nTransforming all the data with 1/e, and regress the transformed equation without constant.cgs = 5.63 + 1.295 log(income) 2.94 log(cigpric) 0.463 educ + 0.482 age 0.0056 age2 3.
40、461 restaurnThe income effect is now statistically significant and larger in magnitude. The estimates changed somewhat, but the basic story is still the same. Cigarette smoking is negatively related to schooling, has a quadratic relationship with age, and is negatively affected by restaurant smoking
41、 restrictions.33Variance with Heteroskedasticity residuals OLS theare are where,is when for thisestimator A valid where,so , case, simple For the22222i22221211ixiiixxiiiiiuSSTuxxxxSSTSSTxxVarxxuxxsssbbb34Variance with Heteroskedasticity regression thisfrom residuals squared of sum theis and s,t variableindependenother allon regressingfrom residual theis where,isasticity heterosked with ofestimator valida model, regre
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- Module 9 Unit 1 教學(xué)設(shè)計 2024-2025學(xué)年外研版八年級英語上冊
- 2025屆高考生物備考教學(xué)設(shè)計:課時1 降低化學(xué)反應(yīng)活化能的酶
- 第二單元第10課《數(shù)據(jù)可視化》教學(xué)設(shè)計 2023-2024學(xué)年浙教版(2020)初中信息技術(shù)七年級上冊
- 2023-2024學(xué)年高中信息技術(shù)必修一滬科版(2019)第三單元項目八《 分析歷史氣溫數(shù)據(jù)-設(shè)計批量數(shù)據(jù)算法》教學(xué)設(shè)計
- 第12課 音樂盒之三-高中信息技術(shù)Arduino開源硬件系列課程教學(xué)設(shè)計
- 2025年磁性材料:永磁材料項目建議書
- 第一單元 各具特色的國家(大單元教學(xué)設(shè)計)高二政治同步備課系列(統(tǒng)編版選擇性必修1)
- A1型考試模擬題及答案
- 細(xì)胞生物學(xué)考試模擬題含參考答案
- 第五單元課題2 化學(xué)方程式 第2課時 根據(jù)化學(xué)方程式進行簡單計算教學(xué)設(shè)計-2024-2025學(xué)年九年級化學(xué)人教版上冊
- 班級管理的基本原理
- 2024年貴州省高考物理試卷(含答案解析)
- 管理統(tǒng)計學(xué)課件
- 博物館保安職責(zé)(4篇)
- 2024裝配式混凝土建筑工人職業(yè)技能標(biāo)準(zhǔn)
- 2025部編版九年級語文下冊全冊教學(xué)設(shè)計
- 假性動脈瘤護理
- 2024-2030年中國留學(xué)服務(wù)行業(yè)市場前景預(yù)測及發(fā)展戰(zhàn)略規(guī)劃分析報告
- 消火栓及自動噴水滅火系統(tǒng)裝置技術(shù)規(guī)格書
- 2024年體育競技:運動員與俱樂部保密協(xié)議
- 小學(xué)數(shù)學(xué)新教材培訓(xùn)
評論
0/150
提交評論