Monthly Archives: May 2016

[STATA] Normality of Residual

<Step 1> Regression

. reg price headroom gear_ratio foreign mpg

reg3

. am getting viagra emails myself

<Step 2> Find Residual

e = Y – Yhat diferencia viagra generico

. predict e, resid

.

<Step 3>

<1. Graphical Ways>

. kdensity e, normal

normal1

. histogram e, kdensity normal cialis

normal2

Standardize Normal Probability Plot

Checks for non-normality in the middle range of residuals

. pnorm e

normal3 buy cialis without a doctor prescription

Quintile-Normal Plots

Check for non-normality in the extremes of the data (tails). It plots quintiles of residuals vs quintiles of a normal distribution.

. qnorm e

normal4

.

<2. Shapiro-Wilk Test for Normality>

H0: the distributions of the residuals is normal

. swilk e

Normal5

p-value < 0.01  –> H0 is rejected at 99% sig lvl.  –> Not Normal

.

.

[catlist name=stata numberposts=-1]

[STATA] Influence Indicators

<1> Dfbeta

Measures the influence of each observation on the coefficient of a particular independent variable (for example, x1). This is in Standard Errors terms.

An observation is influential if it has a significant effect on the coefficient.


A case is an influential outlier

if

lDfBetal > 2 / SQRT(N)

Where N is the sample size

Note: STATA estimates standardized DfBetas.


Command

. reg price headroom gear_ratio foreign mpg

. predict DF_mpg, dfbeta(mpg)

To flag the cutoff cialis existe generico

. gen cutoffdfbeta = abs(DF_mpg) > 2/sqrt(e(N)) & e(sample)

dfbeta

.

<2> DfFit

Indicator of leverage and high residuals.

Measures how much an observation influences the regression model as a whole.

How much the predicted values change as a result of including and excluding a particular observation. buy sildenafil online no prescription


High influence cheap cialis

if

lDfFITl > 2 SQRT(k/N)

where k is the number of parameters (including the intercept)
and N is the sample size


Command

. reg price headroom gear_ratio foreign mpg

. predict DFits if e(sample), dfits

To flag the cutoff

. gen cutoffdfit = abs(DFits) > 2*sqrt((e(df_m)+1)/e(N)) & e(sample)

dfit

.

<3> Covariance Ratio

Measures the impact of an observation on the Standard Errors


High influence

if

lCOVRATIO – 1l >= 3 * k / N

where k is the number of parameters (including the intercept)
and N is the sample size


Command

. reg price headroom gear_ratio foreign mpg

. predict covratio if e(sample), covratio

To flag the cutoff

. gen cutoffcov = abs(covratio) >= 3*(e(df_m)+1)/e(N) & e(sample)

covra

.

<4> Cook’s Distance

Measures how much an observation influences the overall model or predicted values.

It is a summary measure of leverage and high residuals


High influence

if

D > 4 / N

where N is the sample size.

D > 1 indicates big outlier problem


Command

. reg price headroom gear_ratio foreign mpg

. predict D, cooksd

To flag the cutoff (>1)

. gen cutoffD1 = D > 1

D1

To flag the cutoff (>4/N)

. gen cutoffD2 = D > 4/e(N) & e(sample)

D2

.

<5> Leverage

Measures how much an observation influences regression coefficient


High influence

if

leverage h > 2 * k / N

where k is the number of parameters (including the intercept)
and N is the sample size

A rule-of-thumb: Leverage goes from 0 to 1.
A value closer to 1 or over 0.5 may indicate problems.


Command

. reg price headroom gear_ratio foreign mpg

. predict lev, leverage

To flag the cutoff (>0.5)

. gen cutofflev = lev > 0.5

lev1

To flag the cutoff (>2 * k/N)

. gen cutofflev2 = lev > 2*(e(df_m)+1)/e(N) & e(sample)

lev2

.

<6> Mahalanobis Distance

It is rescaled measure of leverage.

M = leverage * (N-1)

where N is sample size


Higher levels indicate higher distance from average values.

The M-distance follows a Chi-square distribution with K-1 df and alpha = 0.001 (where k is the number of independent variables).

Any value over this Chi-square value may indicate problems.

 

[STATA] Specification Error

Why?

Specification error occurs when an Independent Variable is correlated with the error term.

Causes of Specification Error

  • An incorrect functional form could be employed
  • Omitted Variable Bias (A variable omitted from the model may have a relationship with both the dependent variable and one or more of the independent variables)
  • An irrelevant variable may be included in the model
  • Simultaneity Bias (the dependent variable may be part of a system of simultaneous equations)
  • Measurement error may affect the independent variables

Test <linktest = Ramsey RESET Test>

Ramsey RESET Test, Ramsey Regression Equation Specification Error Test (RESET) Test.

It checks whether we need more variables in our model by running a new regression with the observed Y against Yhat and Yhat-squared as independent variables

<Step 1>

. reg price headroom gear_ratio foreign mpg, robust

reg2

<Step 2>

. linktest

linktest

H0: No specification Error

p-value = 0.001 –> we reject H0 at 99.9% sig. level.

Our model is incorrectly specified

[STATA] Homeskedasticity Test

One of the important OLS Assumption is Homoskedasticity of the residuals.

<Step 1>

*MUST NOT use robust option

. reg price headroom gear_ratio foreign mpg

reg3

.

<1> Graphical Way

. rvfplot, yline(0)

rvfplot

.

<2> Breusch-Pagan Test

H0: resuduals are homoskedastic

hettest

p-value is 0.0002 and hence we reject H0 at 99.9% sig. level.

[STATA] Omitted Variable Bias Test

Why?

When there are variables omitted in our model and is correlated with the included regressor & the omitted variable is a determinant of the dependent variable

–> Our OLS estimator are inconsistent

(Stock & Watson, 2003, p144)

Test <ovtest>

reg

. ovtest

Ramsey RESET test using powers of the fitted values of price
Ho: model has no omitted variables
F(3, 67) = 3.31
Prob > F = 0.0252

 

H0: Model has no omitted Variables.

p-value < 0.05  –> We reject H0 at 95% sig. level.