Wednesday, February 4, 2015

Four Different Types of Regression Residuals

When we estimate a regression model, the differences between the actual and "predicted" values for the dependent variable (over the sample) are termed the "residuals". Specifically, if the model is of the form:

                     y = Xβ + ε ,                                                         (1)

and the OLS estimator of β is b, then the vector of residuals is

                    e = y - Xb .                                                           (2)

Any econometrics student will be totally familiar with this.

The elements of e (the n residuals) are extremely important statistics. We use them, of course, to construct other statistics - e.g., test statistics to be used for testing the validity of the underlying assumptions associated with our regression model. For instance, we want to check, are the errors (the elements of the ε vector) serially independent; are the errors homoskedastic; are they normally distributed; etc.?

What a lot of students don't learn is that these residuals - let's call them "Ordinary Residuals" - are just one type of residuals that are used when analysing the regression model. Lets take a look at this.

I'm not referring to the fact that these residuals are measured in vertical (y) direction, as opposed to some other direction. That point was discussed in another post. Here, we'll be concerned with something rather different.

It's easy to show that

                     e = My = Mε,                                                    (3)

where M = [I - X(X'X)-1X']-1 = [I - H]-1 is an idempotent matrix.

We sometimes refer to the (idempotent) matrix, H = X(X'X)-1X' , as the "hat" matrix. The reason for this is that we often use the symbol y, with a circumflex (^ , or "hat") over it to represent the predicted or fitted values of y after using b to estimate β.

That is,
                      yhat = Xb = X(X'X)-1X'y = Hy.

You could say, "H puts the hat on y".

As an aside, statisticians usually call H the "influence matrix". It's diagonal elements (hii) are termed "leverages" because they measure the influence, or leverage, that each observed value has on the fitted values for the dependent variable, y. The idea of leverage is key in the analysis of outliers in regression.

One thing to notice, from (3), is that even if the error vector, ε, in (1) has a scalar covariance matrix, the covariance matrix of the residuals is non-scalar. That is, if

                      E(ε) = 0, and  V(ε) = σ2In,

reflecting errors that are both homoskedastic and serially uncorrelated, we have:

                     E(e) = 0, and V(e) = σM.                          (4)

(Here, I'm assuming non-random regressors.)

The matrix, M, is going to have diagonal elements that differ from one another in value; and it will have off-diagonal elements that are non-zero. The residuals are heteroskedastic and serially correlated, even when the errors are not.

This might get you thinking along the following lines. If the observed residuals are heteroskedastic and serially correlated, how can we use them to test if the unobserved errors are homoskedastic and serially uncorrelated? That's a good question, and one that I'll come back to in a later post.

Now, let's consider some different types of residuals, as promised. I'll continue to assume that these are associated with OLS estimation of our model in (1). However, the distinctions and concepts that will be introduced can also be applied to any other residuals - e.g., residuals based on Instrumental Variables estimation of the model.

A word of warning - there are some differences in terminology out there. For example, what I call standardized residuals below are sometimes called "internally Studentized residuals"; and what I call Studentized residuals are termed "externally Studentized residuals", or "Studentized deleted residuals", by some authors.

Standardized Residuals

As you'll probably guess, "Standardized Residuals" are obtained by transforming each of the residuals as follows:

                        zi = (ei - e*) / s.e.(ei)  ;   i = 1, 2, 3, ....., n  

where s.e.(ei) is the standard error of ei; and e* is the arithmetic mean of the ei's. The latter will be zero if the model, (1) includes an intercept as a regressor.

From (4),

                       var.(ei) = σ2(1 - hii),

where hii is the ith diagonal element of the "hat" matrix H. So,

                      s.e.(ei) = s[1 - hii]½   ,

where s2 is the usual unbiased estimator of σ2. That is,

                      s2 = (e'e) / (n - k) .                                       (5)

One drawback of the standardized residuals is that the numerator and denominator in the formula for zi are not independent. So, the zi's don't follow a Student-t distribution - which you may have thought they would. However, this issue can be dealt with by considering a slightly more subtle standardization of the ei's.

Studentized Residuals

These residuals are obtained by going one step further than standardizing the ei's. Rather than using a single s2, as defined in (5), we use a different variance estimator for each residual - one that is independent of (ei - e*).

Specifically, model (1) is re-estimated, n times. Each time, we omit one of the observations in the sample for the y and X data. This will yield n different estimators of β, namely b(i); i = 1, 2, ...., n. (If this sounds like the Jackknife, you're right!)

Let e(i) denote the residual vector for the ith such regression - that is, e(i) = y(i) - X(i)b(i), where y(i) and X(i) are the y vector with the ith element omitted, and the X matrix with the ith row omitted, respectively. Then, in each case, a separate estimator of σ2 is obtained by using

                     s(i)2 = e(i)'e(i) / (n - k -1)    ;     i = 1, 2, ..., n.

The "Studentized Residuals" are constructed as:

                    ui = (ei - e*) / [s(i)(1 - hii)½]    ;     i  = 1, 2, 3, ....., n.

Again, e* will be zero if the model includes an intercept.

It's important to keep in mind that neither the standardized, nor the Studentized, residuals are  pair-wise independent.

The "regular", standardized, and Studentized residuals are all used in various ways to assist in checking the specification of the associated regression model, and the robustness of the results.

Partial Residuals

Finally, there's a form of residuals that doesn't seem to get much attention in econometrics. Larsen and McCleary (1972) suggested that, in addition to using standardized or Studentized residuals for diagnostic checking, we might also consider what they termed "partial residuals".

These residuals come into play when we have a multiple regression model. To construct them, we estimate the full regression model, (1), and get the OLS estimator, b, for the full coefficient vector. The pth element of the partial residual vector associated with the pth regressor is then defined as:

              v(p)i = yi - Σ(bjXij)       ;  i = 1, 2,..., n   ;          (6)

where bj is the jth element of b; Xij is the ith observation on the jth regressor; and the range of summation in (6) is for j = 1 to k, and j ≠ p. A different partial residuals series will be obtained for each regressor in the model.

Larsen and McCleary argue that the partial residuals are useful in isolating the individual roles of the separate regressors when it comes to detecting the both the direction and extent of any non-linearities, etc. They can also be used as a useful supplement to the regular, standardized, or studentized residuals to check for possible forms of heteroskcedasticity in the model's errors.

Keep in mind that many of the tests (e.g., White's test) that we use to check for homoskedasticity are "non-constructive". A rejection of the null hypothesis doesn't tell us anything about the possible form of heteroskedasticity. So, residuals plots can be very helpful guides, perhaps pointing us to potentially useful data transformations; or to the formulation of an appropriate error variance structure, and hence likelihood function.

There are other types of residuals that have received attention in the econometrics literature over the years. However, these will be the topic of some follow-up posts.


References

Larsen, W. A. and S. J. McCleary, 1972. The use of partial residuals plots in regression analysis. Technometrics, 14, 781-790.


© 2015, David E. Giles

2 comments:

  1. Great post, as always! Are partial residuals related to DFBETAs?

    I look forward to your post about using the residuals to help understand the errors; this is something I struggle with quite a bit. I'm sure this question will reflect my ignorane, but how can we "test" for anything unobservable? For example (while not required for the Gauss-Markov theorem to hold), I've seen people plotting residuals to check if the assumption of normal errors holds. Why is this reasonable, because we assume a representative random sample? Thanks!

    ReplyDelete
    Replies
    1. Thanks - not quite the same. The DFBETAS are looking at the influence of each sample observation, rather than each regressor. More coming on diagnostics.

      Delete

Note: Only a member of this blog may post a comment.