Thursday, July 4, 2013

Allocation Models

An "allocation model" is a special type of multi-equation model that has some interesting properties. This type of model arises quite frequently in applied econometrics, and it's worth knowing about it. In this post I'll explain what an allocation model is, and explore some of the estimation results that arise.

Let's suppose that there are n observations on the variables y1, y2 and x; and we have the following pair of regression equations:

            y= α1 + β1x + ε1

            y2 = α2 + β2x + ε2

           E[ε1] = E[ε2] = 0

           var.(ε1i) = σ12     ;    var.(ε2i) = σ22    ;    cov. (ε1i , ε2i) = σ12    ;    for all i.

          cov.(εpi , εqj) = 0    ;    for p, q = 1, 2; and for all i ≠ j .

In what follows, it doesn't matter if x is random or non-random, and it doesn't matter what distribution the errors follow. In fact, even though I've assumed that each error term is individually homoskedastic and serially independent, and that there is no serial correlation across the error terms, this can all be relaxed without much effort.

Suppose that the two dependent variables satisfy the constraint, (y1 + y2) = 1. This would arise, for instance if we were modelling market shares for the only two manufacturers of some good. This constraint would also have to be satisfied if we were looking at the shares of a country's exports to its only two trading partners.

Then, what we have is an "allocation model". The sales, exports, or whatever, are being completely "allocated" across manufacturers, trading partners, etc. Of course, more generally there would be (say) m shares and hence m equations, and the constraint would then be (y1 + y2 + .... + ym) = 1.

Now, let's think about the implications of the "adding up" constraint. First, notice that the intercept in the first equation is really (αx 1). The "1" is the value of the intercept variable, and its coefficient is α1. The same thing applies in the second equation. Second, notice that to ensure that (y1 + y2) = 1, for any possible value that x may take, we must have (α1 + α2) = 1; (β1 + β2) = 0; and (ε1 + ε2) = 0.

The third of these restrictions is especially interesting, as it applies to the random error terms, not to the fixed parameters.

The covariance matrix for the two errors is assumed to be the same at every sample point. We say that we have a fixed "contemporaneous" covariance matrix. In our case here, it's a (2 x 2) matrix, with σ12  and σ22  as the diagonal elements, and σ12 as each off-diagonal element. Now, what happens if we have to have (ε1 + ε2) = 0? This means that (ε1i + ε2i) = 0, for all i. In this case, ε2i = -ε1i, and the covariance matrix for the errors has σ12  as both of the diagonal elements, and both of the off-diagonal elements are -σ12 . Consequently, the covariance matrix for the errors has a zero determinant! The matrix is singular - it's rank is 1, rather than 2.

Alright, let's now think about whether this has any implications for the estimation of the two equations, and for the results that we'll obtain.

Given that we have a pair of equations, and the errors of these equations have a non-zero covariance,  σ12 , presumably we'd think of using the Seemingly Unrelated Regression (SURE) estimator. This isn't possible, because we can't invert the error covariance matrix. However, this doesn't actually matter. The reason is that because exactly the same regressors appear in both equations, we can apply OLS estimation and get the same results that we'd have obtained by using SURE (if the covariance matrix were non-singular).

So we can just use OLS to estimate each equation separately. Great, but what about the restrictions that the parameters have to satisfy? Usually, to ensure that these two restrictions hold, we'd estimate the model as a SURE system and impose the cross-equation restrictions. We can't do this, because of the singular covariance matrix!

Fortunately, there's enough "structure" on the problem to ensure that we get the desired results when we estimate the equations separately using OLS!

Let a1, b1, a2, and b2 be the OLS estimators of α1, β1, α2, and β2. Also, let x*, y1*, and y2* be the sample averages of the xi, y1i, and y2i values. We can write:

                 b1 = [Σ(xi - x*)y1i] / [Σ(xi - x*)2] .

Then, substituting (1 - y2i) for y1i, we get:

                 b1 = [Σ(xi - x*)] / [Σ(xi - x*)2] - [Σ(xi - x*)y2i] / [Σ(xi - x*)2] .

But,  [Σ(xi - x*)] = 0, and so:

                 b1 = - [Σ(xi - x*)y2i] / [Σ(xi - x*)2] = -b2.

The OLS estimators of the slope parameters satisfy the desired restriction, (b1 + b2) = 0, automatically!

Similarly, a1 = (y1* - b1x*) = (1 - y2*) - (-b2x*) = 1 - (y2* -b2x*) = 1 - a2 .

The OLS estimators of the intercept parameters satisfy the desired restriction, (a1 + a2) = 1, automatically!

It also follows immediately that the fitted (predicted) values of y1 and y2 satisfy the restriction of summing to one in value. Isn't that nice?

We can extend this to the more general case where we have, say, m allocation equations, and a more general form of allocation. Specifically, the m dependent variables might sum to some linear combination of the regressors, at each sample point. (In the case considered above, the linear combination put a weight of 1 on the intercept variable, and a weight of zero on the x variable.)

Let's write our m-equation system as:

           Y = Xβ + U ,

where Y and U are both (n x m), X is (n x k), and β is (k x m). A column of this matrix equation gives us all of the observations for a single equation. A row of this matrix equation gives us one observation on all m equations.

Let's write the allocation constraint as Y1 = Xθ , where 1 is a column vector of m "ones", and θ is a (k x 1) vector of known values. Typically, these values will be zero, one, or negative one, but they don't have to be. For instance, if we had a set of m Engel curves, each explaining the expenditure on one good, and with "total expenditure" as a regressor, then theta would have zero elements everywhere, except in the position corresponding to total expenditure. In that position there'd be a "one", so that total expenditure equalled the sum of the individual expenditures.

So, applying the allocation constraint to the m-equation model, we have:

          Y1 = Xβ1 + U1 = Xθ .

For this to hold for any values the data may take, we require that β1 = θ; and U1 = 0. The second of these restrictions again implies that the (m x m) contemporaneous covariance matrix for the errors is singular, as this matrix is Ω = E[U'U], and Ω1 = E[U'U1] =E[0] = 0. The rank of Omega is therefore (m - 1), rather than m.

Applying OLS to each equation of the model, we have

            B = (X'X)-1X'Y ,

and so:

            B1 = (X'X)-1X'Y1 = (X'X)-1X'Xθ = θ .

The OLS estimates again automatically satisfy the allocation constraints on the parameters.

In addition the matrix of the predicted values from all of the equations is XB, and we have:

            XB1 = Xθ = Y1 .

The predictions from the m equations add up to the same value that the observed y data add up to!

Let's look at a simple empirical application to illustrate all of this. The EViews workfile that I used is on this blog's code page, and the data are also in a text file on the data page.

The allocation model we'll estimate explains the market shares of 6 web browsers (including "other") over the period October 2004 to February 2007. These shares add up to one. Here's a plot of the market shares over the sample period:
The explanatory variables are themselves market shares - of various computer operating systems. As an intercept is included in each equation, the "Other_OS" share variable is omitted from each each equation to avoid perfect multicollinearity. Each equation has an intercept and five OS variables as regressors. A typical equation is specified like this:

          ie=c(1)+c(2)*winxp+c(3)*win2000+c(4)*win98+c(5)*macos+c(6)*macintel

Here are the resulst of estimating the six-equation system by OLS:


Look at the value of the determinant of the residual covariance matrix! OLS estimation worked, of course, but SURE estimation would not feasible for the reasons outlined above.

The rest of the results are:
The intercept coefficients are C(1), C(7), C(13), C(19), C(25), and C(31). The sum of their estimates is 1.0 (to six decimal places), as it should be! The coefficients for the WINXP regressor are C(2), C(8), C(14), C26), and C(32). The sum of their estimates is 9.26*10-14. I call that zero! (As it should be.) The same thing applies to the coefficients of the other regressors.

In EViews, you can't forecast directly from a system, but can construct the matrix of the residuals for each equation, and use the result that "fitted" = "actuals" + "residuals":

PROC  ; MAKE RESIDUALS



genr    ief = ie + resid_ols01              etc.

genr   sum_predict = ief + firefoxf + netscapef + operaf + safarif + other_bf

The value of the series SUM_PREDICT is exactly 1.0 at every point in the sample - the same as the sum of the observed market shares for the web browsers.

There's another result that I haven't mentioned so far. If we take our system of m equations, with its singular error covariance matrix, and drop any one of the equations, the reduced model will have a non-singular error covariance matrix. In addition if we then apply the SURE estimator to this system of (m - 1) equations, the parameter estimates will be identical to the OLS estimates. We can "recover" the estimates of the parameters for the "dropped" equation by using the allocation constraints.

I'm preparing a separate post on this, and some extensions.

Homework: You might want to persuade yourself that we can replace OLS estimation with IV estimation everywhere in this post, and all of the results will still hold.


References

Bewley, R., 1986. Allocation Models: Specification, Estimation and Applications. Ballinger, Cambridge, MA.

Powell, A. A., 1969. Aitken estimators as a tool in allocating predetermined aggregates. Journal of the American Statistical Association, 64, 913-922.


© 2013, David E. Giles

9 comments:

  1. keep it up Dave.

    It helps all of us who have tried to keep up with this since Uni

    ReplyDelete
  2. Unfortunately, in addition to the sum to one constraint, most allocation problems also require 0<y(i)<1.

    ReplyDelete
    Replies
    1. First, lots of allocation models DON'T require this. See my Engel curve example in the post.

      Second, if you want to take account of this constraint, you can estimate the equations by MLE, assuming beta-distributed errors, and the estimated coefficients will automatically satisfy the "adding up" restrictions. Perhaps you weren't aware of this?

      I'll post a piece on this tomorrow.

      Delete
  3. Thanks for the post, Professor.
    I don't know, but to me, in your two-equation example, it is easier to just say that if you have y1=a1+xb1+e1, then algebraically, 1-y1 = (1-a1) - xb1 - e1. Therefore, a2 = (1-a1), b2 = - b1, and e2 = -e1. These trivially imply the three conditions: a1+a2 = 1; b1+b2=0; e1+e2=0.

    ReplyDelete
    Replies
    1. That's true here, with the same x in each equation. However, a similar set of results apply when there are different x variables in each equation. In that case your approach won't work, but mine does.

      Delete
  4. Hi Dave,

    I enjoyed reading this post! The problems faced in this situation remind me of regressing individual stock log-returns on the market log-return in order to estimate betas coefficients in the CAPM. Blindly doing this by OLS ignores the fact that the market return is actually a linear combination of the individual returns. Perhaps some of the insights from your post are relevant to this problem too; I'll think a bit about it.

    In the example you present, do you find that the residuals satisfy your assumptions of no autocorrelation? I noticed the market share series are quite persistent (possibly even close to integrated of second order), and therefore likely to violate your model assumptions (unless you have some kind of cointegration).

    Best,
    Andreas

    ReplyDelete
    Replies
    1. Andreas - thanks for your comments. The shares can't be persistent in the usual sense as they're bounded between zero and 1. There is very little evidence of autocorrelation (by the portmanteau test). If autocorrelation were present, asd we wanted to model it, this has to be done in a way that satisfies the "adding-up" restrictions. I'll post on this at some stage.

      Delete
  5. I had no idea the estimated coefficients will automatically satisfy the "adding up" restrictions in this case. Thanks for clearing it up.

    ReplyDelete