Econometrics Beat: Dave Giles' Blog: Variance Estimators That Minimize MSE

Tuesday, May 21, 2013

Variance Estimators That Minimize MSE

In this post I'm going to look at alternative estimators for the variance of a population. The following discussion builds on a recent post, and once again it's really directed at students. Well, for the most part.

Actually, some of the results relating to populations that are non-Normal probably won't be familiar to a lot of readers. In fact, I can't think of a reference for where these results have been assembled in this way previously. So, I think there's some novelty here. But we'll get to that in due course.

I can just imagine you smacking your lips in anticipation!
To get things started, let's suppose that we're using simple random sampling to get our n data-points, and that this sample is being drawn from a population that's Normal, with a (finite) mean of μ and a (finite) variance of σ². Let x* denote the sample average: x* = (1 / n)Σx_i, where the range of summation here (and everywhere below) is from 1 to n.

Let's consider the family of estimators of σ²:

s_k² = (1 / k)Σ[(x_i - x*)²],

where "k" is a positive number. Now, to be very clear, I'm not suggesting that we should necessarily restrict our attention to estimators that happen to be in this family - especially when we move away from the case where the population is Normal. I'll come back to this point towards the end of this post.

Clearly, if k = (n - 1), we just have the usual unbiased estimator for σ², which for simplicity we'll call s². If k = n, we have the mean squared deviation of the sample, s_n², which is a downward-biased estimator of σ².

However, we all know that unbiasedness isn't everything!

Often, we look at our potential estimators and evaluate them in the context of some sort of loss function. If this loss function is quadratic, then the expected loss (or "risk") of an estimator is its Mean Squared Error (MSE). You'll recall that the MSE of an estimator is just the sum of its variance and the square of its bias.

Let's compare the unbiased estimator, s², and the biased estimator, s_n², in terms of MSE.

Because we're using simple random sampling from a Normal population, we know that the statistic c = [(n - 1)s² / σ²] follows a Chi-square distribution with (n - 1) degrees of freedom. So, s² = (σ² / (n - 1))χ²_{(n - 1)}. We also know that the mean of a Chi-square random variable equals its degrees of freedom; and its variance is twice those degrees of freedom. So, E[s²] = σ², and Var.(s²) = 2σ⁴ / (n - 1).

The first of these two results also holds if the population is non-Normal, but the second result doesn't hold, as I discussed in this earlier post.

Next, noting that s_n² = (n - 1)s² / n, it follows that;

E[s_n²] = [(n - 1) / n]σ² ;
Bias[s_n²] = E[s_n²] - σ² = - (σ²/ n)

and
Var.[s_n²] = 2σ⁴(n - 1) / n².

So, the MSE of s_n² is given by the expression,

MSE(s_n²) = Var.[s_n²] + (Bias[s_n²])² = σ⁴ (2n - 1) / n².

Because s² is unbiased, its MSE is just its variance, so MSE(s²) = 2σ⁴ / (n - 1). Noting that MSE(s_n²) = [(n - 1) / n] MSE(s²) - (σ⁴ / n²), we see immediately that MSE(s_n²) < MSE(s²), for any finite sample size, n. This can be seen in the following chart, drawn for σ²= 1. (Of course, the two estimators, and their MSEs coincide when the sample size is infinitely large.)

Although s_n² dominates s², in terms of having smaller MSE for all possible sample values, and all (finite) sample sizes, it's still not the best we can do within the family of estimators we're considering! Let's go back to this class of estimators and ask, "what value of k will lead to the estimator with the smallest possible MSE for all members of this class?"

We can answer this question by using a bit of simple calculus. We'll write out the expression for the MSE of s_k², and it will be some function of "k". Then we'll differentiate this function with respect to "k", set the derivative to zero, and then solve for the value of k (say k*). We should then check the sign of the second derivative to make sure that k* actually minimizes the MSE, rather than maximizes it!

So, here goes ........

Notice that we can write a typical member of our family of estimators as

s_k² = (1 / k)Σ[(x_i - x*)²] = [(n - 1) / k]s² .

So, using the results that E[s²] = σ², and Var.(s²) = 2σ⁴ / (n - 1), we get:

E[s_k²] = [(n - 1) / k]σ² ;
Bias[s_k²] = E[s_k²] - σ² = [(n - 1 - k) / k]σ²;
and
Var.[s_k²] = 2σ⁴(n - 1) / k².

The MSE of s_k² is given by the expression,

M = MSE(s_k²) = Var.[s_k²] + (Bias[s_k²])² = (σ⁴ /k²)[2(n - 1) + (n - 1 - k)²].

Differentiating M with respect to "k", and setting this derivative to zero, yields the solution, k* = (n + 1). You can easily check that k* minimizes (not maximizes) M.

So, within this family that we've been considering, the minimum MSE (MMSE) estimator of σ² is the estimator,

s_n+1² = (1 / (n + 1))Σ[(x_i - x*)²] .

This is certainly a well-known result. It also extends naturally to the situation where we're estimating the variance of the (Normally distributed) error term in a linear regression model. In that case the MMSE of this variance is (1 / (n - p + 2))Σe_i², where e_i is the i^th OLS residual, and p is the number of coefficients in the model.

So far, so good!

Now, let's connect with the earlier post that I mentioned above, and see how all of this works out if we have a population that's non-Normal. We'll retain the simple random sampling, though. As was discussed in that post, in general the variance of s² is given by:

Var.[s²] = (1 / n)[μ₄ - (n - 3)μ₂² / (n - 1)] = V, say.

Here, μ₂ and μ₄ are the second and fourth central moments of the population distribution. Recall that μ₂ is the population variance, and for the result immediately above to hold the first four moments of the distribution must exist.

Let's extend this variance expression to members of the family, s_k². Then we'll work out the expression for the MSE of such estimators for a non-normal population. Finally, this will allow us to derive the MMSE estimator in this family for any population distribution - not just for the Normal population that we dealt with earlier in this post.

Once again, we'll begin by using the fact that we can write:

s_k² = (1 / k)Σ[(x_i - x*)²] = [(n - 1) / k]s² .

The estimator, s², is still unbiased for σ² even in the non-Normal case, so we still have the results:

E[s_k²] = [(n - 1) / k]σ² ; and Bias[s_k²] = [(n - 1 - k) / k]σ²= [(n - 1 - k) / k]μ₂

as before.

Also,

Var.[s_k²] = [(n - 1) / k]² Var.[s²] = [(n - 1) / k]²(1 / n)[μ₄ - (n - 3)μ₂² / (n - 1)] ,

and so the MSE of s_k² is given by:

MSE(s_k²) = [(n - 1 - k) / k]²μ₂² + V[(n - 1) / k]²,

where V = Var.[s²] is defined above.

Going through the calculus once again, it's easy to show (I used to hate that statement in textbooks) that the value of "k" for which the MSE is minimized is:

k** = (n - 1)(V + μ₂²) / μ₂²,

where V = (1 / n)[μ₄ - (n - 3)μ₂² / (n - 1)].

Check: For the Normal distribution, μ₄ = 3μ₂², and so k** = (n + 1) = k*, as before.

Having gone to all of this effort, let's finish up by illustrating the optimal k** values for a small selection of other population distributions:

Uniform, continuous on [a , b]

μ₂ = (b - a)²/ 12 ; μ₄ = (9 / 5)
k** = (n - 2) + (3 / n) +1296(n - 1) / [5n(b - a)⁴]

Standard Student's-t, with v degrees of freedom

μ₂ = v / (v - 2) ; μ₄ = 3v² / [(v - 2)(v - 4)]
k** = (n - 2) + (3 / n) +3(n - 1)(v - 2) /[n(v - 4)] ; for v > 4

χ², with v degrees of freedom

μ₂ = 2v ; μ₄ = 3(v + 4) / v
k** = (n - 2) + (3 / n) +3(n - 1)(v + 4)] / (4nv³)

Exponential, with mean θ

μ₂ = θ² ; μ₄ = 9θ⁴
k** = (n² + 7n - 6) / n

Poisson, with parameter λ

μ₂ = λ ; μ₄ = λ(3λ + 1)
k** = (n - 2) + (3 / n) +(n - 1)(3λ + 1) / (nλ³)

Now, a final word of caution.

Yes, setting k = k** in the case of each of these non-Normal populations, and then estimating the variance by using the statistic, s_k² = (1 / k)Σ[(x_i - x*)²], will ensure that your "estimator" is MMSE within this particular family of estimators. However, this doesn't mean to say that it's the "best", or even a feasible, estimator to use.

For instance, consider the last example where the population is Poisson. In that case, the population mean and variance are both λ. The MLE for λ is the sample average, x*. Therefore, x* is also the MLE for the population variance. Actually, x* is the "minimum variance unbiased" (MVUE) estimator of λ. The statistic s² is also an unbiased estimator of λ, but it is inefficient relative to x*. So x* dominates s² in terms of MSE. See here for a nice discussion.

If we were to try and implement our MMSE estimator of the variance in this case, we'd be trying to estimate λ. However, k** is a function of λ. We'd need to know the population variance in order to obtain the MMSE estimator of that parameter! You can see that the same issue applies to the Student's-t and χ² examples given above but it's not an issue with the other two examples.

Sometimes, MMSE estimators simply aren't "feasible". They're functions of the unknown parameters we're trying to estimate. They're not really estimators, at all!

Econometrics Beat: Dave Giles' Blog

Pages

Tuesday, May 21, 2013

Variance Estimators That Minimize MSE

No comments:

Post a Comment