Monday, December 29, 2014

Multivariate Medians

I'll bet that in the very first "descriptive statistics" course you ever took, you learned about measures of "central tendency" for samples or populations, and these measures included the median. You no doubt learned that one useful feature of the median is that, unlike the (arithmetic, geometric, harmonic) mean, it is relatively "robust" to outliers in the data.

(You probably weren't told that J. M. Keynes provided the first modern treatment of the relationship between the median and the minimization of the sum of absolute deviations. See Keynes (1911) - this paper was based on his thesis work of 1907 and 1908. See this earlier post for more details.)

At some later stage you would have encountered the arithmetic mean again, in the context of multivariate data. Think of the mean vector, for instance.

However, unless you took a stats. course in Multivariate Analysis, most of you probably didn't get to meet the median in a multivariate setting. Did you ever wonder why not?

One reason may have been that while the concept of the mean generalizes very simply from the scalar case to the multivariate case, the same is not true for the humble median. Indeed, there isn't even a single, universally accepted definition of the median for a set of multivariate data!

Let's take a closer look at this.