Friday, September 22, 2017

Misclassification in Binary Choice Models

Several years ago I wrote a number of posts about Logit and Probit models, and the Linear Probability Model LPM). One of those posts (also, see here) dealt with the problems that arise if you mis-classify the dependent variable in such models.  That is, in the binary case, if some of your "zeroes" should be "ones", and/or vice versa.

In a conventional linear regression model, measurement errors in the dependent variable are not a biog deal. However, the situation is quite different with Logit, Probit, and the LPM.

This issue is taken up in detail in an excellent, recent, paper by Meyer and Mittag (2017), and I commend their paper to you.

To give you an indication of what those authors have to say, this is from their Introduction:
".....the literature has established that misclassification is pervasive and affects estimates, but not how it affects them or what can still be done with contaminated data. This paper characterizes the consequences of misclassification of the dependent variable in binary choice models and assesses whether substantive conclusions can still be drawn from the observed data and if so, which methods to do so work well. We first present a closed form solution for the bias in the linear probability model that allows for simple corrections. For non-linear binary choice models such as the Probit model, we decompose the asymptotic bias into four components. We derive closed form expressions for three bias components and an equation that determines the fourth component. The formulas imply that if misclassification is conditionally random, only the probabilities of misclassification are required to obtain the exact bias in the linear probability model and an approximation in the Probit model. If misclassification is related to the covariates, additional information on this relation is required to assess the (asymptotic) bias, but the results still imply a tendency for the bias to be in the opposite direction of the sign of the coefficient."
This paper includes a wealth of information, including some practical guidelines for practitioners.


Meyer, B. D. and N. Mittag, 2017. Misclassification in binary choice models. Journal of Econometrics, 200, 295-311.

© 2017, David E. Giles

Wednesday, September 20, 2017

Monte Carlo Simulations & the "SimDesign" Package in R

Past posts on this blog have included several relating to Monte Carlo simulation - e.g., see here, here, and here.

Recently I came across a great article by Matthew Sigal and Philip Chalmers in the Journal of Statistics Education. It's titled, "Play it Again: Teaching Statistics With Monte Carlo Simulation", and the full reference appears below.

The authors provide a really nice introduction to basic Monte Carlo simulation, using R. In particular, they contrast using a "for loop" approach, with using the "SimDesign" R package (Chalmers, 2017). 

Here's the abstract of their paper:
"Monte Carlo simulations (MCSs) provide important information about statistical phenomena that would be impossible to assess otherwise. This article introduces MCS methods and their applications to research and statistical pedagogy using a novel software package for the R Project for Statistical Computing constructed to lessen the often steep learning curve when organizing simulation code. A primary goal of this article is to demonstrate how well-suited MCS designs are to classroom demonstrations, and how they provide a hands-on method for students to become acquainted with complex statistical concepts. In this article, essential programming aspects for writing MCS code in R are overviewed, multiple applied examples with relevant code are provided, and the benefits of using a generate–analyze–summarize coding structure over the typical “for-loop” strategy are discussed."
The SimDesign package provides an efficient, and safe template for setting pretty much any Monte Carlo experiment that you're likely to want to conduct. It's really impressive, and I'm looking forward to experimenting with it.

The Sigal-Chalmers paper includes helpful examples, with the associated R code and output. It would be superfluous for me to add that here.

Needless to say, the SimDesign package is just as useful for simulations in econometrics as it is for those dealing with straight statistics problems. Try it out for yourself!


Chalmers, R. P., 2017. SimDesign: Structure for Organizing Monte Carlo Simulation Designs, R package version 1.7.

M. J. Sigal and R. P. Chalmers, 2016. Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24, 136-156.

© 2017, David E. Giles

Sunday, September 10, 2017

Econometrics Reading List for September

A little belatedly, here is my September reading list:
  • Benjamin, D. J. et al., 2017. Redefine statistical significance. Pre-print.
  • Jiang, B., G. Athanasopoulos, R. J. Hyndman, A. Panagiotelis, and F. Vahid, 2017. Macroeconomic forecasting for Australia using a large number of predictors. Working Paper 2/17, Department of Econometrics and Business Statistics, Monash University.
  • Knaeble, D. and S. Dutter, 2017. Reversals of least-square estimates and model-invariant estimations for directions of unique effects. The American Statistician, 71, 97-105.
  • Moiseev, N. A., 2017. Forecasting time series of economic processes by model averaging across data frames of various lengths. Journal of Statistical Computation and Simulation, 87, 3111-3131.
  • Stewart, K. G., 2017. Normalized CES supply systems: Replication of Klump, McAdam and Willman (2007). Journal of Applied Econometrics, in press.
  • Tsai, A. C., M. Liou, M. Simak, and P. E. Cheng, 2017. On hyperbolic transformations to normality. Computational Statistics and Data Analysis, 115, 250-266,

© 2017, David E. Giles