The Limits of Statistics on Gender & Pay
Today is apparently “Equal Pay Day”. The idea is that women earn 77% as much as men, so if a woman and a man both started work on January 1st of last year, she would have to work until today to match what he earned last year. So the headline disparity in pay is quite glaring, but a lot of people don’t like that idea. If indeed women are paid less than men, that suggests that discrimination can persist even in the face of market pressures. It might even call for government action – quelle horreur! The most common argument in the face of this is that men and women select into different career tracks – Mark Perry and Andrew Biggs have a well-sourced piece arguing this in today’s Wall Street Journal.
Unfortunately for Drs. Perry and Biggs, this is a great example of a classic social science mistake known as “controlling for post-treatment covariates”. When you are performing a basic analysis, you are often studying a single item, the “treatment”, and can “control” for variables by including them in your regression function. In this case, the “treatment” is gender* but careers, promotions, etcetera are all covariates. The problem here is that just including those covariates in the regression doesn’t help you unless they are independent and independently distributed. See, if your gender affects whether or not you became a Wall Street financier or an elementary school teacher, a simple regression where you include an “occupation” variable doesn’t necessarily help you. In fact, there’s good reason to think it might make accuracy worse.
Unfortunately, it’s often really hard to tell which way the estimates here might be biased. If gender is substantially driving career choice, then my guess is that the estimates Drs. Perry and Biggs give are biased downwards – that is to say, they are understating the gender pay gap. It’s really hard to get at the true answer to this question without a different approach – one that exploits a natural experiment, or a real experiment, to impose a condition where gender really is the only difference between two groups of people. We should be pretty dubious of “controlling” for things that seem immediately obviously correlated with both our independent variable of interest (gender) and our dependent variable of interest (pay). So in conclusion, hooray math!
*: Yes that sounds super-weird, just bear with me here.