Tag Archive | Social Science

Significant Sound and Fury, Signifying Nothing

Andrew Gelman seizes on what bothers me about so much current political science, particularly studies that focus on attitudes.

This becomes particularly clear when we look at work along these lines in political science. If, for example, subliminal smiley faces have big effects on political attitudes, then this should cause us to think twice about how seriously to take such attitudes, no? Or if men’s views on economic redistribution are in large part determined by physical strength, or if women’s vote preferences are in large part determined by what time of the month it is, or if both sexes’ choice to associate with co-partisans is in large part determined by how they smell, then this calls into question a traditional civics-class view of the will of the people.

Luckily (or, perhaps, depending on your view, unluckily), the evidence for the empirical claims in the above paragraphs ranges from weak to nonexistent.

But my point is that there is a wave of research, coming from different directions, but all basically saying that our political attitudes are shallow and easily manipulated and thus, implicitly, not to be trusted. I don’t find this evidence convincing and, beyond this, I’m troubled by the eagerness some people seem to show to grab on to such claims, with their ultimately anti-democratic implications.

There’s this idea – a powerful one – that people have attitudes, but they also have “non-attitudes”.  When forced to answer questions on a survey, very often people are asked questions about which they have little or no opinions.  For example, I simply don’t have an opinion on the prospects of the Boston Red Sox next year.  But people don’t like saying “no opinion”, and so will often just answer something.  These are the opinions that are most changeable, because they are less “opinions” than randomly produced responses.  And so when surveys show strong effects from treatments like smiley faces, most of the movement is coming out of those most-weakly-held opinions.  Almost definitionally, the stronger the treatment effect these studies show the less important the thing they are measuring.

The proliferation of these types of studies produce an awful lot of papers with a well-designed experiment, a counterintuitive result, and the nagging question, “So…did we actually learn anything important?”

The Political Grift Economy

Great piece from Politico this morning about the rise of “Scam PACs”.  Scam PACs are just that, a method of using PACs to scam donors.  Externally, scam PACs look just like other PACs – they are nonprofits that raise money for political causes from donors, generally relying heavily on direct mail and email fundraising appeals.  The difference between regular PACs and scam PACs happens after the money is raised, though.  Most PACs take that money and distribute it to candidates sympathetic to whatever cause the PAC is meant to represent.   Scam PACs, on the other hand, take that money and spend it on “operating expenses” – generally justified as more fundraising – and usually through vendors/”consultants” that just so happen to be controlled by PAC’s officers.  They’re a fascinating world, but there’s one particularly interesting oddity about them: they are almost exclusively a right-wing phenomenon.

In general, bad actors treat the right-wing grassroots as a piggy bank in a way that doesn’t exist on the left.  Scam PACs are just the tip of the iceberg; there are whole businesses built around exploiting donors in order to enrich pseudo-political profiteers.  Shady survivalist businesses, gold marketing, insurance scams, multilevel marketing, etcetera.  There’s a great (if somewhat polemic) article on this from Rick Perlstein, which chronicles just how far back it goes.  Basically, the political-grift business seems to have originated with and co-evolved with the conservative movement.

As an academic question, the political parasite economy seems worth studying.  There are so many interesting questions!  Is it a specific product of campaign finance regimes?  Is it a result of the United States’ rather loosely integrated party system, which relies a lot on semi-affiliated third-party organizations?  For that matter, do things like this even exist in other countries?  I certainly have no idea.  And most interestingly, why do they only exist on the right side of the spectrum?  Perlstein’s answer is that the conservative movement encourages a certain desperation that makes people more willing to open their wallets.  Perhaps – I think it might be more related to the fact that conservative voters/donors are much older, and the elderly are the top targets for fraud.  Regardless, there’s some interesting work to be done here and the fact that America’s political financing is actually quite well-documented suggests that there might be data sufficient to address it.

Bowling Together

I have finally, after meaning to for years, begun reading Robert Putnam’s Bowling Alone.  It chronicles the decline of social capital in America, represented by our weakening ties to each other and our communities.  It also attempts to examine the causes of that decline, albeit much less effectively.  Social capital was – when this book came out nearly fifteen years ago – quite the trendy topic in political science.  So the book is dated, both in academic terms but also in more concrete terms.  Putnam dwells a great deal on TV, and I suspect that a contemporary study of the same topic would have to much more seriously engage with the Internet.

However, to my surprise the book is no less relevant.  Few of the trends Putnam highlights – decreasing social/community engagement, activism, and general social capital – have turned around.  On the bright side, political participation has rebounded – it reached a low point in 2000, right before he published the book.  Speculation warning: I suspect this has more to do with increasingly effective political mobilization engines than a resurgence in social capital.  Americans today are no less isolated than when Putnam published this, and the decline in American social capital remains concerning.

The real question, for me, is what could be done to reverse this.  In the age of “disruption”, we are used to thinking of technological fixes for problems.  Indeed, as a political scientist who follows technology, I am very tired of hearing about apps that aim to increase political involvement and engagement.  The mechanisms that build social capital are technologies of a sort – the clubs and organizations that bring us into closer contact with each other.  But they are not technologies that a team or entrepreneur can easily emulate, because they are ineffective without the norm of participation that encourages others to join.  Rebuilding social capital is primarily a project of instilling participatory norms.  “Social entrepreneurship” is all the rage these days, and I think the biggest social entrepreneurship project out there is designing organizations – even just social clubs – that effectively pull people in and keep them involved.

Read the Study First, For the Nth Time

There’s a story getting a ton of play on Facebook about how one-third of men say they’re willing to force a woman to have sex against her will.  It’s horrible, it’s attention-grabbing, and it plays into peoples’ worst fears about men (and people in general). And at least unlike some of the “studies” that get press coverage, it’s at least a paper in a peer-reviewed journal with an acceptable sample size (86).  It has the usual problem of convenience sampling (e,g,, all white college males who received college credit for answering).  The issue goes deeper than that; this is a horribly flawed paper with a research design that verges on deceptive.

The study is set up to maximize the number of people who say they would commit rape.  The authors use questions from a few different scales to assess whether the respondents are inclined to commit sexual violence or violence in general. After spending the survey making the recipients answer questions about violence and sexual violence, they then ask the recipients about their willingness to engage in sexual violence.  The problem is that by asking all these questions about violence before asking the key question about violence, they are likely to activate violent tendencies in their respondents.  This is called “priming”, and can be a real problem.  Survey responses are volatile, and even small primes can affect them.  Conscientious researchers should have put the key question first – or even better, used treatment and control groups where they didn’t ask the preceding questions about sexual violence.

A headline-grabbing result with a super-questionable research technique usually means a bunk result.  It’s hard to blame journalists and activists for grabbing onto it, but the researchers and journal editors should have known better.

On Good Bad Research

There’s a viral study going around – you might have run into it on Facebook – on porn consumption and marriage.  It went viral because duh, and its conclusion is that porn is destroying marriage.  Unfortunately, it’s bunk.  Jordan Weissman does an admirable job debunking the study; long story short, the authors use a terrible instrumental variable.  It’s a nice little look at the use and abuse of social science tactics, and how fancy math can be used to confidently frame complete bull.

This is bad research, and there’s a big market for bad research.  People love to be able to point to “studies” supporting their point of view, whether it’s on economic or cultural issues.  Or the real beating blackened heart of terrible science, diets.  The academy has many problems, but something like this would never make it past peer review because the design is obviously bunk to someone who understands the instrumental variable technique.  On the other hand, it can be released as a “working paper” and run around the world twice before the truth can put its proverbial boots on.

Lying with math is a hard problem to deal with.  The tools of social science can be used and abused, and often even a reasonable and well-intentioned researcher can create work that is willfully misinterpreted and abused by bad actors.  I wish I had some good answers as to what can be done.  Actual working academics have professional ethical responsibilities, and peer review to keep them honest.  Unfortunately, the people who are most likely to abuse the public trust are the ones sitting outside that system.  And misinformation is hard to correct once people have internalized it.

Lying with Data: Hate Crimes Edition

My interest was piqued when I saw a much-tweeted article about the “most-prejudiced places in America”.  Maybe not the most grabbing headline, but it sure was when accompanied with this map:

Notice anything strange?

Something jumped out at me right away – the Deep South is a bastion of tolerance!  Until you look a bit closer – because the map doesn’t actually measure tolerance, it measures hate crimes.  Is it possible – just maybe – that in more prejudiced places, hate crimes don’t get prosecuted as such? For a quick thought experiment, place yourself at the scene of a Mississippi cross-burning in 1962.  The chance of that hate crime being reported, much less prosecuted, is somewhere roughly between zero and “not in a million years”.  As the disclaimer at the bottom of the story clarifies, the data for this map comes from voluntary submissions by local law enforcement – I think that this might be a wee bit biased, especially considering that the 9% of agencies that don’t report almost certainly have unusually high numbers. So we have two problems with using this data to make claims about American prejudice – first, that there’s some very nonrandom missing data, and second that the data is probably coded in a biased way.

If you see a map that seems to obviously conflict with well-known facts, you should look more closely. Yet there’s just something about pretty maps that makes people lose their ability to think critically.  I really need to start working those into my research,

Facebook, OKCupid, and Applied Social Science

The chief data scientist at OKCupid, Christian Rudder, has published a response of sorts to recent news about Facebook experimenting on its users.  The response is, basically, that all web sites experiment on their users.  As a question of fact, this is obviously correct – the modern art of website management and digital media in general is best understood as a practical application of modern social science.  As a question of norms, it doesn’t seem particularly troubling either.  While the Facebook experiment was in an ethical grey area, the experiments Rudder outlines would easily pass muster with an IRB.*  OKCupid didn’t get informed consent, but also posed no potential for physical or emotional harm to human subjects.

One thing the inter-academia Facebook kerfluffle has overlooked – nowadays more and more social science is taking place outside of the university.  Places like OK Cupid and Facebook are accumulating some of the most useful and illuminating data on human behavior, and it’s all locked up in proprietary databases.  They have adopted social science methods, but their insights aren’t getting out to benefit society as a whole.  None of it is peer-reviewed or shared amongst themselves, either – I have to wonder how many tech companies have internally validated “insights” that contradict what’s known by others.

As time goes on, a smaller and smaller share of social science will take place in the academy.  If you’re a psychologist or behavioralist and really want to dive into the mysteries of human motivation and interaction in 2014, would you want a job at a university or OK Cupid?

 

*: Institutional Review Board, a university body that approves experiments on human beings.

The Worthlessness of Psychological Testing

So, the Myers-Briggs psychological profile test is worthless.  You’ve probably encountered this before: take a bunch of profiling questions and it’ll spit out whether you’re Introverted/Extroverted, Intuitive/Sensing, Thinking/Feeling, and Perceiving/Judging (16 total profiles).  It’s plagued by measurement issues, including the fact that the constituent categories are based on nothing at all and the fact that measurement error is remarkably high – as many as 50% of people get different results when they take the test multiple times.  Furthermore, and this is the real kicker, it has zero predictive power in predicting people’s happiness, situational comfort, job success, or any other tangible outcomes.

So measuring latent variables is hard, and it’s actually particularly difficult in psychology.  A latent variable is one that can’t be observed, but can be inferred from other things.  A simple example is generosity – you can’t exactly measure how charitable a person or society is in their heart of hearts, but you can measure how much they volunteer or donate to charity and use that to make inferences about their level of generosity.   Measuring these latent variables is a key social science problem, and one on which people spend a great deal of time. 

There are two reason why psych latent variable measurement is particularly tricky – we don’t really know what the variables mean, and we don’t really know what the key variables are.  The first is simply that it’s difficult to cleanly define introversion/extroversion in a way that doesn’t rely heavily on pre-existing notions that emerge from…where?  Probably from pre-existing notions, which is indeed where Jung derived his categories.  This is troublesome, because it means we’re to some degree testing for things defined however we want and introduces a degree of circular logic.  The second concern is more diffuse – how do we know that introversion/extroversion is a key component of personality?  How specifically do we know that it’s more important than, say, general degree of anxiety or like/dislike of peanut butter?  It may seem more important, but…um…why?  Even if Jung’s four axes were scientifically derived and correctly measured, there’s no obvious reason to believe these four axes are the central components of personality.

The problems of deriving measurements for psychology suggest that it might be a better fit for different techniques.  Psych testing is a classic example of “supervised learning” – we define outcomes, see how people match up to them, and use that info to derive predictions about how new people will match up to them.  That in turn drives the test.  But the problems with that are large, as I detailed above – unsupervised learning might be a better fit.  This would include techniques such as clustering, wherein you give people a bunch of questions and use an algorithm to see whether there are natural lines of division in the data rather than specify beforehand what questions are important.  That in turn would allow you to infer what exactly are the crucial components of personality – though it wouldn’t necessarily help with defining what exactly it is you’re measuring, it is a clear step forward.

A lot of social science problems are not obviously well-suited for unsupervised learning, but this one seems to be.

Social Science: The Roman Frontier

There’s a big focus in academic humanities these days on ‘digital humanities’.  This can include a lot of vaguely silly stuff, like doing word counts in great works of literature in attempts to make literary analysis more sophisticated.  However, there’s also much more interesting work going on, particularly in economic history.  A traditional problem for application of social science methods to historical question is the scarcity of data, because hard data in an easy-to-digest format is pretty rare.  However, determined researchers can apply some new tools and some hard thinking in order to find out quite a lot.

ORBIS is one of the coolest projects I’ve seen in the “digital humanities” world.  It’s a reconstruction of the travel network across the Roman world.  The researchers, Walter Schiedel and Elijah Meeks, have gone to great lengths to reconstruct the methods of travel across the empire.  The topographic map is just the starting point – they’ve included information on highways, travel modes, and even seasonal weather and wind conditions for information on seasonal changes in travel times.  They’ve even incorporated historical records on Roman-era prices so that you can see the inflation-adjusted cost of various travel options, selecting for the fastest or the cheapest trip. 

This could be a great resource for social scientists or for historians looking to apply social science methods to historical problems.  If you’re interested in systematically studying the effects of, say, Roman administrative quality on local economic outcomes this data set is invaluable.  This data would allow the use of what’s called an “instrumental variable” study, which is a method for studying effects that can’t be easily untangled from causes.  Local administrative quality and economic productivity are a good example; neither is really exogenous to the other. However, you can get around this by using a third variable, an “instrument”, that is exogenous to both but only affects the treatment.  Travel time from Rome is perfect for this – it definitely has an effect on local administrative quality, but doesn’t have an obvious impact on local productivity.  This allows you to back out the effect of administrative quality on local productivity, which is a question that’s otherwise very difficult to answer.

Of course, that relies on similarly high-quality data on all three variables of interest.  That could be difficult to come by, which just goes to show the difficulty of doing empirical social science on historical topics.  However, ORBIS is an incredible step in the right direction.  It’s very cool what these researchers have done, and I hope to see more work like this in the future!

The Problem of Data-Driven HR

A big thing nowadays is the proposition that “Big Data” will transform human resources, or HR – who to hire, who to fire, who to promote, etcetera.  It’s an interesting and apparently seductive idea, that by capturing more and more data employers will be able to tell everything about their employees.  I was discussing this last night and have been trying to think it through with my “social scientist” cap on.  I think that if analytics does come to drive a lot of HR decisions it will do so much more slowly than we think.  For a few reasons:

  • Garbage in, garbage out. Employee ratings systems are notoriously terrible, and are incredibly biased estimators of employee ability, or value added, or what have you.  Employee quality is a notoriously slippery latent variable, and hard to even coherently define, much less measure.  If we can’t trust the end measurement, we can’t trust anything we find about its determinants.
  • The complexity of the problem invites spurious correlations. There is an absolute ton of data available about people, ranging from the simple – demographics, college GPA – to the complex, like social media activity and text-processing of writing samples.  This is great, but it means that as long as there are a sufficient number of employees you can always find some relationship that is “statistically significant” by hunting through enough variables.  Doing this virtually guarantees you will be directing HR decisions based on spurious relationships. For example, if you take 100 variables known to have no relationship to the outcome and a large, almost certainly you will find 5 with statistical significance as predictors at the 95% confidence interval.  That’s what the confidence interval means – you’re wrong 5% of the time.
  • Non-comparability of employees. This goes back to the first point – for some businesses, there are armies of employees doing similar tasks.  For most businesses there aren’t, and it’s difficult to reduce performance to the same axis.  A software developer and a salesperson simply can’t be measured on the same scale, it’s not meaningful.  It’s often even difficult to compare two salespeople, especially if individual sales are large things like software packages or consulting projects – the more skilled employees tend to take on more challenging assignments.  For companies comprising large amounts of white-collar works doing distinct tasks, the whole mission may be hopeless.

On the upside, some of these actually seem like problems that can be fixed.  Probably the starting point is to examine what actually drives employee ratings – I have a feeling that the findings might be unsettling.  At most companies, I would expect to see a major impact of things like race and gender, and a fairly loose relationship between ratings and actual performance as measured by whatever objective measures are at hand – e.g., client happiness, productivity, project cost/time overruns.  This seems like one case where you’d actually want to have an outside consultant, because your own company’s managers might not be the most objective.  The point in measuring the bias isn’t to chastise everyone – it’s so that HR can normalize ratings behind the scenes and hope to actually improve measurements of employee quality without having to trust individuals’ judgments as much.

It’s not a coincidence that most of these “Big Data HR” stories are coming out of firms that can objectively measure performance like call centers.  The garbage-in-garbage-out problem is extremely difficult, and most of the firms that are currently using HR analytics without clear objective measures are probably putting out garbage.  That needs to be fixed before anyone can hope to use serious analytics in the HR space, and it’s an incredibly difficult problem that likely has no obvious best solution.  Only after the social science foundations are in place do firms have to worry about the Polanyian problem