In my time outside of my research job this summer, I have been continuing to work with Internet data and attempting to learn how to use these data sources to assemble datasets.  I’ve mentioned the Sunlight Foundation before, and am continuing to work with their data.  They’re doing yeoman’s work in making the FEC’s data more publicly accessible, and if you care about politics it’s a very interesting resources.  The main challenge in getting data is not only parsing it out of JSON, but in figuring out ways to actually gather the data.  If you’re interested in a complex question, this might mean mashing up multiple different API calls.  Today I built a little demonstration of this, mainly relying on mashing up two different calls – their search API and their donations API, which can give out summaries of which party an organization is donating to.

The end result?  You can pop in a word or sequence of words, and it returns a list of all organizations whose name contains that word, along with a summary of how many donations they make, how much money they are donating, and where the money is going.  It’s a neat little tool that lets you examine the partisan valence of different words.  So, for example, I suspected that containing “freedom” in the name of a PAC suggests that it probably leans right – but I was shocked to see just how lopsided the distribution was:


Because freedom isn’t free.

On the other hand, there had to be Democratic equivalents – so I tried “Education”.  Much to my surprise, not only are the amounts much greater, but by far the greatest share of money flows not to Democrats but to nonpartisan groups.  This deserves more examination, and perhaps suggests foundations and nonprofits giving to other nonprofits.


There’s a lot more that can be done with this, and in fact the data can be massively enriched by bringing some other tools to bear.  But for now, it’s a neat little tool that can let you empirically examine the partisan affiliation of words.  Note – all data is from the Sunlight Foundation’s Influence Explorer API, data collection and graphics are done in R using RCurl and RJSON.

