Fun With Data: Sunlight Foundation

So, as sort of a summer side project I’m working on some applications data.  A good first step to that is learning how to directly extract data from web applications, and so I’ve been getting more experience using APIs.  For non-technical folks, an API is like a language that web applications have to talk to each other – so you can write some code on your desktop and directly query from the website.  It’s incredibly useful, because if you want to get data from, say Twitter, you can pull it in bulk in a few seconds rather than having to call Twitter up, asking for data, and being told to just use their API.  So I’ve been starting off with the Sunlight Foundation’s API, which is a powerful tool that lets you pull all sorts of information on politicians, PACs, organizations, and so on – it’s coded in the relatively simple JSON format.  When combined with the statistical programming language R, the amount of data that is at your fingertips is amazing.

So in my free time over the past couple of days, I learned how to query the API using a webscraping library called RCurl and parse the data from JSON, which looks like this:

“[{\”count\”: \”422631\”, \”name\”: \”Mitt Romney (R)\”, \”state\”: \”\”, \”seat\”: \”federal:president\”, \”amount\”: \”356453980.00\”, \”party\”: \”R\”, \”id\”: \”d2d1b6f4ea644455b8845be3b4c8116c\”}

into a matrix, which is a data structure that can be analyzed much like a spreadsheet.  Just as a quick proof of concept, I queried the top corporate and nonprofit donors so far in the 2014 campaign to see the distribution of donors.


That’s not particularly helpful, is it?  As it turns out, the distribution of donations is HIGHLY unequal – there are a few organizations with more than a million in donation, mostly unions and the Perry Homes Corporation, the top donor.  Not immediately clear what is going on there.  But the vast majority of donor organizations give very little money – of the 10,000 records queried, only the top 800 or so have given more than $100,000.  This runs rather contrary to the image of corporations and unions flooding Washington with money, whereas in fact only a few throw money around recklessly.  In fact, the only way you can actually see the distribution is by using the logs of the donations in order to linearize it:


It seems that the decision to donate to politics follows some sort of exponential distribution, rather than a normal one.  This means a distribution with a long tail and a few highly influential outliers, rather than the bell curve that people are mostly familiar with.  It’s not entirely clear what this means, but it’s worth taking note of – it suggests that donations probably don’t follow any sort of linear rule, and that we should look for underlying causes of political donations that are exponentially distributed.  Further posts will look more deeply into what can be done with this Sunlight Foundation data.

Tags: , , , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: