For Digital Analytics, MDEs > Power

One of the most important concepts in designing and running experiments is statistical power – defined as the probability a test will yield a rejection of the null hypothesis at a desired confidence level.  In more colloquial language, statistical power is “how likely am I to get a statistically significant finding for an effect, provided that an effect is real”.  That colloquial language is not particularly colloquial, which is a shame because power is an incredibly useful tool for an experimenter to wield.  It lets you figure out ahead of time which tests are worth doing and which tests are not, as well as calculate the minimum sample sizes required in order to run a test.  The problem is that it’s incredibly difficult to explain the concept to people without a solid statistics background, and a good analytics practitioner is an interpreter and a salesperson as much as an analyst.

I’ve realized that a much better way to frame the importance of sample size is the minimum detectable effect, or MDE.  The minimum detectable effect is, colloquially “given a sample of size n, how powerful would an effect have to be in order to detect it reliably?”.  There are a few virtues that make this particularly useful in the context of a Web business:

  • It doesn’t rely on assumptions about effect size: Statistical power takes effect size as an input, but the problem here is that we don’t know the effect of an experiment until we try.  Instead of working from there, we work from what we already have – natural rate of variation in what we’re measuring and the sample size. Which is another strength, that:
  • Sample size is an input, so you can work with what you have: In many business contexts, the sample is fixed rather than something we can influence – for example, the number of people visiting our homepage is out of our control.  We can’t just increase the sample size of people on our email list, either. It lets you take the sample as it is, and ask “Given our existing constraints, how powerful would an effect have to be?”, which gets to the biggest strength:
  • It is expressed in terms a practitioner can understand.  We can take whatever sample we have, plug in the variance, and get out a result that’s comprehensible.  “Well, we’ll probably only see an effect if the new creative bumps open rates by 20%” is a sentence that a marketer or product manager can understand, and in turn decide on their own whether that’s plausible.

Analytics is ultimately a service for the rest of the organization, and designing good experiments is one of the most important things we can do.  Making that message heard is a crucial part of the task, and I think MDEs should occupy a bigger part of the communication toolbox.


The unsexy threat to government data

I don’t think it’s quite appreciated how much economic and civic life in the United States is underpinned by government data – from the Census to the Weather Service to the Bureau of Labor Statistics, the government produces a ton of data that is used on an everyday basis.  Businesses use ACS data to locate stores and markets, banks rely on the unemployment rate to make forecasts, courts rely on Census data to look for disenfranchisement, and so on.  This is an area that scares me.

The President is relatively indifferent to the needs of wonks and bureaucrats, and as a result much of this data will be under threat.  It is highly unlikely, given this indifference, that he will order meddling to juice the unemployment rate – but political appointees will be under a lot of pressure to produce good results, and this means giving him good news.  A lot of this data can be degraded through spontaneous efforts in the middle management layer of the government without a real “plan” from the top.

Researchers should keep an eye on official government data products – particularly looking out for markers of fraud such as violations of Benford’s Law or data that looks suspiciously normal.  It would be fairly easy to develop a battery of tests that will monitor official data outputs to see whether there is evidence of manipulation – this could be a good project for enterprising nerds out there.

The sad fact is that if the government is caught meddling in the data, most likely a lot of this data is ruined for use forever.  Institutional trust is easy to destroy, but difficult to rebuild, and the question of data integrity is one that will be warred over by rival groups of nerds with statistical instruments and arguments impossible for everyday citizens to adjudicate.  Once government’s data integrity is called into question the mere fact of the back-and-forth dispute, even if eventually resolved with a full solution, will serve as an argument in itself.  We’ll likely have fallen into a low-trust equilibrium, which is more stable than a high-trust one.

The knock-on effects aren’t good.

The Terror of Strategic Incoherence

At the hearings for Rex Tillerson and James Mattis last week, the incoming Secretaries of State and Defense had some things to say.  Most interestingly both endorsed a hard line on Russian adventurism diametrically opposite from President Trump’s vocal enthusiasm for Russia and Putin.  Does this mean that they will be edged aside in favor of Trump’s proposed US-Russia alignment, or conversely that they will be the “adults in the room” setting real policies.  I certainly don’t know, but the disconnect is frightening in itself.

The late Thomas Schelling conceived of war as a process of diplomatic bargaining.  By committing troops and suffering to a conflict, nations can assess by trial just how committed their adversaries are to their stated aims.  If country A pushes country B for a concession B does not want, A can escalate from polite negotiation all the way up to all-out war as a way of communicating to B just how important the concession is.  A would only escalate up to the point where it thinks B will back down.

Conflict happens and escalates when countries misjudge each other.  If an aggressor badly underestimates another’s commitment, that can lead to terrible conflicts.  A posture of strategic ambiguity – e.g., the Trump Administration’s mix of pro- and anti-Putin views – exacerbates this concern.  Foreign countries can choose to see whatever they want in the noise machine emanating from the White House.  Some adversaries might wrongly believe they can push further than they really can – and since no one knows what our real red lines are, it’s easy to imagine conflicts blowing out of control.

Why your city is bankrupt

Would you believe a story about municipal finance can be deeply disturbing? This one is – it’s the story of why Lafayette, Louisiana has no money and yours doesn’t either.  In short: the built infrastructure is so extensive that maintenance and upkeep have outstripped the tax base of the city.  The key passage (emphasis added):

All of the programs and incentives put in place by the federal and state governments to induce higher levels of growth by building more infrastructure has made the city of Lafayette functionally insolvent. Lafayette has collectively made more promises than it can keep and it’s not even close. If they operated on accrual accounting — where you account for your long term liabilities — instead of a cash basis — where you don’t — they would have been bankrupt decades ago. This is a pattern we see in every city we’ve examined. It is a byproduct of the American pattern of development we adopted everywhere after World War II.

As the authors point out, this is a merely human weakness due to temporal discounting – people are bad at accounting for the present value of future cash flows, whether the flows are income or expenses.    It also does a great job illustrating a key principle of institutional design – rules should be designed to beat human failings.

The idea that principals (e.g. local governments) are poorly incentivized to set their own decision rules is one of the better arguments against American-style federalism.  Governance arguments about federalism are rare, but much more convincing than traditionalist or consequentialist arguments.

Theories of politics are never good enough

I’ve recently been reading Restless Empire, concerning China’s foreign relations since 1750.  I’m not done, but it’s excellent.  It’s also a type of history I have generally not been exposed to in social science graduate school, in which historical case studies are generally fairly reduced-form and are aimed more narrowly at proving or disproving arguments about the nature of politics.  One of the things about reading richly textured history is that it can make arguments like, say, realism vs. idealism in geopolitics seem not just wrong, but beside the point.

The book does not easily lend itself to a school of thought on whether geopolitics are primarily driven by realism, idealism, or domestic politics.  I just finished the section on the 19th century, and one thing that is striking is the range of motivations for the major actors.  In starting the First Opium War, Britain was motivated by classic realist considerations in expanding their control of China.  China’s militant opposition to opium, however, which provided the pretext for the British response, was primarily driven by idealistic concerns about the corrosive impact of opium on the populace.

Later on, during the Chinese Civil War, China’s foreign relations were truly inseparable from domestic politics.  The main foreign patron of the Republic of China’s government in the 1920s and 1930s was the Soviet Union, and this relationship warmed and cooled based on the machinations for control of China between the Guomindang, the Communists, and the Soviets.  Eventually the Guomindang sought new foreign patrons in the United States – not due to changes in global geopolitics, but because the Soviets had taken the side of the Communists in the Chinese Civil War fomented by the GMD/Communist struggle for domestic control.

I’m grossly oversimplifying the history here, but that’s sort of my point.  Classifying the complex and ambiguous events of Chinese foreign policy as supporting any sort of Grand Theory of Geopolitics is obviously a silly exercise.  A given theory may explain the behavior of some or many actors but certainly not all actors at all times.  I think reading more history, particularly covering an extended period of time, is helpful for making social scientists better consider the scoping conditions over which social science theories are valid.

The Great Risk Shift and the Great Backlash

Individuals interact much more directly with “the market” than they did a few decades ago.  Individuals now assume more responsibility for their own retirement security.  Mobility has gone up, and Western workers compete in a much wider labor market – thanks to trade and immigration, a market that feels global in scope.  Healthcare is increasingly in the individuals’ hands – even the new “social welfare” program of the ACA is simply a framework for dumping individuals into a market.  The political scientist Jacob Hacker has termed this the “Great Risk Shift” – risk once borne by state and business being shifted to individuals.

There is, incidentally, evidence individuals are poorly suited to handling this risk.  The creators of the 401K recently expressed regret – people turned out to be worse retirement planners and portfolio managers than the professionals that once ran their pension plans.  The ACA perenially suffers from its complex nature – one of the main reasons for ACA premium hikes is that most consumer aren’t savvy enough to know their rates will go up much less if they get a new plan each year.  People are bad at making these decisions – and just as importantly, many reject the very idea.

Karl Polanyi’s The Great Transformation analyses the first great marketization of human life and the reaction it bred.  Markets were more efficient for producing goods, but alien to traditional human relationships, and the insecurity and chaos of the market scared people who sought security.  Waves of marketization (e.g., enclosure in England) were matched by vicious retaliation by the citizenry – peasant riots in early modern England, violent labor activism in turn-of-the-century America, etc.  Pushing citizens into the market system led citizens to fight back to protect their social system.

A Polanyian perspective suggests the modern era’s increasing marketization of human needs would breed an anti-market backlash. The backlash might not fit neatly into the market revolution’s paradigms, but could plausibly take the form of rabid opposition to free trade and immigration to blunt the power of the global labor market paired with support of guaranteed retirement security programs like Medicare and Social Security.  Polanyi, writing in 1944,  would have foreseen a socialist reaction.  But the counter-revolution might not wave a red flag – it might come wearing a red hat.

A Polanyian approach has no trouble explaining the wave of right-wing populism sweeping the West all at once, whereas quantitative political science has struggled to find the common cause.  I’d go so far as to suggest this might be the key text to making sense of the new era in politics.

Lessons Learned in 2016

2016 has been a beast of a year. It’s been often-humbling and I got a lot wrong.  I also learned a lot of new things and it’s always useful to hold yourself to account. So…what have we learned about the world?

  • Forecasting is overrated: Forecasting – in elections and in other fields – is accorded far too much status and attention for such a low-value-added activity.
  • Sound assumptions beat methodological sophistication: See above.
  • Expertise is overrated: See above.
  • History isn’t over: The great consensus on the shape and role of the Western state that reigned since the mid-1980s is dead.
  • Muddling through is underrated: The 2016 election showed that many seeming political crises can be resolved by simply ignoring them and moving on.  We will see if this is as true internationally as it was domestically, but we should view predictions of disaster very skeptically.
  • Social trust matters: A breakdown of trust in governing and mediating institutions (e.g., the media, political parties) was a necessary precondition to this election.  It is difficult to see how to break out of a low-trust equilibrium, and the social and political consequences are sobering.

What beliefs of mine turned out to be wrong this year?

  • Clinton would win the general: Obviously. I never believed this until Trump took off in the primary, and never thought it was a lock.  I still thought it was ~80% likely, and was surprised at the result.
  • Stories about Clinton’s email server would pass, and wouldn’t do permanent damage: I still don’t understand why this was the most-covered discrete political story of the year.  This is a good reminder of how little I – or the “experts” – really do understand about politics.
  • The internet is marginal to political behavior and outcomes: 2016 should disabuse us of this notion.
  • Probabilistic thinking is easy: Even people with statistical training easily fall into the traps of false certainty and artificially limited outcome spaces.

What I’ve learned professionally from my time on the campaign:

  • Social trust matters: A high-trust social environment is just as key to an organization as to society writ large, if not more so.
  • Group norms and rituals are underrated: See above.
  • People crave certainty: See “forecasting is overrated”.  False certainty is dangerous, but feels just as good as real certainty.
  • Replicable code matters: It’s worth investing time in good processes and especially in automation. They make you faster and less error-prone.
  • There are two big differences between a good analyst and a great analyst: empathy and chunking.  Empathy is underrated in the analyst toolbox, but is needed both to translate someone’s request into what they really need and to communicate your findings most effectively.  “Chunking” is breaking down a big thing into a set of smaller things – effective chunking allows an analyst to tackle new projects, figure out a timeline and list of tasks, and effectively asking for help and/or delegating with discrete chunks.  Good chunking also allows for more building and use of replicable code.
  • The biggest difference between good and bad leaders is caring. There are a lot of other skills that separate “good” from “great”. However, over and over again the difference I observed between effective leaders and ineffective ones was simply a serious desire to engage with the management aspects of the job – developing subordinates, delegating, and managing schedules/timelines/expectations.  As they say, 80% of success is showing up.

My resolutions for the next year:

  • Pay more attention: A senior leader told me simply that, “most people don’t pay much attention most of the time”.  This is a powerful insight.
  • Get involved locally: City council meetings, town halls, and so on.  The only way to combat low social trust is community involvement.
  • Less social media: Social media has a corrosive effect on social solidarity and clear thinking.
  • Read less economics and political science, more psychology, sociology and history: More varied mental frameworks for understanding individual and political behavior are helpful.  No discipline is a source of truth, but each has useful insights.
  • Write more: Nothing like writing down your thoughts to clarify and organize them.

Happy New Year to all – I hope you take the time to reflect on what you’ve learned and how you’ll approach the next year.