In the world of data science, this "staggered rollout" is a goldmine for research, but it’s also a trap. If we use old-school shortcuts to measure success, we often end up with the wrong answer.
When a new policy like the Community Eligibility Program (CEP) rolls out, it doesn’t happen everywhere at once. Some schools sign up immediately, others wait a few years, and some may never join.
In the world of data science, this “staggered rollout” is a goldmine for research, but it’s also a trap. If we use old-school shortcuts to measure success, we often end up with the wrong answer. This post explains why the “classic” way of measuring impact breaks down and how a newer method, the Sun–Abraham correction, gives us the truth.
The most common way to measure a policy’s impact is called Difference-in-Differences (DiD).Imagine two groups: a “Treated” group (schools that get CEP) and a “Control” group (schools that don’t). We look at the gap between them before the program and the gap after. If the gap grows, we attribute that change to the program.The standard regression version is:
\[Y_{it} = \alpha + \beta \cdot \text{Treated}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{Treated}_i \times \text{Post}_t) + \varepsilon_{it}\]Classic DiD Implementation: 2 Groups and 2 Periods
Why this isn’t enough for CEP:
Classic DiD assumes everyone is treated at the exact same time. But CEP is staggered. If you try to force a staggered rollout into this simple “before vs. after” box, you end up mixing schools that have had the program for five years with those that just started yesterday.
To handle different start dates, researchers often use a model called Two-Way Fixed Effects (TWFE). It’s designed to “control” for things that make schools different and for things that change over time (like a statewide policy change).
The standard TWFE DiD regression is:
\[Y_{it} = \alpha_i + \lambda_t + \delta D_{it} + \varepsilon_{it}\]Classic TWFE Implementation: Multiple cohorts staggered together
This time we are looking at the effect before and after treatment (0) for all groups at the same time.
”The “Forbidden Comparison” Problem: Here is where it gets messy. In a staggered rollout, the computer needs a “control group” for every school. Sometimes, it runs out of schools that haven’t been treated yet. When that happens, the model starts using early adopters (who already have the program) as the control group for late adopters.
The Problem: A school already receiving CEP isn’t a “clean” control. If the program’s impact grows over time—say, attendance improves more in year three than in year one—using an early adopter as a baseline will actually “contaminate” the results, making the program look less effective (or even harmful) than it actually is.
Economists Liyang Sun and Sarah Abraham proposed a clever fix: don’t mix the cohorts.Instead of throwing every school into one big pot, the Sun–Abraham method follows a “bottom-up” approach. It estimates effects separately by cohort and event time:
\[Y_{it} = \alpha_i + \lambda_t + \sum_{g}\sum_{k \neq -1} \beta_{gk} \mathbf{1}[G_i=g] \cdot \mathbf{1}[t-g=k] + \varepsilon_{it}\]Sun-Abraham Implementation: Estimations by cohorts:
Early adopters are not consider for later adopters. They correspond to different cohorts.
Then, to report one overall event-study curve, Sun–Abraham aggregates these estimates:
\[\beta_k = \sum_g w_{gk}\beta_{gk} \quad \text{where} \quad \sum_g w_{gk}=1\]This keeps comparisons clean by ensuring treated units aren’t serving as contaminated controls for later-treated units. Making it ideal to analyze policy program that follows a staggered rollout behavior.
Now that we have the right tools, it’s time to look at the data. In my next post, I will share the actual causal results of CEP adoption—stay tuned to see how free school meals impacted student outcomes.