Seminar 6

Author

Sebastian Koehler

Published

February 28, 2026

Materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

Seminar Objectives

Estimating causal effects with observational data
Evaluating social scientific studies

Getting Started

Download the “leaders.csv” dataset and save it in your POL272 data folder.
Create an R script to keep track of your code. In RStudio, you can open a new script by clicking File > New File > R Script.
Save your script by clicking File > Save As and saving it in your POL272 folder with the name seminar6.R.
Clear your environment to avoid operating with objects from previous work by mistake. You can do this by clicking on the broom icon in the Environment tab.
Set the working directory by clicking on Session > Set Working Directory > Choose Directory. Navigate to your POL272 folder and click Open.

The `leaders` dataset

Source:
- Benjamin F. Jones and Benjamin A. Olken (2009), Hit or Miss? The Effect of Assassinations on Institutions and War, American Economic Journal: Macroeconomics, 1(2): 55–87.
Dataset Overview:
- Includes data on assassinations and assassination attempts against political leaders from 1875 to 2004.

Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is assassination attempts.

Table 1: Variables in leaders

Variable	Description
year	year of the assassination attempt
country	country name
leadername	name of the leader
died	whether leader died: 1=yes, 0=no
politybefore	polity scores before the assassination attempt (-10 to +10)
polityafter	polity scores after the assassination attempt (-10 to +10)

Note: The Polity Score (politybefore and polityafter) captures the level of democracy on a 21-point scale ranging from -10 (hereditary monarchy) to +10 (consolidated democracy).

Load the dataset:

Code

leaders <- read.csv("data/leaders.csv")

View the first few rows:

Code

head(leaders)

  year     country       leadername died politybefore polityafter
1 1929 Afghanistan Habibullah Ghazi    0           -6   -6.000000
2 1933 Afghanistan       Nadir Shah    1           -6   -7.333333
3 1934 Afghanistan      Hashim Khan    0           -6   -8.000000
4 1924     Albania             Zogu    0            0   -9.000000
5 1931     Albania             Zogu    0           -9   -9.000000
6 1968     Algeria      Boumedienne    0           -9   -9.000000

Research Question

We are interested in estimating the average causal effect of a leader’s death on a country’s level of democracy.

The Challenge: Ideally, we would conduct a randomised experiment by assigning some countries to have their leaders die and others to have them survive. This is obviously impossible (and illegal) in reality.
Using Observational Data: Instead of experiments, researchers often rely on observational data - real-world events that have already happened. The key is to find situations where something close to random occurred.
Why Assassination Attempts?
- The decision to attempt an assassination is not random — it depends on political, social, and strategic factors.
- However, whether the attempt succeeds or fails can be influenced by small, unpredictable elements, such as the exact timing of a gunshot and the movement of the target.
Key Assumption:
- Because of this randomness, whether a leader dies or survives after an assassination attempt is almost random.
- If this holds true, then:
  - Countries where the leader was killed (treatment group) should be comparable to those where the leader survived (control group).
  - This allows us to estimate the causal effect of a leader’s death using the difference-in-means estimator.
Limitations: This method is not as perfect as a true experiment, but it provides a way to make causal claims using real-world data.

📌 Questions

Given that we are interested in estimating the average causal effect of the death of a leader on the level of democracy of a country:

What is our outcome variable (Y)? Is this variable binary or non-binary?

Reveal Answer

The outcome variable is polityafter, which is a non-binary variable since it can take more than two values.

What is our treatment variable (X)? Is this variable binary or non-binary?

Reveal Answer

The treatment variable is died, which is a binary variable since it can only take 1s and 0s.

All treatment variables we will consider in this class are binary. They equal 1 when the observation was treated, and 0 when the observation was not treated. In this case, the treatment is the death of the leader and, thus, the treatment variable is died, which equals 1 when the leader died and 0 when the leader did not die.

Computing the difference-in-means with `summarise()`

The difference-in-means estimator allows us to measure the average causal effect of a leader’s death by comparing the average polity score (polityafter) between the treatment group (attempts where the leader died) and the control group (attempts where the leaders survived).

We can compute this using summarise():

Code

# Load necessary package
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

# Compute the mean of polityafter for each group
leaders %>%
  group_by(died) %>%
  summarise(mean_polity = mean(polityafter))

# A tibble: 2 × 2
   died mean_polity
  <int>       <dbl>
1     0      -1.89 
2     1      -0.762

The output shows the average polity score after the assassination attempt for both groups (died = 1 and died = 0).

The difference between these two means gives us an estimate of the causal effect of a leader’s death on democracy:

Code

# Calculate the difference in means
diff_in_means <- -0.762 - (-1.89)

# Print the result
diff_in_means

[1] 1.128

To interpret this result, we need to consider:

The assumptions we must make to claim causality
The direction of the effect (positive/negative)
The size of the effect
The unit of observation

Interpretation: assuming that the chance of survival after an attempted assassination is approximately random, we estimate that the death of a leader through assassination increases the polity score of that country by around 1 point, on average.

Computing the difference-in-means with `lm()`

We can use a linear model to estimate the causal effect of a leader’s death (died) on a country’s level of democracy after the assassination attempt (polityafter). We can do this using the lm() function as follows:

Code

lm(polityafter ~ died, data = leaders)


Call:
lm(formula = polityafter ~ died, data = leaders)

Coefficients:
(Intercept)         died  
     -1.895        1.132

Recall that the formula of the fitted line is:

\[ \widehat{\text{Y}} = \widehat{\alpha} + \widehat{\beta}\text{X} \]

We can write the specific fitted line for our regression model substituting each term with our model values, i.e., substitute \(\widehat{Y}\) for the name of the outcome variable, substitute \(\widehat{\alpha}\) for the estimated value of the intercept coefficient, substitute \(\widehat{\beta}\) for the estimated value of the slope coefficient, and substitute X for the name of the treatment variable.

This gives the following fitted line:

\[ \widehat{\text{polityafter}} = -1.9 + 1.13\text{died} \]

Note: The Y variable is polityafter, \(\widehat{\alpha} = -1.90, \widehat{\beta} = 1.13\), and the X variable is died.

Interpreting the estimated slope coefficient

Mathematical Interpretation

The mathematical definition of \(\widehat{\beta}\) is the \(\Delta\widehat{Y}\) associated with \(\Delta\widehat{X}=1\). In this case, \(\widehat{\beta}\) is the \(\Delta \widehat{polityafter}\) associated with \(\Delta\widehat{died} = 1\).

Interpretation: The death of a leader (that is, when died increases by one unit, from 0 to 1) is associated with an increase in the polity score after the assassination attempt by 1.13 points, on average.

Note: Since \(Y\) (polity score) is measured in points, the change in \(Y\) (\(\Delta Y\)) should also be expressed in points. Thus, \(\widehat{\beta}\) is measured in polity score points.

Causal Interpretation

In this context, \(X\) (died) is a treatment variable, meaning that \(\widehat{\beta}\) is equivalent to the difference-in-means estimator.
Therefore, we should use causal language in our interpretation. Instead of saying the leader’s death is “associated with” an increase, we should say it “causes” an increase of 1.13 points.

Interpretation: We estimate that the death of a leader causes an increase in the country’s polity score after the assassination attempt by 1.13 points, on average. This estimate is valid if assassination attempts where the leader died are comparable to those where the leader survived, meaning there are no confounding variables influencing the outcome.

Exercises

The exercises will be based on Ansolabehere, Stephen, Shanto Iyengar, Adam Simon and Nicholas Valentino. (1994) Does Attack Advertising Demobilize the Electorate? The American Political Science Review, 88(4), pp.829-838.

Click here to access the article and read the highlighted sections.

Please answer the following questions based on the highlighted sections of the text.

Is this a causal study? In other words, is the aim of the study to estimate the causal effect of a treatment on an outcome? Yes or no?
Is this a randomised experiment or an observational study? Explain your reasoning.
What is the treatment the study is interested in estimating the effects of? (Technically, the study is interested in the effect of two different treatments, but in this seminar we just focus on one of them.)
What is the outcome variable?
What was the unit of observation? Or, in other words, what does each observation represent?
How many people participated in this study? Hint: you may need to look into the table containing the results of the analysis.
What was the estimated average causal effect of the treatment on the outcome? In other words, what were the findings of the study? Make sure to include the assumption, why the assumption is reasonable, the treatment, the outcome, as well as the direction, size, and unit of measurement of the average treatment effect.

Reveal Answers

Yes, it is a causal study because it aims to estimate causal effects.
It is a randomised experiment because the treatment was assigned at random.
Exposure to a negative political TV advertisement (rather than a non-political one) is the treatment.
Intention to vote.
Each observation represents a participant or person.
1,655 people.
Let’s start by figuring out each key element separately.

What’s the assumption? We assume that the participants who were exposed to a negative political TV advertisement (the treatment group) were comparable to the participants who were exposed to a non-political TV advertisement (the control group). If this assumption were not true, the difference-in-means estimator would NOT produce a valid estimate of the average treatment effect.

Why is the assumption reasonable? Because negative political TV advertisements were assigned at random OR because the data come from a randomised experiment. Remember that random treatment assignment makes the treatment and control groups identical to each other in all observed and unobserved pre-treatment characteristics, on average.

What’s the treatment? Being exposed to a negative political TV advertisement.

What’s the outcome? Intention to vote.

What’s the direction, size, and unit of measurement of the average causal effect? A decrease of 2.5 percentage points, on average. It is a decrease because we are measuring change—the change in the outcome variable caused by the treatment—and the difference-in-means estimator is negative.

The difference-in-means estimator = the proportion of participants who intend to vote, among those exposed to negative political TV advertisements - the proportion of participants who intend to vote, among those exposed to non-political TV advertisements - 58% - 61% = -2.5 percentage points.

Assuming that the participants who were exposed to a negative political TV advertisement were comparable to the participants who were exposed to a non-political TV advertisement (a reasonable assumption since negative political TV advertisements were assigned at random), we estimate that being exposed to a negative political TV advertisement decreases intention to vote by about 2.5 percentage points, on average.

Seminar Objectives

Getting Started

The leaders dataset

Research Question

Computing the difference-in-means with summarise()

Computing the difference-in-means with lm()

Interpreting the estimated slope coefficient

Mathematical Interpretation

Causal Interpretation

Exercises

The `leaders` dataset

Computing the difference-in-means with `summarise()`

Computing the difference-in-means with `lm()`