Seminar - Week 08

Author

Sebastian Koehler

Published

March 12, 2026

Materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

Seminar Objectives

  • Controlling for confounders using a multiple linear regression model

Getting Started

  1. Create an R script to keep track of your code. In RStudio, you can open a new script by clicking File > New File > R Script.

  2. Save your script by clicking File > Save As and saving it in your POL272 folder with the name seminar7.R.

  3. Clear your environment to avoid operating with objects from previous work by mistake. You can do this by clicking on the broom icon in the Environment tab.

  4. Set the working directory by clicking on Session > Set Working Directory > Choose Directory. Navigate to your POL272 folder and click Open.

  5. Load the leaders.csv dataset:

Code
leaders <- read.csv("~/Desktop/POL272/data/leaders.csv")

In this analysis, we explore the causal effect of the death of a leader on the level of democracy in a country. Using data from leaders.csv, we will:

  1. Estimate the effect of leader death (died) on democracy scores after an assassination attempt (polityafter) using simple linear regression.
  2. Control for potential confounding variables, specifically prior democracy scores (politybefore), using multiple linear regression.
  3. Interpret the differences between these models and discuss implications for causal inference.

To present and interpret our regression models more effectively, we will be using the screenreg() function from the texreg package. This function allows us to neatly display multiple regression models side by side, making it easier to compare coefficients.

Install and load the texreg package by running:

Code
# Install texreg 
install.packages("texreg") 
Code
# Load texreg 
library(texreg)
Version:  1.39.5
Date:     2025-12-21
Author:   Philip Leifeld (University of Manchester)

Consider submitting praise using the praise or praise_interactive functions.
Please cite the JSS article in your publications -- see citation("texreg").

Simple Linear Regression

First, let’s estimate the effect of died on polityafter without any control variables, as we did last week.

Code
# Fit a simple linear regression model
model_simple <- lm(polityafter ~ died, data = leaders)

# Display results
screenreg(model_simple)

=======================
             Model 1   
-----------------------
(Intercept)   -1.89 ***
              (0.47)   
died           1.13    
              (1.00)   
-----------------------
R^2            0.01    
Adj. R^2       0.00    
Num. obs.    250       
=======================
*** p < 0.001; ** p < 0.01; * p < 0.05

Interpretation:

  • The coefficient for died tells us the estimated difference in democracy scores between countries where the leader died vs. survived.
  • If assassination outcomes were random, this coefficient would be an unbiased estimate of the causal effect.
  • However, other factors (e.g., a country’s prior democracy level) might influence both the likelihood of an assassination attempt succeeding and democracy scores afterward.

Were countries with successful assassination attempts more or less democratic before the attempt compared to countries where the leader survived?

We can test this by comparing the mean of politybefore between the two groups: (died = 0 and died = 1) as follows:

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::extract() masks texreg::extract()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
leaders %>% 
  group_by(died) %>% 
  summarise(mean_politybefore = mean(politybefore))
# A tibble: 2 × 2
   died mean_politybefore
  <int>             <dbl>
1     0            -1.74 
2     1            -0.704

Countries where the assassination attempt ended up being successful were, on average, slightly more democratic to begin with than countries where the assassination attempt ended up not being successful.

Since politybefore (a country’s democracy score prior to the assassination attempt) affects both the likelihood of assassination success (died) and the level of democracy after the assassination attempt (polityafter), it acts as a confounder in our analysis.

Multiple Linear Regression

To adjust for potential confounding, we add politybefore to the model.

Code
# Fit a multiple linear regression model
model_controlled <- lm(polityafter ~ died + politybefore, data = leaders)

# Display results for both models
screenreg(list(model_simple, model_controlled))

====================================
              Model 1     Model 2   
------------------------------------
(Intercept)    -1.89 ***   -0.43    
               (0.47)      (0.27)   
died            1.13        0.26    
               (1.00)      (0.57)   
politybefore                0.84 ***
                           (0.04)   
------------------------------------
R^2             0.01        0.69    
Adj. R^2        0.00        0.68    
Num. obs.     250         250       
====================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Comparing the Two Models:

  • Without controls: The estimated effect of died was 1.13 — suggesting that the death of the leader increases the country’s polity scores after the assassination attempt by 1.13 points, on average

  • With controls: The effect drops to 0.26, meaning that much of the previous estimate was driven by pre-existing differences in democracy levels.

Interpreting the coefficient for politybefore

For every one-unit increase in a country’s prior democracy score (politybefore), the democracy score after the assassination attempt (polityafter) is expected to increase by 0.84 points, on average, holding other variables constant.

There is a clear correlation between polity (democracy) scores both pre- and post- assassination attempts on average, which suggests successful assassinations attempts do not make a large difference to the average levels of democracy observed in the countries considered here, on the whole.

Conclusion

This analysis highlights the importance of controlling for confounders:

  • In simple regression, the slope coefficient is equivalent to the difference-in-means estimator.
  • In multiple regression, we adjust for confounders, obtaining a better causal estimate.
  • The drop in the effect size of the coefficient for died suggests that failing to control for confounders can lead to the overestimation of causal effects.

Exercise

In this exercise, you will work in groups to design a research study that explores the causal relationship between two variables. This will help you think critically about research design, measurement, and potential confounders.

  1. Choose a Research Question

Each group should select one of the following research questions to focus on:

  • Does social media use increase political polarization?
  • Does exposure to misinformation impact attitudes toward democracy?
  • Does watching debates between candidates affect vote choice?
  • Does police brutality decrease trust in state institutions?
  • Does the presence of female politicians increase trust in government?
  1. Identify Your Variables
  • What is the independent variable (X)? This is the variable that you believe influences the outcome. Clearly define it and describe how you might measure it.

  • What is the dependent variable (Y)? This is the outcome that your study aims to explain. Clearly define it and describe how you might measure it.

  1. Identify Potential Confounders

List at least three confounding variables — variables that are related to both the independent and dependent variable and could distort the estimated causal effect.

Justify why these are confounders. Explain how each could influence both the treatment and the outcome.

💡 Example: If studying social media use and polarization, education could be a confounder — higher education might influence both social media habits and levels of polarization.

  1. Study Design – Estimating the Effect

Now, decide how you would design a study to estimate the effect of X on Y while accounting for confounders (nothing too detailed, just a rough idea):

  • Would you use an experiment or an observational study?
  • If an experiment, how would you assign treatment?
  • If an observational study, how would you control for confounders?