Introduction:
Many studies in social science that aim to estimate the effect of an intervention suffer from selection bias, where the units who receive the treatment may have different characteristics from those in the control condition. These pre-existing differences between the groups must be controlled to obtain approximately unbiased estimates of the effects of interest. For example, in a study estimating the effect of bullying on high school graduation, students who were bullied are likely to be very different from students who were not bullied on a wide range of characteristics, such as socioeconomic status and academic performance, even before the bullying began. It is crucial to try to separate out the causal effect of the bullying from the effect of these pre-existing differences between the “treated” and “control” groups. Matching methods provide a way to attempt to do so.
Random assignment of units to receive (or not receive) the treatment of interest ensures that there are no systematic differences between the treatment and control groups before treatment assignment. However, random assignment is often infeasible in social science research, due to either ethical or practical concerns. Matching methods constitute a growing collection of techniques that attempt to replicate, as closely as possible, the ideal of randomized experiments when using observational data.
There are two key ways in which the matching methods we discuss replicate a randomized experiment. First, matching aims to select subsamples of the treated and control groups that are only randomly different from one another on all observed covariates. In other words, matching seeks to identify subsamples of treated and control units that are “balanced” with respect to observed covariates: the observed covariate distributions are the same in the treatment and control groups. The methods described in this paper examine how best to choose sub-samples from the original treated and control groups such that the distributions of covariates in the matched groups are substantially more similar than in the original groups, when this is possible. A second crucial similarity is that the study has two clear stages. The first stage is design, in which the units to be compared are selected, without use of the values of the outcome variables. Like the design of a randomized experiment, the matches are chosen without access to any of the outcome data, thereby preventing intentional or unintentional bias in selecting a particular matched sample to achieve a desired result. Only after the design is set does the second stage begin, which involves the analyses of the outcome, estimating treatment effects using the matched sample. We only discuss propensity score methods that are applicable at the design stage in the sense that they do not involve any outcome data. Some methods that use propensity scores, including some weighting techniques, can involve outcome data, and such methods are not discussed here.
This chapter reviews the diverse literature on matching methods, with particular
attention paid to providing practical guidance based on applied and simulation results that indicate the potential of matching methods for bias reduction in observational
studies. We first provide an introduction to the goal of matching and a very brief history of these methods (Section 1). Section
2 presents the theory and motivation behind propensity scores, discussing how they are a crucial component of the use of matching methods. We then discuss other methods
of controlling for covariates in observational studies, such as regression analysis, and why matching methods (particularly when combined with regression) are more
effective (Section
3). Sections 5 through 7 discuss the implementation of matching methods, including challenges and evaluations of their performance, concluding with recommendations for researchers and a discussion of software available (Section 8). Throughout the chapter we motivate the methods using data from the National Supported Work Demonstration (LaLonde, 1986; Dehejia and Wahba, 1999).
1. The Lalonde/Dehejia and Wahba dataset used to illustrate methods: http://www.nber.org/~rdehejia/nswdata.html
2. The “MatchIt” R library used to do the matching in the paper: http://gking.harvard.edu/matchit/. There are also some other references for software packages at the end of the chapter, and you could include their weblinks as well (they should be in the text or in the reference list).
Links provided by the authors:
Matching software for R (http://www.r-project.org):
MatchIt: http://gking.harvard.edu/matchit
Matching: http://sekhon.berkeley.edu/matching
twang: http://cran.r-project.org/doc/packages/twang.pdf
Matching software for Stata:
psmatch2: http://www1.fee.uva.nl/scholar/mdw/leuven/stata, http://econpapers.repec.org/software/bocbocode/S432001.html
pscore: http://www.lrz-muenchen.de/ sobecker/pscore.html
match: http://emlab.berkeley.edu/users/imbens/statamatching.pdf
Matching software for SAS:
1:1 propensity score matching: http://www2.sas.com/proceedings/sugi26/p214-226.pdf
1:1 Mahalanobis matching within propensity score calipers: www.lexjansen.com/pharmasug/2006/publichealthresearch/pr05.pdf
Variable ratio matching: http://mayoresearch.mayo.edu/mayo/research/biostat/upload/vmatch.sas