The presence of outliers can lead to inflated error rates and substantial distortions of parameter and statistic estimates when using either parametric or nonparametric tests (e.g., Zimmerman, 1994, 1995, 1998). Casual observation of the literature suggests that researchers rarely report checking for outliers of any sort. This inference is supported empirically by Osborne, Christiansen, and Gunter (2001), who found that authors of articles in top-tier Educational Psychology journals reported testing assumptions of the statistical procedure(s) used in their studies--including checking for the presence of outliers--only 8% of the time. Given what we know of the importance of assumptions to accuracy of estimates and error rates, this in itself is troubling. There is no reason to believe that the situation is different in other social science disciplines.
Although definitions vary, an outlier is generally considered to be a data point that is far outside the norm for a variable or population (e.g., Jarrell, 1994; Rasmussen, 1988; Stevens, 1984). Hawkins described an outlier as an observation that “deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins, 1980, p.1). Outliers have also been defined as values that are “dubious in the eyes of the researcher” (Dixon, 1950, p. 488) and contaminants (Wainer, 1976).
Wainer (1976) also introduced the concept of the “fringelier,” referring to “unusual events which occur more often than seldom” (p. 286). These points lie near three standard deviations from the mean and hence may have a disproportionately strong influence on parameter estimates, yet are not as obvious or easily identified as ordinary outliers due to their relative proximity to the distribution center. As fringeliers are a special case of outlier, for much of the rest of the paper we will use the generic term “outlier” to refer to any single data point of dubious origin or disproportionate influence.
Outliers can have deleterious effects on statistical analyses. First, they generally serve to increase error variance and reduce the power of statistical tests. Second, if non-randomly distributed they can decrease normality (and in multivariate analyses, violate assumptions of sphericity and multivariate normality), altering the odds of making both Type I and Type II errors. Third, they can seriously bias or influence estimates that may be of substantive interest (for more information on these points, see Rasmussen, 1988; Schwager & Margolin, 1982; Zimmerman, 1994).
Screening data for univariate, bivariate, and multivariate outliers is simple in these days of high-powered personal computing. The consequences of not doing so can be substantial.
Below are several data files in SPSS format:
ANOVA
Sample size=52
Samples size=416
You can replicate some of the analyses presented in the chapter. In each data file you have two cells (mean1, SD1, mean2 SD2) with a t value for the difference between the two. Means 1 and 2 and SD 1 and 2 are means and standard deviations for the cells before outliers were removed. Means 3/4 and SD 3/4 and t 2 are means, standard deviations, and t scores after outliers are removed from the data. We provide data for small samples (N=52) and large samples (N=416) which have outliers in the same proportion. Analyze the differences before/after outliers are removed, and compare what happens when you have outliers in one or both cells, and you will immediately see that accuracy gets better after outliers are removed.