Chapter 18: Robust Methods for Detecting Assoiations

Two of the best-known methods for detecting and describing an association between two or more variables are Pearson's correlation and least squares regression. As is well known, there are conditions where these methods can provide a satisfactory summary of data and where the associated inferential techniques, which are typically used, provide an adequate indication of whether there is an association among variables. However, there are several fundamental ways in which these methods can be highly unsatisfactory. Briefly, non-normality, heteroscedasticity and outliers can mask true associations, they can wreak havoc on Type I errors and confidence intervals, and they can result in a distorted sense of how the bulk of the points are related. One immediate goal is to provide a more detailed sense of when and why practical problems can occur.

Then more modern methods, aimed at correcting known problems, are described, and practical reasons for using these methods are illustrated.

 

Software

A practical issue is being able to apply the modern methods, summarized here, with easy-to-use software. All of the methods that were described are readily applied using functions written in R or S-PLUS. These software packages are nearly identical and provide a powerful and flexible approach to data analysis that, in general, makes modern methods readily accessible. R is free and can be downloaded from cran.r-project.org. The R functions for applying the methods covered here are available in two files: Rallfunv1.v4 and Rallfunv2.v4.

With version 2.2.0 of R, store these files in C:\Program Files\R\R-2.2.0\.
(So as new versions come out,  2.2.0 will be replaced by the new version number.) Once R is started, the command

source(“Rallfunv1.v4”)

will incorporate all of the functions in the file Rallfunv1.v4 into your version of R. (The packages listed on the R web site is another excellent source of functions aimed at  applying recently developed methods.) When using S-PLUS, download the files allfunv1.v4 and allfunv2.v4 instead and again use the source command. Some of the functions relevant to the methods covered here are described in Wilcox (2003, 2005). But some of the newer functions are not covered in any book, so for convenience, the names of these functions are given here.

 

Exercises and other ways to investigate the issues presented in this chapter: (From Dr. Wilcox's website):

There are four parts covering basics about modern methods, methods for comparing groups, methods for studying associations, and some descriptions and illustrations of my R and S+ functions. Some of the data sets are listed here. The first two deal with measures of depression among Palestinian youths who have or have not had a family member killed or wounded by an Israeli: depression1 depression2. These two data sets provide a particularly interesting example of the shift function for understanding effect size. (Try the R function sband.).  The next data set, costa, deals with Olympic athletes who compete in sprints. The goal is to understand the association between two variables, one of which has to do with the force generated as the runner leaves the blocks. Many methods find no association, but certain methods suggest that an association exists. The file  schiz contains measures of skin resistance stemming from four groups of individuals having to do with schizophrenia.  The file read contains data on measures related to predicting reading ability in children. The files pygc and pyge  deal with what is called Pygmalion in the classroom and provide an interesting ANCOVA example. The data are described in my 2003 and 2005 books in the ANCOVA section.

The first part of the workshop,  work1 , discusses basic principles regarding why standard methods fail and how modern methods attempt to address known problems. Some introductory remarks about software are provided. Included is a discussion of detecting multivariate outliers and some of the issues that arise when outliers are discarded. The portion of the workshop that focuses on comparing groups is stored in work2. The file work3 contains information and illustrations regarding regression.

R and S+: An introduction to R and Splus and some illustrations on how to use my functions, beyond what is in my books, is covered in work4.