Chapter 21. Best Practices in Analyzing Count Data:  Poisson Regression                                    

By E. Michael Nussbaum, Sherif Elsadat, & Ahmed H. Khago, University of Nevada, Las Vegas

In the social sciences, researchers often utilize variables that take the form of counts, such as the number of self-explanations generated in a think-aloud protocol, number of teachers leaving the profession in a given year, number of student absences per semester.  Counts are typically treated as interval/ratio variables, because the distances between points on these variables are constant and they have a true zero point.  These variables are then usually analyzed using traditional methods such as t-tests, ANOVA, or regression by ordinary least squares (OLS).

Unfortunately, count data is often highly skewed, making it more appropriate to use Poisson-methods to analyze these variables, specifically measuring degree of fit to a Poisson distribution and using Poisson regression.  The Poisson is a skewed, nonnegative distribution that is especially suited for low-frequency count variables when many of the counts are zero and/or when the distribution is positively skewed.  Poisson methods are often more statistically powerful than traditional methods with count variables when the population distribution is skewed and the distribution approximates the Poisson distribution.  Unfortunately use of Poisson methods in social science research has been relatively rare to-date, primarily because of a lack of understanding of the underlying theory.  This chapter therefore provides an introductory overview of Poisson regression.

Example #1: Voter turnout data (from the chapter). Click here to download.

Example #2: Nonmelonoma skin cancer (From Dr. Nussbaum's web site)

Data on a rare event, with a low probability of occurance. The Poisson distribution here is used to approximate the binomial. Both nominal (city) and interval/ratio (age) variables are used as predictors. An offset is also used.