Chapter 33: Introduction to Bayesian Modeling for Social Sciences

In the context of statistical problems, the frequentist (or empirical) interpretation of probability has historically played a predominant role. In this approach, probability is defined as the limiting frequency of occurrence in an infinitely repeated experiment. The underlying assumption is that of a “fixed” concept of probability, which is unknown but can be theoretically disclosed by means of repeated trials, under the same experimental conditions.

However, although the frequentist approach still plays the role of the standard in various applied areas there are many other possible conceptualizations of probability that characterize different philosophies behind the problem of statistical inference. Among these, an increasingly popular one is the Bayesian (also referred to as subjectivist), originated by the posthumous work of Reverend Thomas Bayes (1763) – see Senn (2003) or Fienberg (2006) for a historical account of Bayesian theory.
The main feature of this approach is that probability is interpreted as a subjective degree of belief in the occurrence of an event, representing the individual level of uncertainty in its actual realization (cfr. de Finetti 1974, probably the most comprehensive account of subjective probability). One of the main implications of subjectivism is that there is no requirement that one should be able to specify, or even conceive of, some relevant sequence of repetitions of the event in question, as happens in the frequentist framework, with the advantage that “one-off” type of events can be assessed consistently.


In the Bayesian philosophy, the probability assigned to any event depends also on the individual whose uncertainty is being taken into account and on the state of background information underlying this assessment. Varying any of these factors might change the probability. Consequently, under the subjectivist view, there is no assumption of a unique, correct (or “true”) value for the probability of any uncertain event (Dawid 2005). Rather, each individual is entitled to their own subjective probability and according to the evidence that becomes sequentially available, they tend to update their belief.


Bayesian methods are not new to social sciences (from Phillips 1973 to Iversen 1984, Gill 2002, Berger 2005, Efron 2005 and Raftery 2005) but it is also not systematically integrated into most research in the social sciences.  This may be due to the common perception among practitioners that Bayesian methods are “more complex”.


In fact, in our opinion the apparent higher degree of complexity is more than compensated by at least the two following consequences. First, Bayesian methods allow taking into account, through a formal model, of all the available information, such as, for example, the results of previous studies. Moreover, the inferential process is straightforward, as it is possible to make probabilistic statements directly on the quantities of interest (i.e. some unobservable feature of the process under study, typically represented by a set of parameters).


Despite their subjectivist nature, in our opinion Bayesian methods allow the practitioner to make the most of the evidence: in just the situation of “repeated trials”, after observing the outcomes (successes and failures) of many past trials (assuming no other source of information), the individuals will be drawn to an assessment of the probability of success on the next event that is extremely close to the observed proportion of successes so far. However, if past data are not sufficiently extensive, it may be reasonably argued that there should indeed be scope for interpersonal disagreement as to the implications of the evidence. Therefore the Bayesian approach provides a more general framework for the problem of statistical inference.


In order to facilitate comprehension, we shall present two worked examples and switch between theory and practice in every section. In the first part of the chapter, we consider data about non attendance at school for a set of Australian children, with additional information about their race (aboriginal, white) and age band also included. We use this data set to present some of the main feature of Bayesian reasoning and to follow the development of the simplest form of models (conjugated analysis). In the last sections of the chapter, we describe a more realistic model for the analysis of SAT score data. The main objective of this analysis is to develop a more complex model combing information for a number of related variables, using the simulation techniques of Markov Chain Monte Carlo.