Choice of Unit of Analysis (David A. Kenny)

David A. Kenny
April 18, 2024

Unit of Analysis

If the interest is in analysis of data from dyads, you might want to click here to see a book that discusses that topic: Dyadic Data Analysis, by Kenny, Kashy, and Cook (2006).

This page provides the practicing researcher with guidance concerning the choice of the unit in the statistical analysis. I thank Charles Judd for helping me with many of the ideas on this page.

Outline
     Statement of the Problem
    Independence of Units
    Unit of Generalization
    Unit of Measurement
    Unit of Assignment or Sampling
    How Do I Conduct the Analysis?
    References

Statement of the Problem
Typically, the data structure is a matrix in which the row are called units, usually persons, and the columns are called variables. For such a data structure, if a statistical analyses were undertaken, row, usually person, would be the unit of analysis

Very often there is ambiguity as to what should be the unit of analysis for the statistical analysis. Many research studies have units broken into clusters:

Persons in groups.
Children in classrooms.
Workers in organizations.
Persons in neighborhoods.

In each case the unit of analysis might be persons or it might be clusters. To make cluster the unit of analysis, the researcher would compute the mean of persons within each cluster and use those means in the analysis.

With clustering persons are said to be nested within clusters. Alternatively, the two different units might be crossed; that is all combination of the two units might be created:

Judges evaluate the same set of faces.
Persons rate the same situations.
Persons rate their happiness on different days.

In this case, there are three different possibilities for the unit of analysis. Consider for instance judges rating faces. The three possible units are judge, face, and judge by face. In the remainder of this page, only the nesting of units is discussed, and the presumption is that persons are nested in groups.

Independence of Units
Essential to a statistical analysis is the idea of replication or the repeated observation of a phenomenon. For a replication to be a true replication, there must be independence of observations. Independence of observations is presumed in standard measures of variability of observations. For there to be independence, two observations are no more likely to be similar (or different) than any other two observations. There are several factors that make units nonindependent (Kenny & Judd, 1986). Observations can be nonindependent because of compositional effects, common fate, and social interaction:

Compositional effects refer to the fact that sets of observations are already similar before the study even begins.
Common fate refers to the fact that sets of observations may have common causes.
Social interaction refers to direct and indirect influence between pairs of observations.

Using path analysis notation, a compositional effect is a curved line between a pair of observations, common fate is spuriousness (the observation caused by common variable), and social interaction is a direct effect. The nonindependence would be positive if the nonindependent observations were more similar than independent observations; the nonindependence would be negative if the nonindependent observations were more different than independent observations. The degree of nonindependence can be viewed as a correlation coefficient, though it is not usually measured by an ordinary Pearson product-moment correlation. Alternatively, independence can be estimated as a variance; for example for the nested design as the variance due to groups could be assessed. However, using a variance to assess nonindependence is problematic because variances cannot be negative, whereas correlations can be.

To determine the unit of analysis, an assessment of whether observations are independent is often helpful. That is, the observations that are thought to be nonindependent, may in fact be independent. The measurement of nonindependence can be complicated, but in many cases an intraclass correlation (ICC) can be used to measure the degree of nonindependence. For the nested design, the ICC can be measured using a one-way analysis of variances, where the independent variable is group. (Read more about about the ICC measure for dyads.) Kenny and Judd (1996) discuss a wide variety of measures of nonindependence.

If it can be shown that units are independent by determining that the ICC is essentially zero, then person and not group can be the unit of analysis. If, however, the ICC is nonzero, then person should not be the unit of analysis.

Unit of Generalization
Another factor in deciding the unit of analysis is the level of generalization that the researcher seeks to make. Consider a researcher who measures 10 children in 10 classrooms from 10 different schools, or 1000 children in all. There are three possible levels of generalizations: the student, the classroom, and the school. One simple rule is to conduct the analysis at the level at which one wants to make generalizations. So if one wants to draw conclusions about persons, person should be the unit of analysis. However, as will be seen, this simple rule cannot always be followed.

The researcher should be aware of the ecological fallacy (Robinson, 1950). The conclusions drawn from an analysis conducted at a group level may not apply at the individual level. Conversely, analyses at the individual level may not apply to the group level. In principal, the analysis should be conducted at the level at which generalizations should be made. However, there are exceptions to this rule.

Unit of Measurement
Another consideration is the unit of measurement. Again returning to the example of children, classroom, and school, some variables may be measured on children (e.g., achievement), some on the classroom (e.g., teacher's gender), and some on the school (e.g., school size). Just because one measures a variable at a certain level does not imply that the variable operates at that level. Consider the variable group size. Presumably this variable operates at the group level. However, if a researcher changed the unit of measurement of the variable and asked persons how big the group was, the variable would operate at the group level, not at the individual level.

A related issue is that sometimes a researcher aggregates across units (i.e., averages) and so changes the unit of measurement. For example, to measure organizational climate, the mean of individual measures might be used. Just because the mean is at the level of the organization, does not mean that it, in fact, operates at that level.

Unit of Assignment or Sampling
A final consideration in the decision about the unit of analysis is design factors. It is necessary to consider the unit by which observations are selected to enter the study or are assigned to levels of the independent variable. A good idea is to perform the statistical analysis at the level of the selection or assignment. So, for instance, if floors in a dormitory are assigned to experimental conditions, dormitory floor, not person, should be the unit of analysis. This is not a "hard-and-fast rule," just a helpful guideline. For instance, individuals may be the unit of assignment, but if individuals interact with one another, then it may not be possible to use individual as the unit of analysis.

How Do I Conduct the Analysis?
As discussed in Kenny (1995), there are three major approaches to the unit of analysis question when persons are nested within groups (or observations are nested within persons). First discussed are two approaches:

Aggregation: Determine the lowest level at which observations are independent and then average scores of both the causal and outcome measures at that level. For instance, if children are nested in classrooms, which are nested in schools, make school the unit of analysis. Child is the lowest level and school is the highest level. If there are no school or classroom effect, then child would be the unit of analysis. If there were no school effects, but there were classroom effects, then classroom would be the unit of analysis. If there were school effect, then school would be the unit of analysis. This strategy is advisable when the causal variables are measured at the level of aggregation or when most of the variation in the causal variable is at that level. Thus, the scores, before they are aggregated on the causal variable, all have the same value.
Within analysis: Determine the lowest level at which observations are not independent and conduct the analysis within each of these units and within each school. Save the estimates from these separate analyses and then test if the mean of the estimates is different from zero. This strategy is advisable if the causal variable varies considerably within the nonindependent units. So for instance, if classrooms were not independent and gender of student was an independent variable, then one would compute the mean difference between boys and girls for each classroom. One issue with this approach is that often some of these estimates are more precise and so the analysis should weight some estimates more than others.

There are two key questions in determining the unit of analysis. First, a determination must be made about the lowest level of units that are independent. Often statistical analysis is necessary to determine the extent to which units are independent (though this can be tricky: see Kenny, Kashy, and Bolger's (1998) concept of "consequential nonindependence"). Second, a determination must be made about the degree of variation in the causal variable. If most of its variation is between the nonindependent units, then aggregation or averaging should be used. If not, then the within analysis should be used.

Sometimes, rules about the unit of assignment and the unit of generalization will be violated. For instance, classrooms may be the unit of assignment, but if there is no evidence of nonindependence due to classroom, person can be the unit of analysis. Alternatively, if there is evidence that classrooms are nonindependent, then person should not be the unit of analysis, even if person is the unit of generalization. Because all of the variation of treatment is between classrooms (recall that classroom is the unit of assignment), then the treatment's effect will be seen in between classroom variation, not within classroom.

The third strategy discussed in Kenny (1999) is the combined or pooled analysis: multilevel modeling essentially combines the two above strategies. In essence, it solves the unit of analysis question by making it a pseudo question. All the observations are analyzed, and the degree of nonindependence is empirically estimated. Judd and Kenny (2024) discuss how mixed models can be used in this combined analysis. Currently, this is strategy preferred by most analysts, and researchers should learn this approach.

References

Judd, C. M., & Kenny, D. A. (2024). Random factors and research generalization. In H. T. Reis, C. M. Judd, T. V. West (Eds.), Handbook of research methods in social and personality psychology, 3rd ed.. New York: Cambridge University Press.

Kenny, D. A., & Judd, C. M. (1986). Consequences of violating the independence assumption in analysis of variance. Psychological Bulletin, 99, 422-431.

Kenny, D. A., & Judd, C. M. (1996). A general procedure for the estimation of interdependence. Psychological Bulletin, 119, 138-148.

Kenny, D. A., Kashy, D. A., & Bolger, N. (1998). Data analysis in social psychology. In D. Gilbert, S. Fiske, & G. Lindzey (Eds.), Handbook of social psychology (4th ed., Vol. 1, pp. 233-265). Boston, MA: McGraw-Hill.

Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351-357.

Go back to the homepage.