Dyadic Analysis (David A. Kenny)

David A. Kenny
November 24, 2015

Dyadic Analysis

The topics on this page are much more extensively covered in book Dyadic Data Analysis written by David A. Kenny, Deborah A. Kashy, and William Cook. To find out more about this book click here.

What this tutorial does not cover:

relational indices (multiple measures that are combined to form an index such as similarity or accuracy)
categorical or dichotomous outcomes: e.g., breakup or divorce
non-standard designs (see below for what the standard design is)
groups as opposed to dyads

Related topics are covered in the Unit of Analysis page.

The work done on this page has been done in collaboration with many people. Particularly important are Deborah Kashy and William Cook (with whom I wrote a book called Dyadic Data Analysis published by Guilford Press), and many others have been helpful in developing my thinking. I especially want to acknowledge the late Larry Kurdek (who has provided specific feedback about the page), Larry La Voie, Eliot Smith, Mike Berbaum, Harry Reis, Dale Griffin, Rich Gonzalez, and Charles Judd. There are almost certainly others whom I forgot.

List of Topics

Topic 1. What is a standard dyadic design?

Topic 2. What is the level of measurement of the outcome measure?

Topic 3. Are the dyad members distinguishable or not?

Topic 4. Determine the types of variables in the analysis.

Topic 5. The assessment of nonindependence

Topic 6. Consequences of nonindependence on significance testing

Topic 7. How are the effects of a between-dyads predictor variable estimated?

Topic 8. How are the effects measured when the predictor variable is within dyads?

Topic 9. How is nonindependence controlled if there are several predictor variables, some of which are between and others of which are within dyads?

Topic 10. How can effects be estimated if the predictor variable is a mixed variable?

Topic 1. What is a standard dyadic design?

Are the data as below, where "<~>" means a tie or link between two persons (first subscript is person and the second dyad)?

X₁₁<~> X₂₁

X₁₂<~> X₂₂

X₁₃<~> X₂₃

X₁₄<~> X₂₄

X₁₅<~> X₂₅

That is, each person is linked to one and only one other person in the sample and both persons are measured on the same variables. I will also sometimes denote the two persons' scores as X and X' or Y and Y' and not use subscripts.

Examples of Standard Dyad Designs:

25 pairs of roommates
44 lesbian couples
38 supervisor-supervisee pairs
116 father-daughter pairs
54 pairs of twins

An Example of Data Set that Could Become Standard

44 dating couples and 22 persons whose partner is not measured (the data from the 22 "singles" would have to be set aside and the remaining 44 would form a standard design)

Examples of Designs that Are Not Standard

a) therapy groups in which persons rate each other (persons are linked to every one in the group; see the Social Relations Model)
b) classroom survey of dating habits (people are linked to people who are likely not in the survey)
c) cancer patients rate how much their spouse helps them (the spouse would need to rate the help of the patient)
d) people who have more than one partner (the one-with-many design): egocentric networks, persons rated by multiple informants, people who see different doctors in a clinic, people asked to recall how jealous they were in their last three relationships

Topic 2. What is the level of measurement of the outcome measure?

The methods discussed on this page presume that the outcomes are measured at the interval level of measurement. That is, there must be a numeric score for each person. However, the predictor variables need not be at the interval level of measurement. For instance, gender may be a predictor variable. If the outcome is categorical (e.g., together versus separated), methods derived for sociometric analysis (e.g., data in which persons state whether they like members in a group) are likely more appropriate. Consult the book by Wasserman and Faust (Social Network Analysis: Methods and Applications) on this topic. Also the Loeys, T., Cook, W., De Smet, O., Wietzker, A., & Buysse, A. (2014). The Actor-Partner-Interdependence-Model for Categorical Dyadic Data: a user-friendly guide to GEE. Personal Relationships, 21: 225-241 is relevant.

Topic 3. Are the dyad members distinguishable or not?

Analysis procedures depend on whether the members of the dyad are indistinguishable (sometimes called exchangeable) or not. Dyad members can be distinguished if there is a variable that allows the researcher to differentiate members. So, for instance, members of heterosexual couples can be distinguished by their gender, whereas members of gay and lesbian cannot. As another example, close friends usually cannot be distinguished, whereas boss and employee can be. Very often in dyadic analysis, researchers distinguish dyad members in an arbitrary fashion. For instance, they call the first person whose data are entered "person one" and the second person as "person two." Because such a designation is arbitrary, the results obtained from this analysis would vary if the data were ordered differently. It is inadvisable to pretend that the members of the dyad are distinguishable when, in fact, they are not. It is possible to test empirically if dyad members are distinguishable. See the paper by Gonzalez and Griffin (1999).

Just because members can be distinguished does not mean that such a distinction necessarily should be made. So, if members can be distinguished by their gender, but gender does not affect the responses, it would be better not to make such a distinction and treat dyad members as if they were indistinguishable. Sometimes dyad members can be distinguished in more than one way. So, for instance, heterosexual couples can also be distinguished by who is older. Generally, it is advisable to choose the distinction that is more meaningful for the current research and variables under study. The second distinguishing variable can be handled in the analysis, as will be explained in Topic 9.

Topic 4. Determine the types of variables in the analysis.

In dyadic analysis, there are three major types of variables:

Between-dyads variable: All the variation in the variable is between dyads. So both members of the dyad have the same score on the variable.

Within-dyads variable: All the variation is in the variable is within dyads. So the sum of the two persons' scores is the same for every dyad.

Mixed variable: There is both variation between and within dyads. Consider the variable of gender. If the study consisted of same-gender roommates, gender would be a between-dyads variable. If the study consisted of opposite-gender or platonic friendships, then gender would be a within-dyads variable. Finally, if some dyads were same and others were opposite gender, then gender would be a mixed variable.

More often than not, categorical variables are between or within, whereas continuous variables are very often mixed. There are exceptions. For example, number of years married is continuous but still a between-dyads variable.

With these distinctions, the meaning of "distinguishable" can be made clearer. If dyad members are said to be distinguishable, then there is a within-dyads variable that is dichotomous. The second major distinction concerns the ordering in the analysis. Some variables are usually considered predictors and others are outcomes. Almost always outcomes variables are mixed variables, and so between- and within-dyads variables are almost always predictor variables. It is possible to have a multi-equation model such that a variable that is an outcome in one equation is a predictor variable in another. For the data on the following three variables:

Variable      1           2         3
Member      1   2      1   2      1   2
D      1        1   1      2   3      1   7
Y       2        3   3      4   1      3   3
A     3        5   5      5   0      2   2
D      4        3   3      3   2      5   2

Variable 1 is between-dyads (the scores of the two members are the same), variable 2 is within (both scores sum to 5), and variable 3 is mixed.

Topic 5. The Assessment of Nonindependence

The nonindependence in a variable refers to the degree of similarity between the two members of the dyad on that variable. The degree of nonindependence in outcome variables should ordinarily be assessed as this information affects the choice of statistical analysis. If the scores on the outcome variables are independent, then person can be the unit of analysis. If there is nonindependence, then dyad needs to be explicitly considered in the analysis. The degree of correlation can be positive (the two members are similar to one another) or negative (the two members are different from each other). Some methods for handling nonindependence treat it as a variance and not a correlation. These methods presume then that the nonindependence must be positive because variances cannot be negative. The tests described are preliminary tests because the key question is whether there is nonindependence after the effects of the predictor variables are removed.

The measures of nonindependence for distinguishable dyads are the Pearson product-moment correlation (the ordinary correlation coefficient). For indistinguishable dyads the less familiar intraclass correlation is computed. See Chapter 2 of Kenny, Kashy, and Cook (2006) for much more detail.

Computation of the Intraclass Correlation

ANOVA Formula

MS_B - MS_W

��
MS_B + MS_W

where MS_B is variance in the dyad means times two and MS_W is the sum of the difference scores squared divided by the product of two times the number of dyads. The two MS terms can be viewed as mean squares from an analysis of variance with dyad as the independent variable. Fisher invented analysis of variance as a generalization of the intraclass correlation.

Double Entry Method

To compute the correlation, treat the two scores as if they were two sets of scores. This can be easily seen by illustration. Imagine the simple data set of

           Person
Dyad     1   2
_{----------------------------}
1         5   7
2         8   4
3         5   6

The correlation between two variables, denoted as X and Y, would become

Dyad   X   Y
_{--------------------------}
1       5    7
1       7    5
2       8    4
2       4    8
3       5    6
3       6    5

Notice that each dyad is entered twice, hence the name of double entry. There is a slight negative bias in the estimate of the correlation using this method which is even older than the Fisher intraclass correlation. It has been revived by Dale Griffin and Rich Gonzalez.

Significance Test of the Intraclass Correlation

ANOVA Method

The test of the intraclass is an F test of MS_B/MS_W if MS_B is larger or MS_W/MS_B if MS_W is larger. The degrees of freedom for MS_B is the number of dyads less one and for MS_W are the number of dyads. As discussed below, consideration needs to be given to using a value of alpha greater than the conventional .05 value.

Double Entry Method

The test is simple. Simply multiply the correlation by the square root of the number of dyads and treat it as a standard normal or Z test. This test is somewhat conservative, especially if the correlation is large.

Illustration of the Computations

Consider the following data set:

                                   Person
                         Dyad 1     2
                         -----------------
                                    1      5     7
                                    2      4     4
                                    3      3     2
                                    4      8     7
                                    5      2     4
                                    6      8     6
                                    7      5     7
                                    8      3     4
                                    9      4     4
                                  10       9     5

ANOVA Intraclass Correlation

The MS_B equals 6.828 and the MS_W equals 1.750 making the intraclass correlation equal to

6.828 - 1.750
^{----------------------------} = .592

6.828 + 1.750

The F test equals F(9,10) = 6.828/1.750 = 3.902, p = .046 (two-tailed). Thus, the F test is statistically significant and it is concluded that scores are not independent.

Double Entry Intraclass Correlation

The double entry value equals .557 and its Z is 1.76 with a p of .078. Both of the r and Z are smaller than the ANOVA value.

Computation of Partial Correlations

Generally, it is advisable to test for nonindependence controlling for the predictor variables. This is something that is often not done, but should be. The effects of the predictor variables may create a pseudo nonindependence. If the predictor variable is between dyads, its effects produces positive nonindependence. Alternatively, if the predictor variable is within-dyads, its effects produce a negative correlation. If the ordinary Pearson product-moment correlation is used to measure the nonindependence, then partial correlations are computed. If intraclass correlations are used, then one partial out of the variance associated with the effects out of both mean squares are used to compute the intraclass. If the double-entry method is used, standard partialling methods can be used. Normally, the computation of these partial correlations occurs within the estimation of a model (see Topic 10).

Topic 6. Consequences of Nonindependence on Significance Testing

Considered here is the bias when testing the effect of a predictor variable on an outcome whose scores may be nonindependent. Assume that the predictor variable is either within or between dyads.

The Effect on the Significance Test When Person
Is the Unit of Analysis Given Nonindependence

Design	r Positive	r Negative
Between-dyads Design	too many Type I errors	too few Type I errors
Within-dyads Design	too few Type I errors	too many Type I errors

Definitions of Terms in the Table

r -- the correlation between the two dyad members scores on the outcome measure controlling for the predictor variable
design -- whether the predictor variable being tested is within- or between-dyads
Type I error -- rejecting the null hypothesis that the predictor variable has no effect when it is true

As an example, gender would be between dyads if same-gender roommates were studied (some dyads are two males and others are two females). Given that the predictor variable is between dyads and a positive correlation in the outcome, the use of person as the unit of analysis leads to too many statistically significant results. The consequence of mixed predictor variables on p values is intermediate. If the intraclass correlation of the mixed variable is positive, its effects are like a between-dyads predictor variable and if negative like a within-dyads predictor variable.

Power of the Test of Nonindependence

The analysis of dyadic data often hinges on whether there is nonindependence. Thus, the power of this test is critical. Consider the test of the Pearson correlation, assuming that dyad members are distinguishable. (The power of the intraclass correlation is essentially the same.)

Number of Dyads Needed to Have 80% Power in Testing the Correlation between Dyad Member
(Alpha of .05 and .01)

r	.05	.01
.1	782	450
.2	193	112
.3	84	49
.4	46	27
.5	28	17
.6	19	12
.7	13	8

Quite clearly, when the intraclass is small, the power of its test is very low even when alpha is set very high. I return to this issue in the section on effects of nonindependence on significance testing.

Recommended Strategy

Kashy, Bolger, and I in the a Handbook of Social Psychology chapter have defined the concept of consequential nonindependence. It is the level at which nonindependence results in a p value of .10 when it is presumed to be .05. The level of consequential nonindependence is about .45. We argue that there should be enough power, at least 80%, to test for consequential nonindependence. If there is, then the effective alpha is .06 which is not very troublesome. For dyads, to have sufficient power to test for consequential nonindependence, there must be at least 35 dyads. If there are less, then the power of the test of nonindependence may be too low. Thus, in studies with fewer than 35 dyads, a reasonable course of action is to presume that scores are nonindependent because there is not sufficient power to test for nonindependence.

Topic 7. How are the effects of a between-dyads predictor variable estimated?

First, given that that there are at least about 35 dyads, test whether there is nonindependence. (If there are less than 35 dyads, one should treat the data as if they were nonindependent.) Ideally the test for nonindependence should control for the effects of any predictor variables. If test indicates that there is independence, then person can be used as the unit of analysis. If test indicates that there is nonindependence, then dyad should be used as the unit of analysis. The outcome measure is either the sum or average of the two members whichever is more interpretable.

Topic 8. How are the effects measured when the predictor variable is within dyads?

It is assumed that preliminary tests have shown that there is nonindependence in the outcome. First, assume that the predictor variable is dichotomous, e.g. boss versus employee. The paired t-test can be used. The two groups are the two levels of the within-dyads variable. This test in essence tests whether the mean of the difference scores equals zero where the difference is between the scores of the two dyad members.

So if we wanted to test whether husbands are more or less satisfied than wives, we would compute a difference score, say wife score minus husband score, and test whether the mean of the differences is significantly different from zero. Second, assume that the predictor variable is interval, e.g., percent of housework done. Compute difference scores in both X and Y. Regress differenced Y on differenced X without any intercept. The effect of differenced X on differenced Y in this equation measures the effect of X on Y. The intercept is not fitted because the direction of differencing is arbitrary. By not including an intercept, the solution will be the same regardless of how the differencing was done. (Try it out if you do not believe me!) It is instructive to reproduce the paired t test results by fitting a regression equation with no intercept. First, compute a paired t test and note the t and its p value. Now compute difference scores and create a predictor "variable" all of whose scores are that equal two. Run the regression equation, not including an intercept. Note you get the same t and p value. Now reverse the sign of the first dyad, both of the difference and the predictor variable. Note again that you get the same coefficient (the mean difference) and the same t and p value. You would not if the intercept were included.

Topic 9. How is nonindependence controlled if there are several predictor variables, some of which are between and others of which are within dyad?

One needs to perform two analyses. In the first, the sum or average of the two dyad members' scores is the outcome and the predictors in this equation is the set of between-dyads predictor variables. The second analysis is of the difference scores and includes the difference scores of the within-dyads variables as predictors and no intercept. Sometimes this second analysis can be accomplished by a repeated measures analysis of variance. There must be one within-dyads variable that is dichotomous, e.g., gender. It would be treated as the "repeated measure," and dyad, not person, would be the unit of analysis. If there are two dichotomous within-dyads factors, then the following must be done. One of the within-dyads variables becomes the repeated measure. The other is captured by creating the following between-dyads factor. It codes for the level of second within-dyads factor at first level of the repeated measure. The interaction of this factor with the repeated measures captures the second within-dyads factor. Interactions of between and within factors are captured by including the between factors in the difference score regressions. Much more detail on this topic is contained in Chapter 3 of Kenny, Kashy, and Cook (2006).

Topic 10. How can effects be estimated if the predictor variable is a mixed variable?

Model

The model that contains predictor variables that are between-dyads, within-dyads, and mixed is called the Actor-Partner Interdependence Model or APIM. You can download a bibliography of APIM papers if you click here. Assume that there is a mixed predictor variable, denoted as X, and an outcome denoted as Y. Denote X_i and X_i' as the two scores of the predictor variable for dyad i and Y_i and Y_i' as the two scores on the outcome for that same dyad. The actor effect is defined as the effect of X_i on Y_i and X_i' on Y_i' and partner effects as the effect of X_i on Y_i' and X_i' on Y_i. Basically the model is that X and X' cause Y where the effect of X on Y is called an actor effect and the effect of X' on Y is called a partner effect. So if a researcher examined the effects of gender on intimacy and studied friendships, both same and opposite gender, there are at least two effects of gender on intimacy. Females could be more intimate than males, an actor effect. Alternatively, interactions with females could be more intimate, a partner effect, For more detail on the analysis of APIM consult Chapter 7 of Kenny, Kashy, and Cook (2006).

Actor-Partner Interaction

Many dyadic processes can be viewed as actor-partner interactions. For instance, the effects of similarity between dyad members can be viewed as an actor-partner interaction. The usual way to measure the actor-partner interaction is to multiply actor times partner effects or X times X'. However, if the interaction is supposed to be similarity, then the absolute difference between X and X' would be measured. Note that the actor-partner interaction is always a between-dyads variable. For any actor-partner interaction that is estimated, one should control for the main effects of actor and partner. Thus, if similarity is to be tested, the main effects of actor and partner should be controlled.

Estimation

Practically, there are three ways to estimate this model when there is nonindependence: pooled regressions, structural equation modeling, and multilevel modeling. The pooled regression method has been essentially replaced by the two other two methods, but it is still useful to describe.

POOLED REGRESSIONS
This is old-fashioned way to do the analysis, but it is discussed as it might be helpful to think about. Two regression equations are estimated and their results are pooled. In the first, the criterion variable is the average the two Y scores. The predictor variables in this equation include the average of all mixed variables and all the between-dyads variables. In the second regression equation, the differences between the Xs and Ys are computed. One computes the difference in the same way for all of the variables. In this equation the difference between X and all within-dyads variables (differenced) are included. However, the intercept is not estimated in this equation. The standard errors need to be pooled and specialized degrees of freedom must be estimated that are described below. In the between-dyads regression, the average of the two members' scores is used and the regression coefficient is denoted as b_B. In the second regression equation, the difference score is computed and no intercept is fitted. This coefficient is denoted as b_W. These two coefficients are used to determine estimates of actor and partner effects.

STRUCTURAL EQUATION MODELING
Two equations are estimated in which Y and Y' are the outcomes. In each equation analysis, X and X' are predictors. To test specialized predictions (e.g., pooling coefficients and setting coefficients equal), a structural equation modeling program is needed. Dyad is the unit of analysis. The method is most useful when dyad members are distinguishable. One advantage of this approach is that the entire model is estimated, including the correlation of the Xs and the residual correlation of the Ys.

MULTILEVEL MODELING
Multilevel modeling can be used to estimate the APIM. Any multilevel program can be used. In this analysis, each observation is a different person. However, there must be a variable that identifies each dyad and for some computer programs, the data should be sorted by this variable. First described here is PROC MIXED within SAS. Later we describe SPSS. The code is:

      PROC MIXED;
        CLASS = DYAD;
        MODEL OUTCOME = X XPAR / S DDFM =SATTERTH;
        REPEATED / TYPE = CS SUBJECT = DYAD;

Person is the unit of analysis and for each person there is a code for DYAD (a unique score for each dyad). The variable XPAR is the score of the partner on X or what has been designated here as X'. The REPEATED statement allows for the possibility of a negative correlation between dyad members. Do not be surprised if the degrees of freedom are non-integer values. For some computers and data sets, adding "DDFM=SATTERTH" significantly increases the computational time. See Campbell and Kashy (2002) for more details on the analysis using SAS and HLM5.

To use SPSS (12.0 and higher), it is advisable that the data be sorted by dyad. We also need a variable that we will call MEMBER. For example, one person is 1 on MEMBER and the other is 2. (If there is a distinguishing variable in the data set, it can be used instead of MEMBER.

Upper case terms refers to an SPSS command.

Step 0: Preparation

File: Individual as unit.
Create necessary variables; partner predictor on the record.
Center predictors if necessary.
Make sure dyad id (DYAD) and a person number (MEMBER) is present.

Step 1: Start

ANALYSIS
MIXED MODELS
LINEAR
Type in dyad id in SUBJECTS
Type in the MEMBER variable name in REPEATED MEASURES if members are indistinguishable or the distinguishing identifier (e.g., GENDER) if members are distinguishable.
Pick COMPOUND SYMMETRY is members are indistinguishable and COMPOUND SYMMETRY HETEREOGENOUS if people are distinguishable on the repeated measures variable
CONTINUE

Step 2: Click LINEAR MIXED MODELS

Type in the name of the DEPENDENT VARIABLE
Type categorical variables in FACTOR(S)
Type continuous variables in COVARIATE(S); include actor or own X as a predictor and partner's X as predictors.

- The remaining steps go from left to right on the bottom of the screen. -

Step 3: Click FIXED

Add in relevant terms. Include relevant actor and partner effects.
Pay close attention to the term in the box in the middle.
Ordinarily make sure "INCLUDE INTERCEPT" box is checked.
CONTINUE

Step 4: Click STATISTICS

Click PARAMETER ESTIMATES
Click TESTS FOR COVARIANCE PARAMETERS
Can ask for DESCRIPTIVE STATISTICS and CASE PROCESSING SUMMARY
CONTINUE

Step 5: Run the job

Click OK

If you save syntax, you can delete the following statements as they use defaults:

/CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0,ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/METHOD = REML

Syntax might look like the following for indistinguishable members:

MIXED
dv WITH actorx partx
/FIXED = actorx partx actorx*partx
/PRINT = SOLUTION TESTCOV
/REPEATED = member | SUBJECT(dyadid) COVTYPE(CSR) .

Note that if members were distinguishable, that variable would be added and CSR would be changed to CSH.

References
Campbell, L. J., & Kashy, D. A. (2002). Estimating actor, partner, and interaction effects for dyadic data using PROC MIXED and HLM5: A brief guided tour. Personal Relationships, 9, 327-342.

Griffin, D., & Gonzalez, R. (1995). Correlational analysis of dyad-level data in the exchangeable case. Psychological Bulletin, 118, 430-439.
Gonzalez, R., & Griffin, D. (1999). The correlational analysis of dyad-level data in the distinguishable case. Personal Relationships, 6, 449-469.

Kenny, D. A. (1995). The effect of nonindependence on significance testing in dyadic research. Personal Relationships, 2, 67-75.
Kenny, D. A. (1996). Models of nonindependence in dyadic research. Journal of Social and Personal Relationships, 13, 279-294.
Kenny, D. A., & Cook, W. (1999). Partner effects in relationship research: Conceptual issues, analytic difficulties, and illustrations. Personal Relationships, 6, 433-448.
Kenny, D. A., & Judd, C. M. (1986).Consequences of violating the independence assumption in analysis of variance. Psychological Bulletin, 99, 422-431.

Kenny, D. A., Kashy, D. A., & Bolger, N. (1998). Data analysis in social psychology. In D. Gilbert, S. Fiske, & G. Lindzey (Eds.), Handbook of social psychology (4th ed., Vol. 1, pp. 233-265). Boston, MA: McGraw-Hill.
Kenny, D. A., Kashy, D. A., & Cook, W. (2006). Dyadic data analysis. New York: Guilford..

^{Go back to the homepage.}