David
A. Kenny
August 15,
2011
Path Analysis
This page discusses how to use multiple
regression to estimate the parameters of a structural model.
Key
Assumption
For an endogenous variable, its disturbance must be uncorrelated with all of the specified causal variables. So for a
model, consider each endogenous variable and determine that its disturbance is
uncorrelated with each of its causes.
Violation
of the Assumptions to Use Multiple Regression to Estimate Structural
Coefficients
There are three
conditions under which the disturbance is correlated with the exogenous
variable (thus, eliminating multiple regression as an appropriate tool to estimate
causal paths):
a) Spuriousness: A variable causes both the endogenous
variable and one its causal variables and that variable is not included in the
model.
b) Reverse Causation: The endogenous
variable causes, either directly or indirectly, one of its causes.
c) Measurement Error: There
is measurement error in a causal variable.
Estimation
Using Multiple Regression
Unstandardized variables
Paths: b weights from the
regression equation
disturbance variance: the variance of the endogenous variable times one minus the multiple correlation squared
Curved lines between exogenous variables: covariances
Curved lines between
disturbances: partial covariances, with causal variables of both endogenous variables partialled
Standardized variables
Paths: beta weights from the
regression equation
Disturbance Path: square root of one minus the multiple correlation squared
Curved lines between exogenous variables: correlations
Curved lines between
disturbances: partial correlations with common causal variables of both endogenous variables partialled
Steps
in Testing
STEP
ONE: TEST OF DELETED PATHS
Respecify the
model to make it just-identified or
saturated. That is, add the paths that the model specifies to be zero and
include them in the model. Test the paths specified to be zero making
sure that the specified paths are included in the equation, but not tested.
These tests may be done with a reduced alpha (e.g., .01).
STEP
TWO: TEST OF SPECIFIED PATHS
Retaining the
significant paths from the previous step, test the paths that were specified to
be present in the Sometimes these tests are done hierarchically.
STEP
THREE: TRIMMED MODEL
Re-estimate the model, including (a) the paths that were specified to be zero
but were significant from step one and (b) dropping the paths that were
specified but were not significant in step two. In some cases, the model
should not be trimmed if either the goal is to compare the paths from one model
to another or if the goal is determine the absolute size of the paths.
Types
of Tests
a) test of the individual paths: standard t or F test of the
coefficient F
b) test of all of the paths in a given equation
(N - p - 1)(R22 - R12)
________________
k(1 - R22)
where N
is the overall sample size, p is the number of deleted plus specified paths, k
is the number of deleted paths, R12 is the multiple correlation squared (not adjusted) from the
equation with only the specified paths, and R22 is the multiple correlation squared (not adjusted) from
the equation with the specified and deleted paths.
c) The combined test of all of the paths in the model is usually a chi square
goodness of fit test from a SEM program such as AMOS, EQS, MPLUS, or LISREL.
Determination of Deleted Paths
If all of the
paths in the model can be estimated by multiple regression, the number of
deleted paths equals the number of knowns minus the number of unknowns or the
degrees of freedom of the specified model. To make the model just-identified, examine each pair of variables
and determine pairs of variables that are not linked by a path or a correlation
(including a correlation between disturbances). Add a path or a correlation
between disturbance between each pair of these unlinked pairs. The
direction of the path is given by theory and the requirement that feedback not be introduced.
Types
of Links Between Two Variables
Direct
effect:
Either X causes Y, Y causes X, or both.
Indirect
effect:
The relationship between X and Y is said to be indirect if X causes Z which in
turn causes Y. (To learn more about indirect effects and
mediation.)
Spuriousness: The relationship between X
and Y is said to be spurious if Z causes X and Y.
Unexplained
covariation:
Both X and Y are exogenous and so variation between them is not explained
by the model. Other names for spuriousness are omitted variables, confounding, and third-variable causation.
Decomposition
of a Correlation
Correlation
between two endogenous variables:
Correlation = Direct Effect + Indirect Effects + Spuriousness
Correlation between an endogenous variable and an exogenous variable:
Correlation = Direct Effect + Indirect Effects + Unspecified Covariance
Example
Below is the
Theory of Planned Behavior of Fishbein and Ajzen.
The coefficients in this model can be estimated by multiple regression:
For the Intention equation, it disturbance, U1 is uncorrelated with
both Attitude and Social Norms and for the Behavior equation, it disturbance, U2 is uncorrelated with both Intention.
There are two deleted paths model: One from Attitude to
Behavior and the other from Social Norms to Behavior. If these two paths
are added to the model, it would be saturated. The following three steps
are suggested:
STEP ONE: TEST OF DELETED PATHS
We add the paths
from Attitude and Social Norms to Behavior. We test these paths
including the specified path of Intention to Behavior.
STEP
TWO: TEST OF SPECIFIED PATHS
Retaining any of
the significant paths from the previous step, we test the paths a, b, and c.
STEP
THREE: TRIMMED MODEL
Because this is a fairly standard model, we probably would not trim. If
we did and say found that path b, the path from Social Norms to Intention, was
not needed, we re-estimate the model with that path dropped and then report the
resulting coefficients.