David A. Kenny
October 25, 2005
Path Analysis
This page discusses how to use multiple regression to estimate the
parameters of a structural model.

Key Assumption
For an endogenous variable, its disturbance must be uncorrelated with all of the 
specified causal variables.  So for a model, consider each endogenous variable and 
determine that its disturbance is uncorrelated with each of its causes.

Violation of the Assumptions to Use Multiple Regression to Estimate Structural Coefficients
There are three conditions under which the disturbance is correlated with the
an exogenous variable (thus, eliminating multiple regression as an appropriate 
tool to estimate causal paths):     a) Spuriousness:  A variable causes both the endogenous variable and            one its causal variables and that variable is not included in            the model.     b) Reverse Causation:  The endogenous variable causes,            either directly or indirectly, one of its causes.     c) Measurement Error:  There is measurement error in a causal variable. Estimation Using Multiple Regression     Standardized variables          Paths: beta weights from the regression equation          Disturbance ath: square root of one minus the             multiple correlation squared          Curved lines between exogenous variables: correlations          Curved lines between disturbances: partial             correlations with common causal variables of both                    endogenous variables partialled     Unstandardized variables          Paths: b weights from the regression equation          disturbance variance: the variance of the endogenous             variable times one minus the multiple correlation             squared          Curved lines between exogenous variables: covariances          Curved lines between disturbances: partial covariances,             with causal variables of both endogenous variables                   partialled Steps in Testing      STEP ONE: TEST OF DELETED PATHS           Respecify the model to make it just-identified. That           is, add the paths that the model specifies to be zero           and include them in the model.  Test the paths           specified to be zero making sure that the specified           paths are included in the equation, but not tested.           These tests may be done with a reduced alpha (e.g., .01).          STEP TWO: TEST OF SPECIFIED PATHS           Retaining the significant paths from the previous step,           test the paths that were specified to be present in the           model.  Sometimes these tests are done hierarchically.     STEP THREE: TRIMMED MODEL           Re-estimate the model, including (a) the paths that were           specified to be zero but were significant from step one           and (b) dropping the paths that were specified but were not           significant in step two.            Types of Tests     test of the individual paths         standard t or F test of the coefficient     F test of all of the paths in a given equation                           (N - p - 1)(R22 - R12)                           ---------------------                                k(1 - R22)             where                 N   overall sample size                 p   the number of deleted plus specified paths                 k   the number of deleted paths                 R12  multiple correlation squared (not adjusted)                     from the equation with only the specified                     paths                 R22  multiple correlation squared (not adjusted)                     from the equation with the specified and                     deleted paths The combined test of all of the paths in the model is usually a chi square goodness of fit test from a SEM program such as AMOS, EQS, or LISREL. Determination of Deleted Paths     If all of the paths in the model can be estimated by multiple  regression, the number of deleted paths equals the number of knowns minus  the number of unknowns or the degrees of freedom of the specified model.   To make the model justidentified, examine each pair of variables and  determine pairs of variables that are not linked by a path or a  correlation (including a correlation between disturbances).  Add a path or  a correlation between disturbance between each pair of these unlinked pairs.   The direction of the path is given by theory and the requirement that  feedback not be introduced.                                  Types of Links Between Two Variables     Direct effect:  Either X causes Y, Y causes X, or both.       Indirect effect:  The relationship between X and Y is said to be indirect if X causes Z which in turn causes Y. (To learn more about indirect effects.)     Spuriousness:  The relationship between X and Y is said to be spurious  if Z causes X and Y.     Unexplained covariation:  Both X and Y are exogenous and so variation  between them is not explained by the model. Decomposition of a Correlation Correlation between two endogenous variables:    Correlation = Direct Effect + Indirect Effects + Spuriousness Correlation between an endogenous variable and an exogenous variable:    Correlation = Direct Effect + Indirect Effects + Unspecified                                                     Covariance
Go to the next page.


Go to the SEM page.