David A. Kenny
October 18, 2009


Review of Multiple Regression

The example equation:

          Y = a + bX + cZ + e

     Y   criterion variable
     X   predictor variable
     a   intercept: the predicted value of Y when all the predictors are zero
     b   regression coefficient: how much of a difference in Y results from a one unit difference in X
     e   residual
     Y'   predicted Y given X and Z or eqivalantly a + bX + cZ (often called "Y hat")
     R   multiple correlation: the correlation between Y and Y'

The coefficients (a, b, and c) are chosen so that the sum of squared errors is minimized. The estimation technique is then called least squares or ordinary least squares (OLS). Given the criterion of least squares, the mean of the errors is zero and the errors correlate zero with each predictor.

If the predictor and criterion variables are all standardized, the regression coefficients are called beta weights. A beta weight equals the correlation when there is a single predictor. If there are two or predictors, a beta weights can be larger than +1 or smaller than -1.

The predictors in a regression equation have no order and one cannot be said to enter before the other.

Generally in interpreting a regression equation, it makes no scientific sense to speak of the variance due to a given predictor. Measures of variance depend on the order of entry in step-wise regression and on the correlation between the predictors. Also the semi-partial correlation or unique variance has little interpretative utility.

The standard test of a specified regression coefficient is to determine if the multiple correlation significantly declines when the predictor variable is removed from the equation and the other predictor variables remain. In most computer programs this is test is given by the t or F next to the coefficient.

Multicollinearity

If two predictors are highly correlated or if one predictor has a large multiple correlation with the other predictors, there is said to be multicollinearity. With perfect multicollinearity (correlations of plus or minus one), estimation of regression coefficients is impossible. Multicollinearity results in large standard errors for coefficients and so a statistically significant regression coefficient is difficult (power is low).

Example

Consider the hypothetical regression equation in which Age and Gender (1 = Male and -1 = Female) predict weight:

Weight = 50 + 25(Gender) + 3(Age) + Error

We interpret the coefficients as follows:
      intercept: the predicted weight for people who are zero years of age and half way between male and female
      gender: given the coding, a difference between men and women is 2 and so there is a 50 pound difference between the two groups
      age: a difference of one year in age results in a difference of 3 pounds

The multiple correlation would represent the correlation between Weight and Predicted Weight; the multiple correlate squared would represent the proportion of variance in Weight explained by Age and Gender.  Note that for a male aged 50 years, the predicted age would be 50 + 25 + 150 = 225.

Another site that more extensively describes multiple regression:
Statsoft

Go to the next page.

Go to the SEM page.