David A. Kenny
October 18, 2009
Review of Multiple
Regression
The example equation:
Y = a + bX + cZ + e
Y criterion variable
X predictor variable
a intercept: the predicted value of Y when all the predictors are zero
b regression coefficient: how much of a difference in Y results from a one unit difference in X
e residual
Y' predicted Y given X and Z or eqivalantly a + bX + cZ (often called "Y hat")
R multiple correlation: the correlation between Y and Y'
The coefficients (a, b, and c) are chosen so that the sum of squared errors
is minimized. The estimation technique is then called least squares or
ordinary least squares (OLS). Given the criterion of least squares, the
mean of the errors is zero and the errors correlate zero with each predictor.
If the predictor and criterion variables are all standardized, the
regression coefficients are called beta weights. A beta weight equals the correlation when there is a single predictor. If there are two or predictors, a beta weights can be larger than +1 or smaller than -1.
The predictors in a regression equation have no order and one cannot be
said to enter before the other.
Generally in interpreting a regression equation, it makes no scientific
sense to speak of the variance due to a given predictor. Measures of
variance depend on the order of entry in step-wise regression and on the
correlation between the predictors. Also the semi-partial correlation or
unique variance has little interpretative utility.
The standard test of a specified regression coefficient is to determine
if the multiple correlation significantly declines when the predictor
variable is removed from the equation and the other predictor variables
remain. In most computer programs this is test is given by the t or
F next to the coefficient.
Multicollinearity
If two predictors are highly correlated or if one predictor has a large
multiple correlation with the other predictors, there is said to be
multicollinearity. With perfect multicollinearity (correlations of plus
or minus one), estimation of regression coefficients is impossible.
Multicollinearity results in large standard errors for coefficients and so
a statistically significant regression coefficient is difficult (power is
low).
Example
Consider the hypothetical regression equation in which Age and Gender (1 = Male and -1 = Female) predict weight:
Weight = 50 + 25(Gender) + 3(Age) + Error
We interpret the coefficients as follows:
intercept: the predicted weight for people who are zero years of age and half way between male and female
gender: given the coding, a difference between men and women is 2 and so there is a 50 pound difference between the two groups
age: a difference of one year in age results in a difference of 3 pounds
The multiple correlation would represent the correlation between Weight and Predicted Weight; the multiple correlate squared would represent the proportion of variance in Weight explained by Age and Gender. Note that for a male aged 50 years, the predicted age would be 50 + 25 + 150 = 225.
Another site that more extensively describes multiple regression: Statsoft
Go to the next
page.
Go to the SEM page.