|
A parametric statistical technique for identifying the relationship between a dependent variable and one or more independent variables. The data employed in regression analyses should be at the interval or ratio level of measurement (though nominally measured independent variables, known as dummy variables, may be included).
The technique fits a straight-line plane to the trend in a scatter of points in n dimensions (where n is the total number of variables being investigated). It is best visualized in the two-dimensional case (i.e. with one independent variable only), in which the plane is represented by a straight line (see figure) whose parameters are determined by the formula:
{img src=show_image.php?name=bkhumgeofm21.gif }
where X is the independent (or causal) variable, Y is the dependent (or effect) variable, b is the slope of the line (often termed the regression coefficient), a is its intercept (where the regression line crosses the vertical axis — i.e. the value of Y when X = 0.0: it is also termed the constant) and e is the error term for the residuals (cf. significance test).
A multiple regression equation contains more than one independent variable and has the general form:
{img src=show_image.php?name=bkhumgeofm22.gif }
In this each value of b (termed a partial regression coefficient) indicates the change in the value of Y with a unit change in the value of the relevant X variable, assuming no change in the values of the other X (i.e. they are \'held constant\' in the technical jargon). The intercept coefficient (a) indicates the estimated value of Y when all of the X variables are set to 0.0.
{img src=show_image.php?name=bkhumgeofig65.gif }
regression
The goodness-of-fit of a regression line (i.e. its closeness to all of the points in the scatter) is measured by a correlation coefficient. The goodness-of-fit for each separate variable in a multiple regression is termed the partial correlation coefficient.
Regression analysis, like other techniques within the general linear model, makes a variety of assumptions about the data used. If these are not met in a data set being analysed, then the coefficients are likely to be either or both inefficient and biased. (RJJ)
Suggested Reading Barnes, T.J. 1998: A history of regression. Environment and Planning A 30: 203-23. Johnston, R.J. 1978: Multivariate statistical analysis in geography: a primer on the general linear model. London and New York: Longman. O\'Brien, L. 1992: Introducing quantitative geography: measurement, methods and generalised linear models. London and New York: Routledge. |
|