![]() ![]() We argue that the biostatistical literature contains a great deal of advice that could be leading empiricists to believe that they are seriously violating assumptions of their models, when in fact they are not. Our primary goal is to aid empiricists in understanding what aspects of the literature on collinearity should be considered when conducting analyses of data sets wherein correlations occur among predictor variables. We then build on this to clear up some misleading advice that we consider prevalent on the consequences of (multi)collinearity, i.e. of correlations among explanatory variables. In this paper we first discuss what multiple regression is, what coefficients arising from multiple regression analysis are, and how the purpose of multiple regression is different from that of simple regression. However, this interpretation can lead to confusion for multiple regression, where the coefficients of a multiple regression model measure something subtly but crucially different. This rough interpretation may generally be satisfactory for simple regressions, i.e., models involving only one explanatory variable. Statistical results from both procedures are commonly interpreted as metrics of the degree of relationship between (sometimes multiple) explanatory and response variables. Simple and multiple regression are two of the most-used statistical procedures in biology. Purported solutions to the perceived problems of collinearity are detrimental to most biological analyses. This should not be interpreted as a problem, as it is perfectly natural that direct effects should be harder to characterise than univariate associations. In particular, collinearity causes multiple regression coefficients to be less precisely estimated than corresponding simple regression coefficients. We suspect that the perception of collinearity as a hindrance to analysis stems from misconceptions about interpretation of multiple regression models, and so we pursue discussions about these misconceptions in this light. There is no general sense in which collinearity is a problem. A simple, but careful, look at the distinction between these two analyses is valuable in its own right, and can also be used to clarify widely-held misconceptions about collinearity (correlations among explanatory variables). Beyond simple and multiple regression in their most basic forms, understanding the key principles of these procedures is critical to understanding, and properly applying, many methods, such as mixed models, generalised models, and causal inference using graphs (including path analysis and its extensions). A clear understanding of these methods is essential, as they underlie a large range of procedures in common use in biology. We suspect that the superficial similarity of simple and multiple regression leads to confusion in their interpretation. The latter describe the partial, or direct, effects of multiple variables, conditioned on one another. ![]() The former use regression lines to describe univariate associations. Simple regression (regression analysis with a single explanatory variable), and multiple regression (regression models with multiple explanatory variables), typically correspond to very different biological questions. Received 10 February 2018 Revised Accepted ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |