# Introduction Linear Regression Model Building

A major role of epidemiologic research is to identify and quantify associations between explanatory (independent) and outcome (dependent) variables. For example, you may be interested in determining the effect of:

• management practices (explanatory) on disease status (outcome);
• milk yield (explanatory) on mastitis incidence (outcome);
• retention of placenta (explanatory) on reproductive performance (outcome)

Regression analysis is the preferred technique to evaluate these associations. Linear regression analysis is used when the outcome variable is numerical or quantitative and the association between outcome and explanatory variables is approximately linear.

## Linear regression model

A linear regression model can be represented as:

y = α + β1x1 + β2x2 + β3x3 + ε

where,

• y is the quantitative outcome variable and x's are various explanatory variables that can be either quantitative or categorical.
• β1 indicates an increase in expected value of y with a unit increase in x1, after adjusting for all other variables in the model; β2 indicates an increase in expected value of y with a unit increase in x2, after adjusting for all other variables in the model (and so on)
• Positive and negative values of β indicate, respectively, increase and decrease in the expected value of y with increase in the respective x value, whereas zero value of β indicates no linear association between y and x
• α is the expected value of y when all x's are equal to zero and is usually not biologically meaningful and
• ε is the random error which is assumed to be distributed normally with mean zero and variance σ2.