## Including or Excluding Independent Variables

### Independent variables, To Be (In), or Not To Be (In), that
is the question

### Do I have to be in the model?
L

How do we decide whether to include an independent variable, and how do we
decide whether to exclude an independent variables from our model?

If we have developed, or hypothesised, a model and we find that each of
the factors is statistically significant, then we will keep them in our
model; there is no problem there!

Consider that we have an experiment where we think that two factors, X1
and X2, are likely to be important and to have a real effect on our
dependent variable (Y). We have a quite large sample size and obtain :

Suppose : b_{1} = 0.5 ± 0.05 => t_{calc} = 0.5/0.05 = 10

and b_{2} = 0.4 ± 0.02 => t_{calc} = 0.4/0.02 = 20

These are both statistically significant, we retain them in the model,
there is no problem.

However, if a smaller sample size had been used the standard errors we
would likely have obtained would have been proportionately larger, by a
factor of the square root of n (since the sampling variance would be n
times larger). So if we had a sample size only 1/4 as large, then the
sampling variance would be 4 times as large and the standard errors would
be twice as large. **N.B.** This comes from basic, introductory
statistics; sampling variances are inversely proportional to the sample size.

Thus suppose that we had obtained:

b_{1} = 0.5 ± 0.1 => t = 0.5/0.1 = 5

and b_{2} = 0.4 ± 0.04 => t = 0.4/0.04 = 10

Now, if b_{1} is not statistically significant (because of the smaller
sample size) and we accept H_{o} (that there is no effect of factor
b_{1}) we may **seriously bias the other factors!**

Even if we can accept H_{o} that does not prove that there is no
relation. So if we have good reason to believe that X_{1}
(b_{1}) has an effect then we should be reticent to eliminate it.