When we are planning an experiment we are often faced with deciding upon the number of sub-samples per experimental unit and the number of experimental units required. There is little point in carrying out an experiment if there is only a small chance of being able to detect any differences even if they are real and do exist. We might as well save ourselves the time and effort and instead go and relax on the beach; at least we will get a nice tan!

The two factors, number of sub-samples and sample size, are usually not independent of oneanother. Most text books (and Steel, Torrie and Dickey is no exception) only deal with the simple question of sample size when there are no sub-samples. This is just one particular case, and often causes confusion when attempting to plan experiments involving sub-samples (nested designs). However, with a few basic rules and principles it is relatively easy; it requires us to determine the variance of our measurements. We will look at some formulae and specific examples.

Cochran (xx) (STD ...) provides a formula for deciding upon the optimum number of sub-samples per experimental unit; it depends upon the relative variabilities of the experimental units and sub-samples, and their relative costs.

n = Ö [ (c_{1} *
s^{2}_{e})/
(c_{2} * s^{2}_{exp})]

where

c_{1} = the cost per experimental unit

c_{2} = the cost per sampling unit

s^{2}_{e}
= the variance amongst sampling units

s^{2}_{exp}
= the variance amongst experimental units

For example, using the data from section (Random effect) on apple trees
we found that the variance between trees
( s^{2}_{tree} ) was
32 and that the variance amongst apples within trees
( s^{2}_{e} ) was
12. Suppose that we are going to carry out an experiment and that we
calculate that it will cost 20$ per tree and 0.12$ per apple
measured. Then the optimum number of apples per tree (sub-samples per
experimental unit; n.b. tree is the experimental unit) is

n = Ö [( 20 * 12)/(0.12 * 32)] = 8.14

Thus the optimum number of sub-samples per experimental unit (apples per tree) will be 8 or 9; let us decide to measure 9 apples per tree.

How do we calculate sample size? Most text books on statistics give the classical formula for determining the necessary sample size:

n ³ (Z_{a/2}
+ Z_{b})^{2} *
( s / d)^{2}

This formula is correct, but is only appropriate as it stands when we
have the correct s.
This is often assumed to simply be the residual variance
s^{2}_{e};
which is only true if we are dealing
with a design where there is NO sub-sampling, i.e. a 'fixed effects'
only model! What if we have a design/plan where we cannot measure the
experimental unit directly, but rather we take measurements on
[sub-]samples? The above example with apples trees is an example of
this; the experimental unit to which we apply a treatment is the tree,
but we measure the weight of the apples on the tree. We have
concluded in the above section, that 9 apples per tree would be sensible.
What we have to do is determine s;
s is
the variance of our measurement of the experimental unit. Recall that
the measurements are made on the apples, so effectively the measure
for the tree is the average of the 9 apples. What is the variance
of the mean of these 9 apples? As you will undoubtedly remember from
Statistical Methods I (or it's equivalent) the variance of a mean
is the variance divided by the number of measurements. What is
the variance (Mean Square) amongst trees? The Mean Square, for an
experiment with 9 apples per tree, would be:

MS_{trees} = s^{2}_{e} +
9 * s^{2}_{tree}

so the variance (our s^{2}) of a
tree mean will be

s^{2} =
s^{2}_{e}/9 +
s^{2}_{tree}

Note, as the number of sub-samples (apples) increases so
s^{2}_{e}/n will decrease,
but even with a very large number
of apples per tree we will still have the component of the variance
due to trees s^{2}_{tree}
which will not decline! If we compute the
s^{2} in this way we will arrive
at the correct
determination for the size of our experiment. Note, that what is
given here is entirely consistent with the classical formula; in
the classical case where there is no subsampling, then the number
of measurements which contribute to each experimental unit measure
is 1, and the variance of a mean of 1 number is simply the variance,
which will be the residual variance.

Stroup, a practicing statistician at the University of Nebraska, Lincoln, has written a very nice article about calculating the number of experimental units one needs. The web link to the article is given below:

http://statistics.unl.edu/faculty/steve/802/2001/power_sas.pdf

Another paper, published in the [American] Journal of Dairy Science, 2006, is entitled "Estimating statistical power of mixed models used in dairy nutrition experiments", Kononoff, R. J., and Hanford, K. J. JDS v89:P3968-3971. This paper is available electronically through the McGill Library system. Although the title refers to mixed models and dairy nutrition it is quite general and applicable to any field and/or class of model, fixed or random or mixed; dairy cattle, humans, soils, etc. Recommended.

Steel, Torrie and Dickey, Chapter 14.6, and Chapter 11

R.I. Cue ©

Department of Animal Science, McGill University

last updated : 2010 November 24