This is a first attempt at a presentation of the use of the glht function of the multcomp package to demonstrate how to construct and use a General Linear Hypothesis Test (glht).
We can look at the parameter estimates for regression coefficients, and their standard errors to estimate their significance, using a simple t-test. We can use the esticon function (from the doBy package) to test some linear function of fixed effects estimates. However, if we want to, or need to, test a hypothesis which involves a simultaneous test of several terms (ie where the numerator degrees of freedom are more than 1) then we need the glht function from the multcomp package.
Such tests we often see discussed in the context of 'CONTRASTS', indeed in SAS the CONTRAST statement is a very useful and powerful facility. However, it only allows us to test the hypothesis K'b=Zero, and not the general linear hypothesis K'b=m, where m is non-zero (see S.R. Searle, Linear models, for more details). NOTE, in many of the (excellent) SAS manuals these hypotheses are presented as Lb=0. IFF we have a pure regression problem then PROC REG can provide us with this sort of K'b=m test, using the TEST optional satement, but this is restricted to regression models only.
Probably 95% of the time these (tests against Zero) are what we want, but not always; we may have some good theories and reasons for testing against something other than Zero (i.e. m). Such a scenario (K'b=m) is not possible in SAS using the CONTRAST sttatement, however, the glht function in the multcomp package of R can provide this general linear hypothesis test.
Consider the multiple regression example we looked at; leaf burn of tobacco plants .
Suppose that we have reasons to believe that the intercept in our regression model should be 2, and that the regression coefficient of the regression of leaf burn on Nitrogen % should be -0.5. How do we test if our results are consistent with this hypothesis, or not. NOTE we have only one hypothesis here, that jointly, the intercept is equal to 2, and the regression coeffcient on Nitrogen is equal to -0.5.
# set a string as the path/filename, so we can easily move the # file from one directory to another # this will typically be (for a Windows system, something # like "c:/Users/UserName/wherever_your_R_data_are/" # for a Linux system it will probably be "/home/UserName/your_R_files/' FilePath <- "c:/Users/Roger Cue/R-stuff/regress_R/" FileName <- "regress1.txt" filen <- paste(FilePath,FileName,sep="") filen ds<-read.table(filen,header=TRUE) summary(ds) # NB anova gives Type 1 Sums of Squares, in order, not Marginal # so we need to fit the model thrice, with the reversed order, to get # the marginal effect of X1, X2 and X3 # model 1, all fixed effects lm1 <-lm(y~x1+x2+x3,data=ds) lm1 print(lm1,digits=7) summary(lm1) anova(lm1) # try a glht, of b0 = 2 and b1 = -0.5, WORKS (good) library(multcomp) # create k' row for intercept k1 <- c(1, 0, 0, 0) # create k' row for b1 (X1 = Nitrogen) k2 <- c(0, 1, 0, 0) # cobine (row bind, together the two rows) K <- rbind(k1,k2) # create our m vector, what glht calls the right hand side # form our expression Kb = m m<- c(2, -0.5) kb <- glht(lm1,linfct=K,rhs=m) summary(kb,test=Ftest())
The results from the last part of this analysis, for our hypothesis, are:
General Linear Hypotheses Linear Hypotheses: Estimate k1 == 2 1.8110 k2 == -0.5 -0.5315 Global Test: F DF1 DF2 Pr(>F) 1 1.395 2 26 0.2658 >
This we see that our estimate of the intercept is 1.811, and our estimate of the regression coefficient of the regression of leaf burn on Nitrogen % is -0.5315.
Our global F-test produces a calculated F value of 1.395, for 2 numerator degrees of freedom (logical because we have, jointly, the intercept and the regression coefficient, and 26 d.f. for the denominator (in this case the residual). The associated probability is 0.2658; we can conclude that our estimates are not inconsistent with the hypothesis.