R topics

A well-regarded (by me) Graphical User Interface (GUI) for R is the RStudio package. This software (RStudio) is available for Windows, Linux and Mac computers, as is the R software (Open Source). I recommend using R from the RStudio GUI, indeed all the examples here had been run under RStudio.

Using R. Here are my recommendations; they may not be perfect, but they seem to have worked for me.

1. Download and install (as Adminstrator/root super-user) R from CTAN. For Windows there is an installer program. For Linux there are pre-compiled versions for some platforms, OR your Linux distribution may provide a base R. I do not have a Mac and hence cannot comment upon its install, however, I imagine that it is similar to Windows. IF your OS is 64-bit Windows then the installer will offer to install both 32-bit and 64-bit R versions. I let it install both, but I almost never use the 32-bit version on a 64-bit Windows. For current Macs (which are all 64-bit OS) there is only a 64-bit version of R. IF you are using Linux then most of the packages are distributed as source code, not pre-complied binaries. This means that you REALLY need to have the GCC C, C++ and Fortran compiler development tools installed on your system. If your Linux system was set up as a workstation with software development tools installed then you may well have all the compiler/build tools required (I did). If not, then use your Linux distribution tools to install GCC, C, C++ and Fortran compilers [gfortran] (and any other additional subsidiary packages that they may need). I did this on my Linux systems, it was not painful! The details/specifics of software installs vary from system to system; tools include yum, aptitude, apt-get and others. Alternatively, if you manage your Linux system from a GUI then there is probably an icon for 'Software' and 'Software Update'.

2. As a regular (ie non-Administrator/root user) use the Rgui (Windows), or R.app (Mac), or R (Linux) and install various addition packages into your own personal workspace library. I install car, doBy, emmeans (supercedes lsmeans), nlme, lme4, lmerTest, ggplot2, haven, survival, multcomp, pbkrtest, multcomp, readxl and xtable. Installing them into your own library means that you know what is installed, what version it is, and that you can upgrade to a newer version as/when one becomes available, all without interfering with any other user of the same computer. This may seem less important if you are using R on your own personal laptop, but if you are using/sharing a computer then it is a good idea to have each person's tools and R packages separate. If you are using R on a Linux system then the install process will in fact download the R source code and then run the build and install tools (ie compile the source code files and link them together and install them in the appropriate place [your own personal library]). Again this is relatively painless, and even quite impressive and neat to watch the build steps happening: compiling each source code using the appropriate C, C++ or Fortran compilers, linking the object files, installing to a library and then cleaning up after itself!

3. Download and install the GUI front-end RStudio [desktop version] (available from www.rstudio.com).

4. From time to time (e.g. once a week) I use the Rgui (ie not RStudio) to check for updates to the R packages. I do not use the package update facility in RStudio as I have found the sometimes this results in conflicts in the versions of packages. I download R only from CTAN mirrors in Canada, and likewise update/install packages only from these self-same mirror sites. I am not sure if the conflicts/problems I have had with RStudio package updates is a language/version issue or what. The way I do R package updates seems to work for me and avoid problems. You may wish to install a version of R configured for your own native language, in which case you may also wish to download package updates from the same site, in the same language, YMMV. From time to time, when using Rstudio I use the "check for updates" facility of Rstudio to check for updates to the Rstudio program itself, and if so then download and install the latest version of Rstudio.

Packages I like (and why):

car - allows one to produce Type III Sums of Squares and F-tests

doBy - provides facilities for various linear contrasts and estimates

haven - allows one to import SAS datasets (ie .sas7bdat files)

lme4 - provides linear models (lm), comparable to SAS proc glm and genmod

lmerTest - extends lm to mixed models, BUT with denominator degrees of freedom as per SAS proc mixed; particularly useful for nested models and random regression models. N.B. only normal distributions. Note, lmerTest makes use of the package pbkr, so you should always ensure that the version of pbkr is up to date with lmerTest. This can be checked with the packageVersion("lmerTest"), and packageVersion("pbkrtest").

nlme - non-linear models (fixed and random), similar to SAS proc nlin and nlmixed.

emmeans - generate, from an lm object or lmer object (results from lm or lmerTest respectively) emmeans (estimated marginal means, or predicted marginal means). This supercedes least squares means, commonly called lsmeans by SAS et al).

survival - survival analyses, allows censoring of observations. Comparable to SAS proc lifetest and phreg

multcomp & glht - general linear hypthesis tests. This package allows us to generate multiple line F-tests for our own generated hypothesis test; useful.

readxl - read Excel files. Does not use Java, and does not need any Windows components (hence useful for Linux compatability). However, one should always examine very carefully the imported data to verify that what we have is indeed what we think we are importing!

xtable - produce LaTeX code from various objects (e.g. lm, lsmeans, Anova, etc); useful for producing high quality output.

Documentation and references/links

There is a wealth of information and documentation (tutorials, examples, references etc) available. There are also MOOCS (e.g. Coursera) on R. This list is not meant to be exhaustive, it is simply some of the material that I have found useful to learn how to do things in R, particularly for somebody coming with a knowledge of SAS and/or Fortran (no I am not just betraying my age, but also the fact that in the high performance computing world of my research, Fortran [and its speed] is still king).

The R Book (Crawley, publisher Wiley) - in my opinion THE basic reference.

Bob Muenchen (University of Tennessee) has an excellent document entitled "R for SAS and SPSS Users" and a published book of the same name. As the title suggests it is primarily geared to showing people with a knowledge of either SAS or SPSS how to acomplish the same things using R. I have found it excellent for learning and seeing how to manipulate data, e.g. subsetting, sorting, merging of datasets.

Statistical models and analyses

Many of these topics are conversions from the material examined and analysed using SAS from the course Statistical Methods II (AEMA 610). the objective here is to show how the same analyses can be carried out using R, as a gentle introduction to R for statistical analyses.

  • Multiple regression
  • Completely Randomized Design, CRD
  • Classification or Regression model?
  • Correlations
  • two-Way fixed effects model
  • Randomized Complete Block Design, RCBD
  • General Linear Hypotheses Tests (glht)
  • Nested (hierarchical, subsampling) models
  • Factorial, fixed effects models

  • R.I. Cue ©
    Department of Animal Science, McGill University
    last updated : 2018 March 29th