Subsections

11.2 Getting started with R

The R language and environment for statistical computing and graphics is an open source software project that runs on all major operating systems. It is available as a standard Windows self-installer, as a universal binary disk image for Mac OS X, and as an rpm or similar for the major Linux distributions. Because it is open source, it is also possible to build R from the source code on any platform.

There are a number of graphical user interfaces for R, including the basic Windows default, the more sophisticated Mac OS X GUI, and several independently-developed GUIs,^11.3but the canonical interface is an interactive command line.

In keeping with the theme of this book, we will only work with R by writing code, rather than interacting with R via menus and dialogs.

11.2.1 The command line

The R command line interface consists of a prompt, usually the > character.^11.4 We type commands or expressions, R echoes what we type and, at the end of each expression, R prints out a result. A very simple interaction with R looks like this:

> "hello R"

[1] "hello R"

We have typed a simple piece of text and the value of this sort of simple expression is just the text itself.

11.2.2 Managing R Code

One way to write R code is simply to enter it interactively at the command line as shown above. This is a very good thing to do when we want to experiment with a new function or expression, or if we want to explore a data set very casually. For example, if we want to know what R will do when we divide a number by 0, we can quickly find out by trying it.

> 1/0

[1] Inf

However, interactively typing code at the R command line is a very bad thing to do if we ever expect to remember or repeat what we are doing. Most of the time, we will write R code in a file and get R to read the code from the file. This can be performed in an ad-hoc way by simply cutting and pasting code from a text editor into R. Alternative, some editors can be associated with an R session and allow submission of code chunks via a single key-stroke (the Windows GUI provides a script editor with this facility). Another option is to read an entire file of R code into R using the source() function (see Section 11.6).

It is vital that we retain a record of all manipulations that we perform on a data set. This is important from a professional and ethical standpoint as insurance that we fully declare any modifications of the data and to allow other scientists to fully replicate our actions. It is essential as a form of documentation of what we have done, and it is smart because making corrections to the procedure or repeating the procedure with another data set becomes straightforward and fast.

In the ideal situation, we need only store the original data set, files containing code to transform the data set, and files containing code to analyse the data set. We can then reproduce any stage of previous work simply by rerunning the appropriate code.

The organisation of code into separate files can be non-trivial in a large project, but is important to store only one copy of code that performs a particular operation on data (the DRY principle yet again). One approach is to store code that is used in several places in one file and have other files use the source() function to load the common code.

As usual, files containing code should be named with care. The name that we use for a file is a form of documentation and a good name is essential for being able to find code and understand the organisation of files within a directory.

11.2.3 The working directory

Any files created during an R session are created in the current working directory of the session, unless an explicit path or folder or directory is specified. Similarly, when files are read into an R session they are read from the current working directory.

On Linux, the working directory is the directory that the R session was started in. This means that the standard way to work on Linux is to create a directory for a particular project, put any relevant files in that directory, change into that directory, then start an R session.

On Windows, it is typical to start R by double-clicking a shortcut or by selecting from the list of programs in the `Start' menu. This approach will, by default, set the working directory to one of the directories where R was installed. This is a bad place to work, so it is a good idea to set the Start in field on the properties dialog of the short cut or menu item. It may also be necessary to use the setwd() function or the Change dir option on the File menu to explicitly change the working directory to something appropriate.

11.2.4 Finding the exit

One of the most important things to learn when immersing oneself in a new software environment is how to get out. In R, the expression q() quits the session. When we do this, R will ask whether we want to save the “workspace image”, which means the results of any code that we have run during the session. As mentioned already, it is better to keep a record of the R code that we used in a session, rather than a copy of the results of the R code, so it is safe to say “no” to this question. Section 11.3.4 will look at how to answer this question in more detail.

Paul Murrell

This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.