Subsections


11.4 Control flow

Computer code in a general-purpose language consists of several expressions that are run in the order that they appear.

For example, in the following code, the first expression determines which line in an HTML file contains the special text id="worldnumber". The second expression takes that line and removes unwanted text from the line to end up with a string that contains the current population of the world.

popLine <- grep('id="worldnumber"', clockHTML)
popString <- gsub('^.+id="worldnumber">', "",
                  gsub("</div>.*", "",
                       clockHTML[popLine]))

The second expression can make use of the result in the first expression because that result has been assigned to a symbol. The second expression uses the symbol popLine to access the result from the first expression.

11.4.1 Loops

General-purpose languages include features that allow some exceptions from the usual rule that expressions are run one at a time, from first to last. One such exception is the loop. This allows a collection of expressions to be run repeatedly.

In the example in Section 11.1, we performed the same task, calculating the rate of population growth, 10 times. This was achieved, not by typing out the relevant code 10 times, but by using a loop.

for (i in 1:10) {
    pop1 <- checkTheClock()
    Sys.sleep(600)
    pop2 <- checkTheClock()
    rateEstimates[i] <- (pop2 - pop1) / 10
}

The line for (i in 1:10) { specifies that this loop will run 10 times; the expression 1:10 is a shorthand way of expressing the integer values 1 to 10.

The expressions within the braces are run each time through the loop. In this case, almost exactly the same actions are taken each time the loop is run; the one thing that does change is that the symbol i is assigned a different value. The first time the loop runs, i is assigned the value 1. The second time through the loop, i has the value 2, and so on. In this example, the changing value of i is just used to make sure that each estimate of the population growth rate is stored in a different place (rather than overwriting each other).

In R, there is less need for loops compared to other scripting languages, because R naturally deals with entire vectors, or matrices of data at once (see Section 11.5).

R also provides a while loop, which can be used when it is not known how many times the code will repeat (e.g., an iterative optimisation algorithm). See Section 12.2.1 for more information.

11.4.2 Flashback: Layout of R code

Chapter 2 introduced general principles for writing computer code. In this section, we will look at some specific issues related to writing scripts in R.

The same principles of commenting code and laying out code so that it is easy for a human audience to consume still apply. In R, a comment is anything on a line after the special hash character, #. For example, the comment in the following line of code is useful as a reminder of why the number 600 has been chosen.

Sys.sleep(600) # Wait 10 minutes

Indenting is again very important. We need to consider indenting when an expression is too long and has been broken across several lines of code. The example below shows a standard approach that ensures that arguments to a function call are left-aligned.

popString <- gsub('^.+id="worldnumber">', "",
                  gsub("</div>.*", "",
                       clockHTML[popLine]))

When using loops, the expressions within the body of the loop should be indented, as below.

for (i in 1:10) {
    pop1 <- checkTheClock()
    Sys.sleep(600)
    pop2 <- checkTheClock()
    rateEstimates[i] <- (pop2 - pop1) / 10
}

It is also important to make use of whitespace. Examples in the code above include the use of spaces around the assignment operater (<-), around arithmetic operators, and between arguments (after the comma) in function calls.

Paul Murrell

Creative Commons License
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.