Subsections

11.3 Basic Expressions

The simplest sort of expression that we can enter is just a constant value--a piece of text (a string) or a number. For example, if we need to specify the name of a file that we want to read data from, we specify the name as a string.

> "http://www.census.gov/ipc/www/popclockworld.html"

[1] "http://www.census.gov/ipc/www/popclockworld.html"

If we need to specify a number of seconds corresponding to 10 minutes, we specify a number.

> 600

[1] 600

11.3.1 Arithmetic

Any general-purpose language has facilities for standard arithmetic. For example, the following code shows the arithmeric calculation that was performed in Section 11.1 to obtain the rate of growth of the world's population--the change in population divided by the elapsed time (note the use of the forward-slash character, /, to represent division).

> (6617747987 - 6617746521) / 10

[1] 146.6


11.3.2 Function calls

Most of the useful features of R are available via functions. In order to use a function, or make a function call, it is necessary to know the function name and what arguments the function takes. Arguments are the information we give to the function. The information that the function gives us back is called the function's return value.

For example, Sys.sleep() is a function that simply waits for a number of seconds. This function has one argument, called time, which is used to supply the number of seconds to wait. When calling an R function, the name of the argument may be specified using the argument name, or just using its position within the list of arguments. For example, the following two expressions are identical:

Sys.sleep(600)

Sys.sleep(time=600)

The readLines() function, which is used to read information from text files into R, has four arguments. The first argument is called con11.5 and this is used to specify the name of the text file. The second argument is called n and this specifies how many lines of text to read from the file. This argument has a default value of -1, which means that the entire file is read. Because the argument has a default value, we do not have to specify a value for this argument when we call readLines(). For example, the following two calls are identical:

readLines("http://www.census.gov/ipc/www/popclockworld.html")

readLines("http://www.census.gov/ipc/www/popclockworld.html",
          n=-1)

The remaining arguments to readLines() control how the function reacts to unexpected features in the text file and we will not worry any more about them here. What is worth mentioning is the fact that readLines() is a function with a return value. The result of a call to the readLines() function is a string for each line in the text file, as shown below (the output has been truncated):

> readLines("http://www.census.gov/ipc/www/popclockworld.html")

[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" "
[2] "    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">" 
[3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" "                    
[4] "      xml:lang=\"en\" lang=\"en\">"                               
[5] "<head>"                                                           
[6] "    <title>World POPClock Projection</title>"

Another function that returns a value is the paste() function. This function is used to combine strings together and its return value is a string. This function also has an optional argument called sep, which is used to supply another string that gets placed between the strings that are being combined. For example, the following code combines a name with a date, using the sep argument to make sure that there is nothing in between the name and the date in the result:

> paste("popGrowthEstimates", "2007-09-12",
         sep="")

[1] "popGrowthEstimates2007-09-12"

11.3.3 Symbols and assignment

Code written in a general-purpose language can be compared to a cooking recipe; there are a series of steps involved and the results of the initial steps are used later on. For example, eggs and water may be mixed together in one bowl and flour and salt mixed in another bowl, then the two sets of ingredients may be combined together.

This idea of storing intermediate results is an important feature of general-purpose languages. In R, we speak of assigning values to symbols.

Anything that we type that starts with a letter, which is not one of the special R keywords, is a symbol. A symbol represents a container where a value can be stored. When R encounters a symbol it returns the value that has been stored. For example, there is a predefined symbol called pi; the value stored in pi is the mathematical constant $\pi$.

> pi

[1] 3.141593

The result of any expression can be assigned to a symbol, which means that the result is stored and remembered for use later on.

For example, when we read the contents of a text file into R using the readLines() function, we usually want to store the contents so that we can work with the information later on. This is accomplished by assigning the result of the function to a symbol, like in the following code.

> clockHTML <- 
    readLines("http://www.census.gov/ipc/www/popclockworld.html")

We say that clockHTML is assigned the value returned by the readLines() function.11.6

Whenever we use the symbol clockHTML, R retrieves the value that we assigned to it, namely the strings containing the contents of the text file.

> clockHMTL

[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" "
[2] "    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">" 
[3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" "                    
[4] "      xml:lang=\"en\" lang=\"en\">"                               
[5] "<head>"                                                           
[6] "    <title>World POPClock Projection</title>"

The ls() function displays the names of all symbols that have been created in the current session.

> ls()

 [1] "cathead"           "clockHTML"        
 [3] "histpop"           "histpopulation"   
 [5] "missingupper"      "pop"              
 [7] "popCols"           "popEltPattern"    
 [9] "popLine"           "popRange"         
[11] "popString"         "subset"           
[13] "worldpopEstimates" "worldpopHTML"     
[15] "worldpopMillions"  "worldpopStrings"  
[17] "yearRange"


11.3.4 Persistent storage

The concept of storing values for later use can be extended to the situation where we want to remember results not just within the same session, but for use in another session on another day. This topic is more generally discussed in Section 11.6; this section just mentions the particular option of saving the R workspace.

Every time we exit an R session, we are offered the option of saving the current workspace. The workspace consists of the value of all R symbols that we have created during the session.

The workspace is saved as a file called .Rdata. When R starts up, it checks for such a file in the current working directory and loads it automatically.

Saving the R workspace is not the recommended approach. We have already discussed why it is better to save the original data set and R code, rather than saving intermediate calculations. In addition, the workspace, or any object stored using save() produces a binary file, with all of the associated disadvantages (see Section 7.7). In particular, if a workspace is corrupted for some reason, it may be impossible to recover the lost information.


11.3.5 Naming variables

When writing scripts, because we are constantly assigning intermediate values to symbols, we are forced to come up with lots of different symbol names. It is important that we choose sensible symbol names for several reasons:

  1. Good symbol names are a form of documentation in themselves. A name like dateOfBirth tells the reader a lot more about what value has been assigned to the symbol than a name like d, or dob, or date.
  2. Short or convenient symbol names, such as x, or xx, or xxx should be avoided because it too easy to create conflict by reusing them in our own code or by having other code reuse them.

Anyone with children will know how difficult it can be to come up with even one good name, let alone a constant supply, but fortunately there are several good guidelines for producing sensible variable names:

Paul Murrell

Creative Commons License
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.