There are two major disadvantages to working with data using R: R is an interpreted language (as opposed to compiled languages such as C), which means it can be relatively slow; and R holds all data in memory, so it cannot perform tasks on very large data sets.
The system() function can be used to run other programs from R.
The data for the 2006 JSM Data Expo (Section 7.5.6) were obtained from NASA's Live Access Server (see Section 1.1).
There were 505 files to download so, rather than use the web interface, the data were downloaded using a command-line interface to the Live Access Server. An example of a command used to download a file is shown below and the resulting file is shown in Figure 11.16.
lasget.pl -x -115:-55 -y -22:37 -t 1995-Jan-16 \ -o surftemp.txt -f txt \ http://mynasadata.larc.nasa.gov/las-bin/LASserver.pl \ ISCCPMonthly_avg_nc ts
VARIABLE : Mean TS from clear sky composite (kelvin) FILENAME : ISCCPMonthly_avg.nc FILEPATH : /usr/local/fer_dsets/data/ SUBSET : 24 by 24 points (LONGITUDE-LATITUDE) TIME : 16-JAN-1995 00:00 113.8W 111.2W 108.8W 106.2W 103.8W 101.2W 98.8W ... 27 28 29 30 31 32 33 ... 36.2N / 51: 272.7 270.9 270.9 269.7 273.2 275.6 277.3 ... 33.8N / 50: 279.5 279.5 275.0 275.6 277.3 279.5 281.6 ... 31.2N / 49: 284.7 284.7 281.6 281.6 280.5 282.2 284.7 ... 28.8N / 48: 289.3 286.8 286.8 283.7 284.2 286.8 287.8 ... 26.2N / 47: 292.2 293.2 287.8 287.8 285.8 288.8 291.7 ... 23.8N / 46: 294.1 295.0 296.5 286.8 286.8 285.2 289.8 ... ...
|
The data were downloaded with one file per month of observations, which made for 504 files in total, so it was most efficient to write a script to perform the downloads within two loops. The basic algorithm is this:
The actual download can be performed from within R using the system() function. For example, the one-off download shown above (to produce the file shown in Figure 7.6) can be performed from R with the following code.
> system("lasget.pl -x -115:-55 -y -22:37 -t 1995-Jan-16 \ -o surftemp.txt -f txt \ http://mynasadata.larc.nasa.gov/las-bin/LASserver.pl \ ISCCPMonthly_avg_nc ts")More generally, we could write a function to perform the download for a given variable and date and store the output in a file called filename.
> lasget <- function(variable, date, filename) { command <- paste( "lasget.pl -x -115:-55 -y -22:37 -t ", date, " -o ", filename, " -f txt ", "http://mynasadata.larc.nasa.gov/las-bin/LASserver.pl ", "ISCCPMonthly_avg_nc ", variable, sep="") system(command) }Now it is a simple matter to add a loop over the variables we want to download and a loop over the months that we want to download.
> variables <- list(c("ts", "surftemp"), c("tsa_tovs", "temperature"), c("ps_tovs", "pressure"), c("o3_tovs", "ozone"), c("ca_low", "cloudlow"), c("ca_mid", "cloudmid"), c("ca_high", "cloudhigh")) > dates <- seq(as.Date("1995/1/16"), by="month", length.out=72) > for (variable in variables) { for (date in as.character(dates)) { lasget(variable[1], date, file.path("lasfiles", variable[2])) } }I have chosen to enter the variables and filenames in a list because this makes a strong connection between related variables and filenames and makes maintaining the lists of variable names and file names more convenient and accurate. This means that, for example, it is very unlikely that I could accidentally associate the wrong filename with a variable and it is very unlikely that I could accidentally remove one of the variables without also removing the corresponding filename.
It is also worth mentioning that the download is creating files in a separate directory, rather than cluttering up the current directory. This keeps things orderly and makes it easy to clean up if things go haywire. The final file name is generated using file.path() to make sure that the code will run on any operating system.
The curious reader may be wondering about the double for loop in the above code. Like all of the other examples, we can do this task without loops, although we have to rearrange the data a little in order to do so.
First of all, we need to convert the variables list into a matrix. This will allow us to address the information by column.
> variableMatrix <- matrix(unlist(variables), byrow=TRUE, ncol=2) > variableMatrix
[,1] [,2] [1,] "ts" "surftemp" [2,] "tsa_tovs" "temperature" [3,] "ps_tovs" "pressure" [4,] "o3_tovs" "ozone" [5,] "ca_low" "cloudlow" [6,] "ca_mid" "cloudmid" [7,] "ca_high" "cloudhigh"
Next, we need to produce all possible combinations of variables and dates.
> datesAndVariables <- expand.grid(variable=variableMatrix[, 1], month=dates) > head(datesAndVariables, n=10)
variable month 1 ts 1995-01-16 2 tsa_tovs 1995-01-16 3 ps_tovs 1995-01-16 4 o3_tovs 1995-01-16 5 ca_low 1995-01-16 6 ca_mid 1995-01-16 7 ca_high 1995-01-16 8 ts 1995-02-16 9 tsa_tovs 1995-02-16 10 ps_tovs 1995-02-16
The full variable information needs to be merged back together.
> allCombinations <- merge(datesAndVariables, variableMatrix, by.x="variable", by.y=1) > head(allCombinations[order(allCombinations$month), ], n=10)
variable month V2 59 ca_high 1995-01-16 cloudhigh 74 ca_low 1995-01-16 cloudlow 153 ca_mid 1995-01-16 cloudmid 246 o3_tovs 1995-01-16 ozone 293 ps_tovs 1995-01-16 pressure 361 ts 1995-01-16 surftemp 483 tsa_tovs 1995-01-16 temperature 66 ca_high 1995-02-16 cloudhigh 73 ca_low 1995-02-16 cloudlow 160 ca_mid 1995-02-16 cloudmid
Now we can use the mapply() function to call our lasget() function on each of these combinations:
> mapply(lasget, allCombinations[, 1], allCombinations[, 2], file.path("lasfiles", allCombinations[, 3]))
Another way to solve the problem makes use of the outer() function. To do this, we need to write a function that takes an integer, representing the index of the variable that we want to download, and a date.
> lasgeti <- function(i, date, variables) { lasget(variables[[i]][1], date, file.path("lasfiles", variables[[i]][2])) }Now we can call this function for all combinations of i and dates in a call to outer().
> outer(1:7, dates, lasgeti, variables)
Paul Murrell
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.