This exercise lets you try a couple of data manipulation functions with R.
x<-15
y<- c(1,2,3,10,100)
z<-x*y
z
## [1] 15 30 45 150 1500
sum(z)
## [1] 1740
x<-seq(0,1,length.out=5)
x
## [1] 0.00 0.25 0.50 0.75 1.00
k<-x*y
k
## [1] 0.0 0.5 1.5 7.5 100.0
all.equal(length(k),length(z))
## [1] TRUE
X<-cbind(z,k)
X
## z k
## [1,] 15 0.0
## [2,] 30 0.5
## [3,] 45 1.5
## [4,] 150 7.5
## [5,] 1500 100.0
which(), check which elements in are larger than 24? What do these numbers refer to - values or indices? Extract the values by subsetting with the which() function and store them in a new variable .x1<-X[,1]
x1[3]<-10
which(x1>24)
## [1] 2 4 5
a<-x1[which(x1>24)]
a
## [1] 30 150 1500
Ymat[,2]? Hint: Look at the last row. Remove the second element of and store that as z1 and set up Ymat1 as you did Ymat (i.e., combine z1 and k to columns 1 and 2 of Ymat1). How did Ymat1 change as compared to Ymat? Now set up a list with the first column being and the second column being . Explain why this works differently than for Ymat. Now set up a data frame Ydf with columns z and k. Why does this not work? Now set up Ydf with columns z1,k. How is Ydf different form Ymatz<-c(z,"a") #vector coerced to character
z
## [1] "15" "30" "45" "150" "1500" "a"
z[4]<-"b"
Ymat<-cbind(z,k) #z has 6 elements, k has 5
## Warning in cbind(z, k): number of rows of result is not a multiple of
## vector length (arg 2)
Ymat #k gets recycled (recycling rule) so Ymat[6,2]=k[1]
## z k
## [1,] "15" "0"
## [2,] "30" "0.5"
## [3,] "45" "1.5"
## [4,] "b" "7.5"
## [5,] "1500" "100"
## [6,] "a" "0"
z1<-z[-2]
Ymat1<-cbind(z1,k) #now z has 5 elements, k has 5
Ymat1
## z1 k
## [1,] "15" "0"
## [2,] "45" "0.5"
## [3,] "b" "1.5"
## [4,] "1500" "7.5"
## [5,] "a" "100"
Yl<-list(z,k) # is a list, elements can have different length
Ydf<-data.frame(z,k) #z has 6 elements, k has 5; leads to error in data frame, no recycling
Ydf<-data.frame(z1,k)
Ymat
## z k
## [1,] "15" "0"
## [2,] "30" "0.5"
## [3,] "45" "1.5"
## [4,] "b" "7.5"
## [5,] "1500" "100"
## [6,] "a" "0"
class(Ymat[,1]) #Ymat[,1] is character
## [1] "character"
class(Ymat[,2]) #Ymat[,2] is also character
## [1] "character"
Ydf
## z1 k
## 1 15 0.0
## 2 45 0.5
## 3 b 1.5
## 4 1500 7.5
## 5 a 100.0
class(Ydf[,1]) #Ydf[,1] is factor (data.frame automatically converts characters to factors )
## [1] "factor"
class(Ydf[,2]) #Ydf[,2] is numeric like k
## [1] "numeric"
dim() (or a similar function) to find out the number of columns and the number of rows in . Extract from the values in the first column, second row and fourth row, second column. Now extract the submatrix from the second row, first column to the next to last row, second column. Do this also for .dim(Ymat)[2] #columns
## [1] 2
dim(Ydf)[1] #rows
## [1] 5
Ymat[c(2,4),c(1,2)]
## z k
## [1,] "30" "0.5"
## [2,] "b" "7.5"
Ymat[2:(nrow(Ymat)-1),c(1,2)]
## z k
## [1,] "30" "0.5"
## [2,] "45" "1.5"
## [3,] "b" "7.5"
## [4,] "1500" "100"
Ydf[c(2,4),c(1,2)]
## z1 k
## 2 45 0.5
## 4 1500 7.5
Ydf[2:(nrow(Ymat)-1),c(1,2)]
## z1 k
## 2 45 0.5
## 3 b 1.5
## 4 1500 7.5
## 5 a 100.0
bspdat.csv in an editor. What is the structure of the data (look particularly at the separation and decimal characters)? Read the dataset into R using the read.csv() or read.table() functions and save it as an object called . Investigate the structure of the object with and .setwd("pathTo/bspdat.csv")
bspdat<-read.table("bspdat.csv",header=TRUE,sep=";",dec=",")
head(bspdat)
## age size weight female
## 1 29 173.8 73 0
## 2 21 179.7 69 0
## 3 20 162.5 60 1
## 4 29 174.6 69 1
## 5 25 163.3 55 1
## 6 30 181.4 72 0
str(bspdat)
## 'data.frame': 100 obs. of 4 variables:
## $ age : int 29 21 20 29 25 30 27 29 20 25 ...
## $ size : num 174 180 162 175 163 ...
## $ weight: int 73 69 60 69 55 72 67 73 69 59 ...
## $ female: int 0 0 1 1 1 0 1 0 1 0 ...
bspdat.rda with save(). Add a new variable to the data frame which should be called . It should be a factor with levels “male” for 0 and “female” for 1. Load bspdat.rda again with load(). Does the object still contain the factor ?save(bspdat,file="bspdat.rda")
gender<-factor(bspdat$female,labels=c("male","female"))
bspdat$gender<-gender
load(file="bspdat.rda")
bspdat$gender
## NULL
Exercise_01.Rdata file. The object named datex1 represents an unscored multiple choice test. Inspect the file with str(), head() and summary(). Do you find missing values? If so, replace them with 0. Is there something odd with the column names? Name the last column “gender”.load('Exercise_01.Rdata')
head(datex1)
## Item_1 Item_2 Item_3 Item_4 Item_5 Item_6 Item_7 Item_8 Item_9
## [1,] 4 3 1 3 3 2 5 3 2
## [2,] 0 3 0 3 0 0 0 0 4
## [3,] 0 0 2 0 0 2 2 0 2
## [4,] 4 3 4 3 3 4 5 5 2
## [5,] 0 3 0 0 0 2 3 2 2
## [6,] 3 0 2 0 0 5 0 4 3
## Item_10 Item_11 Item_12 Item_13 Item_14 Item_15 Item_16 Item_17
## [1,] 3 2 3 2 0 3 0 0
## [2,] 4 0 1 4 0 0 4 1
## [3,] 0 0 4 0 0 4 2 0
## [4,] 1 2 0 3 3 4 4 3
## [5,] 2 0 0 0 1 0 0 2
## [6,] 0 0 0 0 0 0 0 2
## Item_18 Item_19 Item_20 Item_21 Item_22 Item_23 Item_24 Item_25
## [1,] 3 4 2 0 1 1 3 0
## [2,] 0 0 0 2 0 2 3 4
## [3,] 0 0 0 1 1 0 0 0
## [4,] 1 4 2 4 1 4 3 4
## [5,] 0 2 4 3 0 0 1 0
## [6,] 0 0 4 3 0 3 2 2
## Item_26 Item_27 Item_28 Item_29 Item_30 Item_31 Item_32 Item_33
## [1,] 1 4 3 3 1 2 4 4
## [2,] 3 0 2 3 1 4 3 0
## [3,] 0 0 0 0 0 0 2 0
## [4,] 2 3 3 2 2 1 3 1
## [5,] 0 3 0 4 0 0 2 4
## [6,] 3 1 0 0 0 0 0 0
## Item_34 Item_35 Item_36 Item_37 Item_38 Item_39 Item_40 Item_41
## [1,] 4 2 0 4 1 3 4 2
## [2,] 0 0 0 4 4 4 2 0
## [3,] 0 0 0 0 0 0 0 0
## [4,] 1 2 3 4 0 1 4 5
## [5,] 0 0 0 0 0 4 4 0
## [6,] 0 0 5 1 0 0 3 0
## Item_42 Item_43 Item_44 Item_45 Item_46 Item_47 Item_48 Item_49
## [1,] 3 3 2 2 4 5 2 0
## [2,] 1 0 3 0 3 0 2 4
## [3,] 0 0 0 2 1 4 0 0
## [4,] 2 3 2 5 4 2 0 1
## [5,] 1 0 0 0 1 2 0 0
## [6,] 3 0 0 0 0 3 4 0
## Item_50
## [1,] 0 0
## [2,] 0 0
## [3,] 0 0
## [4,] 4 1
## [5,] 0 0
## [6,] 0 0
str(datex1)
## num [1:2500, 1:51] 4 0 0 4 0 3 4 4 1 2 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:51] "Item_1" "Item_2" "Item_3" "Item_4" ...
summary(datex1)
## Item_1 Item_2 Item_3 Item_4
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.00
## Median :1.000 Median :3.000 Median :0.0000 Median :3.00
## Mean :1.576 Mean :1.681 Mean :0.9152 Mean :1.58
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:2.0000 3rd Qu.:3.00
## Max. :4.000 Max. :5.000 Max. :4.0000 Max. :5.00
## NA's :1
## Item_5 Item_6 Item_7 Item_8
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :2.000 Median :0.000 Median :3.000
## Mean :1.547 Mean :1.822 Mean :1.533 Mean :2.365
## 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :1
## Item_9 Item_10 Item_11 Item_12
## Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.00 Median :0.000 Median :1.000
## Mean :1.274 Mean :1.29 Mean :1.606 Mean :1.253
## 3rd Qu.:2.000 3rd Qu.:3.00 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.00 Max. :5.000 Max. :4.000
## NA's :1 NA's :1
## Item_13 Item_14 Item_15 Item_16
## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.0000 Median :0.000 Median :2.000
## Mean :1.327 Mean :0.9856 Mean :1.542 Mean :1.873
## 3rd Qu.:3.000 3rd Qu.:2.0000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.0000 Max. :5.000 Max. :5.000
## NA's :2
## Item_17 Item_18 Item_19 Item_20
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.000 Median :1.000 Median :0.000
## Mean :1.188 Mean :1.118 Mean :1.621 Mean :1.033
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## NA's :1
## Item_21 Item_22 Item_23 Item_24
## Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.0000 Median :1.0000 Median :1.000 Median :1.000
## Mean :0.8604 Mean :0.5752 Mean :1.363 Mean :1.636
## 3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.0000 Max. :5.0000 Max. :4.000 Max. :4.000
## NA's :2
## Item_25 Item_26 Item_27 Item_28
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.000 Median :1.000 Median :1.000
## Mean :1.288 Mean :1.114 Mean :1.502 Mean :1.282
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
##
## Item_29 Item_30 Item_31 Item_32
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :1.000 Median :1.000 Median :0.000
## Mean :1.378 Mean :1.226 Mean :1.259 Mean :1.172
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
##
## Item_33 Item_34 Item_35 Item_36
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.000 Median :0.0000 Median :0.0000 Median :2.000
## Mean :1.135 Mean :0.9664 Mean :0.9804 Mean :1.918
## 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:4.000
## Max. :4.000 Max. :4.0000 Max. :4.0000 Max. :5.000
##
## Item_37 Item_38 Item_39 Item_40
## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.0000 Median :1.000 Median :2.000
## Mean :1.402 Mean :0.7655 Mean :1.493 Mean :1.821
## 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :4.000 Max. :4.0000 Max. :4.000 Max. :5.000
## NA's :1
## Item_41 Item_42 Item_43 Item_44
## Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.00 Median :1.000 Median :1.000 Median :0.000
## Mean :1.13 Mean :1.247 Mean :1.481 Mean :1.209
## 3rd Qu.:3.00 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :5.00 Max. :4.000 Max. :4.000 Max. :4.000
## NA's :1
## Item_45 Item_46 Item_47 Item_48
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.177 Mean :1.564 Mean :1.726 Mean :1.472
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :4.000 Max. :5.000 Max. :4.000
## NA's :1
## Item_49 Item_50 V51
## Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.000 Median :0.0000 Median :1.000
## Mean :1.394 Mean :0.9548 Mean :0.526
## 3rd Qu.:3.000 3rd Qu.:2.0000 3rd Qu.:1.000
## Max. :4.000 Max. :4.0000 Max. :1.000
## NA's :1
#Missings in Col 3,6,11,12,16,20,23,38,41,46,50
datex1[which(is.na(datex1),arr.ind = TRUE)]<-0
#or with a loop
#for(i in 1:dim(datex1)[2])
#{
# datex1[which(is.na(datex1[,i])),i]<-0
#}
colnames(datex1)[51]<-"gender"
summary(datex1)
## Item_1 Item_2 Item_3 Item_4
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.00
## Median :1.000 Median :3.000 Median :0.0000 Median :3.00
## Mean :1.576 Mean :1.681 Mean :0.9148 Mean :1.58
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:2.0000 3rd Qu.:3.00
## Max. :4.000 Max. :5.000 Max. :4.0000 Max. :5.00
## Item_5 Item_6 Item_7 Item_8
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :2.000 Median :0.000 Median :3.000
## Mean :1.547 Mean :1.822 Mean :1.533 Mean :2.365
## 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## Item_9 Item_10 Item_11 Item_12
## Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.00 Median :0.000 Median :1.000
## Mean :1.274 Mean :1.29 Mean :1.605 Mean :1.252
## 3rd Qu.:2.000 3rd Qu.:3.00 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.00 Max. :5.000 Max. :4.000
## Item_13 Item_14 Item_15 Item_16
## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.0000 Median :0.000 Median :2.000
## Mean :1.327 Mean :0.9856 Mean :1.542 Mean :1.871
## 3rd Qu.:3.000 3rd Qu.:2.0000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.0000 Max. :5.000 Max. :5.000
## Item_17 Item_18 Item_19 Item_20
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.000 Median :1.000 Median :0.000
## Mean :1.188 Mean :1.118 Mean :1.621 Mean :1.033
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## Item_21 Item_22 Item_23 Item_24
## Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.0000 Median :1.0000 Median :1.000 Median :1.000
## Mean :0.8604 Mean :0.5752 Mean :1.362 Mean :1.636
## 3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.0000 Max. :5.0000 Max. :4.000 Max. :4.000
## Item_25 Item_26 Item_27 Item_28
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.000 Median :1.000 Median :1.000
## Mean :1.288 Mean :1.114 Mean :1.502 Mean :1.282
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## Item_29 Item_30 Item_31 Item_32
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :1.000 Median :1.000 Median :0.000
## Mean :1.378 Mean :1.226 Mean :1.259 Mean :1.172
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## Item_33 Item_34 Item_35 Item_36
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.000 Median :0.0000 Median :0.0000 Median :2.000
## Mean :1.135 Mean :0.9664 Mean :0.9804 Mean :1.918
## 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:4.000
## Max. :4.000 Max. :4.0000 Max. :4.0000 Max. :5.000
## Item_37 Item_38 Item_39 Item_40
## Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.0000 Median :1.000 Median :2.000
## Mean :1.402 Mean :0.7652 Mean :1.493 Mean :1.821
## 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :4.000 Max. :4.0000 Max. :4.000 Max. :5.000
## Item_41 Item_42 Item_43 Item_44
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :1.000 Median :1.000 Median :0.000
## Mean :1.129 Mean :1.247 Mean :1.481 Mean :1.209
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :5.000 Max. :4.000 Max. :4.000 Max. :4.000
## Item_45 Item_46 Item_47 Item_48
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :2.000 Median :1.000 Median :2.000 Median :1.000
## Mean :2.177 Mean :1.563 Mean :1.726 Mean :1.472
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :5.000 Max. :4.000 Max. :5.000 Max. :4.000
## Item_49 Item_50 gender
## Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.000 Median :0.0000 Median :1.000
## Mean :1.394 Mean :0.9544 Mean :0.526
## 3rd Qu.:3.000 3rd Qu.:2.0000 3rd Qu.:1.000
## Max. :4.000 Max. :4.0000 Max. :1.000
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.