Exercise 5: Nonparametric IRT in R

This exercise requires KernSmoothIRT and mokken and mirt for estimating nonparametric IRT models. We will use the Workplace 1990 Workpladce Industrial Relation Survey. The data are available as data(WIRS,package="ltm").

Read the help on the data file help(WIRS). Is there something that makes you believe that it could be difficult to model?

library(mirt)
library(KernSmoothIRT)
library(mokken)
data(WIRS,package="ltm")
head(WIRS)

##   Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
## 1      0      0      0      0      0      0
## 2      0      0      0      0      0      0
## 3      0      0      0      0      0      0
## 4      0      0      0      0      0      0
## 5      0      0      0      0      0      0
## 6      0      0      0      0      0      0

summary(WIRS)

##      Item 1          Item 2          Item 3          Item 4    
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.00  
##  Median :0.000   Median :1.000   Median :0.000   Median :0.00  
##  Mean   :0.373   Mean   :0.583   Mean   :0.283   Mean   :0.24  
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:0.00  
##  Max.   :1.000   Max.   :1.000   Max.   :1.000   Max.   :1.00  
##      Item 5          Item 6     
##  Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.000   1st Qu.:0.000  
##  Median :0.000   Median :0.000  
##  Mean   :0.357   Mean   :0.147  
##  3rd Qu.:1.000   3rd Qu.:0.000  
##  Max.   :1.000   Max.   :1.000

Fit a 4PL with mirt() to the data, inspect the results with plot (type="trace", type="info" or type="infoSE" and type="infotrace") and summary() and assess the fit. Look at a histogram (hist()) of the latent trait values for the persons. What do you find?

#with mirt
wirs.4pl <- mirt(WIRS,1, itemtype="4PL", verbose=FALSE)

## EM cycles terminated after 500 iterations.

summary(wirs.4pl)

##           F1    h2
## Item 1 0.993 0.985
## Item 2 0.998 0.995
## Item 3 0.756 0.571
## Item 4 0.997 0.993
## Item 5 0.842 0.709
## Item 6 0.794 0.630
## 
## SS loadings:  4.883 
## Proportion Var:  0.814 
## 
## Factor correlations: 
## 
##    F1
## F1  1

plot(wirs.4pl,type="trace")  #bad items 1, 2, 4

plot(wirs.4pl,type="infoSE") #Info looks odd (two peaks)

plot(wirs.4pl,type="infotrace") #Item 3,5,6 have nearly no info; 1,2,3 have spikes

plot(wirs.4pl) #Total Score Info is also odd

ifit4pl <- itemfit(wirs.4pl)
ifit4pl

##     item   S_X2 df.S_X2 p.S_X2
## 1 Item 1 39.414       1  0.000
## 2 Item 2 18.776       1  0.000
## 3 Item 3  8.187       1  0.004
## 4 Item 4 17.628       1  0.000
## 5 Item 5 11.370       1  0.001
## 6 Item 6 21.350       1  0.000

p.adjust(ifit4pl$p.S_X2, 'fdr')  #Fit is super bad for all items

## [1] 2.057e-09 2.940e-05 4.220e-03 4.029e-05 8.955e-04 1.148e-05

hist(fscores(wirs.4pl))

plot(density(fscores(wirs.4pl))) # odd distribution as well

As we can see there is something seriously wrong (and even worse than in the last exercise). From the IRF we can suspect that it might have to do with the assumption of the models. Now use mokken and start investigating. Assess whether the assumption of non-intersecting IRF holds with check.restscore() or check.pmatrix(). Can we rule out that a unidimensional Double Monotonicity Model holds for this scale?

rs<-check.restscore(WIRS)
summary(rs) #5 significant violations for item 1, 2 for 5 and 1 each for 2, 4, 6

##        ItemH #ac #vi #vi/#ac maxvi  sum sum/#ac zmax #zsig crit
## Item 1 -0.06  15   5    0.33  0.59 1.64  0.1091 8.61     5  332
## Item 2  0.02  13   1    0.08  0.06 0.06  0.0043 1.44     0   42
## Item 3  0.25  13   2    0.15  0.44 0.49  0.0376 7.47     1  163
## Item 4  0.19  15   2    0.13  0.34 0.39  0.0257 5.93     1  132
## Item 5  0.26  13   3    0.23  0.59 0.77  0.0591 8.61     2  225
## Item 6  0.28  15   1    0.07  0.14 0.14  0.0092 2.99     1   67

pm<-check.pmatrix(WIRS)
summary(pm) #6 significant violations for item 1, 3 for 5, 2 for Item 3, 1 for 4.

##        ItemH #ac #vi #vi/#ac maxvi  sum sum/#ac zmax #zsig crit
## Item 1 -0.06  20   6    0.30  0.09 0.33  0.0165 5.69     6  164
## Item 2  0.02  20   0    0.00  0.00 0.00  0.0000 0.00     0    0
## Item 3  0.25  20   2    0.10  0.04 0.08  0.0041 3.93     2   69
## Item 4  0.19  20   1    0.05  0.04 0.04  0.0019 2.23     1   45
## Item 5  0.26  20   3    0.15  0.09 0.21  0.0105 5.69     3  103
## Item 6  0.28  20   0    0.00  0.00 0.00  0.0000 0.00     0    0

Assess whether the assumption of monotonicity for the items holds (check.monotonicity()) and whether a Stochastic Ordering of the Latent Trait (Monontone Homogenity Model) holds (with the scalability coefficients coefH()).

mon<-check.monotonicity(WIRS)
summary(mon) #5 significant violations for item 1.

##        ItemH #ac #vi #vi/#ac maxvi  sum sum/#ac zmax #zsig crit
## Item 1 -0.06  10   7     0.7  0.29 1.04  0.1041 5.06     5  331
## Item 2  0.02   6   0     0.0  0.00 0.00  0.0000 0.00     0    0
## Item 3  0.25   6   0     0.0  0.00 0.00  0.0000 0.00     0    0
## Item 4  0.19   6   0     0.0  0.00 0.00  0.0000 0.00     0    0
## Item 5  0.26   6   0     0.0  0.00 0.00  0.0000 0.00     0    0
## Item 6  0.28  10   0     0.0  0.00 0.00  0.0000 0.00     0    0

plot(mon,1) #No

sol<-coefH(WIRS)
sol #negative H_ij, negative H_i and H is very low. We can't rank order persons here. Culprit is item 1.

## $Hij
##        Item 1   se      Item 2   se      Item 3  se      Item 4   se     
## Item 1                   -0.490  (0.052)  0.090  (0.039)  -0.139  (0.042)
## Item 2  -0.490  (0.052)                   0.122  (0.059)   0.393  (0.061)
## Item 3   0.090  (0.039)   0.122  (0.059)                   0.260  (0.039)
## Item 4  -0.139  (0.042)   0.393  (0.061)  0.260  (0.039)                 
## Item 5   0.089  (0.033)   0.178  (0.049)  0.414  (0.039)   0.290  (0.044)
## Item 6   0.159  (0.060)   0.157  (0.087)  0.360  (0.053)   0.218  (0.049)
##        Item 5  se      Item 6  se     
## Item 1  0.089  (0.033)  0.159  (0.060)
## Item 2  0.178  (0.049)  0.157  (0.087)
## Item 3  0.414  (0.039)  0.360  (0.053)
## Item 4  0.290  (0.044)  0.218  (0.049)
## Item 5                  0.474  (0.057)
## Item 6  0.474  (0.057)                
## 
## $Hi
##        Item H  se     
## Item 1  -0.058 (0.023)
## Item 2   0.023 (0.033)
## Item 3   0.249 (0.020)
## Item 4   0.192 (0.023)
## Item 5   0.263 (0.018)
## Item 6   0.283 (0.028)
## 
## $H
##  Scale H se     
##    0.154 (0.015)

In 4) we may have found a reason for the scaling problems. Investigate this further with smoothing models. First, fit a smoothing spline model with mirt(). Investigate the model with the IRF plot. What do you find out about Item 1? Also check fit, total score plot and distribution of the latent trait. Is this satisfactory? What do you say about the IRF of Items 2, 4, 5?

splinem<-mirt(WIRS,1,itemtype="spline", verbose=FALSE)

## EM cycles terminated after 500 iterations.

plot(splinem,type="trace")  #

itemplot(splinem,1) #Ideal Point IRF

plot(splinem) #Total Score Info is still odd

ifitspline <- itemfit(splinem)
ifitspline  #fit is better but not good

##     item   S_X2 df.S_X2 p.S_X2
## 1 Item 1  3.963       1  0.047
## 2 Item 2 30.471       1  0.000
## 3 Item 3  2.475       1  0.116
## 4 Item 4  6.049       1  0.014
## 5 Item 5  5.507       1  0.019
## 6 Item 6  4.254       1  0.039

p.adjust(ifitspline$p.S_X2, 'fdr')  #Fit is super bad for all items

## [1] 5.580e-02 2.033e-07 1.157e-01 3.788e-02 3.788e-02 5.580e-02

hist(fscores(splinem))

plot(density(fscores(splinem)))

itemplot(splinem,2) #odd, that looks like a spline gone wrong

itemplot(splinem,4) #odd, that looks like a spline gone wrong

itemplot(splinem,5) #odd, that looks like a spline gone wrong

The spline model gave us some indication of what happens but the fit is not yet satisfactory. There also seems to be something going on with the spline model for Items 2, 4, 5. It looks like the spline function is off (overfitting perhaps). We now use Kernel smoothing, which might bring the IRF out better. We fit a Kernel smoothing model with ksIRT(). Investigate the IRF with the plot method (plottype=c("OCC","EIS")). Check the latent score distribution with plottype="density". Check the expected vs. the true observed score (plottype="expected"). Check the estimation standard error (plottype="sd"). What do you find with respect to Items 1 and 2, 4, 5?

ksm <- ksIRT(responses=WIRS, key=rep(1,ncol(WIRS)),format=2)
ksm

##   Item Correlation
## 1    1      0.2851
## 2    2      0.3829
## 3    3      0.6386
## 4    4      0.5419
## 5    5      0.6769
## 6    6      0.5340

plot(ksm,plottype="OCC")

plot(ksm,plottype="EIS")

plot(ksm,plottype="EIS",items=1) #non monotonic, ideal point style in the middle but not clear; item 1 is super odd

plot(ksm,plottype="EIS",items=c(2,4,5)) #the IRF is less extreme than the spline, for 4 and 5 it seems ok. For item 2 there may be one indication for issues in the lower score range which the spline clearly overfits; also what mokken says

plot(ksm,plottype="density")

plot(ksm,plottype="expected")

plot(ksm,plottype="sd")

BONUS: Investigate the relative credibility for the first person from each raw score group (plottype="RCC"). This is essentially giving the posterior distribution of the scores a subject should have had (in black) while in red is the real score. How good is the credibility for scores 3, 4, 5? (Hint: The people are ordered based on raw score).

p0<- which(rowSums(WIRS)==0)[1] 
p1<- which(rowSums(WIRS)==1)[1]
p2<- which(rowSums(WIRS)==2)[1]
p3<- which(rowSums(WIRS)==3)[1]
p4<- which(rowSums(WIRS)==4)[1]
p5<- which(rowSums(WIRS)==5)[1]
p6<- which(rowSums(WIRS)==6)[1]

plot(ksm,plottype="RCC",subjects=c(p0,p1,p2,p3,p4,p5,p6))

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exercise 5: Nonparametric IRT in R

Thomas Rusch

July 17, 2017