Abstract
This study compared two procedures for the vertical scaling in the Item Response Theory (IRT) context: fixed estimation, and simultaneous estimation. The results favored the simultaneous estimation procedure to the fixed estimation procedure, especially when there were few anchor items. However, the results also revealed that using expected a posteriori estimates (EAP) of ability scores in 3-parameter IRT model may have a deteriorating effect on the vertically scaled test results through the simultaneous estimation procedure. Overall, the results of this empirical study showed that in the large scale tests which aim to monitor the development across grade levels, the simultaneous estimation procedure with the 2-parameter or the 3-parameter IRT models would be a reasonable choice.
Keywords
Vertical scaling, item response theory, Bilog-MG, Simultaneous estimation