One of the primary goals of international large-scale assessments (ILSAs) in education is the comparison of country means in student achievement. The present article introduces a framework for discussing differential item functioning (DIF) for country comparisons in ILSAs. Three different linking methods are compared: concurrent calibration based on full invariance, concurrent calibration based on partial invariance using the MD or RMSD statistics, and separate calibration with subsequent nonrobust and robust linking approaches. Furthermore, we show analytically the bias in country means of different linking methods in the presence of DIF. In a simulation study, we show that partial invariance and robust linking approaches provide less biased country mean estimates than the full invariance approach in the case of biased items. Some guidelines are derived for the selection of cutoff values for the MD and RMSD statistics in the partial invariance approach.
Original languageEnglish
JournalPsychological Test and Assessment Modeling
Volume62
Issue number2
Pages (from-to)233-279
Number of pages47
ISSN1614-9947
Publication statusPublished - 06.2020

    Research areas

  • international large-scale assessments, linking, differential item functioning, multiple groups, RMSD statistic

ID: 1400685