Standard

Comparing the score interpretation across modes in PISA : An investigation of how item facets affect difficulty. / Harrison, Scott; Kroehne, Ulf; Goldhammer, Frank et al.

in: Large-scale Assessments in Education, Jahrgang 11, 8, 11.03.2023.

Publikationen: Beitrag in FachzeitschriftArtikel in FachzeitschriftForschungBegutachtung

Harvard

APA

Vancouver

Harrison S, Kroehne U, Goldhammer F, Lüdtke O, Robitzsch A. Comparing the score interpretation across modes in PISA: An investigation of how item facets affect difficulty. Large-scale Assessments in Education. 2023 Mär 11;11:8. doi: 10.1186/s40536-023-00157-9

Author

Harrison, Scott ; Kroehne, Ulf ; Goldhammer, Frank et al. / Comparing the score interpretation across modes in PISA : An investigation of how item facets affect difficulty. in: Large-scale Assessments in Education. 2023 ; Jahrgang 11.

BibTeX

@article{5148aae117d4445c99498497553c759e,
title = "Comparing the score interpretation across modes in PISA: An investigation of how item facets affect difficulty",
abstract = "Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an identified gap in the current literature is whether mode effects have affected test score interpretation as defined by the assessment framework, and whether the interpretations of the PBA and CBA test scores are comparable. This study uses the 2015 PISA field trial data from thirteen countries to compare test modes through a construct representation approach. It is investigated whether item facets defined by the assessment framework (e.g., different cognitive demands) affect item difficulty comparably across modes using a unidimensional two-group generalized partial credit model (GPCM). Linking the assessment framework to item difficulty using linear regression showed that for both maths and science domains, item categorisation relates to item difficulty, however for the reading domain no such conclusion was possible. In comparing PBA to CBA in representations across the three domains, maths had one facet with a significant difference in representation, reading had all three facets significantly different, and for science, four out of six facets had significant differences. Modelling items labelled “mode invariant” in PISA 2015, the results indicated that in every domain, two facets showed significant differences between the test modes. The graphical inspection of difficulty patterns confirmed that reading shows stronger differences while the patterns of the other domains were quite consistent between modes. The present study shows that the mode effects on difficulty vary within the task facets proposed by the PISA assessment framework, in particular for reading. These findings shed light on whether the comparability of score interpretation between modes is compromised. Given the limitations of the link between the reading domain and item difficulty, any conclusions in this domain are limited. Importantly, the present study adds a new approach and empirical findings to the investigation of the cross-mode equivalence in PISA domains.",
keywords = "Methodological research and machine learning, PISA, Assessment framework, Mode effects, Item facets, Construct representation, Generalized partial credit model",
author = "Scott Harrison and Ulf Kroehne and Frank Goldhammer and Oliver L{\"u}dtke and Alexander Robitzsch",
year = "2023",
month = mar,
day = "11",
doi = "10.1186/s40536-023-00157-9",
language = "English",
volume = "11",
journal = "Large-scale Assessments in Education",
issn = "2196-0739",
publisher = "SpringerOpen",

}

RIS

TY - JOUR

T1 - Comparing the score interpretation across modes in PISA

T2 - An investigation of how item facets affect difficulty

AU - Harrison, Scott

AU - Kroehne, Ulf

AU - Goldhammer, Frank

AU - Lüdtke, Oliver

AU - Robitzsch, Alexander

PY - 2023/3/11

Y1 - 2023/3/11

N2 - Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an identified gap in the current literature is whether mode effects have affected test score interpretation as defined by the assessment framework, and whether the interpretations of the PBA and CBA test scores are comparable. This study uses the 2015 PISA field trial data from thirteen countries to compare test modes through a construct representation approach. It is investigated whether item facets defined by the assessment framework (e.g., different cognitive demands) affect item difficulty comparably across modes using a unidimensional two-group generalized partial credit model (GPCM). Linking the assessment framework to item difficulty using linear regression showed that for both maths and science domains, item categorisation relates to item difficulty, however for the reading domain no such conclusion was possible. In comparing PBA to CBA in representations across the three domains, maths had one facet with a significant difference in representation, reading had all three facets significantly different, and for science, four out of six facets had significant differences. Modelling items labelled “mode invariant” in PISA 2015, the results indicated that in every domain, two facets showed significant differences between the test modes. The graphical inspection of difficulty patterns confirmed that reading shows stronger differences while the patterns of the other domains were quite consistent between modes. The present study shows that the mode effects on difficulty vary within the task facets proposed by the PISA assessment framework, in particular for reading. These findings shed light on whether the comparability of score interpretation between modes is compromised. Given the limitations of the link between the reading domain and item difficulty, any conclusions in this domain are limited. Importantly, the present study adds a new approach and empirical findings to the investigation of the cross-mode equivalence in PISA domains.

AB - Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an identified gap in the current literature is whether mode effects have affected test score interpretation as defined by the assessment framework, and whether the interpretations of the PBA and CBA test scores are comparable. This study uses the 2015 PISA field trial data from thirteen countries to compare test modes through a construct representation approach. It is investigated whether item facets defined by the assessment framework (e.g., different cognitive demands) affect item difficulty comparably across modes using a unidimensional two-group generalized partial credit model (GPCM). Linking the assessment framework to item difficulty using linear regression showed that for both maths and science domains, item categorisation relates to item difficulty, however for the reading domain no such conclusion was possible. In comparing PBA to CBA in representations across the three domains, maths had one facet with a significant difference in representation, reading had all three facets significantly different, and for science, four out of six facets had significant differences. Modelling items labelled “mode invariant” in PISA 2015, the results indicated that in every domain, two facets showed significant differences between the test modes. The graphical inspection of difficulty patterns confirmed that reading shows stronger differences while the patterns of the other domains were quite consistent between modes. The present study shows that the mode effects on difficulty vary within the task facets proposed by the PISA assessment framework, in particular for reading. These findings shed light on whether the comparability of score interpretation between modes is compromised. Given the limitations of the link between the reading domain and item difficulty, any conclusions in this domain are limited. Importantly, the present study adds a new approach and empirical findings to the investigation of the cross-mode equivalence in PISA domains.

KW - Methodological research and machine learning

KW - PISA

KW - Assessment framework

KW - Mode effects

KW - Item facets

KW - Construct representation

KW - Generalized partial credit model

U2 - 10.1186/s40536-023-00157-9

DO - 10.1186/s40536-023-00157-9

M3 - Journal article

VL - 11

JO - Large-scale Assessments in Education

JF - Large-scale Assessments in Education

SN - 2196-0739

M1 - 8

ER -

ID: 9007779