Designing assessments in classroom contexts or having them generated automatically requires - among other things - knowledge about the difficulty of what is assessed. Estimates of difficulty can be derived empirically, usually by piloting items, or theoretically from models. Empirical results, in turn, can inform theory and refine models. In this article, we compare four methods of estimating the item difficulty for a typical topic of introductory programming courses: control flow. For a given set of items that have been tested empirically, we also collected expert ratings and additionally applied measures of code complexity both from software engineering and from computer science education research The results show that there is some overlap between empirical results and theoretical predictions. However, for the simple item format that we have been using, the models all fall short in offering enough explanatory power regarding the observed variance in difficulty. Empirical difficulty in turn can serve as the basis for rules that can be used for item generation in the future.
Original languageEnglish
Title of host publicationKoli Calling '22 : Proceedings of the 22nd Koli Calling International Conference on Computing Education Research
EditorsIlkka Jormanainen, Andrew Petersen
Number of pages12
Place of PublicationNew York, NY, USA
Publication date17.11.2022
Article number3
ISBN (Print)9781450396165
Publication statusPublished - 17.11.2022
No renderer: handleNetPortal,dk.atira.pure.api.shared.model.researchoutput.ContributionToBookAnthology

    Research areas

  • assessment, program tracing, item difficulty, item generation

ID: 5841532