When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.
Original languageEnglish
JournalMultivariate Behavioral Research
Volume55
Issue number3
Pages (from-to)361-381
Number of pages21
ISSN0027-3171
DOIs
Publication statusPublished - 06.2020

    Research areas

  • Multiple regression, interaction effects, maximum likelihood estimation, missing data

ID: 998392