Background: The conventional method for modeling of the five-level EuroQol five-dimensional questionnaire (EQ-5D-5L) health state values in national valuation studies is an additive 20-parameter main-effects regression model. Statistical models with many parameters are at increased risk of overfitting—fitting to noise and measurement error, rather than the underlying relationship.
Objectives: To compare the 20-parameter main-effects model to simplified, nonlinear, multiplicative regression models in terms of how accurately they predict mean values of out-of-sample health states.
Methods: We used data from the Spanish, Singaporean, and Chinese EQ-5D-5L valuation studies. Four models were compared: an 8-parameter model with single parameter per dimension, multiplied by cross-dimensional parameters for levels 2, 3, and 4; 9- and 11-parameter extensions with handling of differences in the wording of level 5; and the “standard” additive 20-parameter model. Fixed- and random-intercept variants of all models were tested using two cross-validation methods: leave-one-out at the level of valued health states, and of health state blocks used in EQ-5D-5L valuation studies. Mean absolute error, Lin concordance correlation coefficient, and Pearson R between observed health state means and out-of-sample predictions were compared.
Results: Predictive accuracy was generally best using random intercepts. The 8-, 9-, and 11-parameter models outperformed the 20-parameter model in predicting out-of-sample health states.
Conclusions: Simplified nonlinear regression models look promising and should be investigated further using other EQ-5D-5L data sets. To reduce the risk of overfitting, cross-validation is recommended to inform model selection in future EQ-5D valuation studies.