Handling co-dependence issues in resampling-based variable selection procedures: a simulation study

De Bin, Riccardo; Sauerbrei, Willi

Journal article; AcceptedVersion; Peer reviewed

View/Open

GSCS_A_1378654.pdf (2.896Mb)

Year

2018

Original version

Journal of Statistical Computation and Simulation. 2018, 88 (1), 28-55, DOI: http://dx.doi.org/10.1080/00949655.2017.1378654

Abstract

If a number of candidate variables are available, variable selection is a key task aiming to identify those candidates which influence the outcome of interest. Methods as backward elimination, forward selection, etc. are often implemented, despite their drawbacks. One of these drawbacks is the instability of their results with respect to small perturbations in the data. To handle this issue, resampling-based procedures have been introduced; using a resampling technique, e.g. bootstrap, these procedures generate several pseudo-samples that are used to compute the inclusion frequency of each variable, i.e. the proportion of pseudo-samples in which the variable is selected. Based on the inclusion frequencies, it is possible to discriminate between relevant and irrelevant variables. These procedures may fail in case of correlated variables. To deal with this issue, two procedures based on 2×2 tables of inclusion frequencies have been developed in the literature. In this paper we analyse the behaviours of these two procedures and the role of their tuning parameters in an extensive simulation study.

This item's license is: Attribution-NonCommercial-NoDerivs 3.0 Unported