Hide metadata

dc.date.accessioned2017-12-12T16:26:48Z
dc.date.available2019-07-20T22:46:18Z
dc.date.created2017-08-03T13:45:48Z
dc.date.issued2017
dc.identifier.citationDe Bin, Riccardo Boulesteix, Anne-Laure Sauerbrei, Willi . Detection of influential points as a byproduct of resampling-based variable selection procedures. Computational Statistics & Data Analysis. 2017, 116, 19-31
dc.identifier.urihttp://hdl.handle.net/10852/59343
dc.description.abstractInfluential points can cause severe problems when deriving a multivariable regression model. A novel approach to check for such points is proposed, based on the variable inclusion matrix, a simple way to summarize results from resampling-based variable selection procedures. The variable inclusion matrix reports whether a variable (column) is included in a regression model fitted on a pseudo-sample (row) generated from the original data (e.g., bootstrap sample or subsample). It is used to study the variable selection stability, to derive weights for model averaged predictors and in others investigations. Concentrating on variable selection, it also allows understanding whether the presence of a specific observation has an influence on the selection of a variable. From the variable inclusion matrix, indeed, the inclusion frequency (I-frequency) of each variable can be computed only in the pseudo-samples (i.e., rows) which contain the specific observation. When the procedure is repeated for each observation, it is possible to check for influential points through the distribution of the I-frequencies, visualized in a boxplot, or through a Grubbs’ test. Outlying values in the former case and significant results in the latter point to observations having an influence on the selection of a specific variable and therefore on the finally selected model. This novel approach is illustrated in two real data examples.en_US
dc.languageEN
dc.publisherElsevier
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Unported
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/
dc.titleDetection of influential points as a byproduct of resampling-based variable selection proceduresen_US
dc.typeJournal articleen_US
dc.creator.authorDe Bin, Riccardo
dc.creator.authorBoulesteix, Anne-Laure
dc.creator.authorSauerbrei, Willi
cristin.unitcode185,15,13,25
cristin.unitnameStatistikk og biostatistikk
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1
dc.identifier.cristin1484013
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Computational Statistics & Data Analysis&rft.volume=116&rft.spage=19&rft.date=2017
dc.identifier.jtitleComputational Statistics & Data Analysis
dc.identifier.volume116
dc.identifier.startpage19
dc.identifier.endpage31
dc.identifier.doihttp://dx.doi.org/10.1016/j.csda.2017.07.001
dc.identifier.urnURN:NBN:no-62018
dc.type.documentTidsskriftartikkelen_US
dc.type.peerreviewedPeer reviewed
dc.source.issn0167-9473
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/59343/2/CSDA-D-16-01272R2.pdf
dc.type.versionAcceptedVersion


Files in this item

Appears in the following Collection

Hide metadata

Attribution-NonCommercial-NoDerivs 3.0 Unported
This item's license is: Attribution-NonCommercial-NoDerivs 3.0 Unported