Hide metadata

dc.date.accessioned2013-03-12T08:22:43Z
dc.date.available2013-03-12T08:22:43Z
dc.date.issued2009en_US
dc.date.submitted2009-06-14en_US
dc.identifier.citationBergersen, Linn Cecilie. Data Integration in Penalized Regression Models. Masteroppgave, University of Oslo, 2009en_US
dc.identifier.urihttp://hdl.handle.net/10852/10831
dc.description.abstractNew challenges within statistical sciences have arisen with the explosive growth of information. Classical methods are not designed for these kinds of problems and may not be possible to use or may not behave as expected. In regression analysis, having a very large number of explanatory variables p when the sample size n is small, will not be in accordance with the assumptions in the usual regression model, where p≤n. A lot of novel and effective strategies have been established to circumvent this problem, and shrinkage methods are one approach which is commonly used when doing regression with p>n, or even p>>n. As the volume of existing data expands, an increased interest in data integration has also aroused. Methods combining information from different data sources could be of great relevance and importance in different scientific fields. One area where high-dimensional data frequently occur is within biology and medicine. Large high-dimensional data sets with thousands of covariates are a result of the great advances and new methods in biotechnology which are able to conduct high-throughput experiments of gene expression and other biological features of interest. The underlying aim analyzing these data, is to search for novel biomarkers which can be used to predict outcome of a disease for future patients. Incorporating more than one type of such biological high-dimensional data in a single model may therefore be appropriate. By effectively taking advantage of known underlying biological processes, the idea of using more of the information available is just as beneficial from a biological point of view as from a statistical perspective. The aim of this thesis is to propose a model for data integration of high-dimensional data in a regression setting where p>n. The suggested method will be a shrinkage method with L1-penalties of the lasso type. By introducing penalty terms which could be uniquely defined for each covariate, the model may provide different amounts of shrinkage to the regression coefficients based on external information from additional data sources. The model will be presented in a biological context and applied to a high-dimensional data set The Radium Hospital Cervix Cancer Cohort Data. The data set includes survival data and both gene expression measurements and aCGH data for patients diagnosed with cervical cancer at the Norwegian Radium Hospital in the period 2001-2004. The intent is to identify genes which are important for survival and to study the possibility of predicting the outcome for future patients.eng
dc.language.isoengen_US
dc.subjectlasso mikromatrise data overlevelses analyseen_US
dc.titleData Integration in Penalized Regression Models : with application to genomicsen_US
dc.typeMaster thesisen_US
dc.date.updated2009-10-13en_US
dc.creator.authorBergersen, Linn Cecilieen_US
dc.subject.nsiVDP::412en_US
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.au=Bergersen, Linn Cecilie&rft.title=Data Integration in Penalized Regression Models&rft.inst=University of Oslo&rft.date=2009&rft.degree=Masteroppgaveen_US
dc.identifier.urnURN:NBN:no-23215en_US
dc.type.documentMasteroppgaveen_US
dc.identifier.duo92790en_US
dc.contributor.supervisorIngrid K. Glad og Heidi Lyngen_US
dc.identifier.bibsys092705758en_US
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/10831/1/Masteroppgave.pdf


Files in this item

Appears in the following Collection

Hide metadata