Parameter estimation is an important area of research, and over the years numerous different estimators have been proposed. One of the most popular ones is the maximum likelihood estimator, and in this thesis we discuss a penalized version of this estimator. The penalized maximum likelihood estimator involves one or more tuning parameters dictating the degree of regularization, and we discuss how these tuning parameters should be chosen or estimated from data.
The main part of the thesis is connected to estimation of regression coefficients in the multiple linear regression setting with normal error terms. In chapter 2 we briefly describe the ordinary least squares estimator and its properties, and we introduce the concept of variable selection.
In chapter 3 we work with ridge regression. This is a penalized version of the least squares estimator giving biased parameter estimates with smaller variance than the least squares estimates. There is always a value of the tuning parameter for which the mean square error is smaller than that of the least squares estimate, but the problem is to find such a value. We discuss three possible ways of estimating the tuning parameter: cross validation, the method of moments and the maximum marginal likelihood estimator. The last two methods are based on the fact that the ridge estimate is also the mode (and mean) of the posterior distribution of the coefficients if we use a normal prior with mean 0 and variance dependent on the tuning parameter. Following the idea of empirical Bayes we estimate the parameter of the prior from data, using both the method of (marginal) moments and the maximum (marginal) likelihood approach. We derive limiting results for the estimated tuning parameter in the three cases and provide a numerical simulation study.
In chapter 4 we introduce what we call the zero-mixed ridge. This is based on the Bayesian interpretation of the ridge estimate, using a two-parameter prior distribution with a point mass in 0. As opposed to ridge regression, which only shrink the estimates towards zero, this estimator have the possibility of setting coefficients to zero and is thus also performing variable selection. We describe the connection to AIC and BIC for ordinary least squares and discuss the same three estimators as in the ridge chapter for estimating the two tuning parameters.
In chapter 5 we discuss the lasso. This is also a penalized version of the least squares, but with a different penalty. The lasso has only one tuning parameter, but performs both shrinkage and variable selection. Again we discuss the same three methods for estimating the tuning parameter and we discuss the similarities and differences between the lasso and the zero-mixed ridge.
In chapter 6 we discuss the estimation of covariance matrices and their inverses using penalized maximum likelihood estimation. This chapter is mainly a summary of articles read early in the process of finding a topic for my thesis, and no results are presented here.