Introducing an Efficient Approach for Expressing Uncertainty in Deep Learning with Bayesian Neural Networks

Bull, Edward Fabian

Master thesis

View/Open

Edward_Bull_thesis.pdf (2.661Mb)

Year

2021

Abstract

Background: Markov chain Monte Carlo (MCMC) methods for deep learning are not commonly used because of the computationally heavy Metropolis-Hastings (MH) test. MCMC methods can give more information in its predictions than variational inference. Different approximations of the MH-test have made algorithms that are tractable for deep learning but are still inefficient. Objective: This exploratory thesis will examine what makes different approximate MCMC methods efficient. Results: This work introduces a method called stochastic target Metropolis-Hastings (STMH). As it uses gradients to approximate the target in the MH algorithm, rather than approximating the MH-test, STMH is a faster method. When compared to stochastic gradient descent (SGD), STMH is not overconfident. It is also more robust against overfitting, and retain goodness-of-fit over time. Lastly, STMH is both more stable and versatile compared to the SGD. Discussion: The algorithm shows promising results for more efficient Bayesian deep learning and presents new opportunities for MCMC methods to be applied to complex problems. As the method uses some theoretic properties of optimal efficiency for the MH algorithm, it may be more efficient than the Stochastic Gradient Langevin Dynamics method that completely omits the MH test. Conclusion: Initial studies of STMH shows promising results. Further advancements of STMH should be considered for future work.