Making sense of the human genome using machine learning

Haaland, Fredrik

Master thesis

Åpne

haaland.pdf (8.022Mb)

År

2013

Sammendrag

Machine learning enables a computer to learn a relationship between two assumingly related types of information. One type of information could thus be used to predict any lack of information in the other using the learned relationship. During the last decades, it has become cheaper to collect biological information, which has resulted in increasingly large amounts of data.

Biological information such as DNA is currently analyzed by a variety of tools. Although machine learning has already been used in various projects, a ﬂexibletool for analyzing generic biological challenges has not yet been made.

The challenges of representing biological data in a generic way that permits machine learning is here discussed. A ﬂexible machine learning application is presented for working on currently available biological DNA. Also, it targets biological challenges in an abstract manner, so that it may become useful for both current and future challenges.

The application has been implemented in The Genomic HyperBrowser and is publicly available. An use case inspired by a biological challenge demonstrates the application usage. A machine learned model is analyzed and used for making predictions. The results are discussed and further actions of how to improve the model is proposed.

The application offers a new way for researchers to investigate and analyze biological data using machine learning.