Abstract
Machine Learning (ML) models have proven to perform well in a broad range of prediction challenges. However, ML models require discriminating data to learn patterns and be capable of performing predictions. In this master thesis, we implement an ML model for ecotoxicology effect prediction. To connect the various chemicals and provide the model with a data landscape it can discover patterns from, we utilize a Knowledge Graph (KG). However, KGs are extensive and inevitably contain undiscriminating and noisy data. Thus, we aim to better the performance by exploring methods of filtering and prioritizing the triples in the KG. We have created several algorithms for filtering and prioritizing the KG. Based on the assumption that discriminating triples connects to either toxic or non-toxic chemicals, the algorithms crawls the graph form each chemical, and scores the triples visited based on the toxicity of the chemical. Because the graph is connected (i.e., there is a path between every pair of vertices), we demonstrate various ways to crawl and score the triples in the graph. The results show that our approaches outperform the baseline method of using the entire KG equally. Further, we discuss the possibility of these approaches to provide explainability for the predictions.