Abstract
Post translational modifications (PTMs) are chemical modifications to proteins that regulate many protein functions. Recently, PTMs found on human leukocyte antigen (HLA) bound epitopes specific to cancer cells have become a potential target for new immunotherapies. However, detecting PTMs experimentally is both costly and time-consuming, leading to the need for novel prediction methods. To this end, this project investigated the current state of the art of machine learning predictors on their ability to predict phosphorylated HLA class I epitopes. Additionally, this project set out to develop a unique epitope specific phosphorylation predictor. To achieve this, a comprehensive independent test set of 811 experimentally validated phosphorylated I epitopes was collated from the literature. A novel epitope specific machine learning framework was developed based on three different approaches: gradient boosted decision trees, support vector machines and artificial neural networks. The framework was designed in two variants: one trained on peptides extracted from validated phosphorylated proteins, and another trained on phosphorylated peptides predicted to bind to HLA class I. Testing the framework showed that the former approach was superior to the latter, and the resulting epitope specific predictor (AUC 0.883) performed better compared to the existing phosphorylation prediction tools (AUC 0.801). Applying the framework to disease related epitopes, indicated a higher rate of predicted phosphorylation in mutated tumor epitopes (neoepitopes) than other pathogenic epitopes. The novel epitope specific framework described here may in future studies be used to discover potential tumor specific phosphorylation sites on HLA class I epitopes.