Abstract
This Master's thesis presents an in-depth study on the application of unsupervised machine learning methods, specifically clustering algorithms, within the scope of adaptive immune receptor (AIR) analysis. The human adaptive immune system's complexity, underscored by its trillions of unique receptor sequences, has necessitated the development of robust analytical tools, such as the immuneML platform. However, despite its extensive capabilities, immuneML lacked functionality for performing cluster analysis, a valuable tool for discerning patterns within large, intricate datasets. The primary objective of this research was to address this limitation by integrating clustering into immuneML to enhance both the accuracy and comprehensibility of AIR data analysis. As a part of this thesis, varying encoding methods, such as K-mer, Word2vec, and TCRdist, and their impact on clustering outcomes are explored. Furthermore, the thesis discusses the implementation of a clustering pipeline within immuneML, including multiple clustering algorithms, ensuring adherence to software best practices and coding standards. The results of this work offer insights into the performance of distinct clustering algorithms and encoding methods in the context of AIR data. Despite several limitations, the integration of cluster analysis into immuneML has the potential to advance our understanding of the adaptive immune system and contribute to the development of effective treatments for various diseases. The research provides a foundation for future work and has the potential to promote the discovery of biomarkers, enhancing the accuracy of immune response predictions.