Multiple samples from cancer patients facilitate heterogeneity analysis and the possibility to establish genetic relationship between samples. Additionally, samples taken at different time points in the stages of a cancer could provide useful insights into the progression/evolution of the cancer.
The objective of this work has been to implement optimized mutation calling strategies and subclone analysis using high-throughput sequencing data to measure genetic relationship between multiple samples from one cancer patient.
Methods and implementation
A set of custom-made python scripts to facilitate complimentary usage of different software tools and apply cross-sample strategies to improve sensitivity and specificity. Results from different applications are integrated to perform subclone analysis.
The sDiscovery toolbox is a set of utilities, designed to improve somatic mutation calling and analyze similarity between samples using variant allele frequency, somatic mutations, tumor fraction and copy number variations .
The work of this thesis demonstrates that the strategic choices using the benefits of multiple samples has reduced some of the technical and biological noise from the data and resulted in a more accurate dataset. The methods for measuring similarity between samples provide valuable input for future research into the heterogeneity field of prostate cancer, where strategy used here will be applied to multiple patients with multi-sample datasets from each.