Somatic mutations reveal complex metastatic seeding from multifocal primary prostate cancer

Abstract Primary prostate cancer shows a striking intraorgan molecular heterogeneity, with multiple spatially separated malignant foci in the majority of patients. Metastatic prostate cancer, however, typically reveals more homogenous molecular profiles, suggesting a monoclonal origin of the metastatic lesions. Longitudinal mutational spectra, comparing multiple primary lesions with metastases from the same patients remain poorly defined. We have here analyzed somatic mutations in multisampled, spatio‐temporal biobanked lesions (38 samples from primary foci and 1 sample from each of 8 metastases from seven prostate cancer patients) applying a custom‐designed panel targeting 68 prostate cancer relevant genes. The metastatic samples were taken at time of primary surgery and up to 7 years later, and sampling included circulating tumor DNA in plasma or solid metastatic tissue samples. A total of 282 somatic mutations were detected, with a range of 0 to 25 mutations per sample. Although seven samples had solely private mutations, the remaining 39 samples had both private and shared mutations. Seventy‐four percent of mutations in metastases were not found in any primary samples, and vice versa, 96% of mutations in primary cancers were not found in any metastatic samples. However, for three patients, shared mutations were found suggesting the focus of origin, including mutations in AKT1, FOXA1, HOXB13, RB1 and TP53. In conclusion, the spatio‐temporal heterogeneous nature of multifocal disease is emphasized in our study, and underlines the importance of testing a recent sample in genomics‐based precision medicine for metastatic prostate cancer.


Design of a prostate cancer-focused targeted sequencing panel
In total, the design consists of 2470 probes covering 243,706 base pairs, targeting 68 genes. The background information includes gene sets from significantly mutated genes in publications on both primary, and some including metastatic, prostate cancer. All these publications included analysis of at least 50 primary cancers, all generated using genome or exome-scale DNA sequencing [1][2][3][4][5][6][7][8] . In addition, a list of significantly mutated genes from an in-house analysis of multifocal primary prostate cancer was included 9 . Identities of the included genes, lengths of their exonic sequences, and which selection criteria they meet are listed in Supplementary Table S4. In short, twenty-six of the included genes were listed as significantly mutated in three or more of the chosen publications; ten were listed as significantly mutated in only two of the publications, but had a frequency of more than one mutation per million basepairs (Mbp); sixteen were listed as significantly mutated in only one of the chosen publications, but with a mutation frequency above three per Mbp; nine are found to be frequently mutated in the germline of hereditary prostate cancer [10][11][12] ; and eight are DNA repair genes [13][14][15][16] which were added to detect mutations in this particularly clinically relevant system. Probes covering nine relevant genes were included to detect common DNA copy number aberrations commonly amplified or deleted in prostate cancer 9 . As the fusion gene TMPRSS2-ERG is present in around 50 % of all prostate cancers and commonly caused by a deletion of a 2.8 Mb region on chromosome 21 17,18 , probes were designed to enable detection of this deletion/fusion. To detect DNA copy number changes in the enhancer region of AR; a recurrent event in metastatic prostate cancer, four probes were designed to capture this particular region 19 . To enable detection of more wide-spread DNA copy number changes, one probe was designed to target sequences in close proximity to each chromosome arm. An overview of all included regions can be found in Supplementary Table S4.

Ultra-low pass whole-genome sequencing of DNA for determination of tumor fraction of cfDNA
Raw sequencing reads were aligned to the hg19 reference genome using BWA (version 0.7.17).
Alignment maps were sorted and converted to binary alignment files using Samtools.
The readCounter application from the Titan toolbox was applied to generate wig files based on the alignment bam files. From the wig files, the ichorCNA software was applied to determine the fraction of ctDNA in cfDNA 20 (DNA copy number plots and estimated tumor fractions can be found in Figure   1, Supplementary Figure S2 and Supplementary Table S1). Blood samples from patients with both available cfDNA and tissue samples from multiple foci were included in further analysis. The number of uniquely mapped reads for each sample was calculated using the command: where the hg19 bed file was created from the same hg19 genome reference file as was used for raw read alignment.
Additional metrics presented in Suppl. Table S1 were calculated using the CollectWgsMetrics from Picard (version 2.6.0).
The ichorCNA software was used to determine the fraction of ctDNA in cfDNA 20 (DNA copy number plots and estimated tumor fractions can be found in Figure Table S1). Blood samples from patients with both available cfDNA and tissue samples from multiple foci were included in further analysis.

Targeted sequencing of DNA and mutation calling
After sequencing, fastq files were created from raw basecalls using tools from the Picard (version 2.19.0) toolbox. Barcodes were identified with the ExtractIlluminaBarcodes tool, and unaligned bam files with unique molecular indexing (UMI) information were created using the IlluminaBasecallsToSam algorithm. Fastq files were obtained by applying the SamToFastq tool, and these raw reads were aligned to the hg38 genome reference using BWA (version 0.7.17). Sam files were coordinate-sorted and subsequently converted to binary format by the SortSam tool. Unaligned bam files with UMI information and aligned bam files without UMI information were merged using the MergeBAMAlignment tool. All bam files were subject to GATK (version 4.1.2.0) preprocessing.
Variant calling after sequence alignment and preprocessing, as described in the previous section, was performed using both the MuTect (SNVs), Strelka (indels), and MuTect2 (SNVs + indels) algorithms. Indels needed to be nominated by both indel callers to be included. Variants covered by one or more alternative reads in the normal sample and less than two alternative reads in the tumor were rejected. In addition, a cut-off of at least 1 % variant allele frequency (VAF) in the tumor sample was used. All mutations needed to be scored as 'PASS' in at least one sample to be included. Candidate variants in repetitive and non-complex genomic regions were discarded after visual inspection.

Patient identity matching of included DNA samples
To verify matching patient identities between all primary and metastatic samples, we ran the

Supplementary tables
The following supplementary tables are available as separate files.