Autoimmune risk variants in ERAP2 are associated with gene-expression levels in thymus

Genetic polymorphisms in the endoplasmic reticulum aminopeptidase (ERAP)1 and ERAP2 genes have been associated with several autoimmune diseases (AIDs) at a genome-wide significance level. In this study, we performed a cis expression quantitative trait locus (eQTL) screen to investigate whether seven fine-mapped AID single-nucleotide polymorphisms (SNPs) in the ERAP-region influence the gene-expression levels of ERAP1 and ERAP2 in thymus. After quality control, we identified six significant eQTLs. We further assessed the peak eQTL signals, and both genes showed highly significant and independent thymic eQTL signals (P=2.16 × 10−15 and P=8.22 × 10−23, respectively). Interestingly, the peak eQTL signal overlapped with the AID risk loci in ERAP2 (r2>0.94), but were distinct in ERAP1 (r2<0.4). Finally, among the SNPs showing the most significant eQTL associations with ERAP2 (P<3.4 × 10−20), six were located within transcription factor motifs in an enhancer region in thymus. Our study therefore reveals the fine-mapped AID risk variants that act as eQTLs with ERAP2 in thymus, and highlights the potential causal regulatory variants.


INTRODUCTION
The endoplasmic reticulum aminopeptidases (ERAP)1 and ERAP2 have a critical role in the production of final peptides that are presented by human leucocyte antigen class I molecules to the CD8 + T cells. It has become clear, through genome-wide association studies, that the ERAP1 and ERAP2 genes are implicated in several autoimmune diseases (AIDs). [1][2][3][4][5][6][7] Several of the AID risk variants in ERAP1 and ERAP2 (rs30187, rs27524, rs27434, rs1363907, rs2549794) have been reported as expression quantitative trait loci (eQTL) in multiple tissues. [8][9][10] However, ERAP1 and ERAP2 eQTLs have not yet been explored in thymus. The thymus has an essential role in the establishment of immune tolerance, as this is the organ responsible for releasing self-tolerant T-lymphocytes out in the periphery. Failure in the immune self-tolerance system can lead to the liberation of autoreactive T cells and subsequent development of AID.
AID risk variants identified through genome-wide association studies have subsequently been fine-mapped in several Immunochip (Ichip) association studies. The Ichip is a SNP genotyping array with dense marker coverage across 186 genetic regions, harboring risk variants for 11 well-defined autoimmune and inflammatory diseases. 11 These include inflammatory bowel disease, 4 multiple sclerosis, 12 psoriasis, 13 rheumatoid arthritis, 14 ankylosing spondylitis, 15 juvenile idiopathic arthritis, 16 autoimmune thyroid disease 17 and type 1 diabetes. 18 Therefore, in this study, we have performed an eQTL screen with fine-mapped AID SNPs in the ERAP1 and ERAP2 region to investigate if they influence gene-expression levels in thymus.

eQTL analysis
Gene-expression levels were measured in whole thymic tissue samples from 42 young individuals. In order to investigate whether AID-associated SNPs in ERAP1 and ERAP2 influence gene expression in thymic tissue, we performed a cis eQTL screen by correlating genotype data of seven fine-mapped AID-associated SNPs (Supplementary Table S1) with expression levels from all probes within a window of ± 1 Mb (Supplementary Table S2). We detected eight statistically significant SNP-probe pairs, but after quality control of the eQTL probes (eProbes; see the 'Materials and methods' section), six significant SNP-probe pairs in the ERAP1 and ERAP2 region were evident ( Table 1). The six eQTLs involved two eProbes, ILMN_1752145 and ILMN_1743145 binding to ERAP1 and ERAP2, respectively.
Next, we aimed to localize the SNP with the peak eQTL signal within the ERAP1 and ERAP2 gene regions. Therefore, the expression levels of ILMN_1752145 and ILMN_1743145 were tested against all Ichip SNPs densely covering the ERAPregion (Supplementary Table S3 and Figure 1). Novel expressionaltering SNPs (eSNPs) emerged as the most statistically significant thymic eQTLs in ERAP1 and ERAP2 (rs7063, P = 2.16 × 10 − 15 and rs27302, P = 8.22 × 10 − 23 ). We noticed that a large number of SNPs associated with the expression of ERAP2 were situated in the neighboring gene, LNPEP. We therefore tested the same SNPs against two probes in our gene-expression data set binding to LNPEP (ILMN_1798433 and ILMN_1814737), but none of the SNPs showed significant eQTL associations (Supplementary Figure S1, data not shown).
Linkage disequilibrium and conditional analysis To evaluate the correlation of the six AID eSNPs and the most statistically significant thymic eSNPs, we examined their linkage disequilibrium (LD) pattern ( Figure 2). In ERAP1, the thymic eSNP rs7063 was in low to moderate LD with the AID SNPs; rs27432 (r 2 = 0.22) and rs10045403 (r 2 = 0.37). Because of the limited correlation between these SNPs, we questioned whether the two AID SNPs, rs27432 and rs10045403, could be independently associated with the ERAP1 expression. Hence, we performed a logistic regression analysis conditioned on the thymic eSNP rs7063 (Supplementary Table S4). However, neither rs27432 nor rs10045403 obtained significant eQTL P-values. This indicates that  the thymic eQTL signals involving the AID risk variants rs27432 and rs10045403 are likely secondary owing to their LD with rs7063, and that there is no clear link between the thymic eSNP and the AID risk variants. In contrast, the thymic peak eSNP in ERAP2, rs27302, was in strong LD with the AID SNPs rs2910686 (r 2 = 0.94), rs1363907 (r 2 = 0.94), rs27290 (r 2 = 0.94) and rs27293 (r 2 = 0.95). These findings suggest that the thymic eQTL in ERAP2 overlap with the AID risk loci.

Haplotype analysis in ERAP2
Because the correlation between the AIDs SNPs and the most statistically significant eSNP in ERAP2 was strong, we explored the relationship between the SNP alleles by haplotype analysis. In the thymic tissue data set, we observed one haplotype which carried all the risk alleles (Supplementary Table S5). This haplotype carried the G-allele at the thymic eSNP rs27302, and homozygosity for this allele was associated with the highest expression levels of ERAP2 (Supplementary Figure S2). Finally, we examined these ERAP2 SNPs using the CEU genotype data set from 1000 Genomes phase 3, and confirmed the strong LD and that the risk alleles were carried on the same haplotype (Supplementary Figure S3).

Functional annotation analysis in ERAP2
Furthermore, we wanted to evaluate the putative functional roles of the ERAP2 SNPs in thymus. In order to do this, we examined the overlap of epigenomic data from adult and fetal thymus in the Roadmap Epigenome Project with the strongest eSNP rs27302, the AID SNPs and all proxy SNPs (N = 56) in strong LD (r 2 40.9) with rs27302 showing the strongest association with the gene expression of ERAP2 (P o3.4 × 10 − 20 ). Neither the most significant eSNP rs27302, nor any of the of the AID SNPs, were positioned within open chromatin in thymus. However, six proxy SNPs (rs2617447, rs2617435, rs1046395, rs2762, rs1423566 and rs2910788) within the same LD block (r 2 = 1) were situated within an open chromatin region and had epigenetic histone marks suggestive of promoter (H3K4m3) and active enhancer (H3K27ac) function (chr5: 96 268 000-96 274 000) (Supplementary Figure S4A and B). We further explored these potential regulatory SNPs in the HaploReg V4 database to see whether they interfered with transcription factor (TF) motifs ( Table 2). The six SNPs were altering the binding sites of several TFs, among others Ik-2, p300 and AIRE, which are TFs that have been implicated in T-cell development [19][20][21] and thymic epithelial cells. 22 These results suggest that the variation in gene expression observed in ERAP2 might be caused by altered binding of regulatory factors in thymus.
Comparison of thymic eQTLs in other tissues and cells To investigate whether our thymic eQTLs also act as eQTLs in other tissues, we searched several publically available eQTL databases (Supplementary Table S6) using the most significant thymic eSNPs in ERAP1 and ERAP2 and the six putative functional eSNPs. We found that all eSNPs (including rs7063, rs27302, rs2617447, rs2617435, rs1046395, rs2762, rs1423566 and rs2910788) displayed eQTL signals for the same gene in the same allelic direction in other tissues (Supplementary Table S7). Hence, these eQTLs are not thymus-specific. In lymphoblastoid cell lines, a variant previously suggested to lead to variation in ERAP2 expression is the SNP rs2248374. 23 We therefore questioned whether rs2248374 was among the eSNPs showing the most significant association with ERAP2 in our data set. However, we detected rs2248374 in another LD block ( Figure 1) with a less significant eQTL P-value (2.74 × 10 − 9 ). Moderate-to-strong LD was observed between rs2248374 and rs27302 (r 2 40.66 in our genotype data set and r 2 40.82 in the data set from 1000 Genomes phase 3). When conditioning on the eSNP rs27302, rs2248374 obtained a non-significant P-value (P = 0.22), and when conditioning on rs2248374, the most significant thymic eSNP rs27302 remained statistically significant (P = 1.76 × 10 − 4 ).

DISCUSSION
In this study, we detected association between six AID risk SNPs and expression levels of ERAP1 and ERAP2 in human thymic tissue. Owing to our limited sample size, we did not have power to genetically fine map the position of the eSNP causing the eQTL signal. However, we could decisively address the correlation between the eQTL peak signals and the AID risk SNPs. In ERAP1, the most significant eSNP was in low LD with the AID risk variants, and the eQTL signals from the AID SNPs were markedly inferior. Hence, the thymic eQTL involving ERAP1 is distinct from the ERAP1 associations observed in several AIDs. These findings show the importance of thoroughly mapping eQTL signals in tissues.
In ERAP2, however, the eSNP rs27302 was strongly correlating with the AID risk variants (r 2 40.94). Interestingly, we noticed that rs27302 obtained a P-value superior to all the other highly significant eSNPs (P o 3.45 × 10 − 20 ) in the same LD block. However, this difference was owing to only one individual in our sample. These results therefore suggest that all the eSNPs in this LD block (involving the AID SNPs) are equally likely to be causal eSNPs. Among the eSNPs showing the most significant eQTL associations with ERAP2 (P o 3.4 × 10 − 20 ), six were located within an open chromatin region in thymus showing a promoter and enhancer profile of histone marks (H3K4me3 and H3K27ac). This region is located between ERAP2 and LNPEP. As enhancer activity is independent of orientation relative to its target gene, 24 it is conceivable that this region controls the expression of ERAP2. Interestingly, the eSNPs were not significantly associated with LNPEP expression. Furthermore, the six SNPs were located in TFbinding sites, pointing toward an influence on TF binding. One of these SNPs, rs2762, has previously been associated with the expression profile of ERAP2 in lymphoblastoid cell lines. 25 Another SNP, rs2248374, has also been suggested to influence the gene expression of ERAP2. 23 However, in our study, LD and conditional eQTL analysis suggest that this SNP is not the main contributor to the eQTL signal. This is in line with other eQTL studies in lymphoblastoid cell lines. 25,26 However, further functional followup studies are needed to examine these regulatory candidates.
Finally, we discovered that the haplotype associated with the highest gene expression of ERAP2 also comprised all the AID risk alleles. This further raises the question how a greater ERAP2 expression can be associated with AID risk. Interestingly, Andrés et al. 23 demonstrated that individuals with higher expression of the ERAP2 protein also had a higher level of MHC (major histocompatibility complex)-class I molecules on the cell surface of primary B lymphocytes. As suggested by the authors, the presence of more MHC-class I molecules on the cell surface can both be beneficial, in case of tumor or virus antigen presentation, but also detrimental, in the case of autoimmunity. Others have also supported the hypothesis that increased expression of MHC-class I molecules may be relevant to the pathogenesis of AIDs. 27 Our findings suggest that the haplotype associated with the highest expression of ERAP2 in thymus might be involved in etiological pathways for ankylosing spondylitis, psoriasis, inflammatory bowel disease and juvenile idiopathic arthritis.
To conclude, this study has revealed AID SNPs that show strong eQTL associations with ERAP2 in thymus and highlights potential causal regulatory variants.

Sample material, SNP genotyping and gene-expression analysis
Thymic tissue samples used in this study was collected from 42 Norwegian children undergoing heart surgery. All children were o13 years (26 samples from children o1 year old). The gender distribution was 22 girls and 20 boys. This project was approved by the regional ethical committee and written informed consent was given by all parents. All tissue samples were made anonymous. SNP genotyping and gene-expression measurement of the 42 thymic samples was performed by using the Illumina Immunochip and the Illumina HumanWG-6 v3 microarray (Illumina, San Diego, CA, USA) and have been described elsewhere. 28 The microarray data have been published in the NCBI Gene Expression Omnibus (http:// www.ncbi.nlm.nih.gov/geo/) under the accession number GSE77052.

The AID eQTL screen
The eQTL analysis was performed by combining the Ichip genotype data and the Illumina HumanWG-6 v3 expression data. Linear regression analysis was applied and P-values were obtained by Wald's test implemented in PLINK. 29 In the eQTL screen, the seven AID SNPs were tested against all surrounding probes within a window of ± 1 Mb (that is, a total window size of 2 Mb). Because 490% of cis regulatory SNPs are positioned within a window of ± 100 kb of the transcription start site, [30][31][32] the a priori likelihood of a SNP being an eSNP is therefore significantly higher within a narrow window of 200 kb compared with a broad 2 Mb window. Therefore, we applied two different thresholds in this study when adjusting for multiple testing using conservative Bonferroni's correction at the 0.05 level. For all the eQTLs where the eSNP and the eProbe were located o100 kb from each other, we corrected for the respective number of tests (n = 46, which corresponds to a P-value threshold at 1.1 × 10 − 3 ). For eProbes situated 4100 kb away from the eSNP, a stricter significance threshold was applied (n = 169, P-value threshold = 2.9 × 10 − 4 ).

Quality control of eQTLs
The eProbe (ILMN_1752145, ILMN_1743145 and ILMN_2336220) sequences were aligned against the University of California Santa Cruz (UCSC) Genome Browser human reference sequences (GRCh37/Hg19). The ILMN_2336220 and ILMN_1743145 probes were uniquely binding to exon 15 in ERAP1 and exon 17 in ERAP2, respectively. The 50 bp long ILMN_1752145 probe was binding to 24 bp of exon 21 in CAST, but completely to exon 20 in of ERAP1. Next, we examined if the sequence of the eProbes were overlapping with any common SNPs (minor allele frequency41%). One SNP (rs27044) was found within the eProbe sequence of ILMN_2336220. rs27044 was in strong LD with the AID eSNPs rs30187 and rs27432 (r 2 = 0.78 and r 2 = 1 in the 1000 Genomes Phase 3 CEU data set, respectively). To assess whether these eQTLs were true, we measured the ERAP1 gene expression with the Taqman Gene Expression assay Hs_00429970 (Thermo Fisher Scientific), which has been described elsewhere. 28 These Taqman gene-expression data were tested against rs30187 and rs27432. The associations were found to be non-significant for both rs30187 (P40.24) and rs27432 (P40.11). The two eQTLs (rs30187 -ILMN_2336220 and rs27432 -ILMN_2336220) were therefore removed. Overall, after quality control, six SNP-probe pairs in ERAP1 and ERAP2 remained.

The peak eQTL analysis
The peak eQTL analysis was performed by including genotype data from additional SNPs available from the Illumina Ichip genotyping. We selected all available SNPs (GSS492% and HWE P-value40.05) densely covering the genes CAST, ERAP1, ERAP2, LNPEP and parts of LIX1 (chr5: 96 000 000-96 500 000 in hg18/chr5: 95 974 244-96 474 244 in hg19) and tested the genotype data against the gene-expression data from ILMN_1752145 and ILMN_1743145. To investigate whether the eSNPs showing strong eQTL associations with ERAP2 could influence expression levels of LNPEP, we also performed an eQTL analysis against two probes (ILMN_1798433 and ILMN_1814737) binding to LNPEP in our gene-expression data set. Conservative Bonferroni's correction at the 0.05 level was used when adjusting for multiple testing.
LD, haplotype and conditional analysis LD was calculated in Haploview (https://www.broadinstitute.org/ haploview/haploview) 33 using genotypes from the 42 thymic tissue samples. Because the sample size in this study is limited, we validated the LD values by investigating the correlation between the same SNPs using genotype data from the 1000 Genomes phase 3 CEU data set. The logistic regression analysis was performed using UNPHASED (v 3.1.7). (https://sites.google.com/site/fdudbridge/software/unphased-3-1) 34 Roadmap epigenomics project and HaploReg v4 analysis Functional annotations for the SNPs located within the open chromatin region in ERAP2-LNPEPz according to functional thymic data from Roadmap Epigenome project (http://www.roadmapepigenomics.org/) were analyzed with HaploReg v4 (http://archive.broadinstitute.org/mam mals/haploreg/haploreg.php). 35 Comparison of thymic eQTLs in other tissues and cell-types The seven databases summarized in Supplementary Table S7 were used to search for the strongest eSNPs in ERAP1 (rs7063) and ERAP2 (rs27302), and the six putative functional eSNPs (rs2617447, rs2617435, rs1046395, rs2762, rs1423566 and rs2910788) in eQTL data sets from other tissues.