Huang et al 2017 Nature
Overview
Inflammatory bowel disease (IBD)
{ width=60% }
- Ulcerative colitis (UC): colon inflammation & ulcers
- Crohn’s disease (CD): colon + ileum inflammation
Contributions of this study
The pace of identifying associated loci outstrips that of defining specific molecular mechanisms and extracting biological insight from associations
- 200 IBD loci identified, few resolved to functional variants
- This study: 94 loci evaluated, 45 variants resolved
- Approach: genetic «fine-mapping» + detailed annotations
- Dense genotyping array of Europeans
- 3 fine-mapping methods
- Enrichment in regulatory regions
- 3 eQTL colocalization methods
###
{ width=90% }
Fine-mapping: methods and results
- Model selection with BIC
- Bayesian single effect model & conditional regression
- Bayesian LASSO
Method 1
Step 1: model selection of independent signals
- BIC-based selection
- Penalized likelihood across traits and variants
- Greedy search algorithm (steepest descent)
- A ``swap-out repositioning” adjustment
Method 1
Step 2: construction of confident sets (CS)
- 95% CS that contains exactly one signal
- Conditional CS in the presence of multiple signals
- Potentially multiple CS generated per region
Not accounting for uncertainty of alternative signals
Method 2 (Based on Jostins & McVean 2016)
Step 1: multivariate regression with correlated normal prior
- Assuming all loci are shared between traits
- Model and inference: a special case of eQTLBMA
- No model averaging
- BF and CS assuming single effect
Method 2 (Based on conditional regression)
Step 2: mapping secondary signals
- Conditional regression followed by Step 1
- Stopping criteria: number of signals by method 1
Method 3 (Fang & Georges 2016 biorxiv)
Step 1: univariate, multiple regression model
- Multiple causal variants jointly using a LASSO prior
- Sample from posterior rather than computing it
- Sample $y$ from liability threshold model
- Sample configuration restricted to clusters (MH)
- Sample parameters and hyperparameters (Gibbs)
- Get CS from posterior samples for each cluster
SNP clusters are created beforehand via hierachical clustering on LD. A correlation threshold has to be specified to define clusters.
Method 3
Step 2: cross trait CS construction
- Simply merge overlapping clusters
Consolidating 3 methods
Fine-mapping summary
- 97 loci analyzed, 94 have «concensus» CS
- Size of CS: from 1 (fine) to >400 (not so fine)
- 68/94 regions have single variant
- 18 single causal variants with >95% certainty
- 13 are novel
- 27 single causal variants with >50% certainty
- Most associated with both UC and CD
- Number of candidate genes reduced from 669 to 233
Fine-mapping summary
Proportion of variance explained (PVE)
- CD: 25% PVE by fine-mapping out of 28% by all loci
- UC: 17% out of 22%
Importance of secondary and tertiary associations
{ height=60% }
Enrichment and colocalization: methods and results
Protein coding variants: an «assurance»
- 18-fold enrichment in non-synonymous
- Only one variant with >50% certainty is synonymous
- All previously reported IBD risk coding variants are included with >95% certainty
Enrichment in TF families
- CD signals overlap more with immune cell peaks
- UC signals overlap more with gut peaks
Co-localization method 1: naive overlap
Background overlap rate: $$\rho_i = 1 - (1 - \frac{1}{N_i})^{C_i}$$
where $N_i$ is number of variants in region $i$, $C_i$ of which are in IDB CS. Expected background overlap is $\sum_i \rho_i$.
Result of method 1
- Non-significant overlap of top PBMC eQTLs in CS identified
- in neither of the 2 different studies investigated
- Overlap of eQTLs for CD4+ cells and ileum cells observed
Co-localization method 2: conditional p-value
Nica et al 2010 PLoS Genet
$$\delta = -\log(P{eQTL}) - [-\log(P{eQTL|GWAS})]$$
- Empirical p-value for $\delta$ by comparing to dropping other variants
Result of Method 2
Co-localization method 3: Model comparisons
Uses summary statistics and Baysian «configuration» models
{ height=50% }
Result of Method 3
- Additionally found moderate enrichment in colon
- p-value 0.04 compared to 0.216 via method 2
Final remarks
Conclusions
- Significant enrichment in coding region and strong immune signature
- Moderate, cell/tissue specific eQTL colocalization
- 21 non-coding variants in IBD CS not interpreted
Discussions
- Direct genotyping of (common) causal variant is essential
- Improve fine-mapping resolution in other ethnicities
- but not enough data to pursue
- but not enough data to pursue
Criticisms??