Fine-mapping inflammatory bowel disease loci to single-variant resolution

· by Gao Wang · Read in about 4 min · (648 words) ·

Huang et al 2017 Nature


Inflammatory bowel disease (IBD)

Digestive system{ width=60% }

  • Ulcerative colitis (UC): colon inflammation & ulcers
  • Crohn’s disease (CD): colon + ileum inflammation

Contributions of this study

The pace of identifying associated loci outstrips that of defining specific molecular mechanisms and extracting biological insight from associations

  • 200 IBD loci identified, few resolved to functional variants
  • This study: 94 loci evaluated, 45 variants resolved
  • Approach: genetic «fine-mapping» + detailed annotations
    • Dense genotyping array of Europeans
    • 3 fine-mapping methods
    • Enrichment in regulatory regions
    • 3 eQTL colocalization methods


Nature volume 547, pages 173–178 (13 July 2017){ width=90% }

Fine-mapping: methods and results

Three fine-mapping methods

  1. Model selection with BIC
  2. Bayesian single effect model & conditional regression
  3. Bayesian LASSO

Method 1

Step 1: model selection of independent signals

  • BIC-based selection
  • Penalized likelihood across traits and variants
  • Greedy search algorithm (steepest descent)
  • A ``swap-out repositioning” adjustment

Method 1

Step 2: construction of confident sets (CS)

  • 95% CS that contains exactly one signal
  • Conditional CS in the presence of multiple signals
  • Potentially multiple CS generated per region

Not accounting for uncertainty of alternative signals

Method 2 (Based on Jostins & McVean 2016)

Step 1: multivariate regression with correlated normal prior

  • Assuming all loci are shared between traits
  • Model and inference: a special case of eQTLBMA
    • No model averaging
  • BF and CS assuming single effect

Method 2 (Based on conditional regression)

Step 2: mapping secondary signals

  • Conditional regression followed by Step 1
  • Stopping criteria: number of signals by method 1

Method 3 (Fang & Georges 2016 biorxiv)

Step 1: univariate, multiple regression model

  • Multiple causal variants jointly using a LASSO prior
  • Sample from posterior rather than computing it
    • Sample $y$ from liability threshold model
    • Sample configuration restricted to clusters (MH)
    • Sample parameters and hyperparameters (Gibbs)
  • Get CS from posterior samples for each cluster

SNP clusters are created beforehand via hierachical clustering on LD. A correlation threshold has to be specified to define clusters.

Method 3

Step 2: cross trait CS construction

  • Simply merge overlapping clusters

Consolidating 3 methods

Fine-mapping procedure and output using the SMAD3 region as an example

Fine-mapping summary

  1. 97 loci analyzed, 94 have «concensus» CS
  2. Size of CS: from 1 (fine) to >400 (not so fine)
    • 68/94 regions have single variant
  3. 18 single causal variants with >95% certainty
    • 13 are novel
  4. 27 single causal variants with >50% certainty
  5. Most associated with both UC and CD
  6. Number of candidate genes reduced from 669 to 233

Fine-mapping summary

Summary of fine-mapped associations

Proportion of variance explained (PVE)

  • CD: 25% PVE by fine-mapping out of 28% by all loci
  • UC: 17% out of 22%

Importance of secondary and tertiary associations

Relative variance explained{ height=60% }

Enrichment and colocalization: methods and results

Protein coding variants: an «assurance»

  • 18-fold enrichment in non-synonymous
  • Only one variant with >50% certainty is synonymous
  • All previously reported IBD risk coding variants are included with >95% certainty

Enrichment in TF families

 Functional annotation of causal variants

  • CD signals overlap more with immune cell peaks
  • UC signals overlap more with gut peaks

Co-localization method 1: naive overlap

Background overlap rate: $$\rho_i = 1 - (1 - \frac{1}{N_i})^{C_i}$$

where $N_i$ is number of variants in region $i$, $C_i$ of which are in IDB CS. Expected background overlap is $\sum_i \rho_i$.

Result of method 1

  • Non-significant overlap of top PBMC eQTLs in CS identified
    • in neither of the 2 different studies investigated
  • Overlap of eQTLs for CD4+ cells and ileum cells observed

Co-localization method 2: conditional p-value

Nica et al 2010 PLoS Genet

$$\delta = -\log(P{eQTL}) - [-\log(P{eQTL|GWAS})]$$

  • Empirical p-value for $\delta$ by comparing to dropping other variants

Result of Method 2

 Number of credible sets that colocalize eQTLs

Co-localization method 3: Model comparisons

Uses summary statistics and Baysian «configuration» models

Giambartolomei et al 2013 PLoS Genet{ height=50% }

Result of Method 3

  • Additionally found moderate enrichment in colon
    • p-value 0.04 compared to 0.216 via method 2

Final remarks


  • Significant enrichment in coding region and strong immune signature
  • Moderate, cell/tissue specific eQTL colocalization
  • 21 non-coding variants in IBD CS not interpreted


  • Direct genotyping of (common) causal variant is essential
  • Improve fine-mapping resolution in other ethnicities
    • but not enough data to pursue