Tutorial for enrichment analysis

In this tutorial, we perform enrichment analysis (using TORUS) using GWAS summary statistics and functional annotations.

Load the packages.

library(mapgen)

Please see the data preparation tutorial about preparing GWAS summary statistics. The result from the enrichment analysis will be used in the fine-mapping tutorial.

Enrichment analysis

We perform enrichment analysis using the software TORUS.

TORUS needs a SNP annotation file, which contains SNP-level genomic annotations. The annotation file uses a header to specify the number and the nature (categorical or continuous) of the anntations. For example,

SNP   binding_d
chr1.51479  0
chr1.52058  2
chr1.52238  1

The first column with the header “SNP” represents the SNP name. The following columns represent specific annotations. For categorical/discrete annotations, the header should alwasy have a suffix "_d“; whereas for continuous annotations, the header should ends with”_c".

For binary annotations, we could use the function prepare_torus_input_files() to create a SNP annotation file and a z-score file as TORUS inputs. The function takes the processed GWAS summary statistics obtained from the data preparation tutorial, and binary annotation files. The annotation files should be in .bed format, and contain the columns “chr”, “start”, “end”. Chromosomes should be just the numbers (no “chr”).

annotation_bed_files <- list.files(path = system.file('extdata', 'test_bed_dir', package='mapgen'),
                                   pattern = '*.bed', full.names = TRUE)

# annotation_bed_files: Path to the annotation (.bed) files.
gwas.sumstats <- readRDS(system.file('extdata', 'test.processed.sumstats.rds', package='mapgen'))
torus.files <- prepare_torus_input_files(gwas.sumstats, annotation_bed_files, torus_input_dir = './torus_input')
torus_annot_file <- torus.files$torus_annot_file
torus_zscore_file <- torus.files$torus_zscore_file

If you use continuous annotations or annotations with multiple categories, you would need to create your own annotation files.

Now that the appropriate files have been generated, let’s perform the enrichment analysis using TORUS.

run_torus() with option = "est-prior" returns a list with: enrichment estimates (log odds ratio) and 95% confidence intervals for each annotation category, as well as SNP-level priors (which could be used as priors in fine-mapping).

torus.res <- run_torus(torus_annot_file, 
                       torus_zscore_file,
                       option = 'est-prior',
                       torus_path = 'torus') # set the path to 'torus' executable.
torus.enrich <- torus.res$enrich
torus.prior <- torus.res$snp_prior

If you only want to estimate enrichment but do not need SNP-level priors, you can set option = "est" to save time computing the priors.

torus.enrich <- run_torus(torus_annot_file, 
                          torus_zscore_file,
                          option = 'est',
                          torus_path = 'torus')$enrich

TORUS also gives us the uncertainty of whether each locus contains a causal variant or not. You can set option = "fdr" to get the FDR associated with each locus.

torus.fdr <- run_torus(torus_annot_file, 
                       torus_zscore_file,
                       option = 'fdr',
                       torus_path = 'torus')$fdr

Kaixuan Luo, Alan Selewa

Enrichment analysis