Installation

devtools::install_github('xinhe-lab/mirage')

Load the package

Preparing input data

Summary statistics for each rare variant.

MIRAGE requires summary statistics as input, including the ID for each rare variant. For each variant, it needs the allele counts in both cases and controls, the gene to which it belongs, and the annotation group index it is associated with. The annotation groups should be distinct categories, such as loss-of-function (LoF) variants and missense variants predicted to have deleterious effects.

We include an example input for MIRAGE in our package, which users can reference to format their own data accordingly.

data(mirage_toy)
head(mirage_toy)
#                                 ID   Gene No.case No.contr group.index
# 15129    6:161006077-161006077_C_T    LPA     298      272           1
# 169        1:11205024-11205024_C_T   MTOR       1        0           2
# 184        1:11272449-11272449_G_A   MTOR       0        1           2
# 194   1:11290989-11290994_CTGACT_C   MTOR       1        0           2
# 239        1:11594465-11594465_C_T PTCHD2       0        1           2
# 356     1:19441304-19441307_CCTT_C   UBR4       2        0           2

Each row in the input represents a variant with its summary statistics:

  • ID: variant ID
  • No.case: Number of times the variant appears in cases.
  • No.contr: Number of times the variant appears in controls.
  • group.index: Indicates the variant group it belongs to.
  • Gene: Specifies the associated gene, if any.

Please note that, the analysis is limited to rare variants. So the MAF (minor allele frequency) of all variants used in the analysis should be less than 0.01

In the example below, since the sample size is (4315 + 4315, case + control), users can filter the variants using

mirage_toy[mirage_toy$No.case/(4315 + 4315) < 0.01|mirage_toy$No.contr/(4315 + 4315) < 0.01,]

Variant groups are user-defined, usually based on annotations. For example, a group “rare LoF” could be defined for loss-of-function variants (including stop loss, stop gain, frameshift indels, and splice site substitutions) with a minor allele frequency of less than 1%.

Additional arguments

MIRAGE requires the following additional arguments:

  • n1: Sample size in cases (required).
  • n2: Sample size in controls (required).
  • gamma: A list of category-specific hyper prior shape parameters in the Beta distribution for effect size, or a numeric value if all categories share the same effect size. Default value is 3.
  • sigma: A list of category-specific hyper prior scale parameters in the Beta distribution for effect size, or a numeric value if all categories share the same effect size. Default value is 2.
  • eta.init: Initial value for the prior on the proportion of risk variants in a variant set. Default value is 0.1.
  • delta.init: Initial value for the prior on the proportion of risk genes. Must be a positive number between 0 and 1. Default value is 0.1.
  • estimate.delta: Specifies whether delta should be estimated (TRUE) or fixed at delta.init (FALSE). Default is TRUE.
  • estimate.eta Specifies whether eta should be estimated (TRUE) or fixed at fixed.eta. Default is TRUE.
  • fixed.eta Must be provided when estimate.eta is false and BF per gene will be reported. Its dimension must match the number of variant groups.
  • max.iter: Maximum number of iterations, enforcing the EM algorithm to stop. Default is 10,000.
  • tol: Threshold of parameter estimate difference to determine the convergence of the EM algorithm. Default value is 10510^{-5}.
  • verbose: When TRUE, enables detailed output during computation. Default is TRUE.

Only n1 and n2 are required to run MIRAGE; other parameters have default values.

Running MIRAGE on toy data

res=mirage(mirage_toy,n1=4315,n2=4315)
# Computing gene level LRT statistics and p-values ...
# Computing LRT statistics and p-values by categories ...

The output of MIRAGE is a list containing the following values:

  • BF.PP.gene: Bayes factor and posterior probability of genes.
head(res$BF.PP.gene)
#     Gene        BF post.prob
# 1    LPA 0.9164196 0.9190893
# 2   MTOR 1.0405812 0.9280489
# 3 PTCHD2 0.8327124 0.9116743
# 4   UBR4 1.2899155 0.9411381
# 5  CSMD2 0.8010560 0.9085032
# 6   NCDN 1.0006285 0.9253904
  • BF.all: A list of Bayes factors for all variants within a gene.
head(res$BF.all[1],n = 5) # variant information in the first gene 
# [[1]]
#                                ID Gene No.case No.contr category
# 15129   6:161006077-161006077_C_T  LPA     298      272        1
# 15078 6:160966536-160966536_A_ACC  LPA       2        2        2
# 15079 6:160966537-160966539_TTG_T  LPA       2        2        2
# 15082  6:160968821-160968822_AT_A  LPA       1        0        2
# 15102   6:160969584-160969584_G_C  LPA       1        4        2
# 15107  6:160977184-160977185_AC_A  LPA       0        1        2
# 15114   6:160978560-160978560_G_A  LPA       1        0        2
# 15116   6:160998167-160998167_C_T  LPA       1        5        2
# 15142   6:161007480-161007480_C_T  LPA       1        0        2
# 15153   6:161010638-161010638_C_T  LPA       1        0        2
# 15163   6:161012014-161012014_G_T  LPA       0        1        2
# 15171   6:161015102-161015102_C_A  LPA       1        0        2
# 15177  6:161016494-161016494_C_CA  LPA       1        0        2
# 15180   6:161020531-161020531_C_T  LPA       0        1        2
# 15210   6:161071369-161071369_C_T  LPA       1        0        2
# 15061   6:160952816-160952816_T_C  LPA     118      103        3
# 15068   6:160961137-160961137_T_C  LPA     257      248        3
# 15073   6:160963771-160963771_C_A  LPA      50       44        4
# 15074   6:160963774-160963774_C_G  LPA      21       32        4
# 15075   6:160963774-160963774_C_T  LPA       0        2        4
# 15081   6:160966559-160966559_G_A  LPA      17       28        4
# 15135   6:161006105-161006105_C_T  LPA      88      100        4
# 15148   6:161007619-161007619_G_T  LPA      26       20        4
# 15195   6:161022107-161022107_C_T  LPA      79       90        4
#       category_factor       var.BF original.index
# 15129               1 1.634504e-07              1
# 15078               2 6.059431e-01              2
# 15079               2 6.059431e-01              2
# 15082               2 1.452084e+00              2
# 15102               2 1.896752e-01              2
# 15107               2 5.479161e-01              2
# 15114               2 1.452084e+00              2
# 15116               2 1.405388e-01              2
# 15142               2 1.452084e+00              2
# 15153               2 1.452084e+00              2
# 15163               2 5.479161e-01              2
# 15171               2 1.452084e+00              2
# 15177               2 1.452084e+00              2
# 15180               2 5.479161e-01              2
# 15210               2 1.452084e+00              2
# 15061               3 7.596685e-02              3
# 15068               3 3.931896e-08              3
# 15073               4 9.300675e-02              4
# 15074               4 5.040937e-02              4
# 15075               4 3.291740e-01              4
# 15081               4 5.484098e-02              4
# 15135               4 2.698210e-02              4
# 15148               4 2.857868e-01              4
# 15195               4 2.795875e-02              4
  • delta.est: Estimate for the proportion of risk genes and p value.
res$delta.est
#   delta.est delta.pvalue
# 1  0.925347    0.9347734
  • eta.est: Estimate for the proportion of risk variants in each variant group and p values.
res$eta.est
#         eta.est eta.pvalue
# 1  0.000000e+00  1.0000000
# 2  1.030961e-01  0.4206792
# 3 4.940656e-324  1.0000000
# 4 1.976263e-323  1.0000000
# 5  7.871498e-02  0.2782136

Calculating Bayes factor for each variant group within a gene

Users may use the script below to compute Bayes factor for each variant group within a gene. We here show an example for computing for the first gene in our example data “LPA”.

########## after run MIRAGE,  given a gene, calculate the BF of every variant group.
gene_name <- "LPA"

# Number of variant groups for this gene
num_category <- nlevels(factor(mirage_toy$group.index[mirage_toy$Gene == gene_name]))

# Extract variant-level results for this gene
variant_full_infor <- res$BF.all[[which(res$BF.PP.gene$Gene == gene_name)]]

# Compute group-level BF
variant_group_BF <- sapply(seq_len(num_category), function(j) {
  idx <- which(variant_full_infor$original.index == j)
  if (length(idx) == 0) {
    return(NA_real_)
  } else {
    return(prod(1 - res$eta.est$eta.est[j] + res$eta.est$eta.est[j] * variant_full_infor$var.BF[idx]))
  }
})

variant_group_BF
# [1] 1.0000000 0.9164266 1.0000000 1.0000000

Calculating Bayes factor with fixed eta

res_fixed_eta=mirage(mirage_toy,n1=4315,n2=4315, estimate.eta = F, fixed.eta = c(0.60, 0.31, 0.16, 0.4, 0.2))

The output of MIRAGE is a list containing the following values:

  • BF.gene: a matrix of gene with BF’s
head(res_fixed_eta$BF.gene)
#     Gene          BF
# 1    LPA 0.009253237
# 2   MTOR 0.984853873
# 3 PTCHD2 0.585846104
# 4   UBR4 1.335284217
# 5  CSMD2 0.077841990
# 6   NCDN 1.023760609
  • BF.all: A list of Bayes factors for all variants within a gene.
head(res_fixed_eta$BF.all[1], n=5) # variant information in the first gene 
# [[1]]
#                                ID Gene No.case No.contr category
# 15129   6:161006077-161006077_C_T  LPA     298      272        1
# 15078 6:160966536-160966536_A_ACC  LPA       2        2        2
# 15079 6:160966537-160966539_TTG_T  LPA       2        2        2
# 15082  6:160968821-160968822_AT_A  LPA       1        0        2
# 15102   6:160969584-160969584_G_C  LPA       1        4        2
# 15107  6:160977184-160977185_AC_A  LPA       0        1        2
# 15114   6:160978560-160978560_G_A  LPA       1        0        2
# 15116   6:160998167-160998167_C_T  LPA       1        5        2
# 15142   6:161007480-161007480_C_T  LPA       1        0        2
# 15153   6:161010638-161010638_C_T  LPA       1        0        2
# 15163   6:161012014-161012014_G_T  LPA       0        1        2
# 15171   6:161015102-161015102_C_A  LPA       1        0        2
# 15177  6:161016494-161016494_C_CA  LPA       1        0        2
# 15180   6:161020531-161020531_C_T  LPA       0        1        2
# 15210   6:161071369-161071369_C_T  LPA       1        0        2
# 15061   6:160952816-160952816_T_C  LPA     118      103        3
# 15068   6:160961137-160961137_T_C  LPA     257      248        3
# 15073   6:160963771-160963771_C_A  LPA      50       44        4
# 15074   6:160963774-160963774_C_G  LPA      21       32        4
# 15075   6:160963774-160963774_C_T  LPA       0        2        4
# 15081   6:160966559-160966559_G_A  LPA      17       28        4
# 15135   6:161006105-161006105_C_T  LPA      88      100        4
# 15148   6:161007619-161007619_G_T  LPA      26       20        4
# 15195   6:161022107-161022107_C_T  LPA      79       90        4
#       category_factor       var.BF original.index
# 15129               1 1.634504e-07              1
# 15078               2 6.059431e-01              2
# 15079               2 6.059431e-01              2
# 15082               2 1.452084e+00              2
# 15102               2 1.896752e-01              2
# 15107               2 5.479161e-01              2
# 15114               2 1.452084e+00              2
# 15116               2 1.405388e-01              2
# 15142               2 1.452084e+00              2
# 15153               2 1.452084e+00              2
# 15163               2 5.479161e-01              2
# 15171               2 1.452084e+00              2
# 15177               2 1.452084e+00              2
# 15180               2 5.479161e-01              2
# 15210               2 1.452084e+00              2
# 15061               3 7.596685e-02              3
# 15068               3 3.931896e-08              3
# 15073               4 9.300675e-02              4
# 15074               4 5.040937e-02              4
# 15075               4 3.291740e-01              4
# 15081               4 5.484098e-02              4
# 15135               4 2.698210e-02              4
# 15148               4 2.857868e-01              4
# 15195               4 2.795875e-02              4

Session information

Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.

sessionInfo()
# R version 4.3.2 (2023-10-31 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 11 x64 (build 26100)
# 
# Matrix products: default
# 
# 
# locale:
# [1] LC_COLLATE=English_United States.utf8 
# [2] LC_CTYPE=English_United States.utf8   
# [3] LC_MONETARY=English_United States.utf8
# [4] LC_NUMERIC=C                          
# [5] LC_TIME=English_United States.utf8    
# 
# time zone: America/Chicago
# tzcode source: internal
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] mirage_0.1.0.0
# 
# loaded via a namespace (and not attached):
#  [1] vctrs_0.6.5       crayon_1.5.3      cli_3.6.2         knitr_1.49       
#  [5] rlang_1.1.2       xfun_0.50.6       textshaping_1.0.0 jsonlite_1.8.9   
#  [9] prettyunits_1.2.0 htmltools_0.5.8.1 ragg_1.3.3        sass_0.4.9       
# [13] hms_1.1.3         rmarkdown_2.29    evaluate_1.0.3    jquerylib_0.1.4  
# [17] fastmap_1.2.0     progress_1.2.3    yaml_2.3.8        lifecycle_1.0.4  
# [21] compiler_4.3.2    codetools_0.2-19  fs_1.6.5          pkgconfig_2.0.3  
# [25] htmlwidgets_1.6.4 rstudioapi_0.17.1 systemfonts_1.2.1 digest_0.6.33    
# [29] R6_2.6.1          bslib_0.9.0       tools_4.3.2       pkgdown_2.1.1    
# [33] cachem_1.1.0      desc_1.4.3