Installation

devtools::install_github('xinhe-lab/mirage')

Load the package

Preparing input data

Summary statistics for each rare variant.

MIRAGE requires summary statistics as input, including the ID for each rare variant. For each variant, it needs the allele counts in both cases and controls, the gene to which it belongs, and the annotation group index it is associated with. The annotation groups should be distinct categories, such as loss-of-function (LoF) variants and missense variants predicted to have deleterious effects.

We include an example input for MIRAGE in our package, which users can reference to format their own data accordingly.

data(mirage_toy)
head(mirage_toy)
#                                 ID   Gene No.case No.contr group.index
# 15129    6:161006077-161006077_C_T    LPA     298      272           1
# 169        1:11205024-11205024_C_T   MTOR       1        0           2
# 184        1:11272449-11272449_G_A   MTOR       0        1           2
# 194   1:11290989-11290994_CTGACT_C   MTOR       1        0           2
# 239        1:11594465-11594465_C_T PTCHD2       0        1           2
# 356     1:19441304-19441307_CCTT_C   UBR4       2        0           2

Each row in the input represents a variant with its summary statistics:

  • No.case: Number of times the variant appears in cases.
  • No.contr: Number of times the variant appears in controls.
  • group.index: Indicates the variant group it belongs to.
  • Gene: Specifies the associated gene, if any.

Please note that, the analysis is limited to rare variants. So the MAF of all variants used in the analysis should be less than 0.01

In the example below, since the sample size is (4315 + 4315, case + control), users can filter the variants using

mirage_toy[mirage_toy$No.case/(4315 + 4315) < 0.01|mirage_toy$No.contr/(4315 + 4315) < 0.01,]

Variant groups are user-defined, usually based on annotations. For example, a group “rare LoF” could be defined for loss-of-function variants (including stop loss, stop gain, frameshift indels, and splice site substitutions) with a minor allele frequency of less than 1%.

Additional arguments

MIRAGE requires the following additional arguments:

  • n1: Sample size in cases (required).
  • n2: Sample size in controls (required).
  • gamma: A list of category-specific hyper prior shape parameters in the Beta distribution for effect size, or a numeric value if all categories share the same effect size. Default value is 3.
  • sigma: A list of category-specific hyper prior scale parameters in the Beta distribution for effect size, or a numeric value if all categories share the same effect size. Default value is 2.
  • eta.init: Initial value for the prior on the proportion of risk variants in a variant set. Default value is 0.1.
  • delta.init: Initial value for the prior on the proportion of risk genes. Must be a positive number between 0 and 1. Default value is 0.1.
  • estimate.delta: Specifies whether delta should be estimated (TRUE) or fixed at delta.init (FALSE). Default is TRUE.
  • max.iter: Maximum number of iterations, enforcing the EM algorithm to stop. Default is 10,000.
  • tol: Threshold of parameter estimate difference to determine the convergence of the EM algorithm. Default value is 1e-05.
  • verbose: When TRUE, enables detailed output during computation. Default is TRUE.

Only n1 and n2 are required to run MIRAGE; other parameters have default values.

Running MIRAGE on toy data

res=mirage(mirage_toy,n1=4315,n2=4315)
# Computing gene level LRT statistics and p-values ...
# Computing LRT statistics and p-values by categories ...

The output of MIRAGE is a list containing the following values:

  • BF.PP.gene: Bayes factor and posterior probability of genes.
head(res$BF.PP.gene)
#     Gene        BF post.prob
# 1    LPA 0.9164196 0.9190893
# 2   MTOR 1.0405812 0.9280489
# 3 PTCHD2 0.8327124 0.9116743
# 4   UBR4 1.2899155 0.9411381
# 5  CSMD2 0.8010560 0.9085032
# 6   NCDN 1.0006285 0.9253904
  • BF.all: A list of Bayes factors for all variants within a gene.
head(res$BF.all[1],n = 5) # variant information in the first gene 
# [[1]]
#                                ID Gene No.case No.contr category
# 15129   6:161006077-161006077_C_T  LPA     298      272        1
# 15078 6:160966536-160966536_A_ACC  LPA       2        2        2
# 15079 6:160966537-160966539_TTG_T  LPA       2        2        2
# 15082  6:160968821-160968822_AT_A  LPA       1        0        2
# 15102   6:160969584-160969584_G_C  LPA       1        4        2
# 15107  6:160977184-160977185_AC_A  LPA       0        1        2
# 15114   6:160978560-160978560_G_A  LPA       1        0        2
# 15116   6:160998167-160998167_C_T  LPA       1        5        2
# 15142   6:161007480-161007480_C_T  LPA       1        0        2
# 15153   6:161010638-161010638_C_T  LPA       1        0        2
# 15163   6:161012014-161012014_G_T  LPA       0        1        2
# 15171   6:161015102-161015102_C_A  LPA       1        0        2
# 15177  6:161016494-161016494_C_CA  LPA       1        0        2
# 15180   6:161020531-161020531_C_T  LPA       0        1        2
# 15210   6:161071369-161071369_C_T  LPA       1        0        2
# 15061   6:160952816-160952816_T_C  LPA     118      103        3
# 15068   6:160961137-160961137_T_C  LPA     257      248        3
# 15073   6:160963771-160963771_C_A  LPA      50       44        4
# 15074   6:160963774-160963774_C_G  LPA      21       32        4
# 15075   6:160963774-160963774_C_T  LPA       0        2        4
# 15081   6:160966559-160966559_G_A  LPA      17       28        4
# 15135   6:161006105-161006105_C_T  LPA      88      100        4
# 15148   6:161007619-161007619_G_T  LPA      26       20        4
# 15195   6:161022107-161022107_C_T  LPA      79       90        4
#       category_factor       var.BF original.index
# 15129               1 1.634504e-07              1
# 15078               2 6.059431e-01              2
# 15079               2 6.059431e-01              2
# 15082               2 1.452084e+00              2
# 15102               2 1.896752e-01              2
# 15107               2 5.479161e-01              2
# 15114               2 1.452084e+00              2
# 15116               2 1.405388e-01              2
# 15142               2 1.452084e+00              2
# 15153               2 1.452084e+00              2
# 15163               2 5.479161e-01              2
# 15171               2 1.452084e+00              2
# 15177               2 1.452084e+00              2
# 15180               2 5.479161e-01              2
# 15210               2 1.452084e+00              2
# 15061               3 7.596685e-02              3
# 15068               3 3.931896e-08              3
# 15073               4 9.300675e-02              4
# 15074               4 5.040937e-02              4
# 15075               4 3.291740e-01              4
# 15081               4 5.484098e-02              4
# 15135               4 2.698210e-02              4
# 15148               4 2.857868e-01              4
# 15195               4 2.795875e-02              4
  • delta.est: Estimate for the proportion of risk genes.
res$delta.est
# [1] 0.925347
  • delta.pvalue: Significance test for delta = 0.
res$delta.pvalue
# [1] 0.9347734
  • eta.est: Estimate for the proportion of risk variants in a variant group.
res$eta.est
#             1             2             3             4             5 
#  0.000000e+00  1.030961e-01 4.940656e-324 1.976263e-323  7.871498e-02
  • eta.pvalue: Significance test for eta = 0.
res$eta.pvalue
# [1] 1.0000000 0.4206792 1.0000000 1.0000000 0.2782136

Calculating Bayes factor for each variant group within a gene

Users may use the script below to compute Bayes factor for each variant group within a gene. We here show an example for computing for the first gene in our example data “LPA”.

########## after run MIRAGE,  given a gene, calculate the BF of every variant group.  

num_category=length(levels(as.factor(mirage_toy[mirage_toy$Gene =="LPA",]$group.index))) # number of variant groups 
variant_full_infor=res$BF.all[[1]]  # extract results for first gene 
variant_group_BF=numeric() # store variant group BF for a given gene 
for (j in 1:num_category)
{
  if (sum(variant_full_infor$original.index==j)==0) # no variants in this group
    variant_group_BF[j]=NA
  
  if (sum(variant_full_infor$original.index==j)>0) # variants exist in this group
    variant_group_BF[j]=prod(1-res$eta.est[j]+res$eta.est[j]*
    variant_full_infor[which(variant_full_infor$original.index==j),]$var.BF) 
  # product of BF of variants within that variant group 
   
}
variant_group_BF
# [1] 1.0000000 0.9164266 1.0000000 1.0000000

Session information

Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.

sessionInfo()
# R version 4.2.0 (2022-04-22)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)
# 
# Matrix products: default
# BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] mirage_0.1.0.0
# 
# loaded via a namespace (and not attached):
#  [1] knitr_1.39        magrittr_2.0.3    hms_1.1.1         progress_1.2.2   
#  [5] R6_2.5.1          ragg_1.2.2        rlang_1.1.2       fastmap_1.1.0    
#  [9] stringr_1.5.1     tools_4.2.0       xfun_0.41         cli_3.6.1        
# [13] jquerylib_0.1.4   ellipsis_0.3.2    htmltools_0.5.2   systemfonts_1.0.4
# [17] yaml_2.3.5        digest_0.6.29     lifecycle_1.0.4   crayon_1.5.1     
# [21] pkgdown_2.0.7     textshaping_0.3.6 purrr_1.0.2       sass_0.4.1       
# [25] vctrs_0.6.5       fs_1.5.2          memoise_2.0.1     glue_1.6.2       
# [29] cachem_1.0.6      evaluate_0.15     rmarkdown_2.25    stringi_1.7.6    
# [33] compiler_4.2.0    bslib_0.3.1       prettyunits_1.1.1 desc_1.4.3       
# [37] jsonlite_1.8.0    pkgconfig_2.0.3