vignettes/mwe.Rmd
mwe.Rmd
MIRAGE requires summary statistics as input, including the ID for each rare variant. For each variant, it needs the allele counts in both cases and controls, the gene to which it belongs, and the annotation group index it is associated with. The annotation groups should be distinct categories, such as loss-of-function (LoF) variants and missense variants predicted to have deleterious effects.
We include an example input for MIRAGE in our package, which users can reference to format their own data accordingly.
data(mirage_toy)
head(mirage_toy)
# ID Gene No.case No.contr group.index
# 15129 6:161006077-161006077_C_T LPA 298 272 1
# 169 1:11205024-11205024_C_T MTOR 1 0 2
# 184 1:11272449-11272449_G_A MTOR 0 1 2
# 194 1:11290989-11290994_CTGACT_C MTOR 1 0 2
# 239 1:11594465-11594465_C_T PTCHD2 0 1 2
# 356 1:19441304-19441307_CCTT_C UBR4 2 0 2
Each row in the input represents a variant with its summary statistics:
No.case
: Number of times the variant appears in
cases.No.contr
: Number of times the variant appears in
controls.group.index
: Indicates the variant group it belongs
to.Gene
: Specifies the associated gene, if any.Please note that, the analysis is limited to rare variants. So the MAF of all variants used in the analysis should be less than 0.01
In the example below, since the sample size is (4315 + 4315, case + control), users can filter the variants using
mirage_toy[mirage_toy$No.case/(4315 + 4315) < 0.01|mirage_toy$No.contr/(4315 + 4315) < 0.01,]
Variant groups are user-defined, usually based on annotations. For example, a group “rare LoF” could be defined for loss-of-function variants (including stop loss, stop gain, frameshift indels, and splice site substitutions) with a minor allele frequency of less than 1%.
MIRAGE requires the following additional arguments:
n1
: Sample size in cases (required).n2
: Sample size in controls (required).gamma
: A list of category-specific hyper prior shape
parameters in the Beta distribution for effect size, or a numeric value
if all categories share the same effect size. Default value is 3.sigma
: A list of category-specific hyper prior scale
parameters in the Beta distribution for effect size, or a numeric value
if all categories share the same effect size. Default value is 2.eta.init
: Initial value for the prior on the proportion
of risk variants in a variant set. Default value is 0.1.delta.init
: Initial value for the prior on the
proportion of risk genes. Must be a positive number between 0 and 1.
Default value is 0.1.estimate.delta
: Specifies whether delta
should be estimated (TRUE
) or fixed at
delta.init
(FALSE
). Default is
TRUE
.max.iter
: Maximum number of iterations, enforcing the
EM algorithm to stop. Default is 10,000.tol
: Threshold of parameter estimate difference to
determine the convergence of the EM algorithm. Default value is
1e-05.verbose
: When TRUE
, enables detailed
output during computation. Default is TRUE
.Only n1
and n2
are required to run MIRAGE;
other parameters have default values.
res=mirage(mirage_toy,n1=4315,n2=4315)
# Computing gene level LRT statistics and p-values ...
# Computing LRT statistics and p-values by categories ...
The output of MIRAGE is a list containing the following values:
BF.PP.gene
: Bayes factor and posterior probability of
genes.
head(res$BF.PP.gene)
# Gene BF post.prob
# 1 LPA 0.9164196 0.9190893
# 2 MTOR 1.0405812 0.9280489
# 3 PTCHD2 0.8327124 0.9116743
# 4 UBR4 1.2899155 0.9411381
# 5 CSMD2 0.8010560 0.9085032
# 6 NCDN 1.0006285 0.9253904
BF.all
: A list of Bayes factors for all variants within
a gene.
head(res$BF.all[1],n = 5) # variant information in the first gene
# [[1]]
# ID Gene No.case No.contr category
# 15129 6:161006077-161006077_C_T LPA 298 272 1
# 15078 6:160966536-160966536_A_ACC LPA 2 2 2
# 15079 6:160966537-160966539_TTG_T LPA 2 2 2
# 15082 6:160968821-160968822_AT_A LPA 1 0 2
# 15102 6:160969584-160969584_G_C LPA 1 4 2
# 15107 6:160977184-160977185_AC_A LPA 0 1 2
# 15114 6:160978560-160978560_G_A LPA 1 0 2
# 15116 6:160998167-160998167_C_T LPA 1 5 2
# 15142 6:161007480-161007480_C_T LPA 1 0 2
# 15153 6:161010638-161010638_C_T LPA 1 0 2
# 15163 6:161012014-161012014_G_T LPA 0 1 2
# 15171 6:161015102-161015102_C_A LPA 1 0 2
# 15177 6:161016494-161016494_C_CA LPA 1 0 2
# 15180 6:161020531-161020531_C_T LPA 0 1 2
# 15210 6:161071369-161071369_C_T LPA 1 0 2
# 15061 6:160952816-160952816_T_C LPA 118 103 3
# 15068 6:160961137-160961137_T_C LPA 257 248 3
# 15073 6:160963771-160963771_C_A LPA 50 44 4
# 15074 6:160963774-160963774_C_G LPA 21 32 4
# 15075 6:160963774-160963774_C_T LPA 0 2 4
# 15081 6:160966559-160966559_G_A LPA 17 28 4
# 15135 6:161006105-161006105_C_T LPA 88 100 4
# 15148 6:161007619-161007619_G_T LPA 26 20 4
# 15195 6:161022107-161022107_C_T LPA 79 90 4
# category_factor var.BF original.index
# 15129 1 1.634504e-07 1
# 15078 2 6.059431e-01 2
# 15079 2 6.059431e-01 2
# 15082 2 1.452084e+00 2
# 15102 2 1.896752e-01 2
# 15107 2 5.479161e-01 2
# 15114 2 1.452084e+00 2
# 15116 2 1.405388e-01 2
# 15142 2 1.452084e+00 2
# 15153 2 1.452084e+00 2
# 15163 2 5.479161e-01 2
# 15171 2 1.452084e+00 2
# 15177 2 1.452084e+00 2
# 15180 2 5.479161e-01 2
# 15210 2 1.452084e+00 2
# 15061 3 7.596685e-02 3
# 15068 3 3.931896e-08 3
# 15073 4 9.300675e-02 4
# 15074 4 5.040937e-02 4
# 15075 4 3.291740e-01 4
# 15081 4 5.484098e-02 4
# 15135 4 2.698210e-02 4
# 15148 4 2.857868e-01 4
# 15195 4 2.795875e-02 4
delta.est
: Estimate for the proportion of risk
genes.
res$delta.est
# [1] 0.925347
delta.pvalue
: Significance test for delta = 0.
res$delta.pvalue
# [1] 0.9347734
eta.est
: Estimate for the proportion of risk variants
in a variant group.
res$eta.est
# 1 2 3 4 5
# 0.000000e+00 1.030961e-01 4.940656e-324 1.976263e-323 7.871498e-02
eta.pvalue
: Significance test for eta = 0.
res$eta.pvalue
# [1] 1.0000000 0.4206792 1.0000000 1.0000000 0.2782136
Users may use the script below to compute Bayes factor for each variant group within a gene. We here show an example for computing for the first gene in our example data “LPA”.
########## after run MIRAGE, given a gene, calculate the BF of every variant group.
num_category=length(levels(as.factor(mirage_toy[mirage_toy$Gene =="LPA",]$group.index))) # number of variant groups
variant_full_infor=res$BF.all[[1]] # extract results for first gene
variant_group_BF=numeric() # store variant group BF for a given gene
for (j in 1:num_category)
{
if (sum(variant_full_infor$original.index==j)==0) # no variants in this group
variant_group_BF[j]=NA
if (sum(variant_full_infor$original.index==j)>0) # variants exist in this group
variant_group_BF[j]=prod(1-res$eta.est[j]+res$eta.est[j]*
variant_full_infor[which(variant_full_infor$original.index==j),]$var.BF)
# product of BF of variants within that variant group
}
variant_group_BF
# [1] 1.0000000 0.9164266 1.0000000 1.0000000
Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.
sessionInfo()
# R version 4.2.0 (2022-04-22)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)
#
# Matrix products: default
# BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] mirage_0.1.0.0
#
# loaded via a namespace (and not attached):
# [1] knitr_1.39 magrittr_2.0.3 hms_1.1.1 progress_1.2.2
# [5] R6_2.5.1 ragg_1.2.2 rlang_1.1.2 fastmap_1.1.0
# [9] stringr_1.5.1 tools_4.2.0 xfun_0.41 cli_3.6.1
# [13] jquerylib_0.1.4 ellipsis_0.3.2 htmltools_0.5.2 systemfonts_1.0.4
# [17] yaml_2.3.5 digest_0.6.29 lifecycle_1.0.4 crayon_1.5.1
# [21] pkgdown_2.0.7 textshaping_0.3.6 purrr_1.0.2 sass_0.4.1
# [25] vctrs_0.6.5 fs_1.5.2 memoise_2.0.1 glue_1.6.2
# [29] cachem_1.0.6 evaluate_0.15 rmarkdown_2.25 stringi_1.7.6
# [33] compiler_4.2.0 bslib_0.3.1 prettyunits_1.1.1 desc_1.4.3
# [37] jsonlite_1.8.0 pkgconfig_2.0.3