vignettes/mwe.Rmd
mwe.Rmd
MIRAGE requires summary statistics as input, including the ID for each rare variant. For each variant, it needs the allele counts in both cases and controls, the gene to which it belongs, and the annotation group index it is associated with. The annotation groups should be distinct categories, such as loss-of-function (LoF) variants and missense variants predicted to have deleterious effects.
We include an example input for MIRAGE in our package, which users can reference to format their own data accordingly.
data(mirage_toy)
head(mirage_toy)
# ID Gene No.case No.contr group.index
# 15129 6:161006077-161006077_C_T LPA 298 272 1
# 169 1:11205024-11205024_C_T MTOR 1 0 2
# 184 1:11272449-11272449_G_A MTOR 0 1 2
# 194 1:11290989-11290994_CTGACT_C MTOR 1 0 2
# 239 1:11594465-11594465_C_T PTCHD2 0 1 2
# 356 1:19441304-19441307_CCTT_C UBR4 2 0 2
Each row in the input represents a variant with its summary statistics:
ID
: variant IDNo.case
: Number of times the variant appears in
cases.No.contr
: Number of times the variant appears in
controls.group.index
: Indicates the variant group it belongs
to.Gene
: Specifies the associated gene, if any.Please note that, the analysis is limited to rare variants. So the MAF (minor allele frequency) of all variants used in the analysis should be less than 0.01
In the example below, since the sample size is (4315 + 4315, case + control), users can filter the variants using
mirage_toy[mirage_toy$No.case/(4315 + 4315) < 0.01|mirage_toy$No.contr/(4315 + 4315) < 0.01,]
Variant groups are user-defined, usually based on annotations. For example, a group “rare LoF” could be defined for loss-of-function variants (including stop loss, stop gain, frameshift indels, and splice site substitutions) with a minor allele frequency of less than 1%.
MIRAGE requires the following additional arguments:
n1
: Sample size in cases (required).n2
: Sample size in controls (required).gamma
: A list of category-specific hyper prior shape
parameters in the Beta distribution for effect size, or a numeric value
if all categories share the same effect size. Default value is 3.sigma
: A list of category-specific hyper prior scale
parameters in the Beta distribution for effect size, or a numeric value
if all categories share the same effect size. Default value is 2.eta.init
: Initial value for the prior on the proportion
of risk variants in a variant set. Default value is 0.1.delta.init
: Initial value for the prior on the
proportion of risk genes. Must be a positive number between 0 and 1.
Default value is 0.1.estimate.delta
: Specifies whether delta
should be estimated (TRUE
) or fixed at
delta.init
(FALSE
). Default is
TRUE
.estimate.eta
Specifies whether eta
should
be estimated (TRUE
) or fixed at fixed.eta
.
Default is TRUE
.fixed.eta
Must be provided when
estimate.eta
is false and BF per gene will be reported.
Its dimension must match the number of variant
groups.max.iter
: Maximum number of iterations, enforcing the
EM algorithm to stop. Default is 10,000.tol
: Threshold of parameter estimate difference to
determine the convergence of the EM algorithm. Default value is
.verbose
: When TRUE
, enables detailed
output during computation. Default is TRUE
.Only n1
and n2
are required to run MIRAGE;
other parameters have default values.
res=mirage(mirage_toy,n1=4315,n2=4315)
# Computing gene level LRT statistics and p-values ...
# Computing LRT statistics and p-values by categories ...
The output of MIRAGE is a list containing the following values:
BF.PP.gene
: Bayes factor and posterior probability of
genes.
head(res$BF.PP.gene)
# Gene BF post.prob
# 1 LPA 0.9164196 0.9190893
# 2 MTOR 1.0405812 0.9280489
# 3 PTCHD2 0.8327124 0.9116743
# 4 UBR4 1.2899155 0.9411381
# 5 CSMD2 0.8010560 0.9085032
# 6 NCDN 1.0006285 0.9253904
BF.all
: A list of Bayes factors for all variants within
a gene.
head(res$BF.all[1],n = 5) # variant information in the first gene
# [[1]]
# ID Gene No.case No.contr category
# 15129 6:161006077-161006077_C_T LPA 298 272 1
# 15078 6:160966536-160966536_A_ACC LPA 2 2 2
# 15079 6:160966537-160966539_TTG_T LPA 2 2 2
# 15082 6:160968821-160968822_AT_A LPA 1 0 2
# 15102 6:160969584-160969584_G_C LPA 1 4 2
# 15107 6:160977184-160977185_AC_A LPA 0 1 2
# 15114 6:160978560-160978560_G_A LPA 1 0 2
# 15116 6:160998167-160998167_C_T LPA 1 5 2
# 15142 6:161007480-161007480_C_T LPA 1 0 2
# 15153 6:161010638-161010638_C_T LPA 1 0 2
# 15163 6:161012014-161012014_G_T LPA 0 1 2
# 15171 6:161015102-161015102_C_A LPA 1 0 2
# 15177 6:161016494-161016494_C_CA LPA 1 0 2
# 15180 6:161020531-161020531_C_T LPA 0 1 2
# 15210 6:161071369-161071369_C_T LPA 1 0 2
# 15061 6:160952816-160952816_T_C LPA 118 103 3
# 15068 6:160961137-160961137_T_C LPA 257 248 3
# 15073 6:160963771-160963771_C_A LPA 50 44 4
# 15074 6:160963774-160963774_C_G LPA 21 32 4
# 15075 6:160963774-160963774_C_T LPA 0 2 4
# 15081 6:160966559-160966559_G_A LPA 17 28 4
# 15135 6:161006105-161006105_C_T LPA 88 100 4
# 15148 6:161007619-161007619_G_T LPA 26 20 4
# 15195 6:161022107-161022107_C_T LPA 79 90 4
# category_factor var.BF original.index
# 15129 1 1.634504e-07 1
# 15078 2 6.059431e-01 2
# 15079 2 6.059431e-01 2
# 15082 2 1.452084e+00 2
# 15102 2 1.896752e-01 2
# 15107 2 5.479161e-01 2
# 15114 2 1.452084e+00 2
# 15116 2 1.405388e-01 2
# 15142 2 1.452084e+00 2
# 15153 2 1.452084e+00 2
# 15163 2 5.479161e-01 2
# 15171 2 1.452084e+00 2
# 15177 2 1.452084e+00 2
# 15180 2 5.479161e-01 2
# 15210 2 1.452084e+00 2
# 15061 3 7.596685e-02 3
# 15068 3 3.931896e-08 3
# 15073 4 9.300675e-02 4
# 15074 4 5.040937e-02 4
# 15075 4 3.291740e-01 4
# 15081 4 5.484098e-02 4
# 15135 4 2.698210e-02 4
# 15148 4 2.857868e-01 4
# 15195 4 2.795875e-02 4
delta.est
: Estimate for the proportion of risk genes
and p value.
res$delta.est
# delta.est delta.pvalue
# 1 0.925347 0.9347734
eta.est
: Estimate for the proportion of risk variants
in each variant group and p values.
res$eta.est
# eta.est eta.pvalue
# 1 0.000000e+00 1.0000000
# 2 1.030961e-01 0.4206792
# 3 4.940656e-324 1.0000000
# 4 1.976263e-323 1.0000000
# 5 7.871498e-02 0.2782136
Users may use the script below to compute Bayes factor for each variant group within a gene. We here show an example for computing for the first gene in our example data “LPA”.
########## after run MIRAGE, given a gene, calculate the BF of every variant group.
gene_name <- "LPA"
# Number of variant groups for this gene
num_category <- nlevels(factor(mirage_toy$group.index[mirage_toy$Gene == gene_name]))
# Extract variant-level results for this gene
variant_full_infor <- res$BF.all[[which(res$BF.PP.gene$Gene == gene_name)]]
# Compute group-level BF
variant_group_BF <- sapply(seq_len(num_category), function(j) {
idx <- which(variant_full_infor$original.index == j)
if (length(idx) == 0) {
return(NA_real_)
} else {
return(prod(1 - res$eta.est$eta.est[j] + res$eta.est$eta.est[j] * variant_full_infor$var.BF[idx]))
}
})
variant_group_BF
# [1] 1.0000000 0.9164266 1.0000000 1.0000000
res_fixed_eta=mirage(mirage_toy,n1=4315,n2=4315, estimate.eta = F, fixed.eta = c(0.60, 0.31, 0.16, 0.4, 0.2))
The output of MIRAGE is a list containing the following values:
BF.gene
: a matrix of gene with BF’s
head(res_fixed_eta$BF.gene)
# Gene BF
# 1 LPA 0.009253237
# 2 MTOR 0.984853873
# 3 PTCHD2 0.585846104
# 4 UBR4 1.335284217
# 5 CSMD2 0.077841990
# 6 NCDN 1.023760609
BF.all
: A list of Bayes factors for all variants within
a gene.
head(res_fixed_eta$BF.all[1], n=5) # variant information in the first gene
# [[1]]
# ID Gene No.case No.contr category
# 15129 6:161006077-161006077_C_T LPA 298 272 1
# 15078 6:160966536-160966536_A_ACC LPA 2 2 2
# 15079 6:160966537-160966539_TTG_T LPA 2 2 2
# 15082 6:160968821-160968822_AT_A LPA 1 0 2
# 15102 6:160969584-160969584_G_C LPA 1 4 2
# 15107 6:160977184-160977185_AC_A LPA 0 1 2
# 15114 6:160978560-160978560_G_A LPA 1 0 2
# 15116 6:160998167-160998167_C_T LPA 1 5 2
# 15142 6:161007480-161007480_C_T LPA 1 0 2
# 15153 6:161010638-161010638_C_T LPA 1 0 2
# 15163 6:161012014-161012014_G_T LPA 0 1 2
# 15171 6:161015102-161015102_C_A LPA 1 0 2
# 15177 6:161016494-161016494_C_CA LPA 1 0 2
# 15180 6:161020531-161020531_C_T LPA 0 1 2
# 15210 6:161071369-161071369_C_T LPA 1 0 2
# 15061 6:160952816-160952816_T_C LPA 118 103 3
# 15068 6:160961137-160961137_T_C LPA 257 248 3
# 15073 6:160963771-160963771_C_A LPA 50 44 4
# 15074 6:160963774-160963774_C_G LPA 21 32 4
# 15075 6:160963774-160963774_C_T LPA 0 2 4
# 15081 6:160966559-160966559_G_A LPA 17 28 4
# 15135 6:161006105-161006105_C_T LPA 88 100 4
# 15148 6:161007619-161007619_G_T LPA 26 20 4
# 15195 6:161022107-161022107_C_T LPA 79 90 4
# category_factor var.BF original.index
# 15129 1 1.634504e-07 1
# 15078 2 6.059431e-01 2
# 15079 2 6.059431e-01 2
# 15082 2 1.452084e+00 2
# 15102 2 1.896752e-01 2
# 15107 2 5.479161e-01 2
# 15114 2 1.452084e+00 2
# 15116 2 1.405388e-01 2
# 15142 2 1.452084e+00 2
# 15153 2 1.452084e+00 2
# 15163 2 5.479161e-01 2
# 15171 2 1.452084e+00 2
# 15177 2 1.452084e+00 2
# 15180 2 5.479161e-01 2
# 15210 2 1.452084e+00 2
# 15061 3 7.596685e-02 3
# 15068 3 3.931896e-08 3
# 15073 4 9.300675e-02 4
# 15074 4 5.040937e-02 4
# 15075 4 3.291740e-01 4
# 15081 4 5.484098e-02 4
# 15135 4 2.698210e-02 4
# 15148 4 2.857868e-01 4
# 15195 4 2.795875e-02 4
Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.
sessionInfo()
# R version 4.3.2 (2023-10-31 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 11 x64 (build 26100)
#
# Matrix products: default
#
#
# locale:
# [1] LC_COLLATE=English_United States.utf8
# [2] LC_CTYPE=English_United States.utf8
# [3] LC_MONETARY=English_United States.utf8
# [4] LC_NUMERIC=C
# [5] LC_TIME=English_United States.utf8
#
# time zone: America/Chicago
# tzcode source: internal
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] mirage_0.1.0.0
#
# loaded via a namespace (and not attached):
# [1] vctrs_0.6.5 crayon_1.5.3 cli_3.6.2 knitr_1.49
# [5] rlang_1.1.2 xfun_0.50.6 textshaping_1.0.0 jsonlite_1.8.9
# [9] prettyunits_1.2.0 htmltools_0.5.8.1 ragg_1.3.3 sass_0.4.9
# [13] hms_1.1.3 rmarkdown_2.29 evaluate_1.0.3 jquerylib_0.1.4
# [17] fastmap_1.2.0 progress_1.2.3 yaml_2.3.8 lifecycle_1.0.4
# [21] compiler_4.3.2 codetools_0.2-19 fs_1.6.5 pkgconfig_2.0.3
# [25] htmlwidgets_1.6.4 rstudioapi_0.17.1 systemfonts_1.2.1 digest_0.6.33
# [29] R6_2.6.1 bslib_0.9.0 tools_4.3.2 pkgdown_2.1.1
# [33] cachem_1.1.0 desc_1.4.3