MIRAGE is able to analyze both variant set and gene set, although its name focus on gene set. Mixture model is ultilized to model the risk uncertainty of either a variant or a gene and the risk probability depends on the annotations. We will start with a simple case-variant set.

Variant set (VS) analysis mirage_vs

Suppose there are KK variant groups and let’s focus on one variant group only. Variant jj in the group is modeled a mixture of risk variant and non-risk variant following Bernoulli distribution

P(Zj=1)=ηP(Z_j=1)=\etaZjZ_j is a binary variable indicating it’s a risk variant Zj=1Z_j=1 and non-risk Zj=0Z_j=0. All variants within the same group are assumed to be homogeneous sharing similar effect size and η\eta is the proportion of risk variants in the group. The posterior probability (PP) of being a risk variant is

P(Zj=1|Xj,Tj)=P(Zj=1,Xj,Tj)P(Xj,Tj)=ηBFjηBFj+1ηP(Z_j=1|X_j, T_j)=\frac{P(Z_j=1, X_j, T_j)}{P(X_j, T_j)}=\frac{\eta BF_j}{\eta BF_j+1-\eta} where BFj=P(Xj,Tj|Zj=1)P(Xj,Tj|Zj=0)BF_j=\frac{P(X_j, T_j|Z_j=1)}{P(X_j, T_j|Z_j=0)}, is the Bayes factor of variant jj, Xj,TjX_j, T_j are rare allele counts in cases and both cases and controls respectively.

Gene set analysis mirage

In a gene set, every gene is modeled as a mixture of risk gene and non-risk gene as

P(Ui=1)=δP(U_i=1)=\delta

gene ii is a risk gene when Ui=1U_i=1 and non-risk gene Ui=0U_i=0. δ\delta is the proportion of risk genes in the gene set. If gene ii is a risk gene, its variant (i,j)(i,j) is from variant group kk, then P(Zij=1)=ηkP(Z_{ij}=1)=\eta_k

ηk\eta_k is the proportion of risk variants in variant group kk where variants may be from multipe different genes. The posterior probability (PP) is

PPi=δBiδBi+1δPP_i=\frac{\delta B_i}{\delta B_i+1-\delta}

BiB_i is the Bayes factor of gene ii. More details can be found in the reference.

References

A Bayesian method for rare variant analysis using functional annotations and its application to Autism: https://www.biorxiv.org/content/10.1101/828061v1