Sum of Single Effects (SuSiE) Regression using summary statistics

susie_rss performs sum of single-effect linear regression with z scores; all posterior calculations are for z-scores. This function fits the regression model \(z = \sum_l R*b_l + e\), where e is \(N(0,R)\) and \(\sum_l b_l\) is a p-vector of effects to be estimated. The required summary data are the p by p correlation matrix, R, and the p-vector z.

susie_rss(
  z,
  R,
  maf = NULL,
  maf_thresh = 0,
  z_ld_weight = 0,
  L = 10,
  prior_variance = 50,
  residual_variance = NULL,
  prior_weights = NULL,
  null_weight = NULL,
  estimate_residual_variance = FALSE,
  estimate_prior_variance = TRUE,
  estimate_prior_method = c("optim", "EM", "simple"),
  check_null_threshold = 0,
  prior_tol = 1e-09,
  max_iter = 100,
  s_init = NULL,
  intercept_value = 0,
  coverage = 0.95,
  min_abs_corr = 0.5,
  tol = 0.001,
  verbose = FALSE,
  track_fit = FALSE,
  check_R = FALSE,
  r_tol = 1e-08,
  refine = FALSE
)

Arguments

z: A p-vector of z scores.
R: A p by p symmetric, positive semidefinite correlation matrix.
maf: Minor allele frequency; to be used along with maf_thresh to filter input summary statistics.
maf_thresh: Variants having a minor allele frequency smaller than this threshold are not used.
z_ld_weight: This feature is not recommended. The weights assigned to the z scores in the LD matrix. If z_ld_weight > 0, the LD matrix used in the model is cov2cor((1-w)*R + w*tcrossprod(z)), where w = z_ld_weight.
L: Number of components (nonzero coefficients) in the susie regression model. If L is larger than the number of covariates, p, L is set to p.
prior_variance: The prior variance. It is either a scalar or a vector of length L.
residual_variance: Variance of the residual. If it is not specified, we set it to 1.
prior_weights: A vector of length p, in which each entry gives the prior probability that SNP j has non-zero effect.
null_weight: Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1).
estimate_residual_variance: The residual variance is fixed to the value supplied by residual_variance. We don't estimate residual variance in susie_rss.
estimate_prior_variance: If estimate_prior_variance = TRUE, the prior variance is estimated (this is a separate parameter for each of the L effects). If provided, prior_variance is then used as an initial value for the optimization. When estimate_prior_variance = FALSE, the prior variance for each of the L effects is determined by the value supplied to prior_variance.
estimate_prior_method: The method used for estimating prior variance. When estimate_prior_method = "simple" is used, the likelihood at the specified prior variance is compared to the likelihood at a variance of zero, and the setting with the larger likelihood is retained.
check_null_threshold: When the prior variance is estimated, compare the estimate with the null, and set the prior variance to zero unless the log-likelihood using the estimate is larger by this threshold amount. For example, if you set check_null_threshold = 0.1, this will "nudge" the estimate towards zero when the difference in log-likelihoods is small. A note of caution that setting this to a value greater than zero may lead the IBSS fitting procedure to occasionally decrease the ELBO.
prior_tol: When the prior variance is estimated, compare the estimated value to prior_tol at the end of the computation, and exclude a single effect from PIP computation if the estimated prior variance is smaller than this tolerance value.
max_iter: Maximum number of IBSS iterations to perform.
s_init: A previous susie fit with which to initialize.
intercept_value: The intercept. (The intercept cannot be estimated from centered summary data.) This setting will be used by coef to assign an intercept value, mainly for consistency with susie. Set to NULL if you want coef not to include an intercept term (and so only a p-vector is returned).
coverage: A number between 0 and 1 specifying the “coverage” of the estimated confidence sets.
min_abs_corr: Minimum absolute correlation allowed in a credible set. The default, 0.5, corresponds to a squared correlation of 0.25, which is a commonly used threshold for genotype data in genetic studies.
tol: A small, non-negative number specifying the convergence tolerance for the IBSS fitting procedure. The fitting procedure will halt when the difference in the variational lower bound, or “ELBO” (the objective function to be maximized), is less than tol.
verbose: If verbose = TRUE, the algorithm's progress, and a summary of the optimization settings, are printed to the console.
track_fit: If track_fit = TRUE, trace is also returned containing detailed information about the estimates at each iteration of the IBSS fitting procedure.
check_R: If check_R = TRUE, check that R is positive semidefinite.
r_tol: Tolerance level for eigenvalue check of positive semidefinite matrix of R.
refine: If refine = TRUE, we use a procedure to help SuSiE get out of local optimum.

Value

A "susie" object with some or all of the following elements:

alpha: An L by p matrix of posterior inclusion probabilites.
mu: An L by p matrix of posterior means, conditional on inclusion.
mu2: An L by p matrix of posterior second moments, conditional on inclusion.
lbf: log-Bayes Factor for each single effect.
lbf_variable: log-Bayes Factor for each variable and single effect.
intercept: Fixed Intercept.
sigma2: Fixed Residual variance.
V: Prior variance of the non-zero elements of b, equal to scaled_prior_variance * var(Y).
elbo: The value of the variational lower bound, or “ELBO” (objective function to be maximized), achieved at each iteration of the IBSS fitting procedure.
fitted: Vector of length n containing the fitted values of the outcome.
sets: Credible sets estimated from model fit; see susie_get_cs for details.
pip: A vector of length p giving the (marginal) posterior inclusion probabilities for all p covariates.
niter: Number of IBSS iterations that were performed.
converged: TRUE or FALSE indicating whether the IBSS converged to a solution within the chosen tolerance level.
Rr: An p-vector of t(X) times fitted values, X %*% colSums(alpha*mu).

Examples

set.seed(1)
n = 1000
p = 1000
beta = rep(0,p)
beta[1:4] = 1
X = matrix(rnorm(n*p),nrow = n,ncol = p)
X = scale(X,center = TRUE,scale = TRUE)
y = drop(X %*% beta + rnorm(n))

input_ss <- compute_ss(X,y,standardize = TRUE)
#> Error in compute_ss(X, y, standardize = TRUE): could not find function "compute_ss"
ss   <- univariate_regression(X,y)
#> Error in univariate_regression(X, y): could not find function "univariate_regression"
R    <- with(input_ss,cov2cor(XtX))
#> Error in with(input_ss, cov2cor(XtX)): object 'input_ss' not found
zhat <- with(ss,betahat/sebetahat)
#> Error in with(ss, betahat/sebetahat): object 'ss' not found
res  <- susie_rss(zhat,R,L = 10)
#> Error in susie_rss(zhat, R, L = 10): could not find function "susie_rss"