cTWAS analysis using summary statistics
ctwas_sumstats(
z_snp,
weights,
region_info,
LD_map,
snp_map,
z_gene = NULL,
thin = 0.1,
niter_prefit = 3,
niter = 30,
L = 5,
init_group_prior = NULL,
init_group_prior_var = NULL,
group_prior_var_structure = c("shared_type", "shared_context", "shared_nonSNP",
"shared_all", "independent"),
filter_L = TRUE,
filter_nonSNP_PIP = FALSE,
min_nonSNP_PIP = 0.5,
min_p_single_effect = 0.8,
maxSNP = Inf,
min_var = 2,
min_gene = 1,
min_group_size = 100,
use_null_weight = TRUE,
coverage = 0.95,
min_abs_corr = 0.1,
LD_format = c("rds", "rdata", "mtx", "csv", "txt", "custom"),
LD_loader_fun = NULL,
snpinfo_loader_fun = NULL,
force_compute_cor = FALSE,
save_cor = FALSE,
cor_dir = NULL,
outputdir = NULL,
outname = "ctwas",
ncore = 1,
ncore_LD = max(ncore - 1, 1),
seed = 99,
logfile = NULL,
verbose = FALSE,
...
)
A data frame with four columns: "id", "A1", "A2", "z". giving the z scores for SNPs. "A1" is effect allele. "A2" is the other allele.
a list of pre-processed prediction weights.
a data frame of region definitions.
a data frame with filenames of LD matrices and SNP information for the regions.
a list of data frames with SNP-to-region map for the reference.
A data frame with columns: "id", "z", giving the z-scores for genes.
The proportion of SNPs to be used for estimating parameters and screening regions.
the number of iterations of the E-M algorithm to perform during the initial parameter estimation step.
the number of iterations of the E-M algorithm to perform during the complete parameter estimation step.
the number of effects for susie during the fine mapping steps.
a vector of initial values of prior inclusion probabilities for SNPs and genes.
a vector of initial values of prior variances for SNPs and gene effects.
a string indicating the structure to put on the prior variance parameters. "shared_type" allows all groups in one molecular QTL type to share the same variance parameter. "shared_context" allows all groups in one context (tissue, cell type, condition) to share the same variance parameter. "shared_nonSNP" allows all non-SNP groups to share the same variance parameter. "shared_all" allows all groups to share the same variance parameter. "independent" allows all groups to have their own separate variance parameters.
If TRUE, screening regions with L > 0.
If TRUE, screening regions with total non-SNP PIP >= min_nonSNP_PIP
.
Regions with non-SNP PIP >= min_nonSNP_PIP
will be selected to run finemapping using all SNPs.
Regions with probability greater than min_p_single_effect
of
having 1 or fewer effects will be used for parameter estimation.
Inf or integer. Maximum number of SNPs in a region. Default is Inf, no limit. This can be useful if there are many SNPs in a region and you don't have enough memory to run the program.
minimum number of variables (SNPs and genes) in a region when estimating paramters and screening regions.
minimum number of genes in a region when estimating paramters and screening regions.
Minimum number of genes in a group.
Groups with number of genes < min_group_size
will be removed for the analysis.
If TRUE, allow for a probability of no effect in susie.
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets.
Minimum absolute correlation allowed in a credible set.
file format for LD matrix. If "custom", use a user defined
LD_loader_fun()
function to load LD matrix.
a user defined function to load LD matrix when LD_format = "custom"
.
a user defined function to load SNP information file, if SNP information files are not in standard cTWAS reference format.
If TRUE, force computing correlation (R) matrices.
If TRUE, save correlation (R) matrices to cor_dir
.
The directory to store correlation (R) matrices.
The directory to store output. If specified, save outputs to the directory.
The output name.
The number of cores used to parallelize computing over regions.
The number of cores used to parallelize computing correlation matrices, in screening regions and fine-mapping steps with LD.
seed for random sampling when thinning the SNPs in region data.
The log filename. If NULL, print log info on screen.
If TRUE, print detailed messages.
Additional arguments of susie_rss
.
a list, including z_gene, estimated parameters, region_data, cross-boundary genes, screening region results, and fine-mapping results.