Causal inference for TWAS
ctwas(
pgenfs,
exprfs,
Y,
ld_regions = c("EUR", "ASN", "AFR"),
ld_regions_version = c("b37", "b38"),
ld_regions_custom = NULL,
thin = 1,
prob_single = 0.8,
max_snp_region = Inf,
rerun_gene_PIP = 0.8,
niter1 = 3,
niter2 = 30,
L = 5,
group_prior = NULL,
group_prior_var = NULL,
estimate_group_prior = T,
estimate_group_prior_var = T,
use_null_weight = T,
coverage = 0.95,
standardize = T,
ncore = 1,
outputdir = getwd(),
outname = NULL,
logfile = NULL
)
A character vector of .pgen or .bed files. One file for one chromosome, in the order of 1 to 22. Therefore, the length of this vector needs to be 22. If .pgen files are given, then .pvar and .psam are assumed to present in the same directory. If .bed files are given, then .bim and .fam files are assumed to present in the same directory.
A character vector of .`expr` or `.expr.gz` files. One file for one chromosome, in the order of 1 to 22. Therefore, the length of this vector needs to be 22. `.expr.gz` file is gzip compressed `.expr` files. `.expr` is a matrix of imputed expression values, row is for each sample, column is for each gene. Its sample order is same as in files provided by `.pgenfs`. We also assume corresponding `.exprvar` files are present in the same directory. `.exprvar` files are just tab delimited text files, with columns:
chromosome number, numeric
gene boundary position, the smaller value
gene boundary position, the larger value
gene id
Its rows should be in the same order as the columns for corresponding `.expr` files.
a vector of length n, phenotype, the same order as provided by `.pgenfs` (defined in .psam or .fam files).
A string representing the population to use for defining
LD regions. These LD regions were previously defined by ldetect. The user can also
provide custom LD regions matching genotype data, see
ld_regions_custom
.
A string representing the genome reference build ("b37", "b38") to use for defining
LD regions. See ld_regions
.
A bed format file defining LD regions. The default
is NULL
; when specified, ld_regions
and ld_regions_version
will be ignored.
The proportion of SNPs to be used for the parameter estimation and initial fine
mapping steps. Smaller thin
parameters reduce runtime at the expense of accuracy. The fine mapping step is rerun using full SNPs
for regions with strong gene signals; see rerun_gene_PIP
.
Blocks with probability greater than prob_single
of having 1 or fewer effects will be
used for parameter estimation
Inf or integer. Maximum number of SNPs in a region. Default is Inf, no limit. This can be useful if there are many SNPs in a region and you don't have enough memory to run the program. This applies to the last rerun step (using full SNPs and rerun susie for regions with strong gene signals) only.
if thin <1, will rerun blocks with the max gene PIP
> rerun_gene_PIP
using full SNPs. if rerun_gene_PIP
is 0, then
all blocks will rerun with full SNPs
the number of iterations of the E-M algorithm to perform during the initial parameter estimation step
the number of iterations of the E-M algorithm to perform during the complete parameter estimation step
the number of effects for susie during the fine mapping steps
a vector of two prior inclusion probabilities for SNPs and genes. This is ignored
if estimate_group_prior = T
a vector of two prior variances for SNPs and gene effects. This is ignored
if estimate_group_prior_var = T
TRUE/FALSE. If TRUE, the prior inclusion probabilities for SNPs and genes are estimated
using the data. If FALSE, group_prior
must be specified
TRUE/FALSE. If TRUE, the prior variances for SNPs and genes are estimated
using the data. If FALSE, group_prior_var
must be specified
TRUE/FALSE. If TRUE, allow for a probability of no effect in susie
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets
TRUE/FALSE. If TRUE, all variables are standardized to unit variance
The number of cores used to parallelize susie over regions
a string, the directory to store output
a string, the output name
the log file, if NULL will print log info on screen