R/ctwas_preprocess_weights.R
preprocess_weights.Rd
Preprocess PredictDB/FUSION weights and harmonize with LD reference
preprocess_weights(
weight_file,
region_info,
gwas_snp_ids,
snp_map,
LD_map = NULL,
weight_format = c("PredictDB", "FUSION"),
drop_strand_ambig = TRUE,
filter_protein_coding_genes = TRUE,
scale_predictdb_weights = TRUE,
load_predictdb_LD = TRUE,
fusion_method = c("lasso", "enet", "top1", "blup", "bslmm", "best.cv"),
fusion_genome_version = "b38",
fusion_top_n_snps,
LD_format = c("rds", "rdata", "csv", "txt", "custom"),
LD_loader_fun = NULL,
ncore = 1,
logfile = NULL
)
filename of the '.db' file for PredictDB weights; or the directory containing '.wgt.RDat' files for FUSION weights.
a data frame of region definition.
a vector of SNP IDs in GWAS summary statistics (z_snp$id).
a list of data frames with SNP-to-region map for the reference.
a data frame with filenames of LD matrices and SNP information for all regions.
Required when load_predictdb_LD = FALSE
.
a string, specifying format of each weight file, e.g. PredictDB, FUSION.
If TRUE remove strand ambiguous variants (A/T, G/C).
If TRUE, keep protein coding genes only. This option is only for PredictDB weights.
If TRUE, scale PredictDB weights by the variance. This is because PredictDB weights assume that variant genotypes are not standardized, but our implementation assumes standardized variant genotypes. This option is only for PredictDB weights.
If TRUE, load pre-computed LD among weight SNPs. This option is only for PredictDB weights.
a string, specifying the method to choose in FUSION models. "best.cv" option will use the best model (smallest p-value) under cross-validation.
a string, specifying the genome version of FUSION models
a number, specifying the top n weight SNPs included in FUSION models. By default, use all weight SNPs.
file format for LD matrix. If "custom", use a user defined
LD_loader_fun()
function to load LD matrix.
a user defined function to load LD matrix when LD_format = "custom"
.
The number of cores used to parallelize computation.
the log file, if NULL will print log info on screen.
a list of processed weights