Preprocess PredictDB/FUSION weights and harmonize with LD reference

preprocess_weights(
  weight_file,
  region_info,
  gwas_snp_ids,
  snp_map,
  LD_map = NULL,
  weight_format = c("PredictDB", "FUSION"),
  drop_strand_ambig = TRUE,
  filter_protein_coding_genes = TRUE,
  scale_predictdb_weights = TRUE,
  load_predictdb_LD = TRUE,
  fusion_method = c("lasso", "enet", "top1", "blup", "bslmm", "best.cv"),
  fusion_genome_version = "b38",
  fusion_top_n_snps,
  LD_format = c("rds", "rdata", "csv", "txt", "custom"),
  LD_loader_fun = NULL,
  ncore = 1,
  logfile = NULL
)

Arguments

weight_file

filename of the '.db' file for PredictDB weights; or the directory containing '.wgt.RDat' files for FUSION weights.

region_info

a data frame of region definition.

gwas_snp_ids

a vector of SNP IDs in GWAS summary statistics (z_snp$id).

snp_map

a list of data frames with SNP-to-region map for the reference.

LD_map

a data frame with filenames of LD matrices and SNP information for all regions. Required when load_predictdb_LD = FALSE.

weight_format

a string, specifying format of each weight file, e.g. PredictDB, FUSION.

drop_strand_ambig

If TRUE remove strand ambiguous variants (A/T, G/C).

filter_protein_coding_genes

If TRUE, keep protein coding genes only. This option is only for PredictDB weights.

scale_predictdb_weights

If TRUE, scale PredictDB weights by the variance. This is because PredictDB weights assume that variant genotypes are not standardized, but our implementation assumes standardized variant genotypes. This option is only for PredictDB weights.

load_predictdb_LD

If TRUE, load pre-computed LD among weight SNPs. This option is only for PredictDB weights.

fusion_method

a string, specifying the method to choose in FUSION models. "best.cv" option will use the best model (smallest p-value) under cross-validation.

fusion_genome_version

a string, specifying the genome version of FUSION models

fusion_top_n_snps

a number, specifying the top n weight SNPs included in FUSION models. By default, use all weight SNPs.

LD_format

file format for LD matrix. If "custom", use a user defined LD_loader_fun() function to load LD matrix.

LD_loader_fun

a user defined function to load LD matrix when LD_format = "custom".

ncore

The number of cores used to parallelize computation.

logfile

the log file, if NULL will print log info on screen.

Value

a list of processed weights