Harmonize GWAS summary statistics and LD reference

preharmonize_z_ld(
  z_snp,
  ld_R_dir,
  outputdir = getwd(),
  outname = NULL,
  logfile = NULL,
  harmonize_z = T,
  strand_ambig_action_z = c("drop", "none", "recover"),
  drop_multiallelic = T
)

Arguments

z_snp

A data frame with two columns: "id", "A1", "A2", "z". giving the z scores for snps. "A1" is effect allele. "A2" is the other allele. If `harmonize= False`, A1 and A2 are not required.

outputdir

a string, the directory to store output

outname

a string, the output name

logfile

the log file, if NULL will print log info on screen

harmonize_z

TRUE/FALSE. If TRUE, GWAS and eQTL genotype alleles are harmonized

strand_ambig_action_z

the action to take to harmonize strand ambiguous variants (A/T, G/C) between the z scores and LD reference. "drop" removes the ambiguous variant from the z scores. "none" treats the variant as unambiguous, flipping the z score to match the LD reference and then taking no additional action. "recover" imputes the sign of ambiguous z scores using unambiguous z scores and the LD reference and flips the z scores if there is a mismatch between the imputed sign and the observed sign of the z score. This option is computationally intensive

drop_multiallelic

TRUE/FALSE. If TRUE, multiallelic variants will be dropped from the summary statistics

LD_R_dir

a string, pointing to a directory containing all LD matrix files and variant information. Expects .RDS files which contain LD correlation matrices for a region/block. For each RDS file, a file with same base name but ended with .Rvar needs to be present in the same folder. the .Rvar file has 5 required columns: "chrom", "id", "pos", "alt", "ref". If using PredictDB format weights and scale_by_ld_variance=T, a 6th column is also required: "variance", which is the variance of the each SNP. The order of rows needs to match the order of rows in .RDS file.