R/ctwas_harmonize_data.R
preharmonize_z_ld.Rd
Harmonize GWAS summary statistics and LD reference
A data frame with two columns: "id", "A1", "A2", "z". giving the z scores for snps. "A1" is effect allele. "A2" is the other allele. If `harmonize= False`, A1 and A2 are not required.
a string, the directory to store output
a string, the output name
the log file, if NULL will print log info on screen
TRUE/FALSE. If TRUE, GWAS and eQTL genotype alleles are harmonized
the action to take to harmonize strand ambiguous variants (A/T, G/C) between the z scores and LD reference. "drop" removes the ambiguous variant from the z scores. "none" treats the variant as unambiguous, flipping the z score to match the LD reference and then taking no additional action. "recover" imputes the sign of ambiguous z scores using unambiguous z scores and the LD reference and flips the z scores if there is a mismatch between the imputed sign and the observed sign of the z score. This option is computationally intensive
TRUE/FALSE. If TRUE, multiallelic variants will be dropped from the summary statistics
a string, pointing to a directory containing all LD matrix files and variant information. Expects .RDS files which contain LD correlation matrices for a region/block.
For each RDS file, a file with same base name but ended with .Rvar needs to be present in the same folder. the .Rvar file has 5 required columns: "chrom", "id", "pos", "alt", "ref".
If using PredictDB format weights and scale_by_ld_variance=T
, a 6th column is also required: "variance", which is the variance of the each SNP.
The order of rows needs to match the order of rows in .RDS file.