Assembles data for all the regions

assemble_region_data(
  region_info,
  z_snp,
  z_gene,
  weights,
  snp_map,
  thin = 0.1,
  maxSNP = Inf,
  trim_by = c("random", "z"),
  adjust_boundary_genes = TRUE,
  ncore = 1,
  seed = 99,
  logfile = NULL
)

Arguments

region_info

a data frame of region definitions.

z_snp

A data frame with columns: "id", "z", giving the z-scores for SNPs.

z_gene

A data frame with columns: "id", "z", giving the z-scores for genes.

weights

a list of preprocessed weights.

snp_map

a list of data frames with SNP-to-region map for the reference.

thin

The proportion of SNPs to be used for the parameter estimation and initial screening region steps. Smaller thin parameters reduce runtime at the expense of accuracy. The fine mapping step is rerun using full SNPs for regions with strong gene signals.

maxSNP

Inf or integer. Maximum number of SNPs in a region. Default is Inf, no limit. This can be useful if there are many SNPs in a region and you don't have enough memory to run the program. This applies to the last rerun step (using full SNPs and rerun susie for regions with strong gene signals) only.

trim_by

remove SNPs if the total number of SNPs exceeds limit, options: "random", or "z" (trim SNPs with lower |z|). See parameter `maxSNP` for more information.

adjust_boundary_genes

identify cross-boundary genes, adjust region_data

ncore

The number of cores used to parallelize susie over regions

seed

seed for random sampling

logfile

The log filename. If NULL, print log info on screen.

Value

a list, including region_data and cross-boundary genes