Assembles data for all the regions

assemble_region_data(
  region_info,
  z_snp,
  z_gene,
  weights,
  snp_map,
  thin = 0.1,
  maxSNP = Inf,
  trim_by = c("random", "z"),
  adjust_boundary_genes = TRUE,
  ncore = 1,
  seed = 99,
  logfile = NULL
)

Arguments

region_info: a data frame of region definitions.
z_snp: A data frame with columns: "id", "z", giving the z-scores for SNPs.
z_gene: A data frame with columns: "id", "z", giving the z-scores for genes.
weights: a list of preprocessed weights.
snp_map: a list of data frames with SNP-to-region map for the reference.
thin: The proportion of SNPs to be used for the parameter estimation and initial screening region steps. Smaller thin parameters reduce runtime at the expense of accuracy. The fine mapping step is rerun using full SNPs for regions with strong gene signals.
maxSNP: Inf or integer. Maximum number of SNPs in a region. Default is Inf, no limit. This can be useful if there are many SNPs in a region and you don't have enough memory to run the program. This applies to the last rerun step (using full SNPs and rerun susie for regions with strong gene signals) only.
trim_by: remove SNPs if the total number of SNPs exceeds limit, options: "random", or "z" (trim SNPs with lower |z|). See parameter `maxSNP` for more information.
adjust_boundary_genes: identify cross-boundary genes, adjust region_data
ncore: The number of cores used to parallelize susie over regions
seed: seed for random sampling
logfile: The log filename. If NULL, print log info on screen.

Value

a list, including region_data and cross-boundary genes