For each region, get the index for snp and gene (index is location/column number in .pgen file or .expr file) located within this region.

index_regions(
  regionfile,
  exprvarfs,
  pvarfs = NULL,
  ld_Rfs = NULL,
  select = NULL,
  thin = 1,
  maxSNP = Inf,
  minvar = 1,
  merge = T,
  outname = NULL,
  outputdir = getwd()
)

Arguments

regionfile

regions file. Has three columns: chr, start, end. The regions file should provide non overlaping regions defining LD blocks. currently does not support chromsome X/Y etc.

select

Default is NULL, all variants will be selected. Or a vector of variant IDs, or a data frame with columns id and z (id is for gene or SNP id, z is for z scores). z will be used for remove SNPs if the total number of SNPs exceeds limit. See parameter `maxSNP` for more information.

thin

A scalar in (0,1]. The proportion of SNPs left after down sampling. Only applied on SNPs after selecting variants.

maxSNP

Default is Inf, no limit for the maximum number of SNPs in a region. Or an integer indicating the maximum number of SNPs allowed in a region. This parameter is useful when a region contains many SNPs and you don't have enough memory to run the program. In this case, you can put a limit on the number of SNPs in the region. If z scores are given in the parameter `select`, i.e. a data frame with columns id and z is provided, SNPs are ranked based on |z| from high to low and only the top `maxSNP` SNPs are kept. If only variant ids are provided, then `maxSNP` number of SNPs will be chosen randomly.

minvar

minimum number of variatns in a region

merge

TRUE/FALSE. If TRUE, merge regions when a gene spans a region boundary (i.e. belongs to multiple regions.)

outname

a string, the output name

outputdir

a string, the directory to store output

Value

A list. Items correspond to each pvarf/exprvarf. Each Item is also a list, the items in this list are for each region.