Skip to contents

Generate a binary perturbation matrix and a continuous gene expression matrix in a bottom-up fashion according to a hierarchical factor model with normal noise terms.

Usage

normal_data_sim(
  N = 400,
  P = 600,
  beta_true,
  K = ncol(beta_true),
  M = nrow(beta_true),
  pi_true = rep(0.1, K),
  sigma_w2_true = rep(0.5, K),
  psi_true = 1,
  G_prob = 0.2,
  offset = FALSE
)

Arguments

N

Number of samples to simulate

P

Number of genes to simulate

beta_true

A \(M\) by \(K\) numeric matrix that stores the true effect sizes of perturbation-factor associations; when offset=TRUE, \(M+1\) rows should be provided instead.

K

Number of factors to simulate

M

Number of perturbations to simulate

pi_true

The true density (proportion of nonzero gene loading) of each factor

G_prob

The Bernoulli probability based on which the binary perturbation matrix G will be generated; determines the frequency of each perturbation in the sample population

offset

Default is FALSE. If TRUE, beta_true should have \(M+1\) rows, with the last row storing the intercept values \(\beta_0\)

Value

A list object with the following elements:

Y

a sample by gene matrix with continuous gene expression values;

G

a binary sample by perturbation matrix;

Z

a sample by factor matrix;

F

a binary gene by factor matrix that indicates whether a gene has non-zero loading in the factor;

U

a gene by factor matrix with normal effect sizes, and F*U (element-wise multiplication) gives the loading matrix W.

Examples

set.seed(12345)
beta_true <- rbind(c(1, 0, 0, 0, 0), c(0, 0.8, 0, 0, 0))
sim_data <- normal_data_sim(N = 4000, P = 6000, beta_true = beta_true)