VEGA

class vega.VEGA(adata, gmt_paths=None, add_nodes=1, min_genes=0, max_genes=5000, positive_decoder=True, encode_covariates=False, regularizer='mask', reg_kwargs=None, **kwargs)[source]

Constructor for class VEGA (VAE Enhanced by Gene Annotations).

Parameters
  • adata (AnnData) – scanpy single-cell object. Please run setup_anndata() before passing to VEGA.

  • gmt_paths (Union[str, list, None]) – one or more paths to .gmt files for GMVs initialization.

  • add_nodes (int) – additional fully-connected nodes in the mask.

  • min_genes (int) – minimum gene size for GMVs.

  • max_genes (int) – maximum gene size for GMVs.

  • positive_decoder (bool) – whether to constrain decoder to positive weights

  • encode_covariates (bool) – whether to encode covariates along gene expression

  • regularizer (str) – which regularization strategy to use (l1, gelnet, mask). Default: mask.

  • reg_kwargs (Optional[dict]) – parameters for regularizer.

  • **kwargs

    use_cuda

    using CPU (False) or CUDA (True).

    beta

    weight for KL-divergence.

    dropout

    dropout rate in model

    z_dropout

    dropout rate for the latent space (for correlation).

save(path, save_adata=False, save_history=False, overwrite=False, save_regularizer_kwargs=True)[source]

Save model parameters to input directory. Saving Anndata object and training history is optional.

Parameters
  • path (str) – path to save directory

  • save_adata (bool) – whether to save the Anndata object in the save directory

  • save_history (bool) – whether to save the training history in the save directory

  • save_regularizer_kwargs (bool) – whether to save regularizer hyperparameters (lambda, penalty matrix…) in the save directory

classmethod load(path, adata=None, device=device(type='cpu'), reg_kwargs=None)[source]

Load model from directory. If adata=None, try to reload Anndata object from saved directory.

Parameters
  • path (str) – path to save directory

  • adata (Optional[AnnData]) – scanpy single cell object

  • device (device) – CPU or CUDA

encode(X, batch_index, cat_covs=None)[source]

Encode data in latent space (Inference step).

Parameters
  • X – input data

  • batch_index – batch information for samples

  • cat_covs – categorical covariates

Returns

  • z – data in latent space

  • mu – mean of variational posterior

  • logvar – log-variance of variational posterior

decode(z, batch_index, cat_covs=None)[source]

Decode data from latent space.

Parameters
  • z – data embedded in latent space

  • batch_index – batch information for samples

  • cat_covs – categorical covariates.

Returns

decoded data

Return type

X_rec

sample_latent(mu, logvar)[source]

Sample latent space with reparametrization trick. First convert to std, sample normal(0,1) and get Z.

Parameters
  • mu – mean of variational posterior

  • logvar – log-variance of variational posterior

Returns

sampled latent space

Return type

eps

to_latent(adata=None, indices=None, return_mean=False)[source]

Project data into latent space. Inspired by SCVI.

Parameters
  • adata (Optional[AnnData]) – scanpy single-cell dataset

  • indices (Optional[list]) – indices of the subset of cells to be encoded

  • return_mean (bool) – whether to use the mean of the multivariate gaussian or samples

generative(adata=None, indices=None, use_mean=True)[source]

Generate new samples from input data (encode-decode).

Parameters
  • adata (Optional[AnnData]) – scanpy single-cell dataset

  • indices (Optional[list]) – indices of the subset of cells to be encoded

  • use_mean (bool) – whether to use the mean of the multivariate gaussian or samples

differential_activity(groupby, adata=None, group1=None, group2=None, mode='change', delta=2.0, fdr_target=0.05, **kwargs)[source]

Bayesian differential activity procedures for GMVs. Similar to scVI [Lopez2018] Bayesian DGE but for latent variables. Differential results are saved in the adata object and returned as a DataFrame.

Parameters
  • groupby (str) – anndata object field to group cells (eg. “cell type”)

  • adata (Optional[AnnData]) – scanpy single-cell object. If None, use Anndata attribute of VEGA.

  • group1 (Union[str, list, None]) – reference group(s).

  • group2 (Union[str, list, None]) – outgroup(s).

  • mode (str) – differential activity mode. If “vanilla”, uses [Lopez2018], if “change” uses [Boyeau2019].

  • delta (float) – differential activity threshold for “change” mode.

  • fdr_target (float) – minimum FDR to consider gene as DE.

  • **kwargs – optional arguments of the bayesian_differential method.

Returns

Return type

Differential activity results

bayesian_differential(adata, cell_idx1, cell_idx2, n_samples=5000, use_permutations=True, n_permutations=5000, mode='change', delta=2.0, alpha=0.66, random_seed=False)[source]

Run Bayesian differential expression in latent space. Returns Bayes factor of all factors.

Parameters
  • adata (AnnData) – anndata single-cell object.

  • cell_idx1 (list) – indices of group 1.

  • cell_idx2 (list) – indices of group 2.

  • n_samples (int) – number of samples to draw from the latent space.

  • use_permutations (bool) – whether to use permutations when computing the double integral.

  • n_permutations (int) – number of permutations for MC integral.

  • mode (int) – differential activity test strategy. “vanilla” corresponds to [Lopez2018], “change” to [Boyeau2019].

  • delta (float) – for mode “change”, the differential threshold to be used.

  • random_seed (bool) – seed for reproducibility.

Returns

dictionary with results (Bayes Factor, Mean Absolute Difference)

Return type

res

forward(tensors)[source]

Forward pass through full network.

Parameters

tensors – input data

Returns

dictionary of output tensors

Return type

out_tensors

vae_loss(model_input, model_output)[source]

Custom loss for beta-VAE

Parameters
  • model_input – dict with input values

  • model_output – dict with output values

Returns

Return type

loss value for current batch

train_vega(learning_rate=0.0001, n_epochs=500, train_size=1.0, batch_size=128, shuffle=True, use_gpu=False, **kwargs)[source]

Main method to train VEGA.

Parameters
  • learning_rate (float) – learning rate

  • n_epochs (int) – number of epochs to train model

  • train_size (float) – a number between 0 and 1 to indicate the proportion of training data. Test size is set to 1-train_size

  • batch_size (int) – number of samples per batch

  • shuffle (bool) – whether to shuffle samples or not

  • use_gpu (bool) – whether to use GPU

  • **kwargs – other keyword arguments of the _train_model() method, like the early stopping patience