VEGA

class vega.VEGA(adata, gmt_paths=None, add_nodes=1, min_genes=0, max_genes=5000, positive_decoder=True, encode_covariates=False, regularizer='mask', reg_kwargs=None, **kwargs)[source]

Constructor for class VEGA (VAE Enhanced by Gene Annotations).

Parameters

adata (AnnData) – scanpy single-cell object. Please run setup_anndata() before passing to VEGA.
gmt_paths (Union[str, list, None]) – one or more paths to .gmt files for GMVs initialization.
add_nodes (int) – additional fully-connected nodes in the mask.
min_genes (int) – minimum gene size for GMVs.
max_genes (int) – maximum gene size for GMVs.
positive_decoder (bool) – whether to constrain decoder to positive weights
encode_covariates (bool) – whether to encode covariates along gene expression
regularizer (str) – which regularization strategy to use (l1, gelnet, mask). Default: mask.
reg_kwargs (Optional[dict]) – parameters for regularizer.
**kwargs –

use_cuda
using CPU (False) or CUDA (True).

beta
weight for KL-divergence.

dropout
dropout rate in model

z_dropout
dropout rate for the latent space (for correlation).

save(path, save_adata=False, save_history=False, overwrite=False, save_regularizer_kwargs=True)[source]

Save model parameters to input directory. Saving Anndata object and training history is optional.

Parameters

path (str) – path to save directory
save_adata (bool) – whether to save the Anndata object in the save directory
save_history (bool) – whether to save the training history in the save directory
save_regularizer_kwargs (bool) – whether to save regularizer hyperparameters (lambda, penalty matrix…) in the save directory

classmethod load(path, adata=None, device=device(type='cpu'), reg_kwargs=None)[source]

Load model from directory. If adata=None, try to reload Anndata object from saved directory.

Parameters

path (str) – path to save directory
adata (Optional[AnnData]) – scanpy single cell object
device (device) – CPU or CUDA

encode(X, batch_index, cat_covs=None)[source]

Encode data in latent space (Inference step).

Parameters

X – input data
batch_index – batch information for samples
cat_covs – categorical covariates

Returns

z – data in latent space
mu – mean of variational posterior
logvar – log-variance of variational posterior

decode(z, batch_index, cat_covs=None)[source]

Decode data from latent space.

Parameters

z – data embedded in latent space
batch_index – batch information for samples
cat_covs – categorical covariates.

Returns

decoded data

Return type

X_rec

sample_latent(mu, logvar)[source]

Sample latent space with reparametrization trick. First convert to std, sample normal(0,1) and get Z.

Parameters

mu – mean of variational posterior
logvar – log-variance of variational posterior

Returns

sampled latent space

Return type

eps

to_latent(adata=None, indices=None, return_mean=False)[source]

Project data into latent space. Inspired by SCVI.

Parameters

adata (Optional[AnnData]) – scanpy single-cell dataset
indices (Optional[list]) – indices of the subset of cells to be encoded
return_mean (bool) – whether to use the mean of the multivariate gaussian or samples

generative(adata=None, indices=None, use_mean=True)[source]

Generate new samples from input data (encode-decode).

Parameters

adata (Optional[AnnData]) – scanpy single-cell dataset
indices (Optional[list]) – indices of the subset of cells to be encoded
use_mean (bool) – whether to use the mean of the multivariate gaussian or samples

differential_activity(groupby, adata=None, group1=None, group2=None, mode='change', delta=2.0, fdr_target=0.05, **kwargs)[source]

Bayesian differential activity procedures for GMVs. Similar to scVI [Lopez2018] Bayesian DGE but for latent variables. Differential results are saved in the adata object and returned as a DataFrame.

Parameters

groupby (str) – anndata object field to group cells (eg. “cell type”)
adata (Optional[AnnData]) – scanpy single-cell object. If None, use Anndata attribute of VEGA.
group1 (Union[str, list, None]) – reference group(s).
group2 (Union[str, list, None]) – outgroup(s).
mode (str) – differential activity mode. If “vanilla”, uses [Lopez2018], if “change” uses [Boyeau2019].
delta (float) – differential activity threshold for “change” mode.
fdr_target (float) – minimum FDR to consider gene as DE.
**kwargs – optional arguments of the bayesian_differential method.

Returns

Return type

Differential activity results

bayesian_differential(adata, cell_idx1, cell_idx2, n_samples=5000, use_permutations=True, n_permutations=5000, mode='change', delta=2.0, alpha=0.66, random_seed=False)[source]

Run Bayesian differential expression in latent space. Returns Bayes factor of all factors.

Parameters

adata (AnnData) – anndata single-cell object.
cell_idx1 (list) – indices of group 1.
cell_idx2 (list) – indices of group 2.
n_samples (int) – number of samples to draw from the latent space.
use_permutations (bool) – whether to use permutations when computing the double integral.
n_permutations (int) – number of permutations for MC integral.
mode (int) – differential activity test strategy. “vanilla” corresponds to [Lopez2018], “change” to [Boyeau2019].
delta (float) – for mode “change”, the differential threshold to be used.
random_seed (bool) – seed for reproducibility.

Returns

dictionary with results (Bayes Factor, Mean Absolute Difference)

Return type

res

forward(tensors)[source]

Forward pass through full network.

Parameters: tensors – input data
Returns: dictionary of output tensors
Return type: out_tensors

vae_loss(model_input, model_output)[source]

Custom loss for beta-VAE

Parameters

model_input – dict with input values
model_output – dict with output values

Returns

Return type

loss value for current batch

train_vega(learning_rate=0.0001, n_epochs=500, train_size=1.0, batch_size=128, shuffle=True, use_gpu=False, **kwargs)[source]

Main method to train VEGA.

Parameters

learning_rate (float) – learning rate
n_epochs (int) – number of epochs to train model
train_size (float) – a number between 0 and 1 to indicate the proportion of training data. Test size is set to 1-train_size
batch_size (int) – number of samples per batch
shuffle (bool) – whether to shuffle samples or not
use_gpu (bool) – whether to use GPU
**kwargs – other keyword arguments of the _train_model() method, like the early stopping patience