VEGA
- class vega.VEGA(adata, gmt_paths=None, add_nodes=1, min_genes=0, max_genes=5000, positive_decoder=True, encode_covariates=False, regularizer='mask', reg_kwargs=None, **kwargs)[source]
Constructor for class VEGA (VAE Enhanced by Gene Annotations).
- Parameters
adata (
AnnData
) – scanpy single-cell object. Please run setup_anndata() before passing to VEGA.gmt_paths (
Union
[str
,list
,None
]) – one or more paths to .gmt files for GMVs initialization.add_nodes (
int
) – additional fully-connected nodes in the mask.min_genes (
int
) – minimum gene size for GMVs.max_genes (
int
) – maximum gene size for GMVs.positive_decoder (
bool
) – whether to constrain decoder to positive weightsencode_covariates (
bool
) – whether to encode covariates along gene expressionregularizer (
str
) – which regularization strategy to use (l1, gelnet, mask). Default: mask.reg_kwargs (
Optional
[dict
]) – parameters for regularizer.**kwargs –
- use_cuda
using CPU (False) or CUDA (True).
- beta
weight for KL-divergence.
- dropout
dropout rate in model
- z_dropout
dropout rate for the latent space (for correlation).
- save(path, save_adata=False, save_history=False, overwrite=False, save_regularizer_kwargs=True)[source]
Save model parameters to input directory. Saving Anndata object and training history is optional.
- Parameters
path (
str
) – path to save directorysave_adata (
bool
) – whether to save the Anndata object in the save directorysave_history (
bool
) – whether to save the training history in the save directorysave_regularizer_kwargs (
bool
) – whether to save regularizer hyperparameters (lambda, penalty matrix…) in the save directory
- classmethod load(path, adata=None, device=device(type='cpu'), reg_kwargs=None)[source]
Load model from directory. If adata=None, try to reload Anndata object from saved directory.
- Parameters
path (
str
) – path to save directoryadata (
Optional
[AnnData
]) – scanpy single cell objectdevice (
device
) – CPU or CUDA
- encode(X, batch_index, cat_covs=None)[source]
Encode data in latent space (Inference step).
- Parameters
X – input data
batch_index – batch information for samples
cat_covs – categorical covariates
- Returns
z – data in latent space
mu – mean of variational posterior
logvar – log-variance of variational posterior
- decode(z, batch_index, cat_covs=None)[source]
Decode data from latent space.
- Parameters
z – data embedded in latent space
batch_index – batch information for samples
cat_covs – categorical covariates.
- Returns
decoded data
- Return type
X_rec
- sample_latent(mu, logvar)[source]
Sample latent space with reparametrization trick. First convert to std, sample normal(0,1) and get Z.
- Parameters
mu – mean of variational posterior
logvar – log-variance of variational posterior
- Returns
sampled latent space
- Return type
eps
- to_latent(adata=None, indices=None, return_mean=False)[source]
Project data into latent space. Inspired by SCVI.
- Parameters
adata (
Optional
[AnnData
]) – scanpy single-cell datasetindices (
Optional
[list
]) – indices of the subset of cells to be encodedreturn_mean (
bool
) – whether to use the mean of the multivariate gaussian or samples
- generative(adata=None, indices=None, use_mean=True)[source]
Generate new samples from input data (encode-decode).
- Parameters
adata (
Optional
[AnnData
]) – scanpy single-cell datasetindices (
Optional
[list
]) – indices of the subset of cells to be encodeduse_mean (
bool
) – whether to use the mean of the multivariate gaussian or samples
- differential_activity(groupby, adata=None, group1=None, group2=None, mode='change', delta=2.0, fdr_target=0.05, **kwargs)[source]
Bayesian differential activity procedures for GMVs. Similar to scVI [Lopez2018] Bayesian DGE but for latent variables. Differential results are saved in the adata object and returned as a DataFrame.
- Parameters
groupby (
str
) – anndata object field to group cells (eg. “cell type”)adata (
Optional
[AnnData
]) – scanpy single-cell object. If None, use Anndata attribute of VEGA.group1 (
Union
[str
,list
,None
]) – reference group(s).group2 (
Union
[str
,list
,None
]) – outgroup(s).mode (
str
) – differential activity mode. If “vanilla”, uses [Lopez2018], if “change” uses [Boyeau2019].delta (
float
) – differential activity threshold for “change” mode.fdr_target (
float
) – minimum FDR to consider gene as DE.**kwargs – optional arguments of the bayesian_differential method.
- Returns
- Return type
Differential activity results
- bayesian_differential(adata, cell_idx1, cell_idx2, n_samples=5000, use_permutations=True, n_permutations=5000, mode='change', delta=2.0, alpha=0.66, random_seed=False)[source]
Run Bayesian differential expression in latent space. Returns Bayes factor of all factors.
- Parameters
adata (
AnnData
) – anndata single-cell object.cell_idx1 (
list
) – indices of group 1.cell_idx2 (
list
) – indices of group 2.n_samples (
int
) – number of samples to draw from the latent space.use_permutations (
bool
) – whether to use permutations when computing the double integral.n_permutations (
int
) – number of permutations for MC integral.mode (
int
) – differential activity test strategy. “vanilla” corresponds to [Lopez2018], “change” to [Boyeau2019].delta (
float
) – for mode “change”, the differential threshold to be used.random_seed (
bool
) – seed for reproducibility.
- Returns
dictionary with results (Bayes Factor, Mean Absolute Difference)
- Return type
res
- forward(tensors)[source]
Forward pass through full network.
- Parameters
tensors – input data
- Returns
dictionary of output tensors
- Return type
out_tensors
- vae_loss(model_input, model_output)[source]
Custom loss for beta-VAE
- Parameters
model_input – dict with input values
model_output – dict with output values
- Returns
- Return type
loss value for current batch
- train_vega(learning_rate=0.0001, n_epochs=500, train_size=1.0, batch_size=128, shuffle=True, use_gpu=False, **kwargs)[source]
Main method to train VEGA.
- Parameters
learning_rate (
float
) – learning raten_epochs (
int
) – number of epochs to train modeltrain_size (
float
) – a number between 0 and 1 to indicate the proportion of training data. Test size is set to 1-train_sizebatch_size (
int
) – number of samples per batchshuffle (
bool
) – whether to shuffle samples or notuse_gpu (
bool
) – whether to use GPU**kwargs – other keyword arguments of the _train_model() method, like the early stopping patience