You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If one runs the Jacobian calculation with sub-sampling enabled such as follows: dyn.vf.jacobian(adata, regulators=top_pca_genes, effectors=top_pca_genes, sampling="velocity", sample_ncells=2000), subsequent perturbation calls will throw a KeyError:
{
"name": "IndexError",
"message": "index 2000 is out of bounds for axis 2 with size 2000",
"stack": "---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[21], line 1
----> 1 dyn.pd.perturbation(adata, genes = \"IRX3\", expression = [100], emb_basis=\"umap\")
2 dyn.pl.streamline_plot(adata, color=\"CellClass\", basis=\"umap_perturbation\", show_legend=True, theme = \"blue\", enforce = True)
3 dyn.pd.rank_perturbation_genes(adata, pkey = \"j_delta_x_perturbation\", output_values = True)
File ~/miniforge3/envs/dynamo_humanbrain/lib/python3.9/site-packages/dynamo/prediction/perturbation.py:290, in perturbation(adata, genes, expression, perturb_mode, cells, zero_perturb_genes_vel, pca_key, PCs_key, pca_mean_key, basis, emb_basis, jac_key, X_pca, delta_Y, projection_method, pertubation_method, J_jv_delta_t, delta_t, add_delta_Y_key, add_transition_key, add_velocity_key, add_embedding_key)
287 delta_X[i, :] = Js[:, :, i].dot(tmp[i] * J_jv_delta_t)
289 for i in np.arange(adata.n_obs):
--> 290 delta_Y[i, :] = Js[:, :, i].dot(delta_X[i] * delta_t)
292 if add_delta_Y_key is None:
293 add_delta_Y_key = pertubation_method + \"_perturbation\"
IndexError: index 2000 is out of bounds for axis 2 with size 2000"
}
Upon analysis, this appears to be due to this code in the dyn.pd.perturbation function which assumes that the dimensions of adata.n_obs and the third dimension of the Jacobian (Js) are equal:
Note these are lines 282 - 291 in dyn.pd.perturbation.
if pertubation_method.lower() in ["j_delta_x", "j_x_prime", "j_jv"]:
if pertubation_method.lower() == "j_delta_x":
delta_X = X_perturb_pca - X_pca
elif pertubation_method.lower() == "j_x_prime":
delta_X = X_perturb_pca
elif pertubation_method.lower() == "j_jv":
tmp = X_perturb_pca - X_pca
delta_X = np.zeros_like(X_pca)
for i in np.arange(adata.n_obs):
delta_X[i, :] = Js[:, :, i].dot(tmp[i] * J_jv_delta_t)
for i in np.arange(adata.n_obs):
delta_Y[i, :] = Js[:, :, i].dot(delta_X[i] * delta_t) #This is the line that throws the error, since `i` may be far greater than than the cells dimension of Js, if there are more cells in adata than were used to calculate Jacobian.
If one attempts to fix this by only calculating delta_Y for the indices that are in Js, one ends up with a very sparse perturbation prediction; that is to say, since not all cells have delta_Y computed, the perturbation vectors are only present in a handful of cells across the dataset, rendering streamline plots almost useless about predicted
I am afraid this is where my programming acumen reaches its limit. I am unsure how to fix this bug.
Code used to Reproduce
import dynamo as dyn
#Assuming one already has the adata loaded in and requisite prior steps (e.g. pre-processing, dynamics calculation, vector field calculation, etc. have been completed
dyn.pp.top_pca_genes(adata, n_top_genes=1999)
top_pca_genes = adata.var.index[adata.var.top_pca_genes]
dyn.vf.jacobian(adata, regulators=top_pca_genes, effectors=top_pca_genes, cores = 28, sampling="velocity", sample_ncells=2000)
dyn.pd.perturbation(adata, genes = "IRX3", expression = [100], emb_basis="umap")
Traceback
{
"name": "IndexError",
"message": "index 2000 is out of bounds for axis 2 with size 2000",
"stack": "---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[21], line 1
----> 1 dyn.pd.perturbation(adata, genes = \"IRX3\", expression = [100], emb_basis=\"umap\")
2 dyn.pl.streamline_plot(adata, color=\"CellClass\", basis=\"umap_perturbation\", show_legend=True, theme = \"blue\", enforce = True)
3 dyn.pd.rank_perturbation_genes(adata, pkey = \"j_delta_x_perturbation\", output_values = True)
File ~/miniforge3/envs/dynamo_humanbrain/lib/python3.9/site-packages/dynamo/prediction/perturbation.py:290, in perturbation(adata, genes, expression, perturb_mode, cells, zero_perturb_genes_vel, pca_key, PCs_key, pca_mean_key, basis, emb_basis, jac_key, X_pca, delta_Y, projection_method, pertubation_method, J_jv_delta_t, delta_t, add_delta_Y_key, add_transition_key, add_velocity_key, add_embedding_key)
287 delta_X[i, :] = Js[:, :, i].dot(tmp[i] * J_jv_delta_t)
289 for i in np.arange(adata.n_obs):
--> 290 delta_Y[i, :] = Js[:, :, i].dot(delta_X[i] * delta_t)
292 if add_delta_Y_key is None:
293 add_delta_Y_key = pertubation_method + \"_perturbation\"
IndexError: index 2000 is out of bounds for axis 2 with size 2000"
}
Expected behavior
Can dynamo calculate the perturbation using only the cells for which a Jacobian was calculated and then impute the resultant vectors to the nearest neighbors?
Session info:
-----
anndata 0.10.8
dynamo 1.3.2
gseapy 1.1.3
matplotlib 3.8.0
networkx 3.1
numpy 1.23.5
pandas 2.1.1
scanpy 1.10.3
seaborn 0.13.2
session_info 1.0.0
-----
PIL 10.0.1
asttokens NA
attr 23.1.0
attrs 23.1.0
backcall 0.2.0
beta_ufunc NA
binom_ufunc NA
biothings_client 0.3.1
bottleneck 1.3.5
brotli 1.0.9
cairo 1.23.0
cattr NA
cattrs NA
certifi 2024.08.30
cffi 1.16.0
charset_normalizer 2.0.4
colorcet 3.0.1
comm 0.2.1
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.7
decorator 5.1.1
exceptiongroup 1.0.4
executing 0.8.3
h5py 3.9.0
hypergeom_ufunc NA
idna 3.4
igraph 0.11.5
importlib_metadata NA
importlib_resources NA
invgauss_ufunc NA
ipykernel 6.25.0
ipywidgets 8.1.2
itsdangerous 2.2.0
jedi 0.18.1
joblib 1.2.0
kiwisolver 1.4.4
legacy_api_wrap NA
leidenalg 0.10.2
llvmlite 0.42.0
louvain 0.8.0
matplotlib_inline 0.1.6
mpl_toolkits NA
mygene 3.2.2
natsort 7.1.1
nbinom_ufunc NA
ncf_ufunc NA
nct_ufunc NA
ncx2_ufunc NA
numba 0.59.1
numdifftools 0.9.41
numexpr 2.8.7
packaging 23.1
parso 0.8.3
patsy 0.5.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 2.5.2
prompt_toolkit 3.0.36
psutil 5.9.0
ptyprocess 0.7.0
pure_eval 0.2.2
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.15.1
pynndescent 0.5.10
pyparsing 3.0.9
pytz 2023.3.post1
requests 2.31.0
requests_cache 1.2.0
scipy 1.10.1
setuptools 68.0.0
six 1.16.0
skewnorm_ufunc NA
sklearn 1.3.0
socks 1.7.1
stack_data 0.2.0
statsmodels 0.14.0
texttable 1.6.4
threadpoolctl 2.2.0
tornado 6.3.3
tqdm 4.65.0
traitlets 5.7.1
typing_extensions NA
ujson 5.10.0
umap 0.5.4
url_normalize 1.4.3
urllib3 1.26.18
vscode NA
wcwidth 0.2.5
yaml 6.0.1
zipp NA
zmq 25.1.0
zoneinfo NA
-----
IPython 8.15.0
jupyter_client 8.6.0
jupyter_core 5.5.0
-----
Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
Linux-6.8.0-45-generic-x86_64-with-glibc2.39
-----
Session information updated at 2024-10-03 08:05
The text was updated successfully, but these errors were encountered:
Thanks for raising this issue. Very interesting question. I'm curious about your use case for using a sampled Jacobian in perturbation analyses rather than calculating the Jacobian from the entire dataset.
The first part of your proposed feature, an option to calculate a "localized" perturbed response with the sub-sampling Jacobian (or any given subsets), is relatively straightforward to implement. However, the approach for imputing these localized results to the entire dataset requires further discussion. Would you mind sharing more details on or relevant literature on your use case/approach?
I down-sample the Jacobian calculation because my current compute environment has only 128 GB of RAM. Attempting to calculate the Jacobian on more than ~5,000 cells usually crashes the environment due to memory overflow.
Thanks for letting me know. In that case, the challenge will be how to impute the results across the entire dataset. Let me take a closer look at this issue, and I'll get back to you later. In the meantime, feel free to share any additional thoughts you might have.
If one runs the Jacobian calculation with sub-sampling enabled such as follows:
dyn.vf.jacobian(adata, regulators=top_pca_genes, effectors=top_pca_genes, sampling="velocity", sample_ncells=2000)
, subsequent perturbation calls will throw a KeyError:Upon analysis, this appears to be due to this code in the dyn.pd.perturbation function which assumes that the dimensions of adata.n_obs and the third dimension of the Jacobian (Js) are equal:
Note these are lines 282 - 291 in dyn.pd.perturbation.
If one attempts to fix this by only calculating delta_Y for the indices that are in Js, one ends up with a very sparse perturbation prediction; that is to say, since not all cells have delta_Y computed, the perturbation vectors are only present in a handful of cells across the dataset, rendering streamline plots almost useless about predicted
I am afraid this is where my programming acumen reaches its limit. I am unsure how to fix this bug.
Code used to Reproduce
Traceback
Expected behavior
Can dynamo calculate the perturbation using only the cells for which a Jacobian was calculated and then impute the resultant vectors to the nearest neighbors?
Session info:
The text was updated successfully, but these errors were encountered: