Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perturbation will not run when only using n_cells for Jacobian calculation #705

Open
styounes opened this issue Oct 3, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@styounes
Copy link

styounes commented Oct 3, 2024

If one runs the Jacobian calculation with sub-sampling enabled such as follows: dyn.vf.jacobian(adata, regulators=top_pca_genes, effectors=top_pca_genes, sampling="velocity", sample_ncells=2000), subsequent perturbation calls will throw a KeyError:

{
	"name": "IndexError",
	"message": "index 2000 is out of bounds for axis 2 with size 2000",
	"stack": "---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 dyn.pd.perturbation(adata, genes = \"IRX3\", expression = [100], emb_basis=\"umap\")
      2 dyn.pl.streamline_plot(adata, color=\"CellClass\", basis=\"umap_perturbation\", show_legend=True, theme = \"blue\", enforce = True)
      3 dyn.pd.rank_perturbation_genes(adata, pkey = \"j_delta_x_perturbation\", output_values = True)

File ~/miniforge3/envs/dynamo_humanbrain/lib/python3.9/site-packages/dynamo/prediction/perturbation.py:290, in perturbation(adata, genes, expression, perturb_mode, cells, zero_perturb_genes_vel, pca_key, PCs_key, pca_mean_key, basis, emb_basis, jac_key, X_pca, delta_Y, projection_method, pertubation_method, J_jv_delta_t, delta_t, add_delta_Y_key, add_transition_key, add_velocity_key, add_embedding_key)
    287                 delta_X[i, :] = Js[:, :, i].dot(tmp[i] * J_jv_delta_t)
    289         for i in np.arange(adata.n_obs):
--> 290             delta_Y[i, :] = Js[:, :, i].dot(delta_X[i] * delta_t)
    292 if add_delta_Y_key is None:
    293     add_delta_Y_key = pertubation_method + \"_perturbation\"

IndexError: index 2000 is out of bounds for axis 2 with size 2000"
}

Upon analysis, this appears to be due to this code in the dyn.pd.perturbation function which assumes that the dimensions of adata.n_obs and the third dimension of the Jacobian (Js) are equal:

Note these are lines 282 - 291 in dyn.pd.perturbation.

if pertubation_method.lower() in ["j_delta_x", "j_x_prime", "j_jv"]:
    if pertubation_method.lower() == "j_delta_x":
        delta_X = X_perturb_pca - X_pca
    elif pertubation_method.lower() == "j_x_prime":
        delta_X = X_perturb_pca
    elif pertubation_method.lower() == "j_jv":
        tmp = X_perturb_pca - X_pca
        delta_X = np.zeros_like(X_pca)
        for i in np.arange(adata.n_obs):
            delta_X[i, :] = Js[:, :, i].dot(tmp[i] * J_jv_delta_t)

    for i in np.arange(adata.n_obs):
        delta_Y[i, :] = Js[:, :, i].dot(delta_X[i] * delta_t) #This is the line that throws the error, since `i` may be far greater than than the cells dimension of Js, if there are more cells in adata than were used to calculate Jacobian.

If one attempts to fix this by only calculating delta_Y for the indices that are in Js, one ends up with a very sparse perturbation prediction; that is to say, since not all cells have delta_Y computed, the perturbation vectors are only present in a handful of cells across the dataset, rendering streamline plots almost useless about predicted

I am afraid this is where my programming acumen reaches its limit. I am unsure how to fix this bug.

Code used to Reproduce

import dynamo as dyn

#Assuming one already has the adata loaded in and requisite prior steps (e.g. pre-processing, dynamics calculation, vector field calculation, etc. have been completed

dyn.pp.top_pca_genes(adata, n_top_genes=1999)
top_pca_genes = adata.var.index[adata.var.top_pca_genes]
dyn.vf.jacobian(adata, regulators=top_pca_genes, effectors=top_pca_genes, cores = 28, sampling="velocity", sample_ncells=2000)
dyn.pd.perturbation(adata, genes = "IRX3", expression = [100], emb_basis="umap")

Traceback

{
	"name": "IndexError",
	"message": "index 2000 is out of bounds for axis 2 with size 2000",
	"stack": "---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 dyn.pd.perturbation(adata, genes = \"IRX3\", expression = [100], emb_basis=\"umap\")
      2 dyn.pl.streamline_plot(adata, color=\"CellClass\", basis=\"umap_perturbation\", show_legend=True, theme = \"blue\", enforce = True)
      3 dyn.pd.rank_perturbation_genes(adata, pkey = \"j_delta_x_perturbation\", output_values = True)

File ~/miniforge3/envs/dynamo_humanbrain/lib/python3.9/site-packages/dynamo/prediction/perturbation.py:290, in perturbation(adata, genes, expression, perturb_mode, cells, zero_perturb_genes_vel, pca_key, PCs_key, pca_mean_key, basis, emb_basis, jac_key, X_pca, delta_Y, projection_method, pertubation_method, J_jv_delta_t, delta_t, add_delta_Y_key, add_transition_key, add_velocity_key, add_embedding_key)
    287                 delta_X[i, :] = Js[:, :, i].dot(tmp[i] * J_jv_delta_t)
    289         for i in np.arange(adata.n_obs):
--> 290             delta_Y[i, :] = Js[:, :, i].dot(delta_X[i] * delta_t)
    292 if add_delta_Y_key is None:
    293     add_delta_Y_key = pertubation_method + \"_perturbation\"

IndexError: index 2000 is out of bounds for axis 2 with size 2000"
}

Expected behavior
Can dynamo calculate the perturbation using only the cells for which a Jacobian was calculated and then impute the resultant vectors to the nearest neighbors?

Session info:

-----
anndata             0.10.8
dynamo              1.3.2
gseapy              1.1.3
matplotlib          3.8.0
networkx            3.1
numpy               1.23.5
pandas              2.1.1
scanpy              1.10.3
seaborn             0.13.2
session_info        1.0.0
-----
PIL                 10.0.1
asttokens           NA
attr                23.1.0
attrs               23.1.0
backcall            0.2.0
beta_ufunc          NA
binom_ufunc         NA
biothings_client    0.3.1
bottleneck          1.3.5
brotli              1.0.9
cairo               1.23.0
cattr               NA
cattrs              NA
certifi             2024.08.30
cffi                1.16.0
charset_normalizer  2.0.4
colorcet            3.0.1
comm                0.2.1
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.7
decorator           5.1.1
exceptiongroup      1.0.4
executing           0.8.3
h5py                3.9.0
hypergeom_ufunc     NA
idna                3.4
igraph              0.11.5
importlib_metadata  NA
importlib_resources NA
invgauss_ufunc      NA
ipykernel           6.25.0
ipywidgets          8.1.2
itsdangerous        2.2.0
jedi                0.18.1
joblib              1.2.0
kiwisolver          1.4.4
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.42.0
louvain             0.8.0
matplotlib_inline   0.1.6
mpl_toolkits        NA
mygene              3.2.2
natsort             7.1.1
nbinom_ufunc        NA
ncf_ufunc           NA
nct_ufunc           NA
ncx2_ufunc          NA
numba               0.59.1
numdifftools        0.9.41
numexpr             2.8.7
packaging           23.1
parso               0.8.3
patsy               0.5.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        2.5.2
prompt_toolkit      3.0.36
psutil              5.9.0
ptyprocess          0.7.0
pure_eval           0.2.2
pycparser           2.21
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.15.1
pynndescent         0.5.10
pyparsing           3.0.9
pytz                2023.3.post1
requests            2.31.0
requests_cache      1.2.0
scipy               1.10.1
setuptools          68.0.0
six                 1.16.0
skewnorm_ufunc      NA
sklearn             1.3.0
socks               1.7.1
stack_data          0.2.0
statsmodels         0.14.0
texttable           1.6.4
threadpoolctl       2.2.0
tornado             6.3.3
tqdm                4.65.0
traitlets           5.7.1
typing_extensions   NA
ujson               5.10.0
umap                0.5.4
url_normalize       1.4.3
urllib3             1.26.18
vscode              NA
wcwidth             0.2.5
yaml                6.0.1
zipp                NA
zmq                 25.1.0
zoneinfo            NA
-----
IPython             8.15.0
jupyter_client      8.6.0
jupyter_core        5.5.0
-----
Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
Linux-6.8.0-45-generic-x86_64-with-glibc2.39
-----
Session information updated at 2024-10-03 08:05
@styounes styounes added the bug Something isn't working label Oct 3, 2024
@Sichao25
Copy link
Collaborator

Thanks for raising this issue. Very interesting question. I'm curious about your use case for using a sampled Jacobian in perturbation analyses rather than calculating the Jacobian from the entire dataset.

The first part of your proposed feature, an option to calculate a "localized" perturbed response with the sub-sampling Jacobian (or any given subsets), is relatively straightforward to implement. However, the approach for imputing these localized results to the entire dataset requires further discussion. Would you mind sharing more details on or relevant literature on your use case/approach?

@styounes
Copy link
Author

I down-sample the Jacobian calculation because my current compute environment has only 128 GB of RAM. Attempting to calculate the Jacobian on more than ~5,000 cells usually crashes the environment due to memory overflow.

@Sichao25
Copy link
Collaborator

Thanks for letting me know. In that case, the challenge will be how to impute the results across the entire dataset. Let me take a closer look at this issue, and I'll get back to you later. In the meantime, feel free to share any additional thoughts you might have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants