Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using spacial distances masks some escape sites #161

Open
Bernadetadad opened this issue Mar 23, 2023 · 7 comments
Open

using spacial distances masks some escape sites #161

Bernadetadad opened this issue Mar 23, 2023 · 7 comments
Assignees

Comments

@Bernadetadad
Copy link
Collaborator

I ran COV-3600 antibody with the latest polyclonal version with and without spatial_distances and it gives very different results, namely, addition of spatial distance parameter significantly changes key escape sites seen before (such as site 420 and 486). It is not clear to me why that is the case. Attached is zipped html comparing escape with or without added spatial distances. COV-3600 barcode runs are in BA.2 repo.

One thing I noticed is that spacial distances table does not include information for distances between or within all chains. E.g., in the attached image I read in A, B and C chains but the spacial distances table only includes information for distances between sites 371-420 for chains A and C. This is the case regardless of which .pdb I use, is it supposed to work like this as the distances within and between chains should be different?

image

COV3600.html.zip

@jbloom
Copy link
Member

jbloom commented Mar 23, 2023

@Bernadetadad, just looking at this qualitatively, I don't think it is a bug per set but perhaps just limitations of spatial regularization.

First off, the chain thing is not a bug. If you look at the inter_residue_distances function you see it is returning the closest pair of sites across all chains, which are usually the ones in the same monomer. But the Polyclonal fitting does not know about chains, so the chain information isn't used. The intuition here is just that things that are close together should be in the same epitope, and they can be close together either by being in the same monomer or adjacent monomers. Usually it's the same monomer but could be adjacent one for monomer-bridging antibodies. In any case, the fitting uses the closest pair.

The regularization operates on the mean escape at a site, not the total. So if you click on the mean plots, you see for the non-regularized ones the mean escape is a lot higher for 420 and 487 than for 371, as at 371 only a single mutation escapes. Whether the normalization should actually be on the total is a sort of subjective question.

But I think the main issue here is that site 371 is not spatially proximal to sites 420 and 487 in the RBD if you look at the structure. This is because the 371 mutation probably affects up-down conformation of the RBD and is mostly acting that way. So the spatial regularization argues against them being in the same epitope as it doesn't know about things like RBD up-down.

It may be that the solution is to just drop the regularization weights. Right now reg_spatial2_weight is set to 0.001 by default (see here). You could either set it all the way to zero, or just decrease it modestly, like by another order of magnitude, and see if that helps?

If decreasing it helps, maybe see if you think that also helps for other fitting. If so, let me know and we could potentially change the default.

Anyway, can you report back on this issue what you find?

@Bernadetadad
Copy link
Collaborator Author

Setting reg_spatial2_weight=0 works like spatial_distances = None, which makes sense.
Need to set reg_spatial2_weight at least 100x lower to see similar escape for site 487 as for spatial_distances = None and at least 1000x lower to see escape at site 420 (not sure if the weight is doing anything with that low value).

@jbloom
Copy link
Member

jbloom commented Mar 23, 2023

OK, maybe try on sera etc and see if you think it is better to just set default weight to zero or make it smaller, and if this should be overall polyclonal default or just something we tune.

@Bernadetadad
Copy link
Collaborator Author

Just a follow up, including spacial regularization for antibodies with the pipeline default values (reg_spatial2_weight:0.001) significantly improves correlation in escape values between biological replicates (that makes some sense I think), but in several antibodies now I’ve seen that this leads to loss of what should be strong escape sites, so for mAbs I’m now setting reg_spatial2_weight: 0.000001 , which retains all escape sites observed without spacial regularization and gives a small increase in correlation between biological replicates relative to reg_spatial2_weight: 0.

@jbloom
Copy link
Member

jbloom commented Mar 28, 2023

I will update defaults on this some after we also decide about antibody count defaults.

@jbloom
Copy link
Member

jbloom commented Mar 28, 2023

@fwelsh, do you have a thought on good spatial regularization defaults for your data?

@fwelsh
Copy link
Collaborator

fwelsh commented Mar 28, 2023

@jbloom I don't use spatial regularization if I'm only fitting one epitope. For multiple epitopes, I could get reasonable deconvolution if I set reg_spatial2_weight to around 0.001 or 0.01, which is quite high.

I don't really understand the logic behind using spatial regularization for single-epitope models? Penalizing the model for trying to put distant sites in the same epitope makes sense. But when we're just fitting one epitope, this seems like it would add unreasonable constraints and artificially skew the data towards a more targeted immune profile. I could just be misunderstanding the role of spatial regularization here, though, let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants