Bayesian GPLVM using a specific latent input prior #1100

Soham6298 · 2024-10-21T12:51:59Z

I have a question regarding implementing a specific setup using Bayesian GPLVM framework.

Setup:

I want to estimate latent X such that

Y = f(X) + error

where Y is NxD (multi-output) and X is latent. I have a prior on X such that X* ~ N(X, s). Assuming s = 0.1, I would like to use the GPLVM framework to recover posterior latent X. Since the setup is a part of a simulation study, I have true X to compute rmse for recovery.

To that end, I am using:

def gpyfit(output, input_prior):
    Q = 1 #input_dim
    m_gplvm = GPy.models.bayesian_gplvm_minibatch.BayesianGPLVMMiniBatch(output, Q, num_inducing = 12, kernel=GPy.kern.RBF(Q))
    m_gplvm.X.set_prior = np.random.normal(X, 0.1)
    m_gplvm.kern.lengthscale = scp.stats.halfnorm.rvs()
    m_gplvm.kern.variance = scp.stats.halfnorm.rvs()
    m_gplvm.likelihood.variance = scp.stats.halfnorm.rvs()
    m_gplvm.optimize(messages=1, max_iters=5e4)
    return(m_gplvm)

As it is apparent, I am setting custom priors for the covariance function hyperparameters as well as error variance.

When I am using this setup, the rmse is absurdly high, which makes me think that I am making a mistake somewhere. It will be helpful to know in case someone has already tried out a similar problem scenario, or if I am making an obvious mistake.

Thanks!

The text was updated successfully, but these errors were encountered:

MartinBubel · 2024-10-24T15:43:33Z

Hi @Soham6298
I will take a look at this.
When testing (with a simple example), I did not experience large losses. Is it possible that you share a bit more of your code? At least in a way that, at best, I can work on similar data as you so comparability is ensured.

Soham6298 · 2024-10-25T13:26:09Z

Hello @MartinBubel ,

Thanks a lot for looking into this.

Since I am running some extensive simulation studies that are part of a more elaborate model comparison, I have set up a small notebook that tests the GPy on 50 simulated datasets. These datasets were generated from an exact squared exponential GP with the true parameters (length scale, marginal variance and error variance) sampled from the same distribution as the priors in the GPy model.

I compute the posterior RMSE for the latent input from the GPy model with the true X from my simulated data. I also have a naive RMSE which is basically RMSE when using the prior to the latent inputs compared to ground truth. I get the following result:

Naive RMSE:0.1349896448175324
Posterior RMSE:1.0165256882560296

I am attaching the data and my notebook so that you can run it for yourself.
GPyTest.tar.gz

MartinBubel · 2024-11-01T15:43:35Z

Hi @Soham6298

thanks for uploading an example!
I have looked at it but I'm afraid I need some more time until I can give a proper answer on this. Sorry! I hope this is not something urgent.

Best, Martin

Soham6298 · 2024-11-04T12:00:18Z

Hi @MartinBubel

Of course! Let me know in case you might need additional inputs from my side. Just to mention, I had also used Pyro GPLVM on the same datasets and the results for the pyro model are beyond the naive RMSE that I present in the above example.

Best,
Soham

MartinBubel self-assigned this Oct 24, 2024

MartinBubel added the need more info If an issue or PR needs further information by the issuer label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bayesian GPLVM using a specific latent input prior #1100

Bayesian GPLVM using a specific latent input prior #1100

Soham6298 commented Oct 21, 2024

MartinBubel commented Oct 24, 2024

Soham6298 commented Oct 25, 2024

MartinBubel commented Nov 1, 2024

Soham6298 commented Nov 4, 2024

Bayesian GPLVM using a specific latent input prior #1100

Bayesian GPLVM using a specific latent input prior #1100

Comments

Soham6298 commented Oct 21, 2024

MartinBubel commented Oct 24, 2024

Soham6298 commented Oct 25, 2024

MartinBubel commented Nov 1, 2024

Soham6298 commented Nov 4, 2024