Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered simplex #38

Open
spinkney opened this issue Jul 21, 2022 · 7 comments
Open

Ordered simplex #38

spinkney opened this issue Jul 21, 2022 · 7 comments

Comments

@spinkney
Copy link
Collaborator

See https://discourse.mc-stan.org/t/ordered-simplex-constraint-transform/24102/17?u=spinkney

I'll add the Stan code for this. Martin helped a lot and this paper made it clear for me https://arxiv.org/pdf/0708.0176.pdf

@bob-carpenter
Copy link
Collaborator

Thanks. It's really remarkable how the ordering constraint works to give you order statistics. I still don't quite have my head around why that's true, but I've verified empirically. That is, we get the same distribution on alpha[1] from either:

ordered[N] alpha;
...
alpha ~ foo(theta);

and

vector[N] alpha_unordered;
...
alpha_unordered ~ foo(theta);
alpha = sort_asc(alpha_unordered);

Is the application to order statistics?

We're moving the language toward allowing composition of constraints. So rather than an ordered simplex, I think we can just compose the ordered transform with the simplex transform. I really want to get this into Stan to deal with the case of non-centered parameterizations of lognormal distributions.

@spinkney
Copy link
Collaborator Author

Is the application to order statistics?
One application is adding an additional identification constraint on mixtures or latent factor models.

The order statistic stuff is really cool. It's useful to model information given in high/low or quartiles. Lots of data is being aggregated for use in my line of work due to privacy laws. Personally, I'm in favor and it is an interesting modeling problem.

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 25, 2022

I don't see how we can combine the ordered and simplex transforms to get this transform and for it to be uniform because the constraints are coupled together. In that, the ordering impacts the simplex at each iteration in the loop below. Whenever I attempted to compose these two transforms together via one after another the ordered simplex transform doesn't give out a "uniform" ordered simplex (approximately uniform distance between each value). One of the end points is typically much too large.

I wonder how tfp manages to compose these two transforms together and if it's also uniform on the ordered simplex?

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  int Nm1 = N - 1;
}
parameters {
 vector[Nm1] y;
 
}
transformed parameters {
 simplex[N] x ;
 real log_abs_det = 0;

 {
   real remaining = 1; // Remaining amount to be distributed
   real lb = 0; // The minimum for the next element
   real ub = 1;
 
    for(i in 1:Nm1) {
      int N_prime = Nm1 + 2 - i; // Number of remaining elements
      // First constrain to [0; 1 / N_prime]
      ub = inv(N_prime);
      real xcons = ub * inv_logit(y[i]);
      log_abs_det += log(ub) + log_inv_logit(y[i]) + log1m_inv_logit(y[i]);

      // Add the lowest element log density
      log_abs_det += log(N_prime - 1) +  log(N_prime) + (N_prime - 2) * log1m(N_prime * xcons);
      
      x[i] = lb + remaining * xcons;
      lb = x[i];
      // We added  remaining * x_cons to each of the N_prime elements yet to be processed
      remaining -= remaining * xcons * N_prime; 
    }
    x[N] = lb + remaining;
 }

}
model {
 target += log_abs_det;
// target += target_density_lp(x, alpha);
}

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Jul 25, 2022

Whenever I attempted to compose these two transforms together via one after another the ordered simplex transform doesn't give out a "uniform" ordered simplex

Ah, that makes sense. We assume uniform on the unconstrained scale leads to uniform simplexes, but nothing says that uniform on the ordered scale leads to uniform ordered simplexes.

Also, it's not clear where we want a vector of 0 values to initialize (mean initialization value for Stan). If I put all 0s through the ordered transform, I get $\begin{bmatrix} 0 & 1 & \cdots & N-1 \end{bmatrix}$ in $N$ dimensions. That then leads to probabilities proportional to $\begin{bmatrix} \exp(0) & \exp(1) & \cdots & \exp(N - 1) \end{bmatrix}$

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 26, 2022

Also, it's not clear where we want a vector of 0 values to initialize (mean initialization value for Stan). If I put all 0s through the ordered transform, I get
[01⋯N−1] in N dimensions. That then leads to probabilities proportional to

Something neat about this is that the average value within $[0, N']$ for each $N' \in (1, \ldots, N)$ is $1/N'^2$. We can see that by noting that the cdf of the minimum is

$$ F(x) = \begin{cases} 1 - (1 - Nx)^{N-1} & 0 \le x \le 1/N \\ 1 & 1/N \le x \le 1 \end{cases} $$

The ccdf is just 1 minus this and we can use that to calculate the mean value.

$$ \begin{aligned} E(x) &= \int_{0}^{1/N} (1 - Nx)^{N-1} dx \\ &= -\frac{1}{N} \int_{1}^{0} u^{N-1} du \\ &= -\frac{u^N}{N^2} \bigg\vert_1^0 \\ &= \frac{1}{N^2} \end{aligned} $$

where I've substituted $u = 1 - Nx$ and $du = -N dx$

I can calculate an offset at each iteration s.t. the offset is equal to this average value. Is that what you mean?

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  real half_logN = 0.5 * log(N);
  int Nm1 = N - 1;
}
parameters {
 vector[Nm1] y;
 
}
transformed parameters {
 simplex[N] x ;
 real log_abs_det = 0;

 {
   real remaining = 1; // Remaining amount to be distributed
   real lb = 0; // The minimum for the next element
   real ub = 1;
 
    for(i in 1:Nm1) {
      int N_prime = Nm1 + 2 - i; // Number of remaining elements
      real off_set = logit(1/N_prime^2);
      //First constrain to [0; 1 / N_prime]
      ub = inv(N_prime);
      real xcons = ub * inv_logit(off_set + y[i]);
      log_abs_det += log(ub) + log_inv_logit(off_set + y[i]) + log1m_inv_logit(off_set + y[i]);

      // Add the lowest element log density
      log_abs_det += log(N_prime - 1) +  log(N_prime) + (N_prime - 2) * log1m(N_prime * xcons);
      
      x[i] = lb + remaining * xcons;
      lb = x[i];
      //We added  remaining * x_cons to each of the N_prime elements yet to be processed
      remaining -= remaining * xcons * N_prime; 
    }
    x[N] = lb + remaining;
 }

}
model {
 target += log_abs_det;
// target += target_density_lp(x, alpha);
}

@spinkney
Copy link
Collaborator Author

spinkney commented Jul 27, 2022

Another way to achieve the same thing is the "augmented approach". The proof is at https://mathoverflow.net/a/312594.

data {
 int<lower=0> N;
 // vector<lower=0>[N] alpha;
}
transformed data {
  vector[N] lsv = 1 ./ reverse(linspaced_vector(N, 1, N));
}
parameters {
 vector<lower=0>[N] a;
}
transformed parameters {
 vector[N] x;
 real a_sum = sum(a);

 for (i in 1:N) {
   x[N - i + 1] =  dot_product(a[1:N - i + 1], lsv[1:N - i + 1]);
 }
 x /= a_sum;
}
model {
 target += -a_sum; // a ~ exponential(1)
}
generated quantities {
  real x_sum = sum(x);
}

@spinkney
Copy link
Collaborator Author

spinkney commented Aug 30, 2022

My notes:
The spacings of a uniform ordered simplex follow

$$ s = \begin{bmatrix} \frac{1}{N(N)}, \frac{1}{N(N - 1)}, \cdots, \frac{1}{N} \end{bmatrix} $$

the expected values of the uniform simplex are then

$$ \operatorname{E}(x) = \begin{bmatrix} s_1, \sum\limits_{i = 1}^2 s_i, \cdots, \sum\limits_{i = 1}^N s_i \end{bmatrix} $$

Let's see if it works. In R

spacings <- function(N) {
  nums <- 0:(N - 1)
  s <- 1/(N * (N - nums))
  data.frame(vals = cumsum(s), spacings = s) 
 }
spacings(5)
> spacings(5)
       vals   spacings
1 0.0400000 0.04000000
2 0.0900000 0.05000000
3 0.1566667 0.06666667
4 0.2566667 0.10000000
5 0.4566667 0.20000000

In Stan

# A tibble: 5 × 10
  variable   mean median     sd    mad      q5   q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 x[1]     0.0399 0.0315 0.0327 0.0306 0.00248 0.106  1.00   45093.   28310.
2 x[2]     0.0900 0.0867 0.0454 0.0502 0.0217  0.170  1.00   45450.   32464.
3 x[3]     0.157  0.159  0.0550 0.0589 0.0640  0.245  1.00   42800.   31539.
4 x[4]     0.256  0.257  0.0670 0.0647 0.143   0.367  1.00   40390.   32386.
5 x[5]     0.457  0.437  0.118  0.115  0.298   0.685  1.00   43653.   32776.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants