How "trajectory divergence term" is calculated in compute_cost function in algorithm_traj_opt.py #98

wangsd01 · 2017-11-20T23:58:32Z

Could you help to give any reference to this part of code? Thank you!

def compute_costs(self, m, eta, augment=True):
    """ Compute cost estimates used in the LQR backward pass. """
    traj_info, traj_distr = self.cur[m].traj_info, self.cur[m].traj_distr
    if not augment:  # Whether to augment cost with term to penalize KL
        return traj_info.Cm, traj_info.cv

    multiplier = self._hyperparams['max_ent_traj']
    fCm, fcv = traj_info.Cm / (eta + multiplier), traj_info.cv / (eta + multiplier)
    K, ipc, k = traj_distr.K, traj_distr.inv_pol_covar, traj_distr.k

    # Add in the trajectory divergence term.
    for t in range(self.T - 1, -1, -1):
        fCm[t, :, :] += eta / (eta + multiplier) * np.vstack([
            np.hstack([
                K[t, :, :].T.dot(ipc[t, :, :]).dot(K[t, :, :]),
                -K[t, :, :].T.dot(ipc[t, :, :])
            ]),
            np.hstack([
                -ipc[t, :, :].dot(K[t, :, :]), ipc[t, :, :]
            ])
        ])
        fcv[t, :] += eta / (eta + multiplier) * np.hstack([
            K[t, :, :].T.dot(ipc[t, :, :]).dot(k[t, :]),
            -ipc[t, :, :].dot(k[t, :])
        ])

    return fCm, fcv

The text was updated successfully, but these errors were encountered:

wangsd01 · 2017-11-29T19:38:11Z

This part is to add divergence of predicted trajectory and sampled trajectory as additional cost.
i.e. (Kx + k - u).T * inverse_policy_variance_matrix * (Kx+k -u)
u is sampled action from data.
Kx + k is predicted action from global policy network.

WilsonWangTHU · 2018-08-14T01:05:40Z

@wangsd01 Hi, I am also looking at these lines. Have you solved the problem?

I am not 100% sure what's happening, but one thing that looks especially suspicious to me is that the derivative to u is Cov^{-1}.dot(k_old).

In the code repo, by looking at the forward pass, it uses u = Kx + k, instead of u = K(x-x_old) + k + u_old.
And therefore, I kinda feel that if we actually take the derivative of the KL penalty wrt u, we will have something like
Cov^{-1}.dot(u_new - u_old) = Cov^{-1}.dot(K_new.dot(x) - K_old.dot(x) + k_new - k_old) != Cov^{-1}.dot(k_old).

Not sure if I missed anything. Be great if you could help :( @cbfinn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How "trajectory divergence term" is calculated in compute_cost function in algorithm_traj_opt.py #98

How "trajectory divergence term" is calculated in compute_cost function in algorithm_traj_opt.py #98

wangsd01 commented Nov 20, 2017 •

edited

Loading

wangsd01 commented Nov 29, 2017

WilsonWangTHU commented Aug 14, 2018

How "trajectory divergence term" is calculated in compute_cost function in algorithm_traj_opt.py #98

How "trajectory divergence term" is calculated in compute_cost function in algorithm_traj_opt.py #98

Comments

wangsd01 commented Nov 20, 2017 • edited Loading

wangsd01 commented Nov 29, 2017

WilsonWangTHU commented Aug 14, 2018

wangsd01 commented Nov 20, 2017 •

edited

Loading