You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking at the video demos on the project page [https://crossformer-model.github.io/] and having some questions on the locomotion results.
In the video, can you help to explain a bit what's the setting shown on the right half of the go1 locomotion video (it seems to be in simulator, is it a video of the the expert policy in sim)?
It reads from the paper "We collected the Go1 data by rolling out an expert policy trained with RL in simulation". Does it mean you train a policy in sim and deploy it in real and then collect data in real?
I'm also curious about the expert behavior in real. Can you provide some representative videos and info on the behaviors on the expert policy in real?
Thanks!
The text was updated successfully, but these errors were encountered:
The simulated quadruped videos on the website show Crossformer rollouts in sim. This model was trained on data collected from an RL-trained expert quadruped policy in sim. Later, we also tried the same process in real. We used an RL-trained expert policy trained in the real world to collect real data. Then, we re-trained Crossformer with the real quadruped data. The real quadruped videos on the website show rollouts of this Crossformer model.
As you can see from the real quadruped videos, we did not learn the best walking gait in real (it doesn't make good use of its front legs etc.). This is likely because we used an expert policy that was trained a while ago and on a different robot. Thus, we didn't get the most optimal data when we rolled out the expert policy due to some distribution shift. The expert data actually looked quite similar to the rollout videos we show on the website. We used an expert policy trained using APRL (https://github.com/realquantumcookie/APRL) so I'd check out that repo if you're curious about the expert policy. In principle simply re-training our expert policy and collecting better data should give us a better walking gait.
Congrats on the great work!
I'm looking at the video demos on the project page [https://crossformer-model.github.io/] and having some questions on the locomotion results.
In the video, can you help to explain a bit what's the setting shown on the right half of the go1 locomotion video (it seems to be in simulator, is it a video of the the expert policy in sim)?
It reads from the paper "We collected the Go1 data by rolling out an expert policy trained with RL in simulation". Does it mean you train a policy in sim and deploy it in real and then collect data in real?
I'm also curious about the expert behavior in real. Can you provide some representative videos and info on the behaviors on the expert policy in real?
Thanks!
The text was updated successfully, but these errors were encountered: