Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the locomotion results #1

Closed
Haichao-Zhang opened this issue Sep 12, 2024 · 2 comments
Closed

Question on the locomotion results #1

Haichao-Zhang opened this issue Sep 12, 2024 · 2 comments

Comments

@Haichao-Zhang
Copy link

Haichao-Zhang commented Sep 12, 2024

Congrats on the great work!

I'm looking at the video demos on the project page [https://crossformer-model.github.io/] and having some questions on the locomotion results.

  1. In the video, can you help to explain a bit what's the setting shown on the right half of the go1 locomotion video (it seems to be in simulator, is it a video of the the expert policy in sim)?

  2. It reads from the paper "We collected the Go1 data by rolling out an expert policy trained with RL in simulation". Does it mean you train a policy in sim and deploy it in real and then collect data in real?
    I'm also curious about the expert behavior in real. Can you provide some representative videos and info on the behaviors on the expert policy in real?

Thanks!

@HomerW
Copy link
Contributor

HomerW commented Sep 23, 2024

Hi, sorry for the late reply!

The simulated quadruped videos on the website show Crossformer rollouts in sim. This model was trained on data collected from an RL-trained expert quadruped policy in sim. Later, we also tried the same process in real. We used an RL-trained expert policy trained in the real world to collect real data. Then, we re-trained Crossformer with the real quadruped data. The real quadruped videos on the website show rollouts of this Crossformer model.

As you can see from the real quadruped videos, we did not learn the best walking gait in real (it doesn't make good use of its front legs etc.). This is likely because we used an expert policy that was trained a while ago and on a different robot. Thus, we didn't get the most optimal data when we rolled out the expert policy due to some distribution shift. The expert data actually looked quite similar to the rollout videos we show on the website. We used an expert policy trained using APRL (https://github.com/realquantumcookie/APRL) so I'd check out that repo if you're curious about the expert policy. In principle simply re-training our expert policy and collecting better data should give us a better walking gait.

Hope this helps!

@Haichao-Zhang
Copy link
Author

@HomerW, thanks a lot for the detailed explainations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants