You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On porting HIL-SERL to LeRobot. This page will outline the minimal list of components and tasks that should be implemented in the LeRobot codebase. The official reference of implementation is available in JAX here.
We will coordinate on discord #port-hil-serl.
We will update this page with ID of owners of each components. We encourage several people to work in team on each component.
You don’t need to write extensive code on your own to make a valuable contribution. Any input on a sub-component, however small, is appreciated. Feel free to add extra component to the list, if needed; this is only a guide, and we welcome more ideas.
Note: In parallel, we are refactoring the codebase, thus you don't need to refactor yourself. Do not hesitate to copy files and code elements to arrive at a first working version as fast as possible.
RLPD (Reinforcement Learning with Prior Data)
Goal: Develop the base RL algorithm for HIL-SERL. RLPD is an off-policy RL algorithm that leverages offline data.
Training script in lerobot/scripts/train.py for offline and online data buffers and dataloader. TD-MPC implementation in LeRobot lerobot/common/policies/tdmpc/.
Human Interventions
Goal: Develop the mechanism to add human interventions during online training. HIL-SERL uses a 3DSpaceMouse mouse to control the robot's end-effector. We can use the leader arm to do that.
Tasks:
Define logic and function to stop the policy and take over in record function Possibly interfaced with keyboard keys to stop the policy and give a few seconds for the user to be ready to take-over.
Define the necessary functions for the leader to follow the position of the follower and start from the same position at the moment of intervention.
Define the logic to differentiate between the data collected from human interventions vs the offline data and online data ; by adding an extra column to the HF dataset when adding new episodes.
Define the sampling logic proposed in HIL-SERL for each category of data, e.g., play on the sampler weights by giving 1 for the offline data, 1 for the online data and 2 for the human interventions.
Useful links:
Alex's TD-MPC real fork checkout some of the scripts he made for real world training.
Goal: Build a reward classifier that returns a sparse reward indicating whether the frame is in a terminal success state or not.
Tasks:
Define logic to label frames of the collected trajectories with SUCCESS or FAILURE. Ideally if the demonstration is successful we can label the last few timesteps as success and vice versa.
Define a reward classifier class that learns to categorize the observations with rewards {-1, 0, 1}. Zero can be for the frames in the middle of an episode before reaching a terminal state, or we can do it in a binary fashion.
Integrate the reward classifier either in lerobot/scripts/eval.py or in the RLPD code to query the reward every time a new frame is added to the online dataset.
Useful links:
HIL-SERL paper appendix B.
Other Implementations
Several implementations proposed in HIL-SERL are key for it to work efficiently or to improve the overall performance. Here are a few that can be added to LeRobot.
Pre-process images: They utilize image cropping to focus on area of interests. Images are resized in the paper to 128x128. (see this PR to be merged that add resize function Allow arbitrary camera frame dimensions #459 and create a new PR that add cropping function)
Augment proprioception with Velocity and Force feedback: Augment observation space with joint velocities/torques in ManipulatorRobot ; we need to make sure the name linked to the address is the same for feetech and dynamixel motors. This is the velocity for feetech Present_Speed and torque for feetech Present_Current.
Penalize gripper actions: Add a penalty on the gripper actions during grasping tasks to avoid unnecessary use of the gripper.
Add simluation support: To simplify experimentation we can try to run HIL-SERL on sim. The main simulation enviornment for now is gym_lowcostrobot. All these components can be tested in sim. Further tasks that resemble those in the paper can be added as well.
Note: The paper uses end-effector control and velocity control for dynamic tasks, but our first implementation won't include them.
The text was updated successfully, but these errors were encountered:
We can use the pushT environment to test if RLPD is working properly. This will also allow us to compare to our baseline RL algorithm TD-MPC.
PushT has two modes of observations, an image state and a privileged vector state with 'keypoints'. Training with the keypoints state is an easier task that can be useful to quickly validate that your implementation is working. Training with the image state is our end-goal.
You can try training PushT with our TD-MPC to get a better idea. Here are the relevant config files and training commands. Make sure that you enable wandb and set it properly on your system so that you can monitor training and observe the eval runs.
Thanks for initiating this! I would actually recommend using Cartesian space control whenever you can do that, as in our experience it simplifies a lot of stuff in the learning process.
But I guess many audience in this PR are also interested in using RL for low-cost robots which don't have built-in EE control, so I am also curious how that works in practice.
HIL-SERL in LeRobot
On porting HIL-SERL to LeRobot. This page will outline the minimal list of components and tasks that should be implemented in the LeRobot codebase. The official reference of implementation is available in JAX here.
We will coordinate on discord #port-hil-serl.
We will update this page with ID of owners of each components. We encourage several people to work in team on each component.
You don’t need to write extensive code on your own to make a valuable contribution. Any input on a sub-component, however small, is appreciated. Feel free to add extra component to the list, if needed; this is only a guide, and we welcome more ideas.
Note: In parallel, we are refactoring the codebase, thus you don't need to refactor yourself. Do not hesitate to copy files and code elements to arrive at a first working version as fast as possible.
RLPD (Reinforcement Learning with Prior Data)
lerobot/lerobot/common/policies/hilserl
lerobot/scripts/train.py
for offline and online data buffers and dataloader. TD-MPC implementation in LeRobotlerobot/common/policies/tdmpc/
.Human Interventions
record
function Possibly interfaced with keyboard keys to stop the policy and give a few seconds for the user to be ready to take-over.Reward Classifier
lerobot/scripts/eval.py
or in the RLPD code to query the reward every time a new frame is added to the online dataset.Other Implementations
ManipulatorRobot
; we need to make sure the name linked to the address is the same for feetech and dynamixel motors. This is the velocity for feetechPresent_Speed
and torque for feetechPresent_Current
.Note: The paper uses end-effector control and velocity control for dynamic tasks, but our first implementation won't include them.
The text was updated successfully, but these errors were encountered: