Source
@Article{Jin_2024_RobotGPT,
author={Jin, Yixiang and Li, Dingzhe and A, Yong and Shi, Jun and Hao, Peng and Sun, Fuchun and Zhang, Jianwei and Fang, Bin},
journal={IEEE Robotics and Automation Letters},
title={{RobotGPT}: Robot Manipulation Learning From {ChatGPT}},
year={2024},
volume={9},
number={3},
pages={2543-2550},
doi={10.1109/LRA.2024.3357432}
}
| (Samsung Research, China) | IEEE | arXiv |
TL;DR
ChatGPT generates demos + DRL (SDQfD)
Flash Reading
- Abstract: ChatGPT cannot guarantee the stability and safety of robot manipulation. Setting the temperature to 0.0 can improve the stability and safety, but it also reduces the diversity. This framework includes an effective prompt design and a robust learning model. A metric for measuring task difficulty is proposed.
- Introduction: Prompt structure with a self-correction module. Deploy an agent learning the strategies from ChatGPT.
- Related Work: This work enhances the stability of LLMs in robot control. There are some benchmarks for robot learning, such as RLBench [1] and BulletArm [3].
- Methodology: ChatGPT is used to generate demonstrations to train the robot. ChatGPT is used in two ways: code generation (task + examples) and error correction (runtime errors and task failures). Challenges of prompting LLMs for robotics [2]: (a) Require a complete and accurate problem description, (b) natural language described APIs, (c) biasing the answer structure. Propose five-part prompting method: background description, object info, environment info, task info, and examples. Background info and RobotAPI are set as system messages in the ChatGPT API. For complex tasks, ChatGPT can produce minor errors. An interactive rectification method is proposed to correct the errors. The GPT-generated demonstrations are in the simulation environment. For robot learning, the state space is composed of a top-down height map, an in-hand image, and gripper state (0/1). The action space includes robot skill (PICK & PLACE) and target pose. The reward function is sparse. The learning algorithm is SDQfD [4].
- Experiments: Focus on sim-to-real and superiority over non-LLM methods. The metric of task difficulty consists of three aspects: the number of objects, object categories, and the number of task steps.
References
- [1] RLBench: The Robot Learning Benchmark & Learning Environment, RA-L 2020. Online.
- [2] ChatGPT for Robotics: Design Principles and Model Abilities, IEEE Access 2023. Online.
- [3] BulletArm: An open-source robotic manipulation benchmark and learning framework, ISRR 2022. Online.
- [4] Policy learning in SE(3) action spaces, CoRL 2020. Online.
Extensions
Demo:
- Background description: A robot arm with two-fingered gripper. There is a plane with height (z axis) 0, and several building blocks with shape of (x, y, x) on it. Output an executable Python code as concise as possible.
- Object info: 1. object_name: CUBE_0, shape: CUBE, size: [0.03 0.03 0.03]; 2. object_name: CUBE_1, shape: CUBE, size: [0.03 0.03 0.03].
- RobotAPI info: Given a Panda robot arm with a two-fingered gripper, there are two functions you can use only: 1. envs.pickObject(object_name); 2. envs.placeObjectOn(object_name, relative_position=[dx, dy])
- Task info: Stack the two cubes together.
[Prompt] -> [DecisionBot] -> [Code] -> [CodeErrorCheck] -> [ResultErrorCheck (EvaluationBot & Human)] -> [CorrectorBot] or [DRL & Robot]