LIBERO Benchmarking Knowledge Transfer for Lifelong Robot Learning

Source

@inproceedings{liu_2023_libero, 
    author = {Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter}, 
    title = {{LIBERO}: Benchmarking Knowledge Transfer for Lifelong Robot Learning}, 
    booktitle = {Neural Information Processing Systems (NeurIPS)}, 
    year = {2023},  
} 

(The University of Texas at Austin)

arXiv

TL;DR

… General pipeline

Flash Reading

Abstract: Lifelong learning in decision-making involves both the transfer of declarative (concept) and procedural (action) knowledge. LIBERO is a benchmark for lifelong learning for robot manipulation tasks. It focuses on five topics: knowledge transfer efficiency, policy architecture, algorithms, robustness w.r.t. task ordering, and effect of model pretraining. It includes a procedural generation pipeline for task generation. For benchmarking, four task suites with 130 tasks in total are designed. It is found that sequential finetuning outperforms existing lifelong learning methods in forward transfer, no single visual encoder architecture excels in knowledge transfer, and naive supervised pretraining can hinder performance in LLDM.
Introduction: Lifelong learning amortize the learning process over the agent’s lifespan, with the goal of acquiring new skills quickly by leveraging prior knowledge (forward learning) and reinforcing previous skills (backward transfer). Existing benchmarks focus on declarative knowledge transfer, while this work focuses on a mix of declarative and procedural knowledge transfer in robot manipulation tasks. LIBERO enables continuous learning across an expanding set of diverse tasks that share concepts and actions through a procedural generation pipeline. It is found that (1) Policy architecture is crucial, where vision transformers work well with rich visual info and CNNs work well with tasks requiring more procedural knowledge; (2) Lifelong learning algorithms are effective in preventing forgetting but do not improve forward transfer compared to sequential finetuning; (3) Pretrained language embeddings of semantically-rich task descriptions do not improve performance than simple task IDs; (4) Basic supervised pretraining on large-scale datasets can hinder lifelong learning performance.
Background: A robot learning problem can be formulated as a MDP $(\mathcal{S}, \mathcal{A}, \mathcal{T}, H, R)$ with the objective to maximize the expected return $\max_\pi\mathbb{E}\sum^H_{t=1}R(s_t)$. In lifelong learning, a sequence of tasks ${\mathcal{T}^1, \mathcal{T}^2, …, \mathcal{T}^K}$ is learned sequentially with a single policy $\pi$.Each task is defined by the initial state distribution and the goal predicate. It is assumed that tasks share the same state and action spaces. Since it is challenging to have sparse rewards, a small demonstration dataset is provided for each task, where the sensory inputs are recorded along with the actions taken. The state at each step is an aggregation of history observations.
Research topics: (1) Transfer of different types of knowledge: Four task suites are designed, where three of them are for spatial knowledge transfer, object concept transfer, and task goals in a disentangled manner, and one is for mixed knowledge transfer. (2) Neural architecture design. (3) Lifelong learning algorithms. (4) Robustness to task ordering. (5) Effect of model pretraining.
LIBERO: A new task is generated in three steps: (1) extract behavioral templates from language annotations and generate sampled tasks described in natural language based on the templates; (2) specify an initial object distribution given a task description; (3) specify task goals using a propositional formula that aligns with the language instructions. This pipline is built on top of Robosuite [1]. Four task suites include LIBERO-SPATIAL (10 tasks), LIBERO-OBJECT (10 tasks), LIBERO-GOAL (10 tasks), and LIBERO-100 (100 tasks). Three lifelong learning algorithms are adapted [2]: Experience Replay (ER, memory-based), Elastic Weight Consolidation (EWC, regularization-based), and PackNet (dynamic-architecture-based). Additionally, sequential finetuning (SEQL) and multitask learning (MTL) are used as lower and upper bounds. Three vision-language policy architectures are implemented: ResNet-RNN, ResNet-T, and ViT-T.

References

[1] robosuite: A modular simulation framework and benchmark for robot learning, 2020. arXiv.
[2] Continual world: A robotic benchmark for continual reinforcement learning, NeurIPS 2021. arXiv.