Temporal Difference Learning for Model Predictive Control

Source

@inproceedings{Hansen_2022_tdmpc,
    title = {Temporal Difference Learning for Model Predictive Control},
    author = {Hansen, Nicklas and Wang, Xiaolong and Su, Hao},
    year = 2022,
    booktitle = {International Conference on Machine Learning (ICML)},
    publisher = {PMLR}
}

(UC San Diego) | arXiv

TL;DR

Concept

Flash Reading

References

Extension

MPPI is an MPC algorithm that iteratively updates parameters for a family of distributions using an importance weighted average of the estimated top-k sampled trajectories.

The core idea is to simulate thousands of trajectory rollouts using the model, where each one has random inputs. The best input is the weighted average of the top-k trajectories, where the weights are based on the trajectory costs. An input sequence is determined by adding random noise to a nominal one. The solver for MPPI is stochastic optimization and gradient-free.