Generalizing from References

Abstract

Learning agile humanoid behaviors from human motion offers a powerful route to natural, coordinated control, but existing approaches face a persistent trade-off: reference-tracking policies are often brittle outside the demonstration dataset, while purely task-driven reinforcement learning can achieve adaptability at the cost of motion quality. We introduce a multi-task RL training paradigm that bridges this gap by treating reference motion as a prior for behavioral shaping rather than a deployment-time constraint. A goal-conditioned policy is trained jointly on two tasks that share the same observation and action spaces, but differ in their initialization schemes, command spaces, and reward structures: a reference-guided imitation task in which reference trajectories define dense imitation rewards but are not provided as policy inputs, and a goal-conditioned generalization task in which goals are sampled independently of any reference and rewards reflect only task success. By co-optimizing these objectives within a shared observation space, the policy acquires structured, human-like motor skills from dense reference supervision while learning to adapt these skills to novel goals and initial conditions. This is achieved without adversarial objectives, explicit trajectory tracking, phase variables, or reference-dependent inference. We evaluate the method on a challenging box-based parkour playground that demands diverse athletic behaviors such as jumping and climbing, and show that the learned controller transfers beyond the reference distribution while preserving motion naturalness. Finally, we demonstrate that the learned skills can be composed in long-horizon scenarios using a simple state-machine-based evaluation protocol, highlighting their robustness and ability to generalize across diverse task conditions.

Long-Horizon Skill Composition

A rule-based composer sequences learned skills into long-horizon parkour rollouts.

Generalization Results

The goal-conditioned policy adapts reference-shaped skills to novel goals and initial conditions.

Climb

Jump

Climb Down

MuJoCo Results

These rollouts show sim-to-sim transfer of composed skills in a different physics engine.

Scenario 1

Scenario 2

Method Extension

Using the same framework, a single policy can learn multiple skills with perceptual input.

BibTeX

@article{wang2026generalizing,
  title={Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework},
  author={Wang, Jiashun and Mungai, M Eva and Li, He and Sleiman, Jean Pierre and Hodgins, Jessica and Farshidian, Farbod},
  journal={arXiv preprint arXiv:2602.20375},
  year={2026}
}
}