Yuan Meng1,, Xiangtong Yao1, Kejia Chen1, Yansong Wu1, Liding Zhang1, Achim Lilienthal1, Zhenshan Bing2, Alois Knoll1,
1The School of Computation, Information and Technology, Technical University of Munich, Germany
2State Key Laboratory for Novel Software Technology, Nanjing University, China
The official implementation of robotic long-horizon manipulation reinforcement learning framework -- HELIOS: Hierarchical Encoding of Long-horizon Inference with Off-policy Bayesian Non-parametric Skills Prior
- python 3.7+
- mujoco 2.0 (for RL experiments)
- Ubuntu 20.04 LTS or 22.04 LTS
# Setup the environment
conda create -n helios python=3.8
# Install dependencies and packages
cd helios
pip3 install -r requirements.txt
pip3 install -e .
# Experiments folder stores trained models
# Data folder stores external data libraries
mkdir ./experiments
mkdir ./data
export EXP_DIR=./experiments
export DATA_DIR=./data
Follow the D4RL Fork link and install the fork according to instructions. This fork includes the new key 'completed_tasks' in the Kitchen environment, which is necessary for the correct RL phase.
WandB is used for logging the training process. Before running any of the commands below, create an account and then change the WandB entity and project name at the top of train.py and rl/train.py to match your account.
To train a DPM based Generalized Skill Prior model, run:
python3 helios/train.py --path=helios/configs/skill_prior_learning/kitchen/helios_h_cl --val_data_size=160 --gpu=0
After DPM based Skill Prior model is trained, to train HELIOS agent on the franka kitchen long-horizon tasks, run (change the index number of your available GPU device accordingly):
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=0 --prefix=helios_kitchen_seed0 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=1 --prefix=helios_kitchen_seed1 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=2 --prefix=helios_kitchen_seed2 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=3 --prefix=helios_kitchen_seed3 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=4 --prefix=helios_kitchen_seed4 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=5 --prefix=helios_kitchen_seed5 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=6 --prefix=helios_kitchen_seed6 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=7 --prefix=helios_kitchen_seed7 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=8 --prefix=helios_kitchen_seed8 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/helios_cl --seed=9 --prefix=helios_kitchen_seed9 --gpu=0
- Run Vanilla SAC:
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=0 --prefix=SAC_kitchen_seed0 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=1 --prefix=SAC_kitchen_seed1 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=2 --prefix=SAC_kitchen_seed2 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=3 --prefix=SAC_kitchen_seed3 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=4 --prefix=SAC_kitchen_seed4 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=5 --prefix=SAC_kitchen_seed5 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=6 --prefix=SAC_kitchen_seed6 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=7 --prefix=SAC_kitchen_seed7 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=8 --prefix=SAC_kitchen_seed8 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/SAC --seed=9 --prefix=SAC_kitchen_seed9 --gpu=0
- Run SPIRL (closed-loop):
# for Skill Prior learning (single Gaussian)
python3 helios/train.py --path=helios/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 --gpu=0
# for downstream hierarchical RL policy
# Note: change the load path of pretrained prior before you run training.
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=0 --prefix=SPIRL_kitchen_seed0 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=1 --prefix=SPIRL_kitchen_seed1 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=2 --prefix=SPIRL_kitchen_seed2 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=3 --prefix=SPIRL_kitchen_seed3 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=4 --prefix=SPIRL_kitchen_seed4 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=5 --prefix=SPIRL_kitchen_seed5 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=6 --prefix=SPIRL_kitchen_seed6 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=7 --prefix=SPIRL_kitchen_seed7 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=8 --prefix=SPIRL_kitchen_seed8 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/spirl_cl --seed=9 --prefix=SPIRL_kitchen_seed9 --gpu=0
- Train Single-step action prior:
python3 helios/train.py --path=helios/configs/skill_prior_learning/kitchen/flat --val_data_size=160 --gpu=0
- Run SAC w/ single-step action prior:
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=0 --prefix=flatPrior_kitchen_seed0 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=1 --prefix=flatPrior_kitchen_seed1 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=2 --prefix=flatPrior_kitchen_seed2 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=3 --prefix=flatPrior_kitchen_seed3 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=4 --prefix=flatPrior_kitchen_seed4 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=5 --prefix=flatPrior_kitchen_seed5 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=6 --prefix=flatPrior_kitchen_seed6 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=7 --prefix=flatPrior_kitchen_seed7 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=8 --prefix=flatPrior_kitchen_seed8 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=9 --prefix=flatPrior_kitchen_seed9 --gpu=0
- Run BC + finetune:
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=0 --prefix=bcFinetune_kitchen_seed0 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=1 --prefix=bcFinetune_kitchen_seed1 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=2 --prefix=bcFinetune_kitchen_seed2 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=3 --prefix=bcFinetune_kitchen_seed3 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=4 --prefix=bcFinetune_kitchen_seed4 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=5 --prefix=bcFinetune_kitchen_seed5 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=6 --prefix=bcFinetune_kitchen_seed6 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=7 --prefix=bcFinetune_kitchen_seed7 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=8 --prefix=bcFinetune_kitchen_seed8 --gpu=0
python3 helios/rl/train.py --path=helios/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=9 --prefix=bcFinetune_kitchen_seed9 --gpu=0
- Run Skill Space Policy w/o prior:
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=0 --prefix=SSP_noPrior_kitchen_seed0 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=1 --prefix=SSP_noPrior_kitchen_seed1 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=2 --prefix=SSP_noPrior_kitchen_seed2 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=3 --prefix=SSP_noPrior_kitchen_seed3 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=4 --prefix=SSP_noPrior_kitchen_seed4 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=5 --prefix=SSP_noPrior_kitchen_seed5 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=6 --prefix=SSP_noPrior_kitchen_seed6 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=7 --prefix=SSP_noPrior_kitchen_seed7 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=8 --prefix=SSP_noPrior_kitchen_seed8 --gpu=0
python3 helios/rl/train.py --path=helios/configs/hrl/kitchen/no_prior/ --seed=9 --prefix=SSP_noPrior_kitchen_seed9 --gpu=0
Again, all commands can be run on maze / block stacking
by replacing kitchen
with the respective environment in the paths
(after downloading the datasets).
The default hyperparameters are defined in the respective model files, e.g. in SkillPriorMdl
for the HELIOS model. SPiRL_DPMM_Mdl
defines DPM related hyperparameters. Modifications to these parameters can be defined through the experiment config files (passed to the respective command via the --path
variable). For an example, see kitchen/hierarchical/conf.py
.
All code that is dataset-specific should be placed in a corresponding subfolder in helios/data
.
To add a data loader for a new dataset, the Dataset
classes from data_loader.py
need to be subclassed
and the __getitem__
function needs to be overwritten to load a single data sample. The output dict
should include the following
keys:
dict({
'states': (time, state_dim) # state sequence (for state-based prior inputs)
'actions': (time, action_dim) # action sequence (as skill input for training prior model)
'images': (time, channels, width, height) # image sequence (for image-based prior inputs)
})
All datasets used with the codebase so far have been based on HDF5
files. The GlobalSplitDataset
provides functionality to read all
HDF5-files in a directory and split them in train/val/test
based on percentages. The VideoDataset
class provides
many functionalities for manipulating sequences, like randomly cropping subsequences, padding etc.
To add a new RL environment, simply define a new environent class in helios/rl/envs
that inherits from the environment interface
in helios/rl/components/environment.py
.
Start by defining a model class in the helios/models
directory that inherits from the BaseModel
or SkillPriorMdl
class.
The new model needs to define the architecture in the constructor (e.g. by overwriting the build_network()
function),
implement the forward pass and loss functions,
as well as model-specific logging functionality if desired. For an example, see helios/models/skill_prior_mdl.py
.
Note, that most basic architecture components (MLPs, CNNs, LSTMs, GRUs, Flow models etc) are defined in helios/modules
and can be
conveniently reused for easy architecture definitions. Below are some links to the most important classes.
Component | File | Description |
---|---|---|
MLP | Predictor |
Basic N-layer fully-connected network. Defines number of inputs, outputs, layers and hidden units. |
CNN-Encoder | ConvEncoder |
Convolutional encoder, number of layers determined by input dimensionality (resolution halved per layer). Number of channels doubles per layer. Returns encoded vector + skip activations. |
CNN-Decoder | ConvDecoder |
Mirrors architecture of conv. encoder. Can take skip connections as input, also versions that copy pixels etc. |
Processing-LSTM | BaseProcessingLSTM |
Basic N-layer LSTM for processing an input sequence. Produces one output per timestep, number of layers / hidden size configurable. |
Processing-GRU | BaseProcessingGRU |
Basic N-layer GRU for processing an input sequence. Produces one output per timestep, number of layers / hidden size configurable. |
Prediction-LSTM | RecurrentPredictor |
Same as processing LSTM, but for autoregressive prediction. |
Mixture-Density Network | MDN |
MLP that outputs GMM distribution. |
Normalizing Flow Model | NormalizingFlowModel |
Implements normalizing flow model that stacks multiple flow blocks. Implementation for RealNVP block provided. |
HELIOS Skill Prior | SPiRL_DPMM_Mdl |
Model of DPM-based Skill Prior learning |
DPM Learning | ModelTrainer |
Train loop modified for DPM fitting |
DPM-based Loss functions | DPMM_KLDivLoss |
modified loss functions, involving weighted sum of KL-divergencies |
The core RL algorithms are implemented within the Agent
class. For adding a new algorithm, a new file needs to be created in
helios/rl/agents
and BaseAgent
needs to be subclassed. In particular, any required
networks (actor, critic etc) need to be constructed and the update(...)
function needs to be overwritten. For an example,
see the SAC implementation in SACAgent
.
The main SPIRL skill prior regularized RL algorithm is implemented in ActionPriorSACAgent
.
helios
|- components # reusable infrastructure for model training
| |- base_model.py # basic model class that all models inherit from
| |- checkpointer.py # handles storing + loading of model checkpoints
| |- data_loader.py # basic dataset classes, new datasets need to inherit from here
| |- evaluator.py # defines basic evaluation routines, eg top-of-N evaluation, + eval logging
| |- logger.py # implements core logging functionality using tensorboardX
| |- params.py # definition of command line params for model training
| |- trainer_base.py # basic training utils used in main trainer file
|
|- configs # all experiment configs should be placed here
| |- data_collect # configs for data collection runs
| |- default_data_configs # defines one default data config per dataset, e.g. state/action dim etc
| |- hrl # configs for hierarchical downstream RL
| |- rl # configs for non-hierarchical downstream RL
| |- skill_prior_learning # configs for skill embedding and prior training (both hierarchical and flat)
|
|- data # any dataset-specific code (like data generation scripts, custom loaders etc)
|- models # holds all model classes that implement forward, loss, visualization
|- modules # reusable architecture components (like MLPs, CNNs, LSTMs, Flows etc)
|- rl # all code related to RL
| |- agents # implements core algorithms in agent classes, like SAC etc
| |- components # reusable infrastructure for RL experiments
| |- agent.py # basic agent and hierarchial agent classes - do not implement any specific RL algo
| |- critic.py # basic critic implementations (eg MLP-based critic)
| |- environment.py # defines environment interface, basic gym env
| |- normalization.py # observation normalization classes, only optional
| |- params.py # definition of command line params for RL training
| |- policy.py # basic policy interface definition
| |- replay_buffer.py # simple numpy-array replay buffer, uniform sampling and versions
| |- sampler.py # rollout sampler for collecting experience, for flat and hierarchical agents
| |- envs # all custom RL environments should be defined here
| |- policies # policy implementations go here, MLP-policy and RandomAction are implemented
| |- utils # utilities for RL code like MPI, WandB related code
| |- train.py # main RL training script, builds all components + runs training
|
|- utils # general utilities, pytorch / visualization utilities etc
|- train.py # main model training script, builds all components + runs training loop and logging
The implementation of this work inherits from three repositories: