Jiawei Ren* Mingyuan Zhang* Cunjun Yu* Xiao Ma Liang Pan Ziwei Liu†
S-Lab, Nanyang Technological University
National University of Singapore
Dyson Robot Learning Lab
*equal contribution
†corresponding author
Dyson Robot Learning Lab
*equal contribution
†corresponding author
NeurIPS 2023
conda create -n insactor python==3.9 -y
conda activate insactor
# diffmimic
python -m pip install --upgrade "jax[cuda]==0.4.2" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# diffplanner
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
conda install -c bottler nvidiacub -y
conda install -c fvcore -c iopath -c conda-forge fvcore iopath -y
conda install pytorch3d -c pytorch3d -y
python -m pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$(pwd)download models
mkdir pretrained_models && cd pretrained_models
gdown --fuzzy https://drive.google.com/file/d/1qdglrZJa5ago2nkSWc5Nu52kvLaS_dTg/view?usp=drive_link # Human-ML skills 0.001
gdown --fuzzy https://drive.google.com/file/d/1Wi33YZWR8K6IuWZdcXdV2mqcivcyObIU/view?usp=drive_link # Human-ML planner
cd ..streamlit run tools/demo.py --server.port 8501The WebUI should be available at localhost:8501.
#prepare data
mkdir data/datasets
gdown https://drive.google.com/file/d/1knGb1Kt9VUu377vcXIDfQN_qcAYSYPez/view?usp=drive_link --fuzzy && unzip human_pml3d.zip && rm human_pml3d.zip && mv human_pml3d data/datasets/human_pml3d
gdown https://drive.google.com/file/d/1_YQwV5kqZgSHhOJ_V9MwDjKO5yYfsdDP/view?usp=drive_link --fuzzy && unzip kit_pml.zip && rm kit_pml.zip && mv kit_pml data/datasets/kit_pml
# prepare contrasitive models
mkdir data/evaluators
gdown https://drive.google.com/file/d/1aoiq702L6fy4yCsX_ub1MxD_mvj-VmrQ/view?usp=drive_link --fuzzy && mv humanml.pth data/evaluators
gdown https://drive.google.com/file/d/1fyGAi2NHHvUNgDN0YgB1ND_7buVel_SA/view?usp=drive_link --fuzzy && mv kit.pth data/evaluators
# KIT
export CONTROLLER_PARAM_PATH="pretrained_models/skill_kit_0.01.pkl"
python -u tools/test.py configs/planner/kit.py --work-dir=work_dirs/eval --physmode=normal pretrained_models/planner_kit.pth
# Humanml
export CONTROLLER_PARAM_PATH="pretrained_models/skill_human_0.01.pkl"
python -u tools/test.py configs/planner/human.py --work-dir=work_dirs/eval --physmode=normal pretrained_models/planner_humanml.pth
# Humanml (Perturb)
export CONTROLLER_PARAM_PATH="pretrained_models/skill_human_0.01.pkl"
python -u tools/test.py configs/planner/human.py --work-dir=work_dirs/eval --physmode=normal --perturb true pretrained_models/planner_humanml.pth
# Humanml (Waypoint)
export CONTROLLER_PARAM_PATH="pretrained_models/skill_human_0.01.pkl"
python -u tools/test_waypoint.py configs/planner/human.py --work-dir=work_dirs/eval --physmode=normal pretrained_models/planner_humanml.pthDifference from the paper results:
- We improved the low-level trakcer training by adding a gradient truncation.
- We fixed the contrastive model for humanml-3d.
More models
cd pretrained_models
gdown --fuzzy https://drive.google.com/file/d/10gFEWUdZtMIA-6yhd_gZXbYm9snMZ63m/view?usp=drive_link # KIT-ML skills 0.001
gdown --fuzzy https://drive.google.com/file/d/1oyT5DE5ItZb1KNSV0lW85cLk9w4alGPV/view?usp=drive_link # Human-ML skills 0.0
gdown --fuzzy https://drive.google.com/file/d/1fvW3RtFU8sOGj2rLZTT7gGSOzLCvNkUW/view?usp=drive_link # KIT-ML skills 0.0
gdown --fuzzy https://drive.google.com/file/d/17ut3gymJpDrPt4nsIhPcug0A0HevqGkE/view?usp=drive_link # Human-ML skills 0.01
gdown --fuzzy https://drive.google.com/file/d/1Ijosu_4W2eIg72kK2IuGaR0wbEhxw6CU/view?usp=drive_link # KIT-ML skills 0.01
gdown --fuzzy https://drive.google.com/file/d/10WrKJ4v1u6DwCYb8KT9s2aP3n2BBn9ST/view?usp=drive_link # KIT-ML planner
cd ..The evaluation script will save diffusion plans at planned_traj.npy and simulated motion in simulated_traj.npy.
streamlit run visualize.pyThen input the trajectory file path (e.g., planned_traj.npy), or drag and drop the trajectory file.
#prepare data
gdown https://drive.google.com/file/d/15qjv-tREix2kJ5kvaRgxTXgoKQgHPq6L/view?usp=drive_link --fuzzy && unzip kit_ml_raw_processed.zip && rm kit_ml_raw_processed.zip && mv kit_ml_raw_processed data/kit_ml_raw_processed # KIT
gdown https://drive.google.com/file/d/1WCiuqQOeIpu2sFjnF20aJiNmYLFMkNxU/view?usp=drive_link --fuzzy && unzip humanml3d_processed.zip && rm humanml3d_processed.zip && mv humanml3d_processed data/humanml3d_processed # Human-ML
# train
python mimic.py --config configs/controller/human_0.01.yamlpython -u tools/train.py configs/planner/kit.py --work-dir=planner_kit # kit
python -u tools/train.py configs/planner/human.py --work-dir=planner_human # humanml@article{ren2023insactor,
title={InsActor: Instruction-driven Physics-based Characters},
author={Ren, Jiawei and Zhang, Mingyuan and Yu, Cunjun and Ma, Xiao and Pan, Liang and Liu, Ziwei},
journal={NeurIPS},
year={2023}
}
Explore More Motrix Projects
- [SMPL-X] [TPAMI'25] SMPLest-X: An extended version of SMPLer-X with stronger foundation models.
- [SMPL-X] [NeurIPS'23] SMPLer-X: Scaling up EHPS towards a family of generalist foundation models.
- [SMPL-X] [ECCV'24] WHAC: World-grounded human pose and camera estimation from monocular videos.
- [SMPL-X] [CVPR'24] AiOS: An all-in-one-stage pipeline combining detection and 3D human reconstruction.
- [SMPL-X] [NeurIPS'23] RoboSMPLX: A framework to enhance the robustness of whole-body pose and shape estimation.
- [SMPL-X] [ICML'25] ADHMR: A framework to align diffusion-based human mesh recovery methods via direct preference optimization.
- [SMPL-X] MKA: Full-body 3D mesh reconstruction from single- or multi-view RGB videos.
- [SMPL] [ICCV'23] Zolly: 3D human mesh reconstruction from perspective-distorted images.
- [SMPL] [IJCV'26] PointHPS: 3D HPS from point clouds captured in real-world settings.
- [SMPL] [NeurIPS'22] HMR-Benchmarks: A comprehensive benchmark of HPS datasets, backbones, and training strategies.
- [SMPL-X] [ICLR'26] ViMoGen: A comprehensive framework that transfers knowledge from ViGen to MoGen across data, modeling, and evaluation.
- [SMPL-X] [ECCV'24] LMM: Large Motion Model for Unified Multi-Modal Motion Generation.
- [SMPL-X] [NeurIPS'23] FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing.
- [SMPL] InfiniteDance: A large-scale 3D dance dataset and an MLLM-based music-to-dance model designed for robust in-the-wild generalization.
- [SMPL] [NeurIPS'23] InsActor: Generating physics-based human motions from language and waypoint conditions via diffusion policies.
- [SMPL] [ICCV'23] ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model.
- [SMPL] [TPAMI'24] MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model.
