Installtion

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Qingping Sun^{1, 2}, Yanjun Wang¹, Ailing Zeng³, Wanqi Yin¹, Chen Wei¹, Wenjia Wang⁵,
Haiyi Mei¹, Chi Sing Leung², Ziwei Liu⁴, Lei Yang^{1, 5}, Zhongang Cai^{✉, 1, 4, 5},

¹SenseTime Research, ²City University of Hong Kong,
³International Digital Economy Academy (IDEA),
⁴S-Lab, Nanyang Technological University, ⁵Shanghai AI Laboratory

AiOS performs human localization and SMPL-X estimation in a progressive manner. It is composed of (1) the body localization stage that predicts coarse human location; (2) the Body refinement stage that refines body features and produces face and hand locations; (3) the Whole-body Refinement stage that refines whole-body features and regress SMPL-X parameters.

Preparation

download all datasets
- AGORA
- BEDLAM
- MSCOCO
- UBody
- ARCTIC
- EgoBody
- EHF
process all datasets into HumanData format. We provided the proccessed npz file, which can be download from here.
download SMPL-X
download AiOS checkpoint

The file structure should be like:

AiOS/
├── config/
└── data
    ├── body_models
    |   ├── smplx
    |   |   ├──MANO_SMPLX_vertex_ids.pkl
    |   |   ├──SMPL-X__FLAME_vertex_ids.npy
    |   |   ├──SMPLX_NEUTRAL.pkl
    |   |   ├──SMPLX_to_J14.pkl
    |   |   ├──SMPLX_NEUTRAL.npz
    |   |   ├──SMPLX_MALE.npz
    |   |   └──SMPLX_FEMALE.npz
    |   └── smpl
    |       ├──SMPL_FEMALE.pkl
    |       ├──SMPL_MALE.pkl
    |       └──SMPL_NEUTRAL.pkl
    ├── preprocessed_npz
    │   └── cache
    |       ├──agora_train_3840_w_occ_cache_2010.npz
    |       ├──bedlam_train_cache_080824.npz
    |       ├──...
    |       └──coco_train_cache_080824.npz
    ├── checkpoint
    │   └── aios_checkpoint.pth
    ├── datasets
    │   ├── agora
    |   │    └──3840x2160
    │   │        ├──train
    │   │        └──test
    │   ├── bedlam
    │   │     ├──train_images
    │   │     └──test_images
    │   ├── ARCTIC
    │   │     ├──s01
    │   │     ├──s02
    │   │     ├──...   
    │   │     └──s10
    │   ├── EgoBody
    │   │     ├──egocentric_color
    │   │     └──kinect_color
    │   └── UBody
    |       └──images
    └── checkpoint
        ├── edpose_r50_coco.pth
        └── aios_checkpoint.pth

Installtion

# Create a conda virtual environment and activate it.
conda create -n aios python=3.8 -y
conda activate aios

# Install PyTorch and torchvision.
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

# Install Pytorch3D
git clone -b v0.6.1 https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
pip install -v -e .
cd ..

# Install MMCV, build from source
git clone -b v1.6.1 https://github.com/open-mmlab/mmcv.git
cd mmcv
export MMCV_WITH_OPS=1
export FORCE_MLU=1
pip install -v -e .
cd ..

# Install other dependencies
conda install -c conda-forge ffmpeg
pip install -r requirements.txt 

# Build deformable detr
cd models/aios/ops
python setup.py build install
cd ../../..

Inference

Place the mp4 video for inference under AiOS/demo/
Prepare the pretrained models to be used for inference under AiOS/data/checkpoint
Inference output will be saved in AiOS/demo/{INPUT_VIDEO}_out

# CHECKPOINT: checkpoint path
# INPUT_VIDEO: input video path
# OUTPUT_DIR: output path
# NUM_PERSON: num of person. This parameter sets the expected number of persons to be detected in the input (image or video). 
#   The default value is 1, meaning the algorithm will try to detect at least one person. If you know the maximum number of persons
#   that can appear simultaneously, you can set this variable to that number to optimize the detection process (a lower threshold is recommended as well).
# THRESHOLD: socre threshold. This parameter sets the score threshold for person detection. The default value is 0.5. 
#   If the confidence score of a detected person is lower than this threshold, the detection will be discarded. 
#   Adjusting this threshold can help in filtering out false positives or ensuring only high-confidence detections are considered.
# GPU_NUM: GPU num. 
sh scripts/inference.sh {CHECKPOINT} {INPUT_VIDEO} {OUTPUT_DIR} {NUM_PERSON} {THRESHOLD} {THRESHOLD}

# For inferencing short_video.mp4 with output directory of demo/short_video_out
sh scripts/inference.sh data/checkpoint/aios_checkpoint.pth short_video.mp4 demo 2 0.1 8

Test

	NMVE		NMJE		MVE				MPJPE
DATASETS	FB	B	FB	B	FB	B	F	LH/RH	FB	B	F	LH/RH
BEDLAM	87.6	57.7	85.8	57.7	83.2	54.8	26.2	28.1/30.8	81.5	54.8	26.2	25.9/28.0
AGORA-Test	102.9	63.4	100.7	62.5	98.8	60.9	27.7	42.5/43.4	96.7	60.0	29.2	40.1/41.0
AGORA-Val	105.1	60.9	102.2	61.4	100.9	60.9	30.6	43.9/45.6	98.1	58.9	32.7	41.5/43.4

a. Make test_result dir

mkdir test_result

b. AGORA Validatoin

Run the following command and it will generate a 'predictions/' result folder which can evaluate with the agora evaluation tool

sh scripts/test_agora_val.sh data/checkpoint/aios_checkpoint.pth agora_val

b. AGORA Test Leaderboard

Run the following command and it will generate a 'predictions.zip' which can be submitted to AGORA Leaderborad

sh scripts/test_agora.sh data/checkpoint/aios_checkpoint.pth agora_test

c. BEDLAM

Run the following command and it will generate a 'predictions.zip' which can be submitted to BEDLAM Leaderborad

sh scripts/test_bedlam.sh data/checkpoint/aios_checkpoint.pth bedlam_test

Acknowledge

Some of the codes are based on MMHuman3D, ED-Pose and SMPLer-X.

Citation

@InProceedings{Sun_2024_CVPR,
    author    = {Sun, Qingping and Wang, Yanjun and Zeng, Ailing and Yin, Wanqi and Wei, Chen and Wang, Wenjia and Mei, Haiyi and Leung, Chi-Sing and Liu, Ziwei and Yang, Lei and Cai, Zhongang},
    title     = {AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {1834-1843}
}

Explore More Motrix Projects

Motion Capture

[SMPL-X] [TPAMI'25] SMPLest-X: An extended version of SMPLer-X with stronger foundation models.
[SMPL-X] [NeurIPS'23] SMPLer-X: Scaling up EHPS towards a family of generalist foundation models.
[SMPL-X] [ECCV'24] WHAC: World-grounded human pose and camera estimation from monocular videos.
[SMPL-X] [NeurIPS'23] RoboSMPLX: A framework to enhance the robustness of whole-body pose and shape estimation.
[SMPL-X] [ICML'25] ADHMR: A framework to align diffusion-based human mesh recovery methods via direct preference optimization.
[SMPL-X] MKA: Full-body 3D mesh reconstruction from single- or multi-view RGB videos.
[SMPL] [ICCV'23] Zolly: 3D human mesh reconstruction from perspective-distorted images.
[SMPL] [IJCV'26] PointHPS: 3D HPS from point clouds captured in real-world settings.
[SMPL] [NeurIPS'22] HMR-Benchmarks: A comprehensive benchmark of HPS datasets, backbones, and training strategies.

Motion Generation

[SMPL-X] [ICLR'26] ViMoGen: A comprehensive framework that transfers knowledge from ViGen to MoGen across data, modeling, and evaluation.
[SMPL-X] [ECCV'24] LMM: Large Motion Model for Unified Multi-Modal Motion Generation.
[SMPL-X] [NeurIPS'23] FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing.
[SMPL] InfiniteDance: A large-scale 3D dance dataset and an MLLM-based music-to-dance model designed for robust in-the-wild generalization.
[SMPL] [NeurIPS'23] InsActor: Generating physics-based human motions from language and waypoint conditions via diffusion policies.
[SMPL] [ICCV'23] ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model.
[SMPL] [TPAMI'24] MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model.

Motion Dataset

[SMPL] [ECCV'22] HuMMan: Toolbox for HuMMan, a large-scale multi-modal 4D human dataset.
[SMPLX] [T-PAMI'24] GTA-Human: Toolbox for GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
config		config
data		data
datasets		datasets
demo		demo
detrsmpl		detrsmpl
models		models
scripts		scripts
util		util
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.py		main.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Preparation

Installtion

Inference

Test

Acknowledge

Citation

Explore More Motrix Projects

Motion Capture

Motion Generation

Motion Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

MotrixLab/AiOS

Folders and files

Latest commit

History

Repository files navigation

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Preparation

Installtion

Inference

Test

Acknowledge

Citation

Explore More Motrix Projects

Motion Capture

Motion Generation

Motion Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages