Add high-level API for the pose3d inference pipeline #14

deruyter92 · 2026-02-10T13:40:55Z

Introduces fmpose3d/fmpose3d.py:

A clean, high-level inference API that wraps the full FMPose3D pipeline (2D pose estimation → 3D lifting) into a single module.

Requires merging this branch and PR Add configs and model registry #13, which introduce a model registry and configurations. (Ideally the entire repository makes use of these configurations rather that argparser and referenced paths to python files for loading the model. But the current additions work in parallel to existing code)
First version of configurable API for easy 3d pose prediction from image paths, arrays, or 2d pose keypoints.
Integration with huggingface and DeepLabCut will follow soon.

Key components

A new HRNetPose2d module inside the hrnet lib, that is configurable without argparser and can handle image arrays intead of paths
HRNetEstimator — Thin wrapper around HRNetPose2d + YOLO that lazily loads models and converts COCO keypoints to H36M format following the demo script
FMPose3DInference — Main entry point with a two-step workflow:
- prepare_2d(source) — Runs HRNet to produce 2D keypoints from flexible input (image path, directory of images, numpy array, or list of paths/arrays).
- pose_3d(keypoints_2d, image_size) — Lifts 2D keypoints to 3D via Euler ODE sampling with optional flip-augmentation and camera-to-world transform.
- predict(source) — Convenience method that chains both steps end-to-end.

Design choices

All heavy resources (models, weights) are lazily loaded on first use via setup_runtime().
The 3D lifting loop faithfully mirrors the logic in demo/vis_in_the_wild.py (flip augmentation, independent noise samples, un-flip & average, root-zeroing, camera-to-world).
Supports an optional seed parameter for reproducible sampling and an optional progress callback for UI integration.
Configurable via ModelConfig, InferenceConfig, and HRNetConfig dataclasses.
Input ingestion (_ingest_input) — Accepts str/Path (single image or directory), ndarray (single frame or batch), or lists thereof. Video files are explicitly rejected for now with NotImplementedError, but these can be implemented in the future.

Example usage

from fmpose3d import FMPose3DInference

# 1. Create the inference API (point it at your checkpoint)
api = FMPose3DInference(model_weights_path="path/to/FMpose3D_pretrained_weights.pth")

# 2. End-to-end: image → 3D poses (runs HRNet 2D detection + 3D lifting)
result = api.predict("photo.jpg", seed=42)
print(result.poses_3d.shape)        # (1, 17, 3)  root-relative
print(result.poses_3d_world.shape)  # (1, 17, 3)  world coordinates

# — or step-by-step for more control: —

# 2a. Estimate 2D keypoints
result_2d = api.prepare_2d("photo.jpg")
print(result_2d.keypoints.shape)    # (num_persons, 1, 17, 2)

# 2b. Lift to 3D
result_3d = api.pose_3d(
    result_2d.keypoints,
    image_size=result_2d.image_size,
    seed=42,
)
print(result_3d.poses_3d_world[0])  # (17, 3) world-coordinate skeleton

More elaborate script for comparing with demo

import numpy as np
import torch
from PIL import Image

from fmpose3d.inference import FMPose3DInference

# --------------------------------------------------------------------------
# Paths
# --------------------------------------------------------------------------
example_image = "./demo/images/running.png"
weights = "./pre_trained_models/fmpose3d_h36m/FMpose3D_pretrained_weights.pth"

# Saved demo predictions (produced by demo/vis_in_the_wild.py)
previous_3d_path = "./demo/predictions/running/pose3D/0000_3D.npz"
previous_2d_path = "./demo/predictions/running/input_2D/keypoints.npz"

# --------------------------------------------------------------------------
# Load previous demo results
# --------------------------------------------------------------------------
previous_3d = np.load(previous_3d_path)["pose3d"]          # (17, 3), world coords
previous_2d = np.load(previous_2d_path)["reconstruction"]  # (P, F, 17, 2)

# --------------------------------------------------------------------------
# Read image to get correct (H, W)
# --------------------------------------------------------------------------
_img = Image.open(example_image)
img_w, img_h = _img.size  # PIL returns (W, H)
print(f"Image size: H={img_h}, W={img_w}")

# --------------------------------------------------------------------------
# Run API
# --------------------------------------------------------------------------
SEED = 42

api = FMPose3DInference(model_weights_path=weights)

# -- 2D keypoints ---------------------------------------------------------
result_2d = api.prepare_2d(source=example_image)

print("\n--- 2D keypoints ---")
print("API keypoints shape :", result_2d.keypoints.shape)
print("Demo keypoints shape:", previous_2d.shape)
print("2D keypoints match  :", np.allclose(result_2d.keypoints, previous_2d))

# -- 3D pose (seeded API) -------------------------------------------------
result_3d = api.pose_3d(
    keypoints_2d=result_2d.keypoints,
    image_size=(img_h, img_w),
    seed=SEED,
)

print("\n--- 3D pose (API, seeded) ---")
print("poses_3d shape      :", result_3d.poses_3d.shape)
print("poses_3d_world shape:", result_3d.poses_3d_world.shape)

# -- 3D pose (seeded DEMO-equivalent path) ---------------------------------
# Re-run with the same seed to get a reproducible reference from the API.
result_3d_b = api.pose_3d(
    keypoints_2d=result_2d.keypoints,
    image_size=(img_h, img_w),
    seed=SEED,
)

print("\n--- Reproducibility check (same seed, two API runs) ---")
print("poses_3d match      :", np.allclose(result_3d.poses_3d, result_3d_b.poses_3d))
print("poses_3d_world match:", np.allclose(result_3d.poses_3d_world, result_3d_b.poses_3d_world))

# -- Compare against saved demo data (will differ due to different RNG state) --
print("\n--- Compare against saved demo output (different RNG state, expected False) ---")
print("poses_3d_world ≈ demo:", np.allclose(result_3d.poses_3d_world[0], previous_3d, atol=1e-4))

# -- Seeded demo-equivalent run --------------------------------------------
# To get a TRUE comparison, run the demo's exact code path with the same
# seed.  Below we replicate the demo logic inline:
print("\n--- Seeded demo-equivalent run ---")
from fmpose3d.common.camera import normalize_screen_coordinates, camera_to_world
from fmpose3d.common.config import FMPose3DConfig, ModelConfig
from fmpose3d.models import get_model
from fmpose3d.inference import euler_sample

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model (same way as demo)
model_cfg = FMPose3DConfig()
CFM = get_model(model_cfg.model_type)
model = CFM(model_cfg).to(device)
pre_dict = torch.load(weights, map_location=device, weights_only=True)
model_dict = model.state_dict()
for name in model_dict:
    if name in pre_dict:
        model_dict[name] = pre_dict[name]
model.load_state_dict(model_dict)
model.eval()

# Use same (non-revised) keypoints as the demo
keypoints = previous_2d            # (P, F, 17, 2)
input_2D_no = keypoints[0]         # (F=1, 17, 2)
import copy

joints_left  = [4, 5, 6, 11, 12, 13]
joints_right = [1, 2, 3, 14, 15, 16]

input_2D = normalize_screen_coordinates(input_2D_no, w=img_w, h=img_h)
input_2D_aug = copy.deepcopy(input_2D)
input_2D_aug[:, :, 0] *= -1
input_2D_aug[:, joints_left + joints_right] = input_2D_aug[:, joints_right + joints_left]
input_2D = np.concatenate(
    (np.expand_dims(input_2D, axis=0), np.expand_dims(input_2D_aug, axis=0)), 0
)
input_2D = input_2D[np.newaxis, :, :, :, :]
input_2D = torch.from_numpy(input_2D.astype("float32")).to(device)

# Seed and run — same seed as the API
torch.manual_seed(SEED)
with torch.no_grad():
    y = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3, device=device)
    output_3D_non_flip = euler_sample(input_2D[:, 0], y, steps=3, model=model)

    y_flip = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3, device=device)
    output_3D_flip = euler_sample(input_2D[:, 1], y_flip, steps=3, model=model)

output_3D_flip[:, :, :, 0] *= -1
output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :]
output_3D = (output_3D_non_flip + output_3D_flip) / 2
output_3D = output_3D[0:, 0].unsqueeze(1)
output_3D[:, :, 0, :] = 0
demo_pose = output_3D[0, 0].cpu().detach().numpy()

rot = np.array([0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088], dtype="float32")
demo_world = camera_to_world(demo_pose, R=rot, t=0)
demo_world[:, 2] -= np.min(demo_world[:, 2])

# Now run the API with the same seed and same keypoints
result_3d_api = api.pose_3d(
    keypoints_2d=previous_2d,      # same non-revised keypoints
    image_size=(img_h, img_w),
    seed=SEED,
)

print("poses_3d match (API vs demo-equivalent)      :", np.allclose(result_3d_api.poses_3d[0], demo_pose, atol=1e-6))
print("poses_3d_world match (API vs demo-equivalent) :", np.allclose(result_3d_api.poses_3d_world[0], demo_world, atol=1e-6))

- Extracted skeleton connection definitions and left/right color masks into constants for better maintainability. - Updated the show2Dpose and show3Dpose functions to utilize these constants. - Changed output image format from JPG to PNG for pose saving.

- Changed GPU ID from 1 to 0 for compatibility. - Updated model and saved model paths to point to the fmpose3d_h36m directory. - Renamed input_images_folder to target_path for clarity in specifying input sources.

- Removed commented-out test argument for clarity. - Renamed model_path argument to model_weights_path for better specificity.

- Changed model_path to point to the fmpose3d_h36m directory. - Updated saved_model_path to model_weights_path for consistency with recent refactoring. - Adjusted test command to use the new model weights path.

- Renamed saved_model_path to model_weights_path for consistency with recent refactoring. - Updated command-line argument to reflect the new model weights path.

- Revised model_path comment to reflect the correct package name as fmpose3d. - Adjusted folder_name variable to improve clarity by removing 'Publish' from the name.

- Removed unused variable tau - Cleaned up commented-out code for better readability.

- Introduced a new command-line argument --model_path for specifying the model file path. - Removed the deprecated --saved_model_path argument for clarity and consistency.

- Updated the backup file logic to use the new model_weights_path instead of saved_model_path for consistency. - Cleaned up commented-out code and streamlined the backup process for better readability and maintainability.

…ncy with recent refactoring. This change simplifies the script by eliminating an unused parameter.

… refactoring - Changed model_path from args.saved_model_path to args.model_weights_path for consistency with other updates.

…presentation

Updated README to reflect changes in project description, citation format, and demo section.

This is a security vulnerability and triggers deprecation warnings in pytorch.

Variable has been deprecated since PyTorch 0.4 (2018). We should use tensors directly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Minor refactors

- New ABC base_model as template. - Easy access to defined set of models. - Modularly extendable with new implementations.

…FMPose3D into jaap/add_config_and_registry

- This is an adapter of the `gen_video_kpts` function - it can read arrays instead of image paths - can be configured with HRNetConfig

xiu-cs · 2026-02-10T14:42:31Z

Looks good to me, thanks! @deruyter92

xiu-cs and others added 30 commits February 5, 2026 21:07

Update vis_in_the_wild.sh for model path and GPU configuration

5001c3c

- Changed GPU ID from 1 to 0 for compatibility. - Updated model and saved model paths to point to the fmpose3d_h36m directory. - Renamed input_images_folder to target_path for clarity in specifying input sources.

Refactor argument parsing in arguments.py

5091074

- Removed commented-out test argument for clarity. - Renamed model_path argument to model_weights_path for better specificity.

Update FMPose3D_test.sh to reflect new model paths

fb14cf5

- Changed model_path to point to the fmpose3d_h36m directory. - Updated saved_model_path to model_weights_path for consistency with recent refactoring. - Adjusted test command to use the new model weights path.

Update vis_in_the_wild.sh to standardize model weights path

500e097

- Renamed saved_model_path to model_weights_path for consistency with recent refactoring. - Updated command-line argument to reflect the new model weights path.

Update FMPose3D_train.sh for model path consistency

82244ce

- Revised model_path comment to reflect the correct package name as fmpose3d. - Adjusted folder_name variable to improve clarity by removing 'Publish' from the name.

Refactor aggregation method in aggregation_methods.py

dbd2afa

- Removed unused variable tau - Cleaned up commented-out code for better readability.

Add model_path argument in arguments.py and remove saved_model_path

d6dcf06

- Introduced a new command-line argument --model_path for specifying the model file path. - Removed the deprecated --saved_model_path argument for clarity and consistency.

Refactor file backup process in FMPose3D_main.py

db005f3

- Updated the backup file logic to use the new model_weights_path instead of saved_model_path for consistency. - Cleaned up commented-out code and streamlined the backup process for better readability and maintainability.

Remove weight_softmax_tau variable from FMPose3D_test.sh for consiste…

0a8487a

…ncy with recent refactoring. This change simplifies the script by eliminating an unused parameter.

fix the path error

6608049

Update model path variable in vis_in_the_wild.py to align with recent…

af6ec60

… refactoring - Changed model_path from args.saved_model_path to args.model_weights_path for consistency with other updates.

correct the color of joints

b634a75

Add demo GIF for visual representation

ea6e291

Update demo image in README.md from JPG to GIF for enhanced visual re…

c6c148c

…presentation

update the model structure

d76b239

Revise README for clarity and updates

1c3ca70

Updated README to reflect changes in project description, citation format, and demo section.

Update torch.load weigths_only=True

e02edd5

This is a security vulnerability and triggers deprecation warnings in pytorch.

fix README broken link and typo

1dbea87

Replace torch Variable with torch tensor

82149c9

Variable has been deprecated since PyTorch 0.4 (2018). We should use tensors directly.

update cuda fallback

a184cb0

update gitignore

1739378

rename get_varialbe -> get_variable everywhere

077eaa6

Fix sys.path imports -> proper module references

81a22c6

Apply suggestion from @Copilot

f3d6ba8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

67268c6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

d74a76d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add config dataclasses (in parallel to arguments.py)

f8ab0a6

add tests for config.py

db775ea

Apply suggestion from @Copilot

4611178

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

xiu-cs and others added 10 commits February 9, 2026 14:50

Apply suggestion from @Copilot

6b5d354

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge pull request #12 from deruyter92/jaap/minor_refactors

a4df14b

Minor refactors

Feat: add extendable model registry

b73d9ce

- New ABC base_model as template. - Easy access to defined set of models. - Modularly extendable with new implementations.

change demo script add example for human pose model

50125e5

Merge branch 'ti_video_demo' into jaap/add_config_and_registry

5677b53

update config: replace model_path with model_type from the registry

4c8b201

Merge branch 'jaap/add_config_and_registry' of github.com:deruyter92/…

acb4feb

…FMPose3D into jaap/add_config_and_registry

Update config: extendable configs and changed name -> PipelineConfig

3863ddf

Add HRNet model api

c4bf891

- This is an adapter of the `gen_video_kpts` function - it can read arrays instead of image paths - can be configured with HRNetConfig

Add high-level inference API for FMPose3D (fmpose3d/fmpose3d.py)

0827a1b

deruyter92 marked this pull request as draft February 10, 2026 13:41

xiu-cs self-requested a review February 10, 2026 14:11

xiu-cs marked this pull request as ready for review February 10, 2026 14:21

xiu-cs and others added 2 commits February 10, 2026 15:27

Merge branch 'main' into feat/add_api

bad89d7

Add documentation header for FMPose3D in HRNet files

1023b4e

xiu-cs merged commit 35a7c83 into main Feb 10, 2026
5 checks passed

xiu-cs deleted the feat/add_api branch February 10, 2026 14:42

xiu-cs added the enhancement New feature or request label Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add high-level API for the pose3d inference pipeline #14

Add high-level API for the pose3d inference pipeline #14

Uh oh!

deruyter92 commented Feb 10, 2026 •

edited by xiu-cs

Loading

Uh oh!

xiu-cs commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add high-level API for the pose3d inference pipeline #14

Add high-level API for the pose3d inference pipeline #14

Uh oh!

Conversation

deruyter92 commented Feb 10, 2026 • edited by xiu-cs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduces fmpose3d/fmpose3d.py:

Key components

Design choices

Uh oh!

xiu-cs commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deruyter92 commented Feb 10, 2026 •

edited by xiu-cs

Loading