Skip to content

Conversation

@deruyter92
Copy link
Collaborator

@deruyter92 deruyter92 commented Feb 10, 2026

Introduces fmpose3d/fmpose3d.py:

A clean, high-level inference API that wraps the full FMPose3D pipeline (2D pose estimation → 3D lifting) into a single module.

  • Requires merging this branch and PR Add configs and model registry #13, which introduce a model registry and configurations. (Ideally the entire repository makes use of these configurations rather that argparser and referenced paths to python files for loading the model. But the current additions work in parallel to existing code)
  • First version of configurable API for easy 3d pose prediction from image paths, arrays, or 2d pose keypoints.
  • Integration with huggingface and DeepLabCut will follow soon.

Key components

  • A new HRNetPose2d module inside the hrnet lib, that is configurable without argparser and can handle image arrays intead of paths
  • HRNetEstimator — Thin wrapper around HRNetPose2d + YOLO that lazily loads models and converts COCO keypoints to H36M format following the demo script
  • FMPose3DInference — Main entry point with a two-step workflow:
    • prepare_2d(source) — Runs HRNet to produce 2D keypoints from flexible input (image path, directory of images, numpy array, or list of paths/arrays).
    • pose_3d(keypoints_2d, image_size) — Lifts 2D keypoints to 3D via Euler ODE sampling with optional flip-augmentation and camera-to-world transform.
    • predict(source) — Convenience method that chains both steps end-to-end.

Design choices

  • All heavy resources (models, weights) are lazily loaded on first use via setup_runtime().
  • The 3D lifting loop faithfully mirrors the logic in demo/vis_in_the_wild.py (flip augmentation, independent noise samples, un-flip & average, root-zeroing, camera-to-world).
  • Supports an optional seed parameter for reproducible sampling and an optional progress callback for UI integration.
  • Configurable via ModelConfig, InferenceConfig, and HRNetConfig dataclasses.
  • Input ingestion (_ingest_input) — Accepts str/Path (single image or directory), ndarray (single frame or batch), or lists thereof. Video files are explicitly rejected for now with NotImplementedError, but these can be implemented in the future.

Example usage

from fmpose3d import FMPose3DInference

# 1. Create the inference API (point it at your checkpoint)
api = FMPose3DInference(model_weights_path="path/to/FMpose3D_pretrained_weights.pth")

# 2. End-to-end: image → 3D poses (runs HRNet 2D detection + 3D lifting)
result = api.predict("photo.jpg", seed=42)
print(result.poses_3d.shape)        # (1, 17, 3)  root-relative
print(result.poses_3d_world.shape)  # (1, 17, 3)  world coordinates

# — or step-by-step for more control: —

# 2a. Estimate 2D keypoints
result_2d = api.prepare_2d("photo.jpg")
print(result_2d.keypoints.shape)    # (num_persons, 1, 17, 2)

# 2b. Lift to 3D
result_3d = api.pose_3d(
    result_2d.keypoints,
    image_size=result_2d.image_size,
    seed=42,
)
print(result_3d.poses_3d_world[0])  # (17, 3) world-coordinate skeleton

More elaborate script for comparing with demo

import numpy as np
import torch
from PIL import Image

from fmpose3d.inference import FMPose3DInference

# --------------------------------------------------------------------------
# Paths
# --------------------------------------------------------------------------
example_image = "./demo/images/running.png"
weights = "./pre_trained_models/fmpose3d_h36m/FMpose3D_pretrained_weights.pth"

# Saved demo predictions (produced by demo/vis_in_the_wild.py)
previous_3d_path = "./demo/predictions/running/pose3D/0000_3D.npz"
previous_2d_path = "./demo/predictions/running/input_2D/keypoints.npz"

# --------------------------------------------------------------------------
# Load previous demo results
# --------------------------------------------------------------------------
previous_3d = np.load(previous_3d_path)["pose3d"]          # (17, 3), world coords
previous_2d = np.load(previous_2d_path)["reconstruction"]  # (P, F, 17, 2)

# --------------------------------------------------------------------------
# Read image to get correct (H, W)
# --------------------------------------------------------------------------
_img = Image.open(example_image)
img_w, img_h = _img.size  # PIL returns (W, H)
print(f"Image size: H={img_h}, W={img_w}")

# --------------------------------------------------------------------------
# Run API
# --------------------------------------------------------------------------
SEED = 42

api = FMPose3DInference(model_weights_path=weights)

# -- 2D keypoints ---------------------------------------------------------
result_2d = api.prepare_2d(source=example_image)

print("\n--- 2D keypoints ---")
print("API keypoints shape :", result_2d.keypoints.shape)
print("Demo keypoints shape:", previous_2d.shape)
print("2D keypoints match  :", np.allclose(result_2d.keypoints, previous_2d))

# -- 3D pose (seeded API) -------------------------------------------------
result_3d = api.pose_3d(
    keypoints_2d=result_2d.keypoints,
    image_size=(img_h, img_w),
    seed=SEED,
)

print("\n--- 3D pose (API, seeded) ---")
print("poses_3d shape      :", result_3d.poses_3d.shape)
print("poses_3d_world shape:", result_3d.poses_3d_world.shape)

# -- 3D pose (seeded DEMO-equivalent path) ---------------------------------
# Re-run with the same seed to get a reproducible reference from the API.
result_3d_b = api.pose_3d(
    keypoints_2d=result_2d.keypoints,
    image_size=(img_h, img_w),
    seed=SEED,
)

print("\n--- Reproducibility check (same seed, two API runs) ---")
print("poses_3d match      :", np.allclose(result_3d.poses_3d, result_3d_b.poses_3d))
print("poses_3d_world match:", np.allclose(result_3d.poses_3d_world, result_3d_b.poses_3d_world))

# -- Compare against saved demo data (will differ due to different RNG state) --
print("\n--- Compare against saved demo output (different RNG state, expected False) ---")
print("poses_3d_world ≈ demo:", np.allclose(result_3d.poses_3d_world[0], previous_3d, atol=1e-4))

# -- Seeded demo-equivalent run --------------------------------------------
# To get a TRUE comparison, run the demo's exact code path with the same
# seed.  Below we replicate the demo logic inline:
print("\n--- Seeded demo-equivalent run ---")
from fmpose3d.common.camera import normalize_screen_coordinates, camera_to_world
from fmpose3d.common.config import FMPose3DConfig, ModelConfig
from fmpose3d.models import get_model
from fmpose3d.inference import euler_sample

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model (same way as demo)
model_cfg = FMPose3DConfig()
CFM = get_model(model_cfg.model_type)
model = CFM(model_cfg).to(device)
pre_dict = torch.load(weights, map_location=device, weights_only=True)
model_dict = model.state_dict()
for name in model_dict:
    if name in pre_dict:
        model_dict[name] = pre_dict[name]
model.load_state_dict(model_dict)
model.eval()

# Use same (non-revised) keypoints as the demo
keypoints = previous_2d            # (P, F, 17, 2)
input_2D_no = keypoints[0]         # (F=1, 17, 2)
import copy

joints_left  = [4, 5, 6, 11, 12, 13]
joints_right = [1, 2, 3, 14, 15, 16]

input_2D = normalize_screen_coordinates(input_2D_no, w=img_w, h=img_h)
input_2D_aug = copy.deepcopy(input_2D)
input_2D_aug[:, :, 0] *= -1
input_2D_aug[:, joints_left + joints_right] = input_2D_aug[:, joints_right + joints_left]
input_2D = np.concatenate(
    (np.expand_dims(input_2D, axis=0), np.expand_dims(input_2D_aug, axis=0)), 0
)
input_2D = input_2D[np.newaxis, :, :, :, :]
input_2D = torch.from_numpy(input_2D.astype("float32")).to(device)

# Seed and run — same seed as the API
torch.manual_seed(SEED)
with torch.no_grad():
    y = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3, device=device)
    output_3D_non_flip = euler_sample(input_2D[:, 0], y, steps=3, model=model)

    y_flip = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3, device=device)
    output_3D_flip = euler_sample(input_2D[:, 1], y_flip, steps=3, model=model)

output_3D_flip[:, :, :, 0] *= -1
output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :]
output_3D = (output_3D_non_flip + output_3D_flip) / 2
output_3D = output_3D[0:, 0].unsqueeze(1)
output_3D[:, :, 0, :] = 0
demo_pose = output_3D[0, 0].cpu().detach().numpy()

rot = np.array([0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088], dtype="float32")
demo_world = camera_to_world(demo_pose, R=rot, t=0)
demo_world[:, 2] -= np.min(demo_world[:, 2])

# Now run the API with the same seed and same keypoints
result_3d_api = api.pose_3d(
    keypoints_2d=previous_2d,      # same non-revised keypoints
    image_size=(img_h, img_w),
    seed=SEED,
)

print("poses_3d match (API vs demo-equivalent)      :", np.allclose(result_3d_api.poses_3d[0], demo_pose, atol=1e-6))
print("poses_3d_world match (API vs demo-equivalent) :", np.allclose(result_3d_api.poses_3d_world[0], demo_world, atol=1e-6))

xiu-cs and others added 30 commits February 5, 2026 21:07
- Extracted skeleton connection definitions and left/right color masks into constants for better maintainability.
- Updated the show2Dpose and show3Dpose functions to utilize these constants.
- Changed output image format from JPG to PNG for pose saving.
- Changed GPU ID from 1 to 0 for compatibility.
- Updated model and saved model paths to point to the fmpose3d_h36m directory.
- Renamed input_images_folder to target_path for clarity in specifying input sources.
- Removed commented-out test argument for clarity.
- Renamed model_path argument to model_weights_path for better specificity.
- Changed model_path to point to the fmpose3d_h36m directory.
- Updated saved_model_path to model_weights_path for consistency with recent refactoring.
- Adjusted test command to use the new model weights path.
- Renamed saved_model_path to model_weights_path for consistency with recent refactoring.
- Updated command-line argument to reflect the new model weights path.
- Revised model_path comment to reflect the correct package name as fmpose3d.
- Adjusted folder_name variable to improve clarity by removing 'Publish' from the name.
- Removed unused variable tau
- Cleaned up commented-out code for better readability.
- Introduced a new command-line argument --model_path for specifying the model file path.
- Removed the deprecated --saved_model_path argument for clarity and consistency.
- Updated the backup file logic to use the new model_weights_path instead of saved_model_path for consistency.
- Cleaned up commented-out code and streamlined the backup process for better readability and maintainability.
…ncy with recent refactoring. This change simplifies the script by eliminating an unused parameter.
… refactoring

- Changed model_path from args.saved_model_path to args.model_weights_path for consistency with other updates.
Updated README to reflect changes in project description, citation format, and demo section.
This is a security vulnerability and triggers deprecation warnings in pytorch.
Variable has been deprecated since PyTorch 0.4 (2018). We should use tensors directly.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
xiu-cs and others added 10 commits February 9, 2026 14:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- New ABC base_model as template.
- Easy access to defined set of models.
- Modularly extendable with new implementations.
- This is an adapter of the `gen_video_kpts` function
- it can read arrays instead of image paths
- can be configured with HRNetConfig
@deruyter92 deruyter92 marked this pull request as draft February 10, 2026 13:41
@xiu-cs xiu-cs self-requested a review February 10, 2026 14:11
@xiu-cs xiu-cs marked this pull request as ready for review February 10, 2026 14:21
@xiu-cs
Copy link
Collaborator

xiu-cs commented Feb 10, 2026

Looks good to me, thanks! @deruyter92

@xiu-cs xiu-cs merged commit 35a7c83 into main Feb 10, 2026
5 checks passed
@xiu-cs xiu-cs deleted the feat/add_api branch February 10, 2026 14:42
@xiu-cs xiu-cs added the enhancement New feature or request label Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants