Dataset and Method Research

Abstract

Human pose estimation aims to predict the location of body keypoints and enable various practical applications. However, existing research focuses solely on individuals with full physical bodies and overlooks those with limb deficiencies. As a result, current pose estimation methods cannot be generalized to individuals with limb deficiencies. In this paper, we introduce the Limb-Deficient Pose Estimation task, which not only predicts the locations of standard human body keypoints, but also estimates the endpoints of missing limbs. To support this task, we present Limb-Deficient Pose (LDPose), the first-ever human pose estimation dataset for individuals with limb deficiencies. LDPose comprises over 28k images for approximately 73k individuals across diverse limb deficiency types and ethnic backgrounds. The annotation process is guided by internationally accredited para-athletics classifiers to ensure high precision. In addition, we propose a Limb-Deficient Loss (LDLoss) to better distinguish residual limb keypoints by contrasting residual limb keypoints and intact limb keypoints. Furthermore, we design Limb-Deficient Metrics (LD Metrics) to quantitatively measure the keypoint predictions of both residual and intact limbs and benchmark our dataset using state-of-the-art human pose estimation methods. Experiment results indicate that LDPose is a challenging dataset, and we believe that it will foster further research and ultimately support individuals with limb deficiencies worldwide.

Limb-Deficient Pose Estimation Task

Task Definition

Limb-Deficient Pose Estimation aims to detect both standard body keypoints and the endpoints of residual limbs in 2D RGB images. The output is a set of keypoint coordinates and confidence scores. It includes 17 keypoints for a full body and 8 extra keypoints for the residual limbs.

25-Keypoint Annotation Schema

Figure: Demonstration about the 25 Keypoints used in the LDPose. The MSCOCO 17 standard body keypoints are shown in green, and the newly introduced 8 residual limb keypoints are highlighted in blue. Real-world examples of residual limbs, both with and without prosthetics, are also presented.

Standard Body Keypoints (17)

Nose, Eyes (×2), Ears (×2), Shoulders (×2), Elbows (×2), Wrists (×2), Hips (×2), Knees (×2), Ankles (×2)

Residual Limb Endpoints (8)

Above Left/Right Elbow Residual Limb End (×2), Below Left/Right Elbow Residual Limb End (×2), Above Left/Right Knee Residual Limb End (×2), Below Left/Right Knee Residual Limb End (×2)

Impact and Applications

Accessibility & Inclusion

Enables inclusive computer vision systems that recognize and support individuals with limb deficiencies in everyday applications, from smart homes to public spaces.

Healthcare & Rehabilitation

Supports physical therapy monitoring, prosthetic fitting assessment, and rehabilitation progress tracking through accurate pose analysis.

Para-Sports Analytics

Facilitates performance analysis for Paralympic athletes, enabling coaches to optimize training and technique for competitive sports.

Assistive Robotics

Powers intelligent assistive devices and robotic systems that can understand and adapt to users with diverse physical abilities.

Social Representation

Promotes representation and visibility of individuals with limb deficiencies in AI systems, reducing bias in computer vision technologies.

Research Advancement

Establishes a new research direction in human pose estimation, encouraging development of more inclusive and robust computer vision methods.

Key Innovation: This task bridges the gap between traditional pose estimation and real-world diversity, making computer vision systems more inclusive and applicable to the estimated 1 billion people worldwide living with disabilities.

Limb-Deficient Pose (LDPose) Dataset

28,065

Total Images

72,716

Individuals

~80%

Individuals with Limb Deficiency

25

Body Keypoints

Dataset Features

Comprehensive Coverage: Various major limb deficiency types including upper limb, lower limb, and bilateral deficiencies
Diverse Scenarios: Images collected from Paralympic competitions, training sessions, and daily life activities
Expert Annotation: Guided by internationally accredited para-athletics classifiers to ensure medical accuracy
Inclusive Representation: Diverse ethnic backgrounds and age groups to promote inclusivity
High Quality Standards: Rigorous quality control with multiple annotation rounds and validation

Data Examples

Upper Limb Deficiency/Amputation Example

Original Image

Pose Annotation

Upper Limb Deficiency: Athletes in competitive sports

Lower Limb Deficiency/Amputation Example

Original Image

Pose Annotation

Lower Limb Deficiency: Daily life activities and mobility

Original Image

Pose Annotation

Bilateral Deficiency: Paralympic training sessions

Data Format

{
  "info": {
    "description": "LDPose Dataset",
    "version": "1.0",
    "year": 2025,
    "contributor": "LDPose Team"
  },
  "images": [
    {
      "id": 1,
      "file_name": "000000001.jpg",
      "width": 640,
      "height": 480,
      "date_captured": "2024-01-01 12:00:00",
      "url": "http://example.com/images/000000001.jpg"
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [100, 50, 200, 350],
      "area": 70000,
      "iscrowd": 0,
      "keypoints": [
        320, 140, 2, 310, 135, 2, 330, 135, 2, 305, 140, 2, 335, 140, 2,
        290, 180, 2, 350, 180, 2, 280, 240, 2, 0, 0, 0, 270, 290, 0,
        0, 0, 0, 300, 280, 2, 340, 280, 2, 295, 380, 2, 345, 380, 2,
        290, 470, 2, 350, 470, 2, 0, 0, 0, 360, 200, 2, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
      ],
      "num_keypoints": 14
    }
  ],
  "categories": [
    {
      "id": 1,
      "name": "LDHuman",
      "supercategory": "",
      "keypoints": [
        "nose", "left_eye", "right_eye", "left_ear", "right_ear",
        "left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
        "left_wrist", "right_wrist", "left_hip", "right_hip",
        "left_knee", "right_knee", "left_ankle", "right_ankle",
        "left_above_elbow_res", "right_above_elbow_res",
        "left_below_elbow_res", "right_below_elbow_res",
        "left_above_knee_res", "right_above_knee_res",
        "left_below_knee_res", "right_below_knee_res"
      ],
      "skeleton": [
        [0,1],[1,3],[0,2],[2,4],[3,5],[4,6],[5,7],[6,8],[7,9],[8,10],[5,11],[6,12],[12,14],[11,13],[13,15],[14,16],
        [5, 17], [7, 19], [6, 18], [7, 20],[11, 21], [12, 22], [13, 23], [14, 24]
      ]
    }
  ]
}

Format Note: LDPose follows COCO format with 25 keypoints per person. Each keypoint is represented as [x, y, v] where v=0 (not labeled), v=1 (labeled but not visible), v=2 (labeled and visible). The 8 residual limb keypoints (indices 17-24) are appended after the standard 17 COCO keypoints.

Limb-Deficient Loss (LDLoss)

Our proposed LDLoss (also called Limb Visibility Loss or paraLoss) is designed to better distinguish residual limb keypoints by contrasting them with intact limb keypoints. The loss function operates on limb pairs, processing visibility predictions for both endpoints of each limb through a cross-entropy loss mechanism.

Algorithm: Limb-Deficient Loss (LDLoss)

Algorithm 1: Limb-Deficient Loss (LDLoss)

Require: kpt_vis_preds ∈ ℝ^(B×K×C), vis_targets ∈ {0,1}^(B×K), 
         LimbPairs (P pairs), loss_weight

1:  # Extract prediction vectors for both endpoints of each limb (expanded by pair)
2:  PredPairs ← []
3:  for each (i, j) in LimbPairs do
4:      # Select visibility predictions for keypoint i and j, both with dimension B×1×C
5:      Pi ← kpt_vis_preds[:, i, :]           # B×C
6:      Pj ← kpt_vis_preds[:, j, :]           # B×C
7:      # Concatenate both endpoints of the same limb as a "two-sample batch" for CE
8:      PredPairs.append( concat(Pi, Pj, dim=0) )  # 2B×C
9:  end for
10: PredPairwiseConfidence ← stack(PredPairs, dim=0)      # P×(2B)×C
11: PredPairwiseConfidence ← reshape(PredPairwiseConfidence, (P*2B, C))

12: # Generate corresponding target label sequence (consistent with concatenation order)
13: VisPairs ← []
14: for each (i, j) in LimbPairs do
15:     Ti ← vis_targets[:, i]   # B
16:     Tj ← vis_targets[:, j]   # B
17:     VisPairs.append( concat(Ti, Tj, dim=0) )  # 2B
18: end for
19: Visibilities ← stack(VisPairs, dim=0)                 # P×(2B)
20: Visibilities ← reshape(Visibilities, (P*2B))          # P*2B

21: # Compute cross-entropy loss on the paired "sample batch"
22: Loss_CE ← CrossEntropyLoss(PredPairwiseConfidence, Visibilities)

23: return loss_weight × Loss_CE

Benchmark

Experimental Environment

Hardware Configuration:

GPU: NVIDIA RTX 4090 × 4
CPU: Intel Xeon Gold 6248R
Memory: 128GB DDR4

Software Environment:

Python 3.8
PyTorch 1.12
CUDA 11.3
MMPose Framework: github.com/open-mmlab/mmpose

Limb-Deficient Metrics (LD Metrics)

COCO Metrics evaluate only the visible keypoints in the ground truth. This works well for individuals with complete bodies. However, for individuals with limb deficiencies, some keypoints do not exist. For example, if a person has lost their left upper limb, the left elbow and wrist are set to (0,0) with zero visibility. If a model predicts these points with high confidence, COCO Metrics ignore the error and give a misleading high score.

To address this limitation, we propose LD Metrics, which extends COCO Metrics by incorporating adaptive keypoint selection based on the realistic distribution of related keypoints. We introduce an adaptive weight, γ_i, into the OKS calculation to enforce consistency among associated keypoints. The new OKS_LDPose is defined as follows:

Equation 1: Object Keypoint Similarity for LDPose

OKS_LDPose = Σ_i γ_i · exp(−d_i² / (2s²κ_i²)) · δ(v_i > 0) Σ_i δ(v_i > 0)

Here, d_i, κ_i, s, and δ(v_i > 0) are defined as in the original COCO evaluation. The weight γ_i is set to 0 when the prediction of multiple related keypoints violates realistic anatomical configurations, even if the predicted keypoints are precise and marked as visible in annotation.

For instance, if keypoint 21 (left above-knee residual limb endpoint) is correctly predicted but any of the associated keypoints (i.e., keypoint 13 [left knee], keypoint 23 [left below-knee residual], or keypoint 15 [left ankle]) are simultaneously predicted in a manner that contradicts the expected limb structure (such as predicting both a residual limb and a natural joint), then γ₂₁ is set to 0. Otherwise, γ_i is 1.

Key Innovation: This mechanism ensures that the model does not output mutually incompatible predictions, leading to a more fair and meaningful evaluation for pose estimation in individuals with limb deficiencies.

Baseline Methods

We benchmarked our dataset using state-of-the-art human pose estimation methods:

YOLO-Pose

One-stage pose estimation with Darknet CSP backbone

RTMPose

Real-time multi-person pose estimation with CSPNeXt

ViTPose

Vision Transformer-based pose estimation framework

Experimental Results

Quantitative Results

We benchmark our LDPose dataset using state-of-the-art pose estimation methods. The table shows COCO metrics (AP, AP⁵⁰, AP⁷⁵, AR) and our proposed LD Metrics (AP, AP⁵⁰, AP⁷⁵, AR) for evaluating residual limb keypoint detection.

Method	Backbone	Fine-tuning	LDLoss	COCO Metrics				LD Metrics (Ours)				Checkpoint
Method	Backbone	Fine-tuning	LDLoss	AP	AP⁵⁰	AP⁷⁵	AR	AP	AP⁵⁰	AP⁷⁵	AR	Checkpoint
YOLO-Pose


	Darknet_csp-d53-l			47.6	59.2	50.1	51.3	-	-	-	-	-
	Darknet_csp-d53-l	✓		51.9	72.1	53.9	56.5	44.4	71.2	46.2	51.1	Download
	Darknet_csp-d53-l	✓	✓	50.9	71.5	52.4	55.3	45.5	70.5	46.0	51.4	Download
RTMPose	CSPNeXt-s			69.1	86.5	73.8	71.5	-	-	-	-	-
	CSPNeXt-s	✓		77.1	91.4	81.8	79.1	61.8	88.3	64.6	66.4	Download
	CSPNeXt-s	✓	✓	72.0	90.2	78.4	75.0	69.5	89.2	74.7	73.3	Download
	CSPNeXt-m			72.9	88.6	77.1	75.0	-	-	-	-	-
	CSPNeXt-m	✓		76.6	91.4	81.9	78.7	62.2	88.4	65.8	67.1	Download
	CSPNeXt-m	✓	✓	73.3	90.4	79.5	76.5	71.1	90.3	76.7	74.9	Download
RTMPose	CSPNeXt-l			74.2	89.6	78.3	76.2	-	-	-	-	-
	CSPNeXt-l	✓		80.8	92.5	85.1	82.7	69.1	90.5	75.3	73.5	Download
	CSPNeXt-l	✓	✓	74.9	91.4	80.6	77.8	72.6	91.3	78.9	76.3	Download
ViTPose	ViT-s			74.3	90.4	78.0	76.4	-	-	-	-	-
	ViT-s	✓		76.8	91.4	81.7	79.3	66.8	90.3	72.3	71.9	Download
	ViT-s	✓	✓	74.9	90.3	80.4	77.6	72.0	90.2	78.1	75.1	Download
	ViT-b			76.9	91.5	81.0	79.1	-	-	-	-	-
	ViT-b	✓		78.2	92.4	82.8	80.5	69.6	90.4	76.1	74.3	Download
	ViT-b	✓	✓	78.9	92.1	84.3	81.7	75.9	92.1	82.1	79.2	Download
ViTPose	ViT-l			80.0	92.6	84.2	82.1	-	-	-	-	-
	ViT-l	✓		82.6	93.7	87.3	84.7	75.5	92.7	82.1	79.6	Download
	ViT-l	✓	✓	80.9	93.1	85.5	84.0	78.4	93.1	84.3	81.6	Download

Note: The best performing model is ViTPose with ViT-l backbone, fine-tuning, and LDLoss, achieving 80.9% AP on COCO metrics and 78.4% AP on LD metrics for residual limb keypoint detection. Results show that fine-tuning on LDPose and using our proposed LDLoss significantly improve performance on limb-deficient pose estimation.

Case Study Visualization

Figure: ViTPose with ViT-l backbone case study. First image column: LDPose ground truth. Second image column: Pretrained ViTPose inference. Third image column: LDPose fine-tuned ViTpose. Fourth image column: LDPose fine-tuned ViTpose with LDLoss.

Resource Downloads

Paper PDF

Complete research paper with detailed method descriptions and experimental results

Download Paper

Dataset

Complete training and test datasets with annotation files

Download Dataset

Source Code

PyTorch implementation with training and testing scripts

GitHub Repository

Pre-trained Model

Pre-trained model weights ready for direct inference

Download Model

Citation

@inproceedings{ying2023ldpose,
  title={LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild},
  author={Ying, Jiaying and Du, Heming and Zhang, Kaihao and Li, Lincheng and Yu, Xin},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={},
  year={2025}
}