LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild
Abstract
Human pose estimation aims to predict the location of body keypoints and enable various practical applications. However, existing research focuses solely on individuals with full physical bodies and overlooks those with limb deficiencies. As a result, current pose estimation methods cannot be generalized to individuals with limb deficiencies. In this paper, we introduce the Limb-Deficient Pose Estimation task, which not only predicts the locations of standard human body keypoints, but also estimates the endpoints of missing limbs. To support this task, we present Limb-Deficient Pose (LDPose), the first-ever human pose estimation dataset for individuals with limb deficiencies. LDPose comprises over 28k images for approximately 73k individuals across diverse limb deficiency types and ethnic backgrounds. The annotation process is guided by internationally accredited para-athletics classifiers to ensure high precision. In addition, we propose a Limb-Deficient Loss (LDLoss) to better distinguish residual limb keypoints by contrasting residual limb keypoints and intact limb keypoints. Furthermore, we design Limb-Deficient Metrics (LD Metrics) to quantitatively measure the keypoint predictions of both residual and intact limbs and benchmark our dataset using state-of-the-art human pose estimation methods. Experiment results indicate that LDPose is a challenging dataset, and we believe that it will foster further research and ultimately support individuals with limb deficiencies worldwide.
Limb-Deficient Pose Estimation Task
Task Definition
Limb-Deficient Pose Estimation aims to detect both standard body keypoints and the endpoints of residual limbs in 2D RGB images. The output is a set of keypoint coordinates and confidence scores. It includes 17 keypoints for a full body and 8 extra keypoints for the residual limbs.
25-Keypoint Annotation Schema

Figure: Demonstration about the 25 Keypoints used in the LDPose. The MSCOCO 17 standard body keypoints are shown in green, and the newly introduced 8 residual limb keypoints are highlighted in blue. Real-world examples of residual limbs, both with and without prosthetics, are also presented.
Standard Body Keypoints (17)
Nose, Eyes (×2), Ears (×2), Shoulders (×2), Elbows (×2), Wrists (×2), Hips (×2), Knees (×2), Ankles (×2)
Residual Limb Endpoints (8)
Above Left/Right Elbow Residual Limb End (×2), Below Left/Right Elbow Residual Limb End (×2), Above Left/Right Knee Residual Limb End (×2), Below Left/Right Knee Residual Limb End (×2)
Impact and Applications
Accessibility & Inclusion
Enables inclusive computer vision systems that recognize and support individuals with limb deficiencies in everyday applications, from smart homes to public spaces.
Healthcare & Rehabilitation
Supports physical therapy monitoring, prosthetic fitting assessment, and rehabilitation progress tracking through accurate pose analysis.
Para-Sports Analytics
Facilitates performance analysis for Paralympic athletes, enabling coaches to optimize training and technique for competitive sports.
Assistive Robotics
Powers intelligent assistive devices and robotic systems that can understand and adapt to users with diverse physical abilities.
Social Representation
Promotes representation and visibility of individuals with limb deficiencies in AI systems, reducing bias in computer vision technologies.
Research Advancement
Establishes a new research direction in human pose estimation, encouraging development of more inclusive and robust computer vision methods.
Key Innovation: This task bridges the gap between traditional pose estimation and real-world diversity, making computer vision systems more inclusive and applicable to the estimated 1 billion people worldwide living with disabilities.
Limb-Deficient Pose (LDPose) Dataset
Dataset Features
- Comprehensive Coverage: Various major limb deficiency types including upper limb, lower limb, and bilateral deficiencies
- Diverse Scenarios: Images collected from Paralympic competitions, training sessions, and daily life activities
- Expert Annotation: Guided by internationally accredited para-athletics classifiers to ensure medical accuracy
- Inclusive Representation: Diverse ethnic backgrounds and age groups to promote inclusivity
- High Quality Standards: Rigorous quality control with multiple annotation rounds and validation
Data Examples


Upper Limb Deficiency: Athletes in competitive sports


Lower Limb Deficiency: Daily life activities and mobility


Bilateral Deficiency: Paralympic training sessions
Data Format
{
"info": {
"description": "LDPose Dataset",
"version": "1.0",
"year": 2025,
"contributor": "LDPose Team"
},
"images": [
{
"id": 1,
"file_name": "000000001.jpg",
"width": 640,
"height": 480,
"date_captured": "2024-01-01 12:00:00",
"url": "http://example.com/images/000000001.jpg"
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"bbox": [100, 50, 200, 350],
"area": 70000,
"iscrowd": 0,
"keypoints": [
320, 140, 2, 310, 135, 2, 330, 135, 2, 305, 140, 2, 335, 140, 2,
290, 180, 2, 350, 180, 2, 280, 240, 2, 0, 0, 0, 270, 290, 0,
0, 0, 0, 300, 280, 2, 340, 280, 2, 295, 380, 2, 345, 380, 2,
290, 470, 2, 350, 470, 2, 0, 0, 0, 360, 200, 2, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
],
"num_keypoints": 14
}
],
"categories": [
{
"id": 1,
"name": "LDHuman",
"supercategory": "",
"keypoints": [
"nose", "left_eye", "right_eye", "left_ear", "right_ear",
"left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
"left_wrist", "right_wrist", "left_hip", "right_hip",
"left_knee", "right_knee", "left_ankle", "right_ankle",
"left_above_elbow_res", "right_above_elbow_res",
"left_below_elbow_res", "right_below_elbow_res",
"left_above_knee_res", "right_above_knee_res",
"left_below_knee_res", "right_below_knee_res"
],
"skeleton": [
[0,1],[1,3],[0,2],[2,4],[3,5],[4,6],[5,7],[6,8],[7,9],[8,10],[5,11],[6,12],[12,14],[11,13],[13,15],[14,16],
[5, 17], [7, 19], [6, 18], [7, 20],[11, 21], [12, 22], [13, 23], [14, 24]
]
}
]
}
Format Note: LDPose follows COCO format with 25 keypoints per person. Each keypoint is represented as [x, y, v] where v=0 (not labeled), v=1 (labeled but not visible), v=2 (labeled and visible). The 8 residual limb keypoints (indices 17-24) are appended after the standard 17 COCO keypoints.
Limb-Deficient Loss (LDLoss)
Our proposed LDLoss (also called Limb Visibility Loss or paraLoss) is designed to better distinguish residual limb keypoints by contrasting them with intact limb keypoints. The loss function operates on limb pairs, processing visibility predictions for both endpoints of each limb through a cross-entropy loss mechanism.
Algorithm: Limb-Deficient Loss (LDLoss)
Algorithm 1: Limb-Deficient Loss (LDLoss)
Require: kpt_vis_preds ∈ ℝ^(B×K×C), vis_targets ∈ {0,1}^(B×K),
LimbPairs (P pairs), loss_weight
1: # Extract prediction vectors for both endpoints of each limb (expanded by pair)
2: PredPairs ← []
3: for each (i, j) in LimbPairs do
4: # Select visibility predictions for keypoint i and j, both with dimension B×1×C
5: Pi ← kpt_vis_preds[:, i, :] # B×C
6: Pj ← kpt_vis_preds[:, j, :] # B×C
7: # Concatenate both endpoints of the same limb as a "two-sample batch" for CE
8: PredPairs.append( concat(Pi, Pj, dim=0) ) # 2B×C
9: end for
10: PredPairwiseConfidence ← stack(PredPairs, dim=0) # P×(2B)×C
11: PredPairwiseConfidence ← reshape(PredPairwiseConfidence, (P*2B, C))
12: # Generate corresponding target label sequence (consistent with concatenation order)
13: VisPairs ← []
14: for each (i, j) in LimbPairs do
15: Ti ← vis_targets[:, i] # B
16: Tj ← vis_targets[:, j] # B
17: VisPairs.append( concat(Ti, Tj, dim=0) ) # 2B
18: end for
19: Visibilities ← stack(VisPairs, dim=0) # P×(2B)
20: Visibilities ← reshape(Visibilities, (P*2B)) # P*2B
21: # Compute cross-entropy loss on the paired "sample batch"
22: Loss_CE ← CrossEntropyLoss(PredPairwiseConfidence, Visibilities)
23: return loss_weight × Loss_CE
Benchmark
Experimental Environment
- GPU: NVIDIA RTX 4090 × 4
- CPU: Intel Xeon Gold 6248R
- Memory: 128GB DDR4
- Python 3.8
- PyTorch 1.12
- CUDA 11.3
- MMPose Framework: github.com/open-mmlab/mmpose
Limb-Deficient Metrics (LD Metrics)
COCO Metrics evaluate only the visible keypoints in the ground truth. This works well for individuals with complete bodies. However, for individuals with limb deficiencies, some keypoints do not exist. For example, if a person has lost their left upper limb, the left elbow and wrist are set to (0,0) with zero visibility. If a model predicts these points with high confidence, COCO Metrics ignore the error and give a misleading high score.
To address this limitation, we propose LD Metrics, which extends COCO Metrics by incorporating adaptive keypoint selection based on the realistic distribution of related keypoints. We introduce an adaptive weight, γi, into the OKS calculation to enforce consistency among associated keypoints. The new OKSLDPose is defined as follows:
Equation 1: Object Keypoint Similarity for LDPose
Here, di, κi, s, and δ(vi > 0) are defined as in the original COCO evaluation. The weight γi is set to 0 when the prediction of multiple related keypoints violates realistic anatomical configurations, even if the predicted keypoints are precise and marked as visible in annotation.
For instance, if keypoint 21 (left above-knee residual limb endpoint) is correctly predicted but any of the associated keypoints (i.e., keypoint 13 [left knee], keypoint 23 [left below-knee residual], or keypoint 15 [left ankle]) are simultaneously predicted in a manner that contradicts the expected limb structure (such as predicting both a residual limb and a natural joint), then γ21 is set to 0. Otherwise, γi is 1.
Key Innovation: This mechanism ensures that the model does not output mutually incompatible predictions, leading to a more fair and meaningful evaluation for pose estimation in individuals with limb deficiencies.
Baseline Methods
We benchmarked our dataset using state-of-the-art human pose estimation methods:
YOLO-Pose
One-stage pose estimation with Darknet CSP backbone
RTMPose
Real-time multi-person pose estimation with CSPNeXt
ViTPose
Vision Transformer-based pose estimation framework
Experimental Results
Quantitative Results
We benchmark our LDPose dataset using state-of-the-art pose estimation methods. The table shows COCO metrics (AP, AP50, AP75, AR) and our proposed LD Metrics (AP, AP50, AP75, AR) for evaluating residual limb keypoint detection.
Method | Backbone | Fine-tuning | LDLoss | COCO Metrics | LD Metrics (Ours) | Checkpoint | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
AP | AP50 | AP75 | AR | AP | AP50 | AP75 | AR | |||||
YOLO-Pose | ||||||||||||
Darknet_csp-d53-l | 47.6 | 59.2 | 50.1 | 51.3 | - | - | - | - | - | |||
Darknet_csp-d53-l | ✓ | 51.9 | 72.1 | 53.9 | 56.5 | 44.4 | 71.2 | 46.2 | 51.1 | Download | ||
Darknet_csp-d53-l | ✓ | ✓ | 50.9 | 71.5 | 52.4 | 55.3 | 45.5 | 70.5 | 46.0 | 51.4 | Download | |
RTMPose | CSPNeXt-s | 69.1 | 86.5 | 73.8 | 71.5 | - | - | - | - | - | ||
CSPNeXt-s | ✓ | 77.1 | 91.4 | 81.8 | 79.1 | 61.8 | 88.3 | 64.6 | 66.4 | Download | ||
CSPNeXt-s | ✓ | ✓ | 72.0 | 90.2 | 78.4 | 75.0 | 69.5 | 89.2 | 74.7 | 73.3 | Download | |
CSPNeXt-m | 72.9 | 88.6 | 77.1 | 75.0 | - | - | - | - | - | |||
CSPNeXt-m | ✓ | 76.6 | 91.4 | 81.9 | 78.7 | 62.2 | 88.4 | 65.8 | 67.1 | Download | ||
CSPNeXt-m | ✓ | ✓ | 73.3 | 90.4 | 79.5 | 76.5 | 71.1 | 90.3 | 76.7 | 74.9 | Download | |
RTMPose | CSPNeXt-l | 74.2 | 89.6 | 78.3 | 76.2 | - | - | - | - | - | ||
CSPNeXt-l | ✓ | 80.8 | 92.5 | 85.1 | 82.7 | 69.1 | 90.5 | 75.3 | 73.5 | Download | ||
CSPNeXt-l | ✓ | ✓ | 74.9 | 91.4 | 80.6 | 77.8 | 72.6 | 91.3 | 78.9 | 76.3 | Download | |
ViTPose | ViT-s | 74.3 | 90.4 | 78.0 | 76.4 | - | - | - | - | - | ||
ViT-s | ✓ | 76.8 | 91.4 | 81.7 | 79.3 | 66.8 | 90.3 | 72.3 | 71.9 | Download | ||
ViT-s | ✓ | ✓ | 74.9 | 90.3 | 80.4 | 77.6 | 72.0 | 90.2 | 78.1 | 75.1 | Download | |
ViT-b | 76.9 | 91.5 | 81.0 | 79.1 | - | - | - | - | - | |||
ViT-b | ✓ | 78.2 | 92.4 | 82.8 | 80.5 | 69.6 | 90.4 | 76.1 | 74.3 | Download | ||
ViT-b | ✓ | ✓ | 78.9 | 92.1 | 84.3 | 81.7 | 75.9 | 92.1 | 82.1 | 79.2 | Download | |
ViTPose | ViT-l | 80.0 | 92.6 | 84.2 | 82.1 | - | - | - | - | - | ||
ViT-l | ✓ | 82.6 | 93.7 | 87.3 | 84.7 | 75.5 | 92.7 | 82.1 | 79.6 | Download | ||
ViT-l | ✓ | ✓ | 80.9 | 93.1 | 85.5 | 84.0 | 78.4 | 93.1 | 84.3 | 81.6 | Download |
Note: The best performing model is ViTPose with ViT-l backbone, fine-tuning, and LDLoss, achieving 80.9% AP on COCO metrics and 78.4% AP on LD metrics for residual limb keypoint detection. Results show that fine-tuning on LDPose and using our proposed LDLoss significantly improve performance on limb-deficient pose estimation.
Case Study Visualization

Figure: ViTPose with ViT-l backbone case study. First image column: LDPose ground truth. Second image column: Pretrained ViTPose inference. Third image column: LDPose fine-tuned ViTpose. Fourth image column: LDPose fine-tuned ViTpose with LDLoss.
Resource Downloads
Paper PDF
Complete research paper with detailed method descriptions and experimental results
Download PaperCitation
@inproceedings{ying2023ldpose,
title={LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild},
author={Ying, Jiaying and Du, Heming and Zhang, Kaihao and Li, Lincheng and Yu, Xin},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={},
year={2025}
}