cat > data-collection.html << 'EOL' Data Collection - InclusiveVidPose Data Collection - InclusiveVidPose

Data Collection

Why Videos Instead of Curated Images?

In a single image, an occluded limb can appear identical to a missing one, creating ambiguity in annotation. Video sequences provide temporal continuity and changing viewpoints. By observing motion or shifts in perspective, annotators can distinguish a limb that is simply hidden from a limb that is truly absent. This capability leads to precise labeling of residual‐limb end keypoints and reduces errors in pose estimation for individuals with limb deficiencies.

Data Sourcing

We obtained raw footage from YouTube by searching for keywords such as “residual limb,” “prosthesis” and “limb difference.” We downloaded each video at its highest available resolution and preserved metadata (upload date, user handle and caption) for future reference.

Data Filtering and Curation

We manually reviewed every downloaded video and extracted segments that center on a person with a limb deficiency. We removed clips with severe motion blur or heavy occlusion, such as splashing water, whenever annotators could not agree on keypoint placement. This process yielded 313 high-quality videos that span diverse actions, backgrounds and lighting conditions. These curated video segments form the core of our InclusiveVidPose dataset. We carefully filter and curate the video content to ensure:

  • Diverse representation of limb deficiencies
  • High-quality video resolution
  • Clear visibility of subjects
  • Appropriate lighting conditions

Video Examples

Example 1: Arm Amputation

Example 2: Leg Amputation