EgoCoord: Self-calibrated Egocentric 3D Body Pose Estimation Using Pixel-Wise Coordinate Encoding
Published in , 2024
The primary challenges for egocentric 3D human pose estimation techniques are the perspective and radial distortions introduced by fisheye lenses. Previous methods utilized camera calibration for undistortion or utilized neural networks to regress 3D human poses from distorted 2D poses. In this paper, we propose a novel approach that integrates a pixel-wise coordinate encoding technique for recognizing image distortion and utilizes the Vision Transformer (ViT) to extract distortion and pose tokens from the input image. The extracted tokens are used in a 3D volumetric heatmap-based egocentric pose estimator, which predicts the 3D human pose using pose tokens and performs pose correction using distortion tokens. The approach integrates CoordConv’s positional encoding strategies, neural network-based camera calibration methods, and the volumetric heatmap-based 3D human pose estimation method. We evaluate the proposed model’s performance using our new evaluation dataset and compare it with state-of-the-art models. Additionally, we perform an ablation study to demonstrate the individual effects of each module in the proposed model.
Recommended citation: JB Lee, H Lee, BR Lee, BG Lee, WH Son, EgoCoord: Self-calibrated Egocentric 3D Body Pose Estimation Using Pixel-Wise Coordinate Encoding, Proceedings of the Asian Conference on Computer Vision, 1233-1249, 2024
Download Paper