A novel method to improve the precision of determining 6D object poses in intricate industrial settings

Pose estimation is an advanced technology that enables machines to perceive and interact with their surroundings. Specifically, 6D object pose estimation calculates an object’s position and orientation in three-dimensional space, which is crucial for object grasping and manipulation in robotics. Although significant progress has been made in recent years, accurately estimating object poses remains challenging, especially when the target object is obscured or hidden in cluttered scenes.

To address this issue, Dr Wyman Wang, Assistant Professor in the School of Science and Technology at Hong Kong Metropolitan University, led a research project to develop a novel method for improving the accuracy of 6D object pose estimation in complex industrial environments. The research team skillfully integrated complementary information from RGB and depth images to refine appearance and geometry representation learning, as well as to facilitate communication between the two channels. Moreover, they observed that as an object rotates, semantic labels remain unchanged, while keypoint offset directions vary with the pose. This insight was used to fully utilise geometry knowledge from the depth image for pose estimation.

Building on these findings, the research team proposed an innovative representation learning network called SO(3)-Pose — SO(3) refers to the special orthogonal group in three dimensions. The network leverages both SO(3)-equivariant and SO(3)-invariant features to determine object poses. SO(3)-invariant features enhance object segmentation by learning more distinctive representations, while SO(3)-equivariant features communicate with RGB features to infer missing geometry for 3D keypoint detection. Comprehensive experiments demonstrated that SO(3)-Pose estimates more accurate 6D object poses and is more robust than other state-of-the-art methods.

Dr Wang’s study makes a significant contribution to 6D object pose estimation, with broad applications in areas such as robotics and augmented reality (AR). For example, using this method, a robot can accurately determine the poses of objects in its environment to grasp a target object and avoid collisions with other objects. An AR app can track objects in real-time and overlay virtual objects on top of them to provide a better simulation experience for users. Dr Wang’s team plans to continue exploring other theories that can be used to optimise object pose estimation and develop new techniques.

For more details, please refer to the following publication generated from the research project:
'SO(3)-Pose: SO(3)-Equivariance Learning for 6D Object Pose Estimation' , Computer Graphics Forum.