When artificial intelligence systems encounter scenes where objects are not fully visible, it has to make estimations based only on the visible parts of the objects. This partial information leads to detection errors, and large training data is required to correctly recognize such scenes. Now, researchers at the Gwangju Institute of Science and Technology have developed a framework that allows robot vision to detect such objects successfully in the same way that we perceive them.
Robotic vision has come a long way, reaching a level of sophistication with applications in complex and demanding tasks, such as autonomous driving and object manipulation. However, it still struggles to identify individual objects in cluttered scenes where some objects are partially or completely hidden behind others. Typically, when dealing with such scenes, robotic vision systems are trained to identify the occluded object based only on its visible parts. But such training requires large datasets of objects and can be pretty tedious.
Associate Professor Kyoobin Lee and Ph.D. student Seunghyeok Back from the Gwangju Institute of Science and Technology (GIST) in Korea found themselves facing this problem when they were developing an artificial intelligence system to identify and sort objects in cluttered scenes. “We expect a robot to recognize and manipulate objects they have not encountered before or been trained to recognize. In reality, however, we need to manually collect and label data one by one as the generalizability of deep neural networks depends highly on the quality and quantity of the training dataset,” says Back.
In a new study accepted at the 2022 IEEE International Conference on Robotics and Automation, a research team led by Prof. Lee and Back developed a model called “unseen object amodal instance segmentation” (UOAIS) for detecting occluded objects in cluttered scenes. To train the model in identifying object geometry, they developed a database containing 45,000 photorealistic synthetic images containing depth information. With this (limited) training data, the model was able to detect a variety of occluded objects. Upon encountering a cluttered scene, it first picked out the object of interest and then determines if the object is occluded by segmenting the object into a “visible mask” and an “amodal mask.”
The researchers were excited by the results. “Previous methods are limited to either detecting only specific types of objects or detecting only the visible regions without explicitly reasoning over occluded areas. By contrast, our method can infer the hidden regions of occluded objects like a human vision system. This enables a reduction in data collection efforts while improving performance in a complex environment,” comments Back.
To enable “occlusion reasoning” in their system, the researchers introduced a “hierarchical occlusion modeling” (HOM) scheme, which assigned a hierarchy to the combination of multiple extracted features and their prediction order. By testing their model against three benchmarks, they validated the effectiveness of the HOM scheme, which achieved state-of-the-art performance.
The researchers are hopeful about the future prospects of their method. “Perceiving unseen objects in a cluttered environment is essential for amodal robotic manipulation. Our UOAIS method could serve as a baseline on this front,” says Back.