M3R-CNN: On effective multi-modal fusion of RGB and depth cues for instance segmentation in bin-picking

Sep 19, 2023·

Takao Nishi

Shinya Kawasaki

Kosuke Iewaki

Fumio Okura

Damien Petit

Yoichi Takano

Kensuke Harada

· 0 min read

PDF

Abstract

Picking tasks in logistics warehouses requires handling many objects of various types, increasing daily. Therefore, high generalization performance is required for object detection in bin-picking systems in logistics warehouses, but conventional methods have yet to meet this requirement. We propose a Multi-modal Mask R-CNN (M3R-CNN) and its training method for that aim. M3R-CNN is a network for the instance-segmentation task that takes RGB and depth as input and obtains high generalizability with small training data. We trained this network with 561 scenes of training data using our proposed method and obtained a recognition accuracy of F1-score = 0.631 and mAP = 0.958 for unknown objects. We also performed an object-grasping experiment with a robot using the M3R-CNN and obtained an availability-score of 0.97.

Type

Journal article

Publication

Advanced Robotics, 37(18):1143-1157

Last updated on Sep 19, 2023

Computer Vision Bin Picking

Authors

Fumio Okura

Associate Professor

← Computer vision in smart city application: A mapping review Oct 23, 2023

Learn to synthesize photorealistic dual-pixel images from RGBD frames Jul 28, 2023 →