M3R-CNN: On effective multi-modal fusion of RGB and depth cues for instance segmentation in bin-picking

Sep 19, 2023·
Takao Nishi
,
Shinya Kawasaki
,
Kosuke Iewaki
Fumio Okura
Fumio Okura
,
Damien Petit
,
Yoichi Takano
,
Kensuke Harada
· 0 min read
Abstract
Picking tasks in logistics warehouses requires handling many objects of various types, increasing daily. Therefore, high generalization performance is required for object detection in bin-picking systems in logistics warehouses, but conventional methods have yet to meet this requirement. We propose a Multi-modal Mask R-CNN (M3R-CNN) and its training method for that aim. M3R-CNN is a network for the instance-segmentation task that takes RGB and depth as input and obtains high generalizability with small training data. We trained this network with 561 scenes of training data using our proposed method and obtained a recognition accuracy of F1-score = 0.631 and mAP = 0.958 for unknown objects. We also performed an object-grasping experiment with a robot using the M3R-CNN and obtained an availability-score of 0.97.
Type
Publication
Advanced Robotics, 37(18):1143-1157