M3R-CNN: On effective multi-modal fusion of RGB and depth cues for instance segmentation in bin-picking
Sep 19, 2023·,,
,,,·
0 min read
Takao Nishi
Shinya Kawasaki
Kosuke Iewaki

Fumio Okura
Damien Petit
Yoichi Takano
Kensuke Harada
Abstract
Picking tasks in logistics warehouses requires handling many objects of various types, increasing daily. Therefore, high generalization performance is required for object detection in bin-picking systems in logistics warehouses, but conventional methods have yet to meet this requirement. We propose a Multi-modal Mask R-CNN (M3R-CNN) and its training method for that aim. M3R-CNN is a network for the instance-segmentation task that takes RGB and depth as input and obtains high generalizability with small training data. We trained this network with 561 scenes of training data using our proposed method and obtained a recognition accuracy of F1-score = 0.631 and mAP = 0.958 for unknown objects. We also performed an object-grasping experiment with a robot using the M3R-CNN and obtained an availability-score of 0.97.
Type
Publication
Advanced Robotics, 37(18):1143-1157