Abstract | Over the last decade, the availability of public image repositories and
recognition benchmarks has enabled rapid progress in visual object category and
instance detection. Today we are witnessing the birth of a new generation of sensing
technologies capable of providing high quality synchronized videos of both color
and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities
and the potential for mass adoption, this technology represents an opportunity
to dramatically increase robotic object recognition, manipulation, navigation, and
interaction capabilities. We introduce a large-scale, hierarchical multi-view object
dataset collected using an RGB-D camera. The dataset consists of two parts: The
RGB-D Object Dataset containing views of 300 objects organized into 51 categories,
and the RGB-D Scenes Dataset containing 8 video sequences of office and
kitchen environments. The dataset has been made publicly available to the research
community so as to enable rapid progress based on this promising technology. We
describe the dataset collection procedure and present techniques for RGB-D object
recognition and detection of objects in scenes recorded using RGB-D videos,
demonstrating that combining color and depth information substantially improves
quality of results.
|