In the realm of computer vision, material segmentation of natural scenes represents a challenge, driven by the complex and diverse appearances of materials. Traditional approaches often rely on RGB images, which can be deceptive given the variability in appearances due to different lighting conditions. Other methods, that employ polarization or spectral imagery, offer a more reliable material differentiation but their cost and accessibility restrict their everyday usage.
In this work, we propose a deep learning framework that bridges the gap between high-fidelity material segmentation and the practical constraints of data acquisition. Our approach leverages a training strategy that employs a paired RGBD-spectral data to incorporate spectral information directly within the neural network. This encoding process is facilitated by a Spectral Feature Mapper (SFM) layer, a novel module that embeds unique spectral characteristics into the network, thus enabling the network to infer materials from standard RGB-D images.
Once trained, the model allows to conduct material segmentation on widely available devices without the need for direct spectral data input. In addition, we generate the 3D point cloud from the RGB-D image pair, to provide a richer spatial context for scene understanding. Through simulations using available datasets, and real experiments conducted with an iPad Pro, our method demonstrates superior performance in material segmentation compared to other methods
We introduce the SFM layer, this layer is designed to universally enhance encoder-decoder architectures
The Spectral Feature Mapper (SFM) module plays a crucial role in analyzing hyperspectral data. The module uses a learnable parameters matrix of size \(C \times B\), where \(C\) is the number of classes and \(B\) is the number of spectral bands. By comparing the spectral signatures of each pixel in the hyperspectral cube, the module calculates the spectral angles to determine similarity. Smaller angles indicate higher similarity, which translates into higher probability for a given class after applying the softmax function. This process results in a probability cube, providing a class mask for each class across the entire image.
@InProceedings{Perez_2024_CVPR,
author = {Perez, Fabian and Rueda-Chac\'on, Hoover},
title = {Beyond Appearances: Material Segmentation with Embedded Spectral Information from RGB-D imagery},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2024},
pages = {293-301}
}