TY - GEN
T1 - DiCo-NeRF
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
AU - Choi, Jiho
AU - Hwang, Gyutae
AU - Lee, Sang Jun
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Neural radiance fields have emerged in the field of autonomous driving, which contributes to improve perception of the complex 3D environment through the reconstruction of geometry and appearance. Moving objects and sky for outdoor environment is challenging to optimize the NeRF model. Previous work addresses these challenges through preprocessing such as masking; however, the masking process requires additional ground-truth data and a segmentation network. We propose DiCo-NeRF, an approach for driving scenes by leveraging cosine similarity map differences of vision-language aligned model. DiCo-NeRF investigates the correlation between rendered patches and predefined text and adjusts the loss of challenging patches, such as moving objects and the sky. Our neural radiance field utilizes embedding vectors from a pre-trained CLIP to obtain the cosine similarity maps. We introduce SimLoss, a loss function aimed at regulating the color field of NeRF based on the quantified distribution differences between ground-truth and rendered similarity maps. Unlike previous NeRF models that used driving datasets, our approach does not require additional input, such as sensor data, to the model. Experimental results demonstrate that the incorporation of language semantic cues improves the performance of the novel view synthesis task, particularly in complex driving environments. We conducted experiments that included fisheye driving scenes from the KITTI-360 and real-world datasets. Our code is available at https://github.com/ziiho08/DiCoNeRF.
AB - Neural radiance fields have emerged in the field of autonomous driving, which contributes to improve perception of the complex 3D environment through the reconstruction of geometry and appearance. Moving objects and sky for outdoor environment is challenging to optimize the NeRF model. Previous work addresses these challenges through preprocessing such as masking; however, the masking process requires additional ground-truth data and a segmentation network. We propose DiCo-NeRF, an approach for driving scenes by leveraging cosine similarity map differences of vision-language aligned model. DiCo-NeRF investigates the correlation between rendered patches and predefined text and adjusts the loss of challenging patches, such as moving objects and the sky. Our neural radiance field utilizes embedding vectors from a pre-trained CLIP to obtain the cosine similarity maps. We introduce SimLoss, a loss function aimed at regulating the color field of NeRF based on the quantified distribution differences between ground-truth and rendered similarity maps. Unlike previous NeRF models that used driving datasets, our approach does not require additional input, such as sensor data, to the model. Experimental results demonstrate that the incorporation of language semantic cues improves the performance of the novel view synthesis task, particularly in complex driving environments. We conducted experiments that included fisheye driving scenes from the KITTI-360 and real-world datasets. Our code is available at https://github.com/ziiho08/DiCoNeRF.
KW - Autonomous driving
KW - Fisheye camera
KW - Neural Radiance Fields
KW - Vision-Language models
UR - https://www.scopus.com/pages/publications/85206462575
U2 - 10.1109/CVPRW63382.2024.00781
DO - 10.1109/CVPRW63382.2024.00781
M3 - Conference paper
AN - SCOPUS:85206462575
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 7850
EP - 7858
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -