File size: 2,606 Bytes
fea92cb
 
421daa9
fea92cb
 
 
 
 
421daa9
fea92cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: apache-2.0
pipeline_tag: image-to-3d
tags:
- dino
- scene-understanding
- semantic-scene-completion
- unsupervised
library_name: pytorch
---

<div align="center">
<h1>Feed-Forward <i>SceneDINO</i> for Unsupervised Semantic Scene Completion</h1>

[**Aleksandar Jevtić**](https://jev-aleks.github.io/)<sup>*1</sup> 
[**Christoph Reich**](https://christophreich1996.github.io/)<sup>*1,2,4,5</sup>
[**Felix Wimbauer**](https://fwmb.github.io/)<sup>1,4</sup>
[**Oliver Hahn**](https://olvrhhn.github.io/)<sup>2</sup>
[**Christian Rupprecht**](https://chrirupp.github.io/)<sup>3</sup>
[**Stefan Roth**](https://www.visinf.tu-darmstadt.de/visual_inference/people_vi/stefan_roth.en.jsp)<sup>2,5,6</sup>
[**Daniel Cremers**](https://cvg.cit.tum.de/members/cremers/)<sup>1,4,5</sup>

<sup>1</sup>TU Munich   <sup>2</sup>TU Darmstadt   <sup>3</sup>University of Oxford   <sup>4</sup>MCML   <sup>5</sup>ELIZA   <sup>6</sup>hessian.AI   *equal contribution

<a href="https://arxiv.org/abs/2507.06230"><img src='https://img.shields.io/badge/ArXiv-grey' alt='Paper PDF'></a>
<a href="https://visinf.github.io/scenedino/"><img src='https://img.shields.io/badge/Project Page-grey' alt='Project Page URL'></a>
<a href="https://huggingface.co/spaces/jev-aleks/SceneDINO"><img src='https://img.shields.io/badge/🤗 Demo-grey' alt='Project Page URL'></a>
<a href="https://opensource.org/licenses/Apache-2.0"><img src='https://img.shields.io/badge/License-Apache%202.0-blue.svg' alt='License'></a>
[![Framework](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?&logo=PyTorch&logoColor=white)](https://pytorch.org/)

</div>

## Overview

SceneDINO is unsupervised and infers 3D geometry and features from a single image in a feed-forward manner. Distilling and clustering SceneDINO's 3D feature field results in unsupervised semantic scene completion predictions. The method is trained using multi-view self-supervision.

## Installation & Quick Start

Please refer to our [Github Repo](https://github.com/tum-vision/scenedino).

## Citation

If you find our work useful, please consider giving it a star ⭐ and citing our paper.

```bibtex
@inproceedings{Jevtic:2025:SceneDINO,
    author  = {Aleksandar Jevti{\'c} and
               Christoph Reich and
               Felix Wimbauer and
               Oliver Hahn and
               Christian Rupprecht and
               Stefan Roth and
               Daniel Cremers},
    title   = {Feed-Forward {SceneDINO} for Unsupervised Semantic Scene Completion},
    journal = {IEEE/CVF International Conference on Computer Vision (ICCV)},
    year    = {2025},
}
```