File size: 2,290 Bytes
382dbe5
 
 
 
 
 
 
 
 
 
 
f680f5f
a1f7a80
382dbe5
 
 
f680f5f
382dbe5
 
 
 
 
 
 
 
 
a1f7a80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: mit
pipeline_tag: image-to-3d
---

# LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging

LiteVGGT is a 3D vision foundation model that significantly boosts vanilla VGGT's performance by achieving up to 10x speedup and substantial memory reduction. This enables efficient processing of large-scale scenes (up to 1000 images) for 3D reconstruction, while maintaining high accuracy in camera pose and point cloud prediction. The method introduces a geometry-aware cached token merging strategy to optimize anchor token selection and reuse merge indices, preserving key geometric information with minimal accuracy impact.

This model was presented in the paper: [LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging](https://huggingface.co/papers/2512.04939).

- [Project Page](https://garlicba.github.io/LiteVGGT/)
- [Code](https://github.com/GarlicBa/LiteVGGT-repo)

## Overview

For 1000 input images, LiteVGGT achieves a **10× speedup** over VGGT while maintaining high accuracy in camera pose and point cloud prediction. Its scalability and robustness make large-scale scene reconstruction more efficient and reliable.

<p align="center">
  <img src="https://github.com/GarlicBa/LiteVGGT-repo/raw/main/assets/teaser.png" alt="teaser" width="100%">
</p>

## Run Demo

To quickly try out LiteVGGT for 3D reconstruction, follow these steps:


First, create a virtual environment using Conda, clone this repository to your local machine, and install the required dependencies.

```bash
conda create -n litevggt python=3.10
conda activate litevggt
git clone [email protected]:GarlicBa/LiteVGGT-repo.git
cd LiteVGGT-repo
pip install -r requirements.txt
```

Install the Transformer Engine package following its official installation requirements (see https://github.com/NVIDIA/TransformerEngine):

```bash
export CC=your/gcc/path
export CXX=your/g++/path
pip install --no-build-isolation transformer_engine[pytorch]
```

Then, download our LiteVGGT checkpoint that has been **finetuned** and **TE-remapped**:
```bash
wget https://huggingface.co/ZhijianShu/LiteVGGT/resolve/main/te_dict.pt
```

Finally:
```bash
python run_demo.py \
  --ckpt_path path/to/your/te_dict.pt \
  --img_dir path/to/your/img_dir \
  --output_dir ./recon_result \
```