Papers
arxiv:2511.08892

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Published on Nov 12
· Submitted by taesiri on Nov 13
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Lumine, a vision-language model-based agent, completes complex missions in real-time across different 3D open-world environments with human-like efficiency and zero-shot cross-game generalization.

AI-generated summary

We introduce Lumine, the first open recipe for developing generalist agents capable of completing hours-long complex missions in real time within challenging 3D open-world environments. Lumine adopts a human-like interaction paradigm that unifies perception, reasoning, and action in an end-to-end manner, powered by a vision-language model. It processes raw pixels at 5 Hz to produce precise 30 Hz keyboard-mouse actions and adaptively invokes reasoning only when necessary. Trained in Genshin Impact, Lumine successfully completes the entire five-hour Mondstadt main storyline on par with human-level efficiency and follows natural language instructions to perform a broad spectrum of tasks in both 3D open-world exploration and 2D GUI manipulation across collection, combat, puzzle-solving, and NPC interaction. In addition to its in-domain performance, Lumine demonstrates strong zero-shot cross-game generalization. Without any fine-tuning, it accomplishes 100-minute missions in Wuthering Waves and the full five-hour first chapter of Honkai: Star Rail. These promising results highlight Lumine's effectiveness across distinct worlds and interaction dynamics, marking a concrete step toward generalist agents in open-ended environments.

Community

Paper submitter

Proposes Lumine, an open, end-to-end vision-language agent for generalist, long-horizon tasks in 3D open worlds, achieving human-level efficiency and zero-shot cross-game generalization without fine-tuning.

This work is so amazing!!!

Genshin mentioned

Amazing work!

原神,启动!

希望开源

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Excellent work! Interestingly, Lumine’s remarkable generalization ability in games like Wuthering Waves further proves that it’s essentially a textbook Genshin-like game, haha

原神 启动!!!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.08892 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.08892 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.08892 in a Space README.md to link it from this page.

Collections including this paper 9