arxiv:2511.08892

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Published on Nov 12

· Submitted by

taesiri on Nov 13

#1 Paper of the day

ByteDance Seed

Upvote

132

Authors:

Abstract

Lumine, a vision-language model-based agent, completes complex missions in real-time across different 3D open-world environments with human-like efficiency and zero-shot cross-game generalization.

AI-generated summary

We introduce Lumine, the first open recipe for developing generalist agents capable of completing hours-long complex missions in real time within challenging 3D open-world environments. Lumine adopts a human-like interaction paradigm that unifies perception, reasoning, and action in an end-to-end manner, powered by a vision-language model. It processes raw pixels at 5 Hz to produce precise 30 Hz keyboard-mouse actions and adaptively invokes reasoning only when necessary. Trained in Genshin Impact, Lumine successfully completes the entire five-hour Mondstadt main storyline on par with human-level efficiency and follows natural language instructions to perform a broad spectrum of tasks in both 3D open-world exploration and 2D GUI manipulation across collection, combat, puzzle-solving, and NPC interaction. In addition to its in-domain performance, Lumine demonstrates strong zero-shot cross-game generalization. Without any fine-tuning, it accomplishes 100-minute missions in Wuthering Waves and the full five-hour first chapter of Honkai: Star Rail. These promising results highlight Lumine's effectiveness across distinct worlds and interaction dynamics, marking a concrete step toward generalist agents in open-ended environments.

View arXiv page View PDF Project page Add to collection

Community

taesiri

Paper submitter 2 days ago

Proposes Lumine, an open, end-to-end vision-language agent for generalist, long-horizon tasks in 3D open worlds, achieving human-level efficiency and zero-shot cross-game generalization without fine-tuning.

zwt963

2 days ago

This work is so amazing!!!

yamatazen

2 days ago

Genshin mentioned

Macro

2 days ago

Amazing work!

ZheyangH

2 days ago

原神，启动！

grantsing

1 day ago

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/lumine-an-open-recipe-for-building-generalist-agents-in-3d-open-worlds

monkeycc

1 day ago

希望开源

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

LOSYIR

1 day ago

Excellent work! Interestingly, Lumine’s remarkable generalization ability in games like Wuthering Waves further proves that it’s essentially a textbook Genshin-like game, haha