arxiv:2512.06951

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

Published on Dec 7

· Submitted by

Ilia Larchenko on Dec 15

Upvote

Authors:

Ilia Larchenko ,

Abstract

A vision-action policy using correlated noise for flow matching and learnable mixed-layer attention wins the 2025 BEHAVIOR Challenge with high performance across diverse household tasks.

AI-generated summary

We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation, requiring bimanual manipulation, navigation, and context-aware decision making. Building on the Pi0.5 architecture, we introduce several innovations. Our primary contribution is correlated noise for flow matching, which improves training efficiency and enables correlation-aware inpainting for smooth action sequences. We also apply learnable mixed-layer attention and System 2 stage tracking for ambiguity resolution. Training employs multi-sample flow matching to reduce variance, while inference uses action compression and challenge-specific correction rules. Our approach achieves 26% q-score across all 50 tasks on both public and private leaderboards.

View arXiv page View PDF Project page GitHub 111 Add to collection

Community

IliaLarchenko

Paper author Paper submitter 1 day ago

We present our 1st place solution to the 2025 NeurIPS BEHAVIOR Challenge, where a single Vision-Language-Action robotics policy is trained to perform 50 household manipulation tasks in a photorealistic simulator. The approach builds on Pi0.5 with several practical architecture, training, and inference modifications. We open-source the full solution, including code, model weights, and a detailed technical report, for anyone exploring or adapting VLAs to real tasks.

librarian-bot

about 12 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 2

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.06951 in a Space README.md to link it from this page.