Papers
arxiv:2512.07469

Unified Video Editing with Temporal Reasoner

Published on Dec 8
ยท Submitted by Xiangpeng Yang on Dec 9
#3 Paper of the day
Authors:
,
,
,
,

Abstract

VideoCoF, a Chain-of-Frames approach, improves video editing precision and instruction-to-region mapping by using reasoning tokens without requiring user-provided masks.

AI-generated summary

Existing video editing methods face a critical trade-off: expert models offer precision but rely on task-specific priors like masks, hindering unification; conversely, unified temporal in-context learning models are mask-free but lack explicit spatial cues, leading to weak instruction-to-region mapping and imprecise localization. To resolve this conflict, we propose VideoCoF, a novel Chain-of-Frames approach inspired by Chain-of-Thought reasoning. VideoCoF enforces a ``see, reason, then edit" procedure by compelling the video diffusion model to first predict reasoning tokens (edit-region latents) before generating the target video tokens. This explicit reasoning step removes the need for user-provided masks while achieving precise instruction-to-region alignment and fine-grained video editing. Furthermore, we introduce a RoPE alignment strategy that leverages these reasoning tokens to ensure motion alignment and enable length extrapolation beyond the training duration. We demonstrate that with a minimal data cost of only 50k video pairs, VideoCoF achieves state-of-the-art performance on VideoCoF-Bench, validating the efficiency and effectiveness of our approach. Our code, weight, data are available at https://github.com/knightyxp/VideoCoF.

Community

Paper author Paper submitter

A Chain of Frames video editing method enbale temporal reasoning and 4x video length extrapolation with just 50k training pairs!
๐Ÿ  Page: videocof.github.io/
๐Ÿ“„ Paper: arxiv.org/abs/2512.07469
๐Ÿ’ป Code: github.com/knightyxp/VideoCoF

Dear @XiangpengYang in the attached video, there is a missing S at the front cover in the word Reasoner.

ยท
Paper author
This comment has been hidden (marked as Resolved)
Paper author Paper submitter
This comment has been hidden (marked as Off-Topic)
Paper author Paper submitter
This comment has been hidden (marked as Resolved)

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.07469 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.07469 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.