π Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!
Weβre excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4Γ video length extrapolation, trained with only 50k video pairs. π₯
π What makes VideoCoF different? π§ Chain-of-Frames reasoning , mimic human thinking process like Seeing β Reasoning β Editing to apply edits accurately over time without external masks, ensuring physically plausible results. π Strong length generalization β trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4Γ). π― Unified fine-grained editing β Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.
β‘ Fast inference update π H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.
π Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!
Weβre excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4Γ video length extrapolation, trained with only 50k video pairs. π₯
π What makes VideoCoF different? π§ Chain-of-Frames reasoning , mimic human thinking process like Seeing β Reasoning β Editing to apply edits accurately over time without external masks, ensuring physically plausible results. π Strong length generalization β trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4Γ). π― Unified fine-grained editing β Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.
β‘ Fast inference update π H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.
The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B. π Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
The LLM by @karpathy is officially in the library, and we wrote a blog covering: how did we port the model, differences from the original, and how to run or train it.
Z-Image Turbo LoRA training with Ostris AI Toolkit + Z-Image Turbo Fun Controlnet Union + 1-click to download and install the very best Z-Image Turbo presets. In this tutorial, I will explain how to setup Z-Image Turbo model properly in your local PC with SwarmUI and download models and use them with highest quality via ready presets. Moreover, I will show to install Z-Image Turbo Fun Controlnet Union to generate amazing quality images with ControlNet preprocessors. Furthermore, I will show how to 1-click install AI Toolkit from Ostris and train Z-Image Turbo model LoRAs with highest quality configs made for every GPU like 8 GB GPUs, 12 GB GPUs, 24 GB GPUs and so on. I did a massive research to prepare these Z-Image Turbo model training configurations.
Today, we announce Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 β our most capable model to date β a sparse mixture-of-experts trained with 41B active and 675B total parameters.
All models are released under the Apache 2.0 license.