new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Nov 14

PSyDUCK: Training-Free Steganography for Latent Diffusion

Recent advances in generative AI have opened promising avenues for steganography, which can securely protect sensitive information for individuals operating in hostile environments, such as journalists, activists, and whistleblowers. However, existing methods for generative steganography have significant limitations, particularly in scalability and their dependence on retraining diffusion models. We introduce PSyDUCK, a training-free, model-agnostic steganography framework specifically designed for latent diffusion models. PSyDUCK leverages controlled divergence and local mixing within the latent denoising process, enabling high-capacity, secure message embedding without compromising visual fidelity. Our method dynamically adapts embedding strength to balance accuracy and detectability, significantly improving upon existing pixel-space approaches. Crucially, PSyDUCK extends generative steganography to latent-space video diffusion models, surpassing previous methods in both encoding capacity and robustness. Extensive experiments demonstrate PSyDUCK's superiority over state-of-the-art techniques, achieving higher transmission accuracy and lower detectability rates across diverse image and video datasets. By overcoming the key challenges associated with latent diffusion model architectures, PSyDUCK sets a new standard for generative steganography, paving the way for scalable, real-world steganographic applications.

  • 6 authors
·
Jan 31

Learning to Generate Object Interactions with Physics-Guided Video Diffusion

Recent models for video generation have achieved remarkable progress and are now deployed in film, social media production, and advertising. Beyond their creative potential, such models also hold promise as world simulators for robotics and embodied decision making. Despite strong advances, however, current approaches still struggle to generate physically plausible object interactions and lack physics-grounded control mechanisms. To address this limitation, we introduce KineMask, an approach for physics-guided video generation that enables realistic rigid body control, interactions, and effects. Given a single image and a specified object velocity, our method generates videos with inferred motions and future object interactions. We propose a two-stage training strategy that gradually removes future motion supervision via object masks. Using this strategy we train video diffusion models (VDMs) on synthetic scenes of simple interactions and demonstrate significant improvements of object interactions in real scenes. Furthermore, KineMask integrates low-level motion control with high-level textual conditioning via predictive scene descriptions, leading to effective support for synthesis of complex dynamical phenomena. Extensive experiments show that KineMask achieves strong improvements over recent models of comparable size. Ablation studies further highlight the complementary roles of low- and high-level conditioning in VDMs. Our code, model, and data will be made publicly available.

  • 5 authors
·
Oct 2

MALT: Improving Reasoning with Multi-Agent LLM Training

Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.

  • 9 authors
·
Dec 2, 2024 4