taesiri's picture

Open to Collab

taesiri PRO

taesiri

·

https://taesiri.ai/

AI & ML interests

AGI ... one linear layer at a time

Recent Activity

submitted a paper about 22 hours ago

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

submitted a paper about 22 hours ago

MiMo-V2-Flash Technical Report

submitted a paper about 22 hours ago

DreamStyle: A Unified Framework for Video Stylization

View all activity

Organizations

upvoted a paper about 22 hours ago

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Paper • 2601.02427 • Published 3 days ago • 22

upvoted a paper about 23 hours ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published 1 day ago • 42

upvoted 4 papers 2 days ago

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published 2 days ago • 50

VINO: A Unified Visual Generator with Interleaved OmniModal Context

Paper • 2601.02358 • Published 2 days ago • 23

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Paper • 2601.01425 • Published 4 days ago • 38

Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published 8 days ago • 31

upvoted a paper 3 days ago

Deep Delta Learning

Paper • 2601.00417 • Published 6 days ago • 26

upvoted a paper 5 days ago

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published 9 days ago • 45

upvoted a paper 6 days ago

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 7 days ago • 219

upvoted 2 papers 8 days ago

End-to-End Test-Time Training for Long Context

Paper • 2512.23675 • Published 9 days ago • 15

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Paper • 2512.23236 • Published 10 days ago • 3

upvoted 9 papers 9 days ago

Yume-1.5: A Text-Controlled Interactive World Generation Model

Paper • 2512.22096 • Published 12 days ago • 57

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

Paper • 2512.23162 • Published 10 days ago • 9

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Paper • 2512.23646 • Published 9 days ago • 14

Video-BrowseComp: Benchmarking Agentic Video Research on Open Web

Paper • 2512.23044 • Published 10 days ago • 9

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published 15 days ago • 18

Web World Models

Paper • 2512.23676 • Published 9 days ago • 23

Monadic Context Engineering

Paper • 2512.22431 • Published 12 days ago • 8

An Information Theoretic Perspective on Agentic System Design

Paper • 2512.21720 • Published 13 days ago • 7

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Paper • 2512.22047 • Published 12 days ago • 26