Joya Chen's picture

Joya Chen PRO

chenjoya

·

https://chenjoya.github.io/

chenjoya

AI & ML interests

Video LLM

Recent Activity

upvoted a paper about 23 hours ago

Cambrian-S: Towards Spatial Supersensing in Video

upvoted a paper 3 days ago

Revisiting Multimodal Positional Encoding in Vision-Language Models

upvoted a paper 3 days ago

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

View all activity

Organizations

upvoted a paper about 23 hours ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published 1 day ago • 19

upvoted 2 papers 3 days ago

Revisiting Multimodal Positional Encoding in Vision-Language Models

Paper • 2510.23095 • Published 12 days ago • 18

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published 4 days ago • 95

upvoted a paper 9 days ago

ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

Paper • 2510.18455 • Published 18 days ago • 17

upvoted a paper 11 days ago

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published 12 days ago • 56

upvoted a paper 26 days ago

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published 29 days ago • 49

upvoted 2 papers about 1 month ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6 • 110

Code2Video: A Code-centric Paradigm for Educational Video Generation

Paper • 2510.01174 • Published Oct 1 • 33

upvoted 4 papers 2 months ago

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

Paper • 2509.01106 • Published Sep 1 • 48

Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 189

Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination

Paper • 2509.01986 • Published Sep 2 • 4

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2 • 123

upvoted a paper 3 months ago

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11 • 29

upvoted a paper 4 months ago

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Paper • 2506.21277 • Published Jun 26 • 15

upvoted 6 papers 5 months ago

Show-o2: Improved Native Unified Multimodal Models

Paper • 2506.15564 • Published Jun 18 • 28

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28 • 44

D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29 • 34

SWE-bench Goes Live!

Paper • 2505.23419 • Published May 29 • 21

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 22