YJ's picture

38 37

YJ

yjh415

AI & ML interests

None yet

Organizations

None yet

commented 13 papers 5 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 130 •

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Paper • 2505.19147 • Published May 25 • 144 •

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 96 •

Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model

Paper • 2505.17894 • Published May 23 • 219 •

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200 •

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88 •

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75 •

TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations

Paper • 2505.18125 • Published May 23 • 112 •

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23 • 81 •

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75 •

NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

Paper • 2505.16938 • Published May 22 • 120 •

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76 •

Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8 • 85 •

commented 7 papers 6 months ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 82 •

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 96 •

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17 • 58 •

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Paper • 2505.15277 • Published May 21 • 104 •

AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82 •

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134 •

Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 121 •