Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.06181

ByteDance Papers

ByteDance papers collection

about 13 hours ago

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3 • 26
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1 • 24

Reasoning Models

A Preliminary Study for GPT-4o on Image Restoration

Paper • 2505.05621 • Published May 8 • 11
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

Paper • 2507.08776 • Published Jul 11 • 54

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 15
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 57
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29 • 7

m-a-p/CriticLeanGPT-Qwen3-8B-RL

8B • Updated Jul 10 • 54 • 3
m-a-p/CriticLeanGPT-Qwen3-14B-RL

15B • Updated Jul 10 • 57
m-a-p/CriticLeanGPT-Qwen3-32B-RL

33B • Updated Jul 11 • 55
m-a-p/CriticLeanGPT-Qwen2.5-7B-RL

15B • Updated Jul 12 • 55 • 1

about 12 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 285 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9 • 20
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 79
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 45
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 29
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Paper • 2310.08678 • Published Oct 12, 2023 • 14
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 21

ByteDance Papers

ByteDance papers collection

about 13 hours ago

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

m-a-p/CriticLeanGPT-Qwen3-8B-RL

8B • Updated Jul 10 • 54 • 3
m-a-p/CriticLeanGPT-Qwen3-14B-RL

15B • Updated Jul 10 • 57
m-a-p/CriticLeanGPT-Qwen3-32B-RL

33B • Updated Jul 11 • 55
m-a-p/CriticLeanGPT-Qwen2.5-7B-RL

15B • Updated Jul 12 • 55 • 1

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3 • 26
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1 • 24

about 12 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 285 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Reasoning Models

A Preliminary Study for GPT-4o on Image Restoration

Paper • 2505.05621 • Published May 8 • 11
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

Paper • 2507.08776 • Published Jul 11 • 54

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9 • 20
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 79
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 15
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 57
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Paper • 2504.01956 • Published Apr 2 • 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published Jun 29 • 7

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 45
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 29
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Paper • 2310.08678 • Published Oct 12, 2023 • 14
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 21

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs