uhlo
			's Collections
			 
		
			
		interesting stuff
		
	updated
			
 
				
				
	
	
	
			
			Chain-of-Verification Reduces Hallucination in Large Language Models
		
			Paper
			
•
			2309.11495
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Adapting Large Language Models via Reading Comprehension
		
			Paper
			
•
			2309.09530
			
•
			Published
				
			•
				
				81
			
 
	
	 
	
	
	
			
			CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
  Language Models in 167 Languages
		
			Paper
			
•
			2309.09400
			
•
			Published
				
			•
				
				85
			
 
	
	 
	
	
	
			
			Language Modeling Is Compression
		
			Paper
			
•
			2309.10668
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			Contrastive Decoding Improves Reasoning in Large Language Models
		
			Paper
			
•
			2309.09117
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Exploring Large Language Models' Cognitive Moral Development through
  Defining Issues Test
		
			Paper
			
•
			2309.13356
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
  Language Models
		
			Paper
			
•
			2309.15098
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
		
			Paper
			
•
			2309.14717
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
		
			Paper
			
•
			2309.16609
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Effective Long-Context Scaling of Foundation Models
		
			Paper
			
•
			2309.16039
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Large Language Models Cannot Self-Correct Reasoning Yet
		
			Paper
			
•
			2310.01798
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			DSPy: Compiling Declarative Language Model Calls into Self-Improving
  Pipelines
		
			Paper
			
•
			2310.03714
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Table-GPT: Table-tuned GPT for Diverse Table Tasks
		
			Paper
			
•
			2310.09263
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			BitNet: Scaling 1-bit Transformers for Large Language Models
		
			Paper
			
•
			2310.11453
			
•
			Published
				
			•
				
				105
			
 
	
	 
	
	
	
			
			Self-RAG: Learning to Retrieve, Generate, and Critique through
  Self-Reflection
		
			Paper
			
•
			2310.11511
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			H2O Open Ecosystem for State-of-the-art Large Language Models
		
			Paper
			
•
			2310.13012
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			LLM-FP4: 4-Bit Floating-Point Quantized Transformers
		
			Paper
			
•
			2310.16836
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
		
			Paper
			
•
			2310.16795
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
		
			Paper
			
•
			2310.17157
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			CodeFusion: A Pre-trained Diffusion Model for Code Generation
		
			Paper
			
•
			2310.17680
			
•
			Published
				
			•
				
				73
			
 
	
	 
	
	
	
			
			Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
		
			Paper
			
•
			2310.19102
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			LoRAShear: Efficient Large Language Model Structured Pruning and
  Knowledge Recovery
		
			Paper
			
•
			2310.18356
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Does GPT-4 Pass the Turing Test?
		
			Paper
			
•
			2310.20216
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			CodePlan: Repository-level Coding using LLMs and Planning
		
			Paper
			
•
			2309.12499
			
•
			Published
				
			•
				
				79
			
 
	
	 
	
	
	
			
			The Generative AI Paradox: "What It Can Create, It May Not Understand"
		
			Paper
			
•
			2311.00059
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			E3 TTS: Easy End-to-End Diffusion-based Text to Speech
		
			Paper
			
•
			2311.00945
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Unveiling Safety Vulnerabilities of Large Language Models
		
			Paper
			
•
			2311.04124
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			MEGAVERSE: Benchmarking Large Language Models Across Languages,
  Modalities, Models and Tasks
		
			Paper
			
•
			2311.07463
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
  an Alternative to Attention Layers in Transformers
		
			Paper
			
•
			2311.10642
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
		
			Paper
			
•
			2311.13600
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Language Models are Super Mario: Absorbing Abilities from Homologous
  Models as a Free Lunch
		
			Paper
			
•
			2311.03099
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
		
			Paper
			
•
			2312.03491
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
		
			Paper
			
•
			2312.04474
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
		
			Paper
			
•
			2312.03818
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Blending Is All You Need: Cheaper, Better Alternative to
  Trillion-Parameters LLM
		
			Paper
			
•
			2401.02994
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
		
			Paper
			
•
			2401.04088
			
•
			Published
				
			•
				
				159
			
 
	
	 
	
	
	
			
			Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
		
			Paper
			
•
			2401.04658
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			The Impact of Reasoning Step Length on Large Language Models
		
			Paper
			
•
			2401.04925
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
		
			Paper
			
•
			2401.06951
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Extending LLMs' Context Window with 100 Samples
		
			Paper
			
•
			2401.07004
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Self-Rewarding Language Models
		
			Paper
			
•
			2401.10020
			
•
			Published
				
			•
				
				151
			
 
	
	 
	
	
	
			
			Rambler: Supporting Writing With Speech via LLM-Assisted Gist
  Manipulation
		
			Paper
			
•
			2401.10838
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
  Text
		
			Paper
			
•
			2401.12070
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			DeepSeek-Coder: When the Large Language Model Meets Programming -- The
  Rise of Code Intelligence
		
			Paper
			
•
			2401.14196
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
				jinaai/jina-embeddings-v2-base-de
				
				
			
			Feature Extraction
			
• 
		
				0.2B
			• 
	
				Updated
					
				
				• 
					
					46.3k
				
	
				
• 
					
					80
				
 
		
	
	
	 
	
	
	
			
			SliceGPT: Compress Large Language Models by Deleting Rows and Columns
		
			Paper
			
•
			2401.15024
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Weaver: Foundation Models for Creative Writing
		
			Paper
			
•
			2401.17268
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			TrustLLM: Trustworthiness in Large Language Models
		
			Paper
			
•
			2401.05561
			
•
			Published
				
			•
				
				69
			
 
	
	 
	
	
	
			
			OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
		
			Paper
			
•
			2402.01739
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Shortened LLaMA: A Simple Depth Pruning for Large Language Models
		
			Paper
			
•
			2402.02834
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Self-Discover: Large Language Models Self-Compose Reasoning Structures
		
			Paper
			
•
			2402.03620
			
•
			Published
				
			•
				
				117
			
 
	
	 
	
	
	
			
			BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
		
			Paper
			
•
			2402.04291
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Aya Dataset: An Open-Access Collection for Multilingual Instruction
  Tuning
		
			Paper
			
•
			2402.06619
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Aya Model: An Instruction Finetuned Open-Access Multilingual Language
  Model
		
			Paper
			
•
			2402.07827
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
		
			Paper
			
•
			2402.07456
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Scaling Laws for Fine-Grained Mixture of Experts
		
			Paper
			
•
			2402.07871
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
  Models
		
			Paper
			
•
			2401.01335
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			Computing Power and the Governance of Artificial Intelligence
		
			Paper
			
•
			2402.08797
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
		
			Paper
			
•
			2402.09727
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			BitDelta: Your Fine-Tune May Only Be Worth One Bit
		
			Paper
			
•
			2402.10193
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Chain-of-Thought Reasoning Without Prompting
		
			Paper
			
•
			2402.10200
			
•
			Published
				
			•
				
				109
			
 
	
	 
	
	
	
			
			How to Train Data-Efficient LLMs
		
			Paper
			
•
			2402.09668
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
  Workflows
		
			Paper
			
•
			2402.10379
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
		
			Paper
			
•
			2402.12226
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			OneBit: Towards Extremely Low-bit Large Language Models
		
			Paper
			
•
			2402.11295
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
  Summarization
		
			Paper
			
•
			2402.13249
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Coercing LLMs to do and reveal (almost) anything
		
			Paper
			
•
			2402.14020
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
		
			Paper
			
•
			2402.13753
			
•
			Published
				
			•
				
				116
			
 
	
	 
	
	
	
			
			MobileLLM: Optimizing Sub-billion Parameter Language Models for
  On-Device Use Cases
		
			Paper
			
•
			2402.14905
			
•
			Published
				
			•
				
				134
			
 
	
	 
	
	
	
			
			The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
		
			Paper
			
•
			2402.17764
			
•
			Published
				
			•
				
				625
			
 
	
	 
	
	
	
			
			GPTVQ: The Blessing of Dimensionality for LLM Quantization
		
			Paper
			
•
			2402.15319
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			ShortGPT: Layers in Large Language Models are More Redundant Than You
  Expect
		
			Paper
			
•
			2403.03853
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			MoAI: Mixture of All Intelligence for Large Language and Vision Models
		
			Paper
			
•
			2403.07508
			
•
			Published
				
			•
				
				77
			
 
	
	 
	
	
	
			
			MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop
  Queries
		
			Paper
			
•
			2401.15391
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
  Understanding
		
			Paper
			
•
			2403.12895
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			RAFT: Adapting Language Model to Domain Specific RAG
		
			Paper
			
•
			2403.10131
			
•
			Published
				
			•
				
				72
			
 
	
	 
	
	
	
			
			LLM Agent Operating System
		
			Paper
			
•
			2403.16971
			
•
			Published
				
			•
				
				72
			
 
	
	 
	
	
	
			
			Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
  LLMs Under Compression
		
			Paper
			
•
			2403.15447
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
		
			Paper
			
•
			2403.16627
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in
  Long-Horizon Generation
		
			Paper
			
•
			2403.05313
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			The Llama 3 Herd of Models
		
			Paper
			
•
			2407.21783
			
•
			Published
				
			•
				
				116
			
 
	
	 
	
	
	
			
			SAM 2: Segment Anything in Images and Videos
		
			Paper
			
•
			2408.00714
			
•
			Published
				
			•
				
				116
			
 
	
	 
	
	
	
			
			Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal
  Language Model
		
			Paper
			
•
			2408.00754
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Gemma 2: Improving Open Language Models at a Practical Size
		
			Paper
			
•
			2408.00118
			
•
			Published
				
			•
				
				79
			
 
	
	 
	
	
	
			
			LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
		
			Paper
			
•
			2408.07055
			
•
			Published
				
			•
				
				67
			
 
	
	 
	
	
	
			
			LLM Pruning and Distillation in Practice: The Minitron Approach
		
			Paper
			
•
			2408.11796
			
•
			Published
				
			•
				
				57
			
 
	
	 
	
	
	
			
			Automated Design of Agentic Systems
		
			Paper
			
•
			2408.08435
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			ColPali: Efficient Document Retrieval with Vision Language Models
		
			Paper
			
•
			2407.01449
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Dolphin: Long Context as a New Modality for Energy-Efficient On-Device
  Language Models
		
			Paper
			
•
			2408.15518
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of
  Encoders
		
			Paper
			
•
			2408.15998
			
•
			Published
				
			•
				
				87
			
 
	
	 
	
	
	
			
			Configurable Foundation Models: Building LLMs from a Modular Perspective
		
			Paper
			
•
			2409.02877
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
		
			Paper
			
•
			2409.17146
			
•
			Published
				
			•
				
				121
			
 
	
	 
	
	
	
			
			Ruler: A Model-Agnostic Method to Control Generated Length for Large
  Language Models
		
			Paper
			
•
			2409.18943
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large
  Language Models
		
			Paper
			
•
			2409.17066
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
		
			Paper
			
•
			2410.05258
			
•
			Published
				
			•
				
				179
			
 
	
	 
	
	
	
			
			LLMs Know More Than They Show: On the Intrinsic Representation of LLM
  Hallucinations
		
			Paper
			
•
			2410.02707
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
		
			Paper
			
•
			2410.07073
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			Falcon Mamba: The First Competitive Attention-free 7B Language Model
		
			Paper
			
•
			2410.05355
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
		
			Paper
			
•
			2410.10814
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
		
			Paper
			
•
			2410.13848
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Why Does the Effective Context Length of LLMs Fall Short?
		
			Paper
			
•
			2410.18745
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Continuous Speech Synthesis using per-token Latent Diffusion
		
			Paper
			
•
			2410.16048
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			ROCKET-1: Master Open-World Interaction with Visual-Temporal Context
  Prompting
		
			Paper
			
•
			2410.17856
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
		
			Paper
			
•
			2410.21276
			
•
			Published
				
			•
				
				87
			
 
	
	 
	
	
	
			
			Document Parsing Unveiled: Techniques, Challenges, and Prospects for
  Structured Information Extraction
		
			Paper
			
•
			2410.21169
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Stealing User Prompts from Mixture of Experts
		
			Paper
			
•
			2410.22884
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			TokenFormer: Rethinking Transformer Scaling with Tokenized Model
  Parameters
		
			Paper
			
•
			2410.23168
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Byte Latent Transformer: Patches Scale Better Than Tokens
		
			Paper
			
•
			2412.09871
			
•
			Published
				
			•
				
				108
			
 
	
	 
	
	
	
			
			How to Synthesize Text Data without Model Collapse?
		
			Paper
			
•
			2412.14689
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
		
			Paper
			
•
			2412.15115
			
•
			Published
				
			•
				
				376
			
 
	
	 
	
	
	
			
			rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
  Thinking
		
			Paper
			
•
			2501.04519
			
•
			Published
				
			•
				
				285
			
 
	
	 
	
	
	
			
			OmniThink: Expanding Knowledge Boundaries in Machine Writing through
  Thinking
		
			Paper
			
•
			2501.09751
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			GameFactory: Creating New Games with Generative Interactive Videos
		
			Paper
			
•
			2501.08325
			
•
			Published
				
			•
				
				67
			
 
	
	 
	
	
	
			
			DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
  Reinforcement Learning
		
			Paper
			
•
			2501.12948
			
•
			Published
				
			•
				
				422
			
 
	
	 
	
	
	
			
			SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
  Model
		
			Paper
			
•
			2502.02737
			
•
			Published
				
			•
				
				246
			
 
	
	 
	
	
	
			
			Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
  Scaling
		
			Paper
			
•
			2502.06703
			
•
			Published
				
			•
				
				153
			
 
	
	 
	
	
	
			
			InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
  a Single GPU
		
			Paper
			
•
			2502.08910
			
•
			Published
				
			•
				
				148
			
 
	
	 
	
	
	
			
			Large Language Diffusion Models
		
			Paper
			
•
			2502.09992
			
•
			Published
				
			•
				
				122
			
 
	
	 
	
	
	
			
			The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
  Agentic Tasks
		
			Paper
			
•
			2502.08235
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			Qwen2.5-VL Technical Report
		
			Paper
			
•
			2502.13923
			
•
			Published
				
			•
				
				207
			
 
	
	 
	
	
	
			
			How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
		
			Paper
			
•
			2502.14502
			
•
			Published
				
			•
				
				91
			
 
	
	 
	
	
	
			
			Thus Spake Long-Context Large Language Model
		
			Paper
			
•
			2502.17129
			
•
			Published
				
			•
				
				73
			
 
	
	 
	
	
	
			
			Slamming: Training a Speech Language Model on One GPU in a Day
		
			Paper
			
•
			2502.15814
			
•
			Published
				
			•
				
				69
			
 
	
	 
	
	
	
			
			START: Self-taught Reasoner with Tools
		
			Paper
			
•
			2503.04625
			
•
			Published
				
			•
				
				113
			
 
	
	 
	
	
	
			
			EuroBERT: Scaling Multilingual Encoders for European Languages
		
			Paper
			
•
			2503.05500
			
•
			Published
				
			•
				
				79
			
 
	
	 
	
	
	
			
			φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time
  Exploration and Exploitation
		
			Paper
			
•
			2503.13288
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			SmolDocling: An ultra-compact vision-language model for end-to-end
  multi-modal document conversion
		
			Paper
			
•
			2503.11576
			
•
			Published
				
			•
				
				117
			
 
	
	 
	
	
	
			
			I Have Covered All the Bases Here: Interpreting Reasoning Features in
  Large Language Models via Sparse Autoencoders
		
			Paper
			
•
			2503.18878
			
•
			Published
				
			•
				
				119
			
 
	
	 
	
	
	
			
			Qwen2.5-Omni Technical Report
		
			Paper
			
•
			2503.20215
			
•
			Published
				
			•
				
				166
			
 
	
	 
	
	
	
			
			PaperBench: Evaluating AI's Ability to Replicate AI Research
		
			Paper
			
•
			2504.01848
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
  Language Model Pre-training
		
			Paper
			
•
			2504.13161
			
•
			Published
				
			•
				
				93
			
 
	
	 
	
	
	
			
			TTRL: Test-Time Reinforcement Learning
		
			Paper
			
•
			2504.16084
			
•
			Published
				
			•
				
				120
			
 
	
	 
	
	
	
			
			The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
		
			Paper
			
•
			2504.15521
			
•
			Published
				
			•
				
				64
			
 
	
	 
	
	
	
			
			BitNet v2: Native 4-bit Activations with Hadamard Transformation for
  1-bit LLMs
		
			Paper
			
•
			2504.18415
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			Reinforcement Learning for Reasoning in Large Language Models with One
  Training Example
		
			Paper
			
•
			2504.20571
			
•
			Published
				
			•
				
				96
			
 
	
	 
	
	
	
			
			Absolute Zero: Reinforced Self-play Reasoning with Zero Data
		
			Paper
			
•
			2505.03335
			
•
			Published
				
			•
				
				185
			
 
	
	 
	
	
	
			
			Search-o1: Agentic Search-Enhanced Large Reasoning Models
		
			Paper
			
•
			2501.05366
			
•
			Published
				
			•
				
				102
			
 
	
	 
	
	
	
			
			A Multi-Dimensional Constraint Framework for Evaluating and Improving
  Instruction Following in Large Language Models
		
			Paper
			
•
			2505.07591
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
		
			Paper
			
•
			2505.09388
			
•
			Published
				
			•
				
				308
			
 
	
	 
	
	
	
			
			Emerging Properties in Unified Multimodal Pretraining
		
			Paper
			
•
			2505.14683
			
•
			Published
				
			•
				
				134
			
 
	
	 
	
	
	
			
			Qwen3 Embedding: Advancing Text Embedding and Reranking Through
  Foundation Models
		
			Paper
			
•
			2506.05176
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Geopolitical biases in LLMs: what are the "good" and the "bad" countries
  according to contemporary language models
		
			Paper
			
•
			2506.06751
			
•
			Published
				
			•
				
				71
			
 
	
	 
	
	
	
			
			Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
		
			Paper
			
•
			2506.16406
			
•
			Published
				
			•
				
				126
			
 
	
	 
	
	
	
			
			MemOS: A Memory OS for AI System
		
			Paper
			
•
			2507.03724
			
•
			Published
				
			•
				
				154
			
 
	
	 
	
	
	
			
			SingLoRA: Low Rank Adaptation Using a Single Matrix
		
			Paper
			
•
			2507.05566
			
•
			Published
				
			•
				
				112
			
 
	
	 
	
	
	
			
			A Survey on Latent Reasoning
		
			Paper
			
•
			2507.06203
			
•
			Published
				
			•
				
				92
			
 
	
	 
	
	
	
			
			Reasoning or Memorization? Unreliable Results of Reinforcement Learning
  Due to Data Contamination
		
			Paper
			
•
			2507.10532
			
•
			Published
				
			•
				
				88
			
 
	
	 
	
	
	
			
			Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
  Systems in LLMs
		
			Paper
			
•
			2507.09477
			
•
			Published
				
			•
				
				84
			
 
	
	 
	
	
	
			
			A Survey of Context Engineering for Large Language Models
		
			Paper
			
•
			2507.13334
			
•
			Published
				
			•
				
				258
			
 
	
	 
	
	
	
			
			Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
		
			Paper
			
•
			2507.16784
			
•
			Published
				
			•
				
				120
			
 
	
	 
	
	
	
			
			GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
		
			Paper
			
•
			2508.06471
			
•
			Published
				
			•
				
				188
			
 
	
	 
	
	
	
			
			VibeVoice Technical Report
		
			Paper
			
•
			2508.19205
			
•
			Published
				
			•
				
				123
			
 
	
	 
	
	
	
			
			Who's Your Judge? On the Detectability of LLM-Generated Judgments
		
			Paper
			
•
			2509.25154
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Cache-to-Cache: Direct Semantic Communication Between Large Language
  Models
		
			Paper
			
•
			2510.03215
			
•
			Published
				
			•
				
				94
			
 
	
	 
	
	
	
			
			When Models Lie, We Learn: Multilingual Span-Level Hallucination
  Detection with PsiloQA
		
			Paper
			
•
			2510.04849
			
•
			Published
				
			•
				
				110
			
 
	
	 
	
	
	
			
			LightMem: Lightweight and Efficient Memory-Augmented Generation
		
			Paper
			
•
			2510.18866
			
•
			Published
				
			•
				
				107
			
 
	
	 
	
	
	
			
			Kimi Linear: An Expressive, Efficient Attention Architecture
		
			Paper
			
•
			2510.26692
			
•
			Published
				
			•
				
				83