view article Article Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models Sep 29 • 20
Speculative Decoding Draft Models Collection Collection of OpenVINO optimized efficient draft models for speculative decoding • 4 items • Updated Sep 16 • 10
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 39