Large Language Models are revolutionizing AI applications; however, slow inference speeds continue to be a significant challenge. Intel researchers, along with industry and university partners, are actively working to address this issue and accelerate the efficiency of LLMs. In a series of blog posts, Intel Researchers introduce several novel works, including a method that accelerates text generation by up to 2.7 times, a method that extends assisted generation to work with a small language model from any model family, and a technique that enables any small “draft” model to accelerate any LLM, regardless of vocabulary differences
-
-
Articles récents
- Starting with Production in Mind: A Blueprint for Affordable Enterprise-Grade RAG on VMware Tanzu
- Running the AI Factory: How Enterprises Operationalize AI Placement at Scale
- Intel® Xeon® 6 Processors: The Ultimate Host CPU Solution for AI-Accelerated Systems and Agentic AI
- Agentic Code Execution: A Leaner Way to Build AI Agents with Open Models
- CPU Overload Despite Having iGPU: Here’s Why?
-
Neural networks news
Intel NN News
- Starting with Production in Mind: A Blueprint for Affordable Enterprise-Grade RAG on VMware Tanzu
Enterprise AI on CPUs: Intel & T‑Systems prove RAG runs cost‑effectively without GPUs.
- Intel® Xeon® 6 Processors and Intel® AMX Deliver More Concurrent Users with NVIDIA HGX B200 Systems
This blog introduces a heterogeneous architecture that co-runs vLLMs on both CPUs and GPUs to […]
- Speed-up JAX LLM Training on Intel® Xeon® 6 CPU: Activation Offloading on Heterogeneous Systems
JAX-based Activation Offloading on Intel® Xeon® 6 with P-cores systems offers an effective […]
- Starting with Production in Mind: A Blueprint for Affordable Enterprise-Grade RAG on VMware Tanzu
-