Large Language Models are revolutionizing AI applications; however, slow inference speeds continue to be a significant challenge. Intel researchers, along with industry and university partners, are actively working to address this issue and accelerate the efficiency of LLMs. In a series of blog posts, Intel Researchers introduce several novel works, including a method that accelerates text generation by up to 2.7 times, a method that extends assisted generation to work with a small language model from any model family, and a technique that enables any small “draft” model to accelerate any LLM, regardless of vocabulary differences
-
-
Articles récents
- Give Your RAG a Voice: Building an Audio Q&A Experience with Intel® AI for Enterprise RAG
- Reduce Downtime Up To 50% by Utilizing AI-Ready RAS Features of Intel® Xeon® Processors
- How to Fine-Tune an LLM on Intel® GPUs With Unsloth
- Intel® Xeon® Processors Set the Standard for Vector Search Benchmark Performance
- From Gold Rush to Factory: How to Think About TCO for Enterprise AI
-
Neural networks news
Intel NN News
- Give Your RAG a Voice: Building an Audio Q&A Experience with Intel® AI for Enterprise RAG
Turn your RAG into a voice-powered assistant with Intel® AI for Enterprise RAG.
- Reduce Downtime Up To 50% by Utilizing AI-Ready RAS Features of Intel® Xeon® Processors
As generative and agentic AI use cases proliferate across nearly every industry, improving the […]
- How to Fine-Tune an LLM on Intel® GPUs With Unsloth
Fine-tuning an LLM doesn’t have to require massive infrastructure. With Unsloth now supporting […]
- Give Your RAG a Voice: Building an Audio Q&A Experience with Intel® AI for Enterprise RAG
-