This evaluation shows materially higher concurrency and improved latency scaling when moving from a 64-core to a 96-core Intel® Xeon® configuration for Intel® AI for Enterprise RAG inference. The 96-core SKU doubles SLA-compliant concurrency for Llama-AWQ and Mistral-AWQ (32 → 64 users) across all workloads and increases Qwen-AWQ SLA concurrency by 33–50% (workload dependent) versus the 64-core system.
-
-
Articles récents
- AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound
- In-production AI Optimization Guide for Xeon: Search and Recommendation Use Case
- Argonne’s Aurora Supercomputer Helps Power Breakthrough Simulations of Quantum Materials
- Argonne’s Aurora Supercomputer Drives Simulations to Explore How Light Shapes Quantum Materials
- AERIS Earth Systems Model Pushes AI for Science to New Heights
-
Neural networks news
Intel NN News
- AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound
We are thrilled to announce an official collaboration between SGLang and AutoRound, enabling […]
- In-production AI Optimization Guide for Xeon: Search and Recommendation Use Case
In this guide, you'll learn multiple aspects of optimizing the Search and Recommendation model […]
- AERIS Earth Systems Model Pushes AI for Science to New Heights
Researchers at the U.S. Department of Energy’s (DOE) Argonne National Laboratory introduce AERIS, […]
- AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound
-