Scaling Intel® AI for Enterprise RAG Performance: 64-Core vs 96-Core Intel® Xeon®

This evaluation shows materially higher concurrency and improved latency scaling when moving from a 64-core to a 96-core Intel® Xeon® configuration for Intel® AI for Enterprise RAG inference. The 96-core SKU doubles SLA-compliant concurrency for Llama-AWQ and Mistral-AWQ (32 → 64 users) across all workloads and increases Qwen-AWQ SLA concurrency by 33–50% (workload dependent) versus the 64-core system.

Ce contenu a été publié dans Non classé. Vous pouvez le mettre en favoris avec ce permalien.