Scaling Intel® AI for Enterprise RAG Performance: 64-Core vs 96-Core Intel® Xeon®

This evaluation shows materially higher concurrency and improved latency scaling when moving from a 64-core to a 96-core Intel® Xeon® configuration for Intel® AI for Enterprise RAG inference. The 96-core SKU doubles SLA-compliant concurrency for Llama-AWQ and Mistral-AWQ (32 → 64 users) across all workloads and increases Qwen-AWQ SLA concurrency by 33–50% (workload dependent) versus the 64-core system.

Scaling Intel® AI for Enterprise RAG Performance: 64-Core vs 96-Core Intel® Xeon®

Articles récents

Neural networks news

Intel NN News

Archives

Catégories