The first post in this series introduced vector search, its relevance in today’s world, and the important metrics used to characterize it. We can achieve dramatic gains in vector search systems by improving their internal vector representations, as the majority of the search runtime is spent bringing vectors from memory to compute their similarity with the query. The focus of this post, Locally-adaptive Vector Quantization (LVQ), accelerates the search, lowers the memory footprint, and preserves the efficiency of the similarity computation.
-
-
Articles récents
- Powering Agentic AI with CPUs: LangChain, MCP, and vLLM on Google Cloud
- Building a Sovereign GenAI Stack for the United Nations with Intel and OPEA
- Accelerating vLLM Inference: Intel® Xeon® 6 Processor Advantage over AMD EPYC
- KVCrush: Rethinking KV Cache Alternative Representation for Faster LLM Inference
- Scaling AI with Confidence: Lenovo’s Approach to Responsible and Practical Adoption
-
Neural networks news
Intel NN News
- Accelerating vLLM Inference: Intel® Xeon® 6 Processor Advantage over AMD EPYC
The vLLM (Virtualized Large Language Model) framework, optimized for CPU inference, is emerging as […]
- Building a Sovereign GenAI Stack for the United Nations with Intel and OPEA
The United Nations (UN) has taken a bold step toward digital sovereignty by developing an […]
- KVCrush: Rethinking KV Cache Alternative Representation for Faster LLM Inference
Developed by Intel, KVCrush can improve LLM inference throughput up to 4x with less than 1% […]
- Accelerating vLLM Inference: Intel® Xeon® 6 Processor Advantage over AMD EPYC
-