One of the key challenges in Large Language Model (LLM) training is reducing the memory requirements needed for training without sacrificing compute/communication efficiency and model accuracy. DeepSpeed [2] is a popular deep learning software library which facilitates memory-efficient training of large language models. DeepSpeed includes ZeRO (Zero Redundancy Optimizer), a memory-efficient approach for distributed training [5]. ZeRO has multiple stages of memory efficient optimizations, and Habana’s SynapseAI® software currently supports ZeRO-1 and ZeRO-2. In this article, we will talk about what ZeRO is and how it is useful for training LLMs. We will provide a brief technical overview of ZeRO, covering ZeRO-1 and ZeRO-2 stages of memory optimization. More details on DeepSpeed Support on Habana SynapseAI Software can be found at Habana DeepSpeed User Guide. Now, let us dive into why we need memory efficient training for LLMs and how ZeRO can help achieve this.
-
-
Articles récents
- Transform your AI Applications with Agentic LLM Workflows
- 3 Recent Updates to the Intel Tiber AI Cloud for Developers
- Predictive Tool Maintenance: oneAPI Enhances Aerospace Industry Application for Manufacturing
- GenAI Winner Projects Built on Intel® Tiber™ AI Cloud at 2024 Collegiate Hackathons
- Optimize LLM serving with vLLM on Intel® GPUs
-
Neural networks news
Intel NN News
- Transform your AI Applications with Agentic LLM Workflows
Highlights from Intel AI DevSummit Tech Talk: Building Agentic LLM Workflows with AutoGen
- 3 Recent Updates to the Intel Tiber AI Cloud for Developers
Unlock AI's potential with Intel Tiber AI Cloud: new PyTorch, oneAPI updates, DeepSeek-R1, Whisper […]
- Predictive Tool Maintenance: oneAPI Enhances Aerospace Industry Application for Manufacturing
Intel Student Ambassador's tech talk at oneAPI DevSummit Oct'24
- Transform your AI Applications with Agentic LLM Workflows
-