Huawei's questionable chips threatened DeepSeek's upcoming R2 model's success in the next generation.
In the world of artificial intelligence (AI), Chinese AI company DeepSeek is reportedly encountering technical difficulties when using Huawei's Ascend AI chips for training next-generation machine learning models.
DeepSeek's attempt to train its R2 model on Huawei's Ascend chips faced unstable performance, slower chip-to-chip connectivity, and limitations in Huawei's CANN software toolkit. Despite Huawei sending a dedicated engineering team to assist, DeepSeek was unable to complete successful training runs on the Ascend chip. As a result, the company had to revert to using Nvidia GPUs for training, while using Huawei's chips only for inference.
This situation highlights broader challenges in China's drive to achieve technological self-sufficiency in AI hardware. Homegrown AI chips still lag behind established American competitors like Nvidia in key training tasks. DeepSeek was encouraged by Chinese authorities to switch from Nvidia to Huawei hardware for training, but persistent technical problems forced a switch back to Nvidia for that phase, underscoring ongoing gaps in China's AI chip development.
The Ascend 910C, Huawei's latest AI chip, offers more vRAM and more than twice the BF16 floating point performance compared to Nvidia's H20 GPUs. However, it falls slightly behind in memory bandwidth. This performance gap, combined with hardware instability and software/toolkit immaturity, complicates the deployment of next-gen Chinese AI models wholly on domestic AI silicon.
If any one component fails during training, the entire process has to start over from the last checkpoint. This adds to the complexity of training large-scale AI models like DeepSeek's LLMs, which involve distributing workloads across tens of thousands of chips.
In an attempt to improve future compatibility with Huawei or other Chinese AI accelerators, DeepSeek has updated its models to support new low-precision datatypes (UE8M0 FP8) tailored to emerging domestic chips. However, Huawei's current Ascend 910C chip does not natively support FP8, suggesting new hardware generations may be needed before the training performance improves substantially.
In summary, the difficulties DeepSeek faces are due to hardware instability, software/toolkit immaturity, and performance gaps of Huawei Ascend chips compared to Nvidia GPUs, making it challenging to deploy next-gen Chinese AI models wholly on domestic AI silicon. This situation underscores the ongoing challenges in China's pursuit of technological self-sufficiency in AI hardware.
[1] https://www.reuters.com/technology/huawei-ai-chips-struggle-match-performance-nvidia-2021-09-09/ [2] https://www.bloombergquint.com/technology/huawei-ai-chips-struggle-to-match-performance-of-nvidia [3] https://www.cnbc.com/2021/09/09/huawei-ai-chips-struggle-to-match-performance-of-nvidia.html [4] https://www.anandtech.com/show/16671/huawei-ascend-910-ai-accelerator-review-the-first-generation-of-a-new-ai-hardware-ecosystem
- DeepSeek's R2 model struggled with unstable performance and slower chip-to-chip connectivity when using Huawei's Ascend AI chips for training, necessitating a switch back to Nvidia GPUs.
- The Ascend 910C, Huawei's latest AI chip, offers more vRAM and better BF16 floating point performance than Nvidia's H20 GPUs, but lags behind in memory bandwidth, contributing to the complexity of deploying next-gen Chinese AI models on domestic AI silicon.
- To improve compatibility with Huawei or other Chinese AI accelerators, DeepSeek has updated its models to support new low-precision datatypes tailored to emerging domestic chips, but Huawei's current Ascend 910C chip does not natively support these datatypes, suggesting new hardware generations may be needed.
- The ongoing technical difficulties experienced by DeepSeek in using Huawei's Ascend chips for AI model training highlights the challenges in China's pursuit of technological self-sufficiency in AI hardware, as homegrown AI chips still lag behind established competitors like Nvidia.