Struggling Diversity and Insufficient Production: The Challenges Preventing China from Ditching Nvidia Hardware for AI Advancements
=================================================================
China's pursuit of AI self-reliance is gaining momentum, despite the challenges posed by U.S. sales restrictions on advanced GPUs like Nvidia's HGX H20 and AMD's Instinct MI308.
The Chinese government's push for self-sufficiency in its semiconductor industry, including AI accelerators, has been ongoing since the mid-2010s. Notable advancements include the development of the CloudMatrix 384, a rack-scale AI supercomputer by Huawei, and the creation of several domestic AI accelerators in 2025.
However, China's domestic AI hardware is still years behind that of AMD and Nvidia. Despite significant progress, with chips reaching up to 85% of Nvidia H20's performance, there is a growing expectation that new models will close the gap by 2026-27.
The maturity of China's domestic AI software stack, however, lags significantly behind Nvidia's CUDA ecosystem. The CUDA platform, a dominant factor in AI development globally, offers a highly optimized and widely adopted environment for AI model training and inference, with strong multi-GPU scaling through NVLink interconnects. Chinese domestic software toolkits, such as Huawei's Compute Architecture for Neural Networks (CANN), struggle with performance instability, limited interoperability, and weaker chip-to-chip communication.
This dependency on Nvidia's ecosystem is a significant bottleneck for China's full independence in AI development. Despite efforts by domestic firms to focus on compatible compute standards and cohesive software-hardware stacks, reaching parity with CUDA's mature ecosystem remains a critical challenge.
The situation is further complicated by the inability of Chinese manufacturers like SMIC to match TSMC's process technologies. Huawei had to obtain the vast majority of silicon for its Ascend 910B and Ascend 910C processors by deceiving TSMC.
Huawei is attempting to address this issue by opening up CANN, aiming to attract a broad community of developers to its platform for performance tuning and framework integration. The open-sourcing of CANN includes compilers, low-level APIs, libraries of AI operators, and a system-level runtime.
In an effort to build a fully localized AI stack in China, the Model-Chip Ecosystem Innovation Alliance was formed. This alliance aims to set common mid-level standards, such as shared model formats, operator definitions, and framework APIs, to address the issue of interoperability.
The U.S. has also imposed a 15% sales tax on AMD and Nvidia hardware sold to China, further escalating the situation.
In summary, while China is making significant strides towards AI self-reliance, the immaturity of its domestic AI software stack compared to CUDA remains a critical impediment to full independence in AI development.