More
    HomeAutomation/AINVIDIA’s biggest customers reportedly delaying orders of Blackwell AI racks

    NVIDIA’s biggest customers reportedly delaying orders of Blackwell AI racks

    -


    Hyperscalers pause multi-billion dollar rack orders due to overheating concerns; meanwhile focus seems to be switching to the next next-generation tech

    Last November, NVIDIA’s latest Blackwell AI chip encountered an overheating problem, reported by the Information. Observers thought this could impact major tech clients like Microsoft and Meta, and this now seems to be the case, according to Reuters. 

    Mizuho Securities, the investment bank, reports that NVIDIA’s AI accelerators account for between 70% and 95% of the market share for AI chips, cited in Technology magazine. This position is reinforced by NVIDIA’s flagship AI GPUs and its CUDA software, which is the industry standard for AI development.

    Major customers Microsoft, AWS, Google, and Meta Platforms have reportedly cut some orders of NVIDIA’s Blackwell GB200 racks. Although each has placed orders for Blackwell racks in the order of $10 billion or more, they are not commenting on the issue, according to Reuters. Apparently some are switching to NVIDIA’s older chips, others are waiting for a more stable version of the Blackwell racks. 

    Microsoft was initially planning to install GB200 racks with at least 50,000 Blackwell chips in one of its Phoenix facilities. SoftBank is slated to receive the world’s first NVIDIA DGX B200 systems, which will serve as the building blocks for its new NVIDIA DGX SuperPOD supercomputer.

    Serious problems?

    The overheating problem that were first reported last November are associated with servers customised to house 72 Blackwell chips. NVIDIA immediately started work with suppliers to refine their server rack designs but the issues are substantial. A 72 GPU rack normally would require about 72kW to run and needs to dissipate an equivalent amount of heat energy, but in some liquid-cooled setups, certain Blackwell GPUs can consume up to 1,200W each. 

    In comparison, an electric oven with a similar footprint consumes between 2-5kW, on average, depending on the model, which shows how tricky the cooling conundrum has become. However, NVIDIA has pointed out in the past that Blackwell can train the same model as its Hopper chip on a quarter of the number of GPUs and a consume a quarter of the power. This clearly demonstrates Blackwell’s advantage if configuration and cooling can be resolved. 

    Best form of defence is attack

    CEO Jensen Huang (pictured above) previously denied media reports that a flagship, liquid-cooled server containing 72 of the new chips experienced overheating during testing. Huang said in November that his company is “on track” to exceed an earlier target of several billion dollars in revenue from Blackwell chips in its fourth fiscal quarter.

    NVIDIA’s mood will not have lightened with news earlier in the day that the US government is to impose further restrictions on AI chip and technology exports. 

    Next-next-gen chips

    However, if next-gen chips are bringing you down, why not shift to next-next-gen? A report has emerged from South Korea that NVIDIA may be planning to launch its Rubin chip – the successor to Blackwell – early. The company has said that one of Rubin’s design goals is to control power consumption. The New Daily report suggests that NVIDIA had initially planned to launch Rubin in 2026, but it is now the launch is expected in the third quarter of 2025. 

    Last autumn, NVIDIA requested Korean chip maker SK hynix to expedite the development of HBM4 (for high-bandwidth memory) chips as they are a fundamental to AI. Now Samsung too is to accelerate development of HBM4, as races to complete the Production Readiness Approval (PRA) process within the first half of 2025.

    Samsung is still trying to pass NVIDIA’s verification for its HBM3E chips, which are also crucial AI components, but it seems the two Korean memory chip giants have shifted their focus to HBM4.

    In the meantime, data centre operators will be having a very close look at how to incorporate Blackwell into their AI plans. Fortunately for them, there is still a very large cloud services market.