Hackster is hosting Hackster Holidays, Ep. 7: Livestream & Giveaway Drawing. Watch previous episodes or stream live on Friday!Stream Hackster Holidays, Ep. 7 on Friday!

Rumored Production, Design Issues Hit NVIDIA's High-Performance Blackwell AI-Accelerating GPUs

The double-die 208-billion-transistor behemoth built to power the AI revolution is in trouble, it seems — with a three-month delay likely.

UPDATE (10/24/2024): NVIDIA has confirmed that the problems delaying the launch of its Blackwell graphics processor products was a design flaw, and has accepted full responsibility — but says the problem is now fixed, thanks to the help of manufacturing partner Taiwan Semiconductor (TSMC).

"We had a design flaw in Blackwell," NVIDIA founder and chief executive officer Jensen Huang admitted at an event this week, as reported by Reuters. "It was functional, but the design flaw caused the yield to be low. It was 100 per cent NVIDIA's fault.

"In order to make a Blackwell computer work, seven different types of chips were designed from scratch and had to be ramped into production at the same time. What TSMC did was to help us recover from that yield difficulty and resume the manufacturing of Blackwell at an incredible pace."

The company now plans to ship the delayed chips within the fourth quarter of the year.

Original article continues below.

NVIDIA's next-generation graphics processor family, codenamed Blackwell, has reportedly hit a last-minute delay as the company struggles with rumored production issues — with big-name customers waiting on the hardware to power their artificial intelligence engines being asked to wait for at least another three months.

Sources identified as a Microsoft employee and "another person with direct knowledge" speaking to The Information this week claimed that NVIDIA was telling its customers that delivery of its Blackwell GPU products was to be pushed back by three months or more due to what journalists Qianer Liu and Anissa Gardizy claim are "design flaws" in the hardware. The Financial Times, meanwhile, positions the problem as "production issues" — though Bernstein analyst Mark Li told the paper that the company would "likely have to make a minor design tweak" as a result.

In either case, the claim is the same: a major delay to Blackwell, which NVIDIA unveiled back in March as its most powerful yet energy-efficient graphics processing platform yet — ideal, the company claimed, for the growing compute demands of the artificial intelligence revolution it seeks to underpin.

With over 208 billion transistors split across two dice, connected via a 10TB/s interlink, the Blackwell family is ambitious indeed — and makes use of a new wafer system integration platform from fabricator Taiwan Semiconductor (TSMC) dubbed CoWoS-L, the first commercial hardware to do so. A report from Semi Analysis claims that issues with this technology are at the heart of the problem, with TSMC struggling to ramp production quickly enough and failures in the connection between the dice and the bridge being the cause of the delay — and requiring a redesign to resolve.

Semi Analysis further claims the company will be announcing a new GPU, the B200A, which uses TSMC's established CoWoS-S platform — bypassing the bridge problem and allowing for rapid production, with the parts being positioned for lower- and mid-range AI systems.

While NVIDIA has not commented on the rumors, other than to state that "production is on track to ramp," it is expected to break its silence soon — if for no better reason than its stock price having taken a hammering as a result of the issue.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles