Amazon Web Services Promises Four Times Faster ML and AI Training, Better Efficiency with Trainium2

Designed for training complex large language models (LLMs) and other models, Trainium2 is claimed to be a leap above its predecessor.

2 years ago • Machine Learning & AI / HW101

Amazon Web Services (AWS) has announced the release of a new generation of in-house processors, taking aim at the computationally taxing work of training machine learning and artificial intelligence (ML and AI) systems — and promising a quadrupling of performance.

"Silicon underpins every customer workload, making it a critical area of innovation for AWS," claims David Brown, vice president of compute and networking at AWS, in support of the launch. "By focusing our chip designs on real workloads that matter to customers, we’re able to deliver the most advanced cloud infrastructure to them. Graviton4 marks the fourth generation we’ve delivered in just five years, and is the most powerful and energy efficient chip we have ever built for a broad range of workloads. And with the surge of interest in generative AI, Tranium2 will help customers train their ML models faster, at a lower cost, and with better energy efficiency."

Amazon Web Services has announced two new in-house high-performance processors for its Elastic Compute Cloud, the Graviton4 (pictured) and Trainium2. (📷: AWS)

The Graviton4 processor, built around Arm cores, is claimed to offer a 30 per cent boost to performance, scales to 50 per cent more cores, and offers 75 per cent more memory bandwidth than its Graviton3 predecessor, and aims at general-purpose workloads. It's the Trainium2 chip, though, that comes with the headline-grabbing claim of a full quadrupling of performance — as measured by the length of time it takes to complete a model's training process.

Claimed to offer "the highest performance, most energy-efficient AI model training infrastructure in the cloud," Amazon's hardware launch comes amid an explosion of interest in generative artificial intelligence (genAI) and large language model (LLM) technology. While it's possible to create genAI models suitable for deployment on resource-constrained devices, it's their training which takes time and resources — which is where Trainium2 is claimed to excel.

The Trainium2 chip is claimed to deliver a quadrupling of performance for training machine learning models. (📷: AWS)

In the company's internal testing, Trainium2 delivered a claimed quadrupling of performance and triple the memory capacity of the original Trainium parts while doubling energy efficiency. As with Amazon's other silicon efforts, it won't be selling the chips to end-users but instead making them available on-demand on the AWS platform — from 16 chips to 100,000 chips depending on exactly how much compute you need and how much cash you can throw at the problem.

The Graviton4 and Trainium2 chips will be made available through Amazon's Elastic Compute Cloud (EC2), alongside — rather than in place of — chips from AMD, Intel, and NVIDIA, the company has confirmed.

artificial intelligence

Gareth Halfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Amazon Web Services Promises Four Times Faster ML and AI Training, Better Efficiency with Trainium2

Designed for training complex large language models (LLMs) and other models, Trainium2 is claimed to be a leap above its predecessor.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles