Arm Announces a "Fundamental Shift' for Edge AI with the IoT-Focused Cortex-A320

Company boasts of a tenfold performance gain for machine learning — and enough power to run LLMs with up to a billion parameters on-device.

4 months ago • Machine Learning & AI / Internet of Things / HW101

Arm has announced the launch of its latest IP for edge machine learning and artificial intelligence (ML and AI), which it claims delivers up to 10 times the machine learning performance and 30 percent more general-purpose performance than its last-generation Cortex-A35 while also supporting on-device operation of large language models (LLMs) and vision language models (VLMs) with up to a billion parameters: the Arm Cortex-A320.

"This isn't just an incremental step forward," Arm's Paul Williamson told us during a pre-launch press briefing. "This is a fundamental shift for us in how we're approaching edge computing and AI processing. For the first time, we've designed a v9 CPU specifically optimized for IoT applications. We're bringing together the ultra efficiency and advanced AI capabilities in a way that hasn't been possible till now, and we've paired it with Ethos-U85 that allows us to see entirely new categories of edge AI applications. For me, there's never been a more exciting time to be in this industry, and we believe that the future of AI is at the edge."

Arm has launched the first "ultra-efficiency" Armv9 IP, the Cortex-A320 — promising a "fundamental shift" in how it approaches edge AI. (📷: Arm)

The heart of the company's new platform is the Arm Cortex-A320, a successor to the Cortex-A35 and Cortex-A53 and positioned as an alternative to the Cortex-M85 for energy-conscious edge AI designs. Based on the Armv9 architecture, the Cortex-A320 is claimed to be optimized for the Internet of Things (IoT) — up to 50 percent more power-efficient than the Cortex-A520, while offering a 30 percent boost in general-purpose compute and up to 10 times the machine learning performance as the Cortex-A35 when paired with the same Ethos-U85 neural coprocessor.

"This is arm's first [Arm]v9 ultra-efficiency core," Williamson claimed during the briefing. "It's been optimized for the IoT. It's revolutionizing AI at the edge, and it's at the heart of the new platform. It offers a huge jump in machine learning performance. It can be configured in clusters of [up to] four cores, and that allows [Cortex-A]320 to be scalable to fit a broad range of performance needs. Cortex-A320 takes advantage of the Arm 9 security and AI compute features, which you've already been seen established in other markets, and we're now bringing those into IoT."

The company makes bold claims of a tenfold generational performance boost for ML and AI workloads. (📷: Arm)

The new IP slots in at the lower-end of the company's Armv9 ecosystem, offering higher efficiency than the more performant Cortex-A520 that is, in turn, more efficient than the Cortex-A725 — while the family still tops out with the Cortex-X925, Arm's highest-performance IP. At the same time, Arm has announced that it is expanding its Kleidi platform to the IoT — launching KleidiAI for the Cortex-A320, which optimizes ML and AI workloads for execution on the CPU and is claimed to boost performance of the Tiny Stories language model running on-device via Llama.cpp by up to 70 percent.

As is usual for Arm, the launch did not come with public pricing information nor details of any early design wins. "We are expecting to see it in in silicon, certainly, with availability next year," Williamson claimed, promising "an interesting range of products that are developed from the platform."

More information is available on the Arm website.

machine learning

computer vision

artificial intelligence

energy efficiency

internet of things

Gareth Halfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Arm Announces a "Fundamental Shift' for Edge AI with the IoT-Focused Cortex-A320

Company boasts of a tenfold performance gain for machine learning — and enough power to run LLMs with up to a billion parameters on-device.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles