Cadence Unveils the Neo NPU, NeuroWeave SDK — And Promises Up to 20x the On-Device ML Performance
Neo neural processing unit IP comes with big gains in both performance per watt and performance per area, the company claims.
Cadence Design Systems has announced its Neo neural processing unit (NPU) technology and NeuroWeave software development kit (SDK), aiming to improve performance and efficiency with on-device machine learning and artificial intelligence (ML and AI) workloads at the edge.
"For two decades and with more than 60 billion processors shipped, industry-leading SoC [System-on-Chip] customers have relied on Cadence processor IP for their edge and on-device SoCs. Our Neo NPUs capitalize on this expertise, delivering a leap forward in AI processing and performance," claims Cadence's David Glasco.
"In today's rapidly evolving landscape," Glasco continues, "it's critical that our customers are able to design and deliver AI solutions based on their unique requirements and KPIs [Key Performance Indicators] without concern about whether future neural networks are supported. Toward this end, we've made significant investments in our new AI hardware platform and software toolchain to enable AI at every performance, power and cost point and to drive the rapid deployment of AI-enabled systems."
The Neo NPU cores are designed for high-performance machine learning at the edge, scaling from 8 giga-operations per second (GOPS) to 80 tera-operations per second (TOPS) of compute in a single core — and from there to "hundreds of TOPS" by integrated multiple cores into a single design. The company claims the design is built to support efficient offloading of workloads from any host processor, from application processors all the way down to microcontrollers and digital signal processors (DSPs), and offers support for FP16 floating-point and INT16, INT8, and INT4 integer precision.
Drawing a direct comparison to the company's first-generation AI hardware, Cadence claims the new Neo NPUs can deliver "up to 20X higher performance" with between two- and fivefold improvements in inferences per second per area (IPS/mm²) and between five- and tenfold improvements in inferences per second per watt (IPS/W). Actual performance is configurable depending on requirements, with Cadence claiming the IP can be configured between 256 to 32 multiply-accumulate (MACs) per cycle to balance power, performance, and area requirements.
On the software side, Cadence is supporting the Neo IP with a software development kit dubbed NeuroWeave. This, the company promises, offers a "uniform, scalable, and configurable software stack" across both its Tensilica and Neo core IP with support for a range of machine learning frameworks including TensorFlow, TensorFlow Lite, and TensorFlow Lite Micro, ONNX. PyTorch, Caffe2, MXNet, and JAX, as well as the Android Neural Network Compiler.
More information on the Neo NPU IP is available on the Cadence website; the company has said it is targeting general availability in December this year, with unnamed "lead customers" having already begun "early engagements."