Quadric's Chimera Blends Neural Processing and Digital Signal Processing for Performance and Ease
Designed to do everything an ML accelerator and a DSP can do simultaneously, the Chimera promises to make developers lives easier.
Machine learning specialist Quadric has unveiled the Chimera, which it's calling "the first family of general-purpose neural processors (GPNPUs)" — and claims it has combined the machine learning performance of a neural processing unit (NPU) with the programmability of a digital signal processor (DSP).
"Machine learning is infiltrating nearly all applications everywhere DSPs are traditionally used today for vision, audio, sound, communications, sensors, and so much more," claims Veerbhan Kheterpal, co-founder and chief executive of Quadric, in support of the company's launch. "Existing silicon solutions to the ML inference challenge have added accelerators as helper offload cores to existing DSPs or CPUs.
"The limitation of that approach is the clumsy way the programmer has to partition her code across the different cores in the system and then tune the interaction between those cores to get desired performance goals. The new Chimera GPNPU family creates a unified, single-core architecture for both ML inference and related conventional C++ processing of images, video, radar or other signals, eliminating multicore challenges."
The company's Chimera chip design does away with the distinction between NPU, CPU, and DSP, offering a single unified architecture accessible from a single software stack — supporting scalar, vector, and matrix math in one single logical processing core. The result, the company claims, is a simpler approach with area, power, and efficiency gains over traditional multi-core alternatives.
The precise performance of the parts depends on which model in the family you pickup. The entry-level QB1 offers 1 trillion operations per second (TOPS) of machine learning performance with 64 giga operations per second (GOPS) of DSP performance; the mid-range QB4 offers 4 TOPS machine learning and 256 GOPS DSP; and the range-topping QB16 manages 16 TOPS for machine learning workloads and 1 TOPS for DSP workloads. While designed for the simplicity of a single-core approach, though, Quadric says multiple Chimera cores can be linked together for increased performance.
On the software side, the company claims the Chimera cores can run "any ML operator," with custom operators added by writing a C++ kernel using the bundled Chimera Compute Library (CCL). Quadric is not, however, making the parts available in silicon; instead, it's offering them as intellectual property to chip design teams — and claims it's ready for "immediate customer engagement" with those looking to begin building around the technology.
More information is available on the Quadric website.