What’s Neu in LLMs?

Neuchips has just announced a pair of new ASICs that can accelerate LLM inference speeds, while cutting costs and slashing energy use.

Nick Bild
10 months agoMachine Learning & AI
The Raptor ASIC can efficiently run LLM inferences (📷: Neuchips)

The Consumer Electronics Show (CES) is once again in full swing in Las Vegas, and as usual, the latest breakthroughs across the entire tech landscape are on display. With the major product releases that have happened in the past year, it should come as no surprise that technologies incorporating artificial intelligence are at the front and center. Large Language Models (LLMs) in particular have had a breakout year, greatly improving in their capabilities as chatbots, digital assistants, control systems for robots, and much more.

But any conversation about the capabilities of LLMs will inevitably also turn to another important aspect of these models — their utilization of hardware resources. Despite many algorithmic advancements that have served to optimize LLMs, they are still known as resource hogs, often requiring massive cloud computing resources just to run inferences. Naturally, this has the effect of limiting when and where these models can be used, hindering them from being incorporated into many commercial applications.

A company called Neuchips that focuses on developing Application-Specific Integrated Circuits (ASICs) for AI applications announced a pair of new hardware components at CES that may help LLMs to run on less powerful hardware platforms while consuming less energy. The products are named the Raptor Gen AI accelerator chip and the Evo PCIe accelerator card. Both of these devices were designed to help enterprises deploy LLMs at a fraction of current costs.

Each Raptor chip is capable of performing as many as 200 tera operations per second, with certain operations that are critical to modern machine learning algorithms, like matrix multiplications and embedding table lookups, being supported. These capabilities extend beyond just LLMs, benefiting a wide range of generative AI and transformer-based models. The Evo acceleration card combines the power of Raptor chips with 32 GB of LPDDR5 memory and a PCIe Gen 5 interface with eight lanes to provide 64 GB/s host I/O bandwidth.

The Neuchips team demonstrated their hardware accelerating the popular Whisper and Llama AI chatbots at CES. Given the performance and the energy-efficiency of this hardware, it may help to power a new generation of AI tools. Be on the lookout for more product releases from Neuchips in the second half of the year.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles