Chipping Away at Edge AI Inefficiencies

Engineers are developing AI-centric chips that integrate processing and memory into the same unit, enabling efficient AI on mobile devices.

An early prototype of the chip (📷: Hongyang Jia / Princeton University)

The latest and most powerful AI algorithms have reached a level of complexity and sophistication that demands significant computational resources to execute efficiently. These algorithms, often based on deep learning architectures such as convolutional neural networks or transformer models, typically run on powerful computers located in cloud computing environments. These environments offer the scalability and resources needed to handle the intensive computational requirements of cutting edge AI tasks.

In order to limit latency and protect sensitive information, mobile devices, such as smartphones and tablets, need to be capable of running these advanced algorithms locally to power the next generation of AI applications. But they have limited computational capabilities and energy budgets compared to the servers found in cloud environments. Factors such as these have limited the rollout of this critical technology where it is needed most.

Furthermore, traditional computing architectures, both in mobile devices and in servers, have a separation between processing and memory units. This architecture introduces a bottleneck that greatly limits processing speeds in data-intensive applications like AI. In AI tasks, where large amounts of data need to be processed rapidly, this bottleneck becomes particularly problematic. Processing data stored in separate memory units incurs latency and reduces overall efficiency, hindering the performance of AI algorithms even further.

To overcome these challenges and enable the widespread adoption of AI on mobile devices, many innovative solutions are actively being explored. Princeton University researchers are working in conjunction with a startup called EnCharge AI towards one such solution — a new type of AI-centric processing chip that is powerful, yet requires very little power for operation. By reducing both the size of the hardware and the power consumption required by the algorithms, these chips have the potential to free AI from the cloud in the future.

Professor Naveen Verma is leading the effort to build the new chip (📷: Sameer A. Khan / Fotobuddy)

Achieving this goal required an entirely different way of looking at the problem. Rather than sticking with the tried and true von Neumann architecture that has powered our computer systems for decades, the researchers designed their chip such that processing and memory co-exist in the same unit, eliminating the need to shuttle data between units via relatively low bandwidth channels.

This is not the first in-memory computing architecture to be introduced by a long shot, but to date, existing solutions have been very limited in their capabilities. The computing needs to be highly efficient, because the hardware must fit within tiny memory cells. So rather than using the traditional binary language to store data, the team instead encoded data in analog. This allows many more than two states to be stored at each address, which allows for data to be packed much more densely.

Using traditional semiconductor devices like transistors, working with analog signals proved to be challenging. In order to guarantee accurate computations that are not impacted by changing conditions like temperature, the researchers instead used a special type of capacitor that is designed to switch on and off with precision to store and process the analog data.

Early prototypes of the chip have been developed and demonstrate the potential of the technology. Further work will still need to be done before the technology is ready for use in the real world, however. After recently receiving funding from DARPA, the chances of that work being completed successfully have risen.

nickbild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Latest Articles