In the Blink of an AI
The Cascade platform was designed to deliver low-latency interactive intelligent computing applications on edge hardware.
The rise of machine learning applications has caused a surge in the use of powerful networks of computers in the cloud to handle the demanding computations required for training and inference. However, this centralized approach has several drawbacks. One major problem is the introduction of latency, which can cause sluggish interactions between users and applications. The data must travel between the user's device and the remote cloud servers, resulting in delays that are particularly noticeable in real-time or interactive situations.
In addition, the cost of deploying machine learning models in the cloud can be prohibitive, as the computational resources required for training and serving models at scale demand substantial financial investments. This high cost of operation can limit the accessibility of advanced machine learning capabilities for smaller organizations and projects.
Beyond economic concerns, the environmental impact of running large-scale machine learning operations in the cloud is a growing concern. The massive energy consumption of data centers contributes to carbon emissions and exacerbates the environmental footprint associated with machine learning technologies.
Furthermore, the reliance on cloud-based solutions raises privacy and security concerns, especially when dealing with confidential or sensitive data. Users must trust third-party cloud service providers with their information, posing potential risks of data breaches or unauthorized access.
A multi-institutional team led by researchers at Cornell University has recently released an open-source platform that was designed to address these issues. Created to foster the development of interactive intelligent computing applications, Cascade can significantly reduce per-event latency while still maintaining acceptable levels of throughput. By deploying applications to edge hardware with Cascade, applications generally run between two and ten times faster than typical cloud-based applications, enabling near real-time interactions in many cases.
Existing platforms for deploying and delivering edge AI applications tend to prioritize throughput over latency, with high-latency components like REST and gRPC APIs being leveraged as interconnects between nodes. With Cascade, low latency is given the highest priority, with super-fast technologies like remote DMA being used for inter-node communication. To further improve a common bottleneck that slows down applications, both data and compute capabilities are co-located on the same hardware. These features do not come at the expense of compatibility — the custom key/value API utilized by Cascade is compatible with dataset APIs available in PyTorch, TensorFlow, and Spark. The researchers noted that, in general, Cascade requires no changes at all to the AI software.
Taken together, these characteristics make Cascade well-suited for applications where reaction times of a fraction of a second are required. This could have important applications in smart traffic intersections, digital agriculture, smart power grids, and automatic product inspection. Also considering the privacy-preserving aspect of using the system, many applications in medical diagnostics could also benefit.
A member of the team used their system to build a prototype of a smart traffic intersection. It is able to locate and track people, vehicles, bicycles, and other objects. If any of these objects are on a collision course, a warning is issued in a matter of milliseconds, while there may still be time to react. Another early application was described that images the udders of cows as they are milked to look for signs of mastitis, which is known to reduce milk production. Using this device, infections can be detected early before they become more severe and hinder production.
The researchers hope that others will leverage their technology to make AI applications more accessible. Toward that goal, the source code has been released under a permissive license, and installation instructions are available in the project’s GitHub repository.